Title: Mapping Visitors Behavior to Business Goals through Click Stream Analysis
1Mapping Visitors Behavior to Business Goals
through Click Stream Analysis
- Prof. Vishnuprasad Nagadevara
- Indian Institute of Management Bangalore
- nagadev_at_iimb.ernet.in
2 Definition
- Web Analytics as defined by Web Analytics
Association - Web Analytics is the measurement, collection,
analysis and reporting of Internet data for the
purposes of understanding and optimizing Web
usage. - Clickstream as defined by Internet Advertising
Bureau (IAB) - The electronic path a user takes while
navigating from site to site, and from page to
page within a site. It is a comprehensive body of
data describing the sequence of activity between
a users browser and any other Internet resource,
such as a Web site or third party ad server - http//www.webanalyticsassociation.org/aboutus/
3Information from Web Analytics
- How many visitors visit the page daily?
- Who are the regular visitors?
- What percentage of the visitors to the page are
registered users? - What are the top pages that are visited on the
web page? - What is the average visit time on the website?
- How often does the visitor return to the site?
- What is the average page depth of a visitor?
- What is the geographic distribution of users of
the website?
4Measures
- Clicks The interaction between the user and the
web server is measured by the click of a mouse. - Visits The number of times a user visits a
specific web site. Every new session is counted
as a new visit. - Hits Total number of server requests serviced by
the server - Exits Site exits, counted by site inactivity for
more than 30 minutes - Unique Visitors A Unique User who accesses the
site in a specified period of time. - Repeated Visitor The average number of times a
user returns to a site over a specific time
period. - Page views The view of any page by the user. A
page may contain text, images, and other online
elements and may be statically or dynamically
generated and could contain single or multiple
frames or screens. - Sessions IAB defines it to be an A sequence of
Internet activity made by one user at one site.
If a user makes no request from a site during a
30 minute period of time, the next content or ad
request would then constitute the beginning of a
new visit - Unique authenticated visitors A unique visitor
who logs on to a site via a registration method
using his/her user id and password.
5Metrics
- Page views per visit Average number of page
views per visit. - Page views per session Average number of page
views per session. - Page views per hour/day Average number of page
views per hour/day. - Clicks per session Average number or clicks per
session. - Clicks per hour Average number of clicks per
hour. - Time between clicks The average duration of time
spent between two clicks. - Hits per hour Average number of hits to the web
server per hour. - Busy hour of the day The highest number of hits
to the web server in a particular hour of a day.
6Implementing Web Analytics
- Define your business objectives
- Define the KPIs that are important for your
business based on objectives and goals of
business. - Identify the data that needs to be collected.
- Identify the process to collect the data
- Prepare the data, analyze and interpret the data
- Design and implement the plan of action
- Monitor the data for continuous feedback
7Objectives of the Study
- The objectives of this study are to
- Explore Web analytics and its usefulness to web
based business. - Identify the techniques used in click stream
analysis. - Identify the application of click stream analysis
through analyzing click stream data obtained from
a particular website using appropriate click
stream analysis techniques.
8Methodology
- This study analyzes the click stream data
obtained from a web site, which specializes in an
online information exchange service to facilitate
identification of suitable partners, in India and
other countries. - The site has a very different revenue model. The
visitors are allowed to browse through the site
without any initial payment. The visitors are
allowed to look at the profiles of prospective
partners free of charge. The visitors will have
to become members by making a one-time payment
only when they need to contact the prospective
brides or grooms. - Users can search for profiles through advanced
search options on the site on various preferences
ranging from basic details of preferred partner
to lifestyle, career, education, profession etc.
9Methodology
- Members can make initial contact with each other
through services available via Chat, SMS, and
e-mail. - Users can avail free registration on the website
and are assured of exclusive privacy and
confidentiality. The website allows the users to
create their profiles, search for other profiles,
and express interest in other profiles and
contact others. Registration and creating a
profile is free of cost. - Registered users can become paid members that
will allow them to contact others, view contact
details of other members, write personalized
messages, initiate chats and let other members
view their contact details. Paid memberships are
provided for a specified duration.
10Methodology
- The click stream data is analyzed to identify
different paths taken by the visitors and the
sequence of pages that lead to payment of
membership fee. Based on this analysis, specific
strategies are recommended to maximize the
revenue for the website.
11DATA PREPARATION
- Problem Format of data
- Clickstream data files are neither delimited nor
fixed length files - Solution
- Used the date in the clickstream as the delimiter
to import data to database - Have to perform string handling in database to
separate out the fields
10.208.65.96 172.16.8.37, 124.124.35.130 - -
23/May/2008000000 -0400 "GET
/billing/billing.php?usercid22401528da14a61c435
12fa025b59578i353273 HTTP/1.0" 200
1832 10.208.65.96 68.126.193.219 - -
23/May/2008000000 -0400 "GET
/profile/js/common.js HTTP/1.1" 200
1246210.208.65.96 59.95.71.32 - -
23/May/2008000000 -0400 "GET
/P/css/comm_style.css HTTP/1.1" 200
2640 10.208.65.96 122.163.70.145 - -
23/May/2008000000 -0400 "GET
/P/search.php?checksumsearchchecksum16465054j
300newsearchinf_checksumcastemappingcrmback
searchorderTlabel_select_nosavesearchfrom_
indexviewallsave_search_redirecthide_search_
bary HTTP/1.1" 200 21561 10.208.65.96
61.1.81.153 - - 23/May/2008000000 -0400 "GET
/P/css/homestyle.css HTTP/1.1" 304
26 10.208.65.96 68.197.236.117 - -
23/May/2008000000 -0400 "GET
/profile/mainmenu.php?checksum3590208069017f9d759
33dfa9ac9005di537f26ca181f05c308393257397ab261i2
810388 HTTP/1.1" 200 3333 10.208.65.96
172.16.25.60, 59.145.189.43 - -
23/May/2008000000 -0400 "GET
/P/css/homestyle.css HTTP/1.0" 304
26 10.208.65.96 10.232.65.96, 10.232.49.1,
203.126.136.220 - - 23/May/2008000000 -0400
"GET /profile/mainmenu.php?checksum HTTP/1.1"
200 3329
12Data
- Data is obtained from the site in the form of
click stream records. Each record consists of
the details of clicks by the visitors and each
record contains the following details - Server IP
- Client IP
- Time stamp with Date
- Status HTTP Status code
- URL requested has three subfields namely The
request method, resource requested and the
protocol used - No. of bytes transferred
- The country of origin for a specific request is
identified using the IP address.
13Data
- URL is used to identify the information/web page
browsed by the visitors. - Time stamp of each click is used to sequence the
movement of the visitors across different pages
in the website. - Identifying a unique user session is an important
step in the analysis of click stream data.
Inactivity for more than 30 minutes is considered
as a break of session. - This is an approximation since there could be
multiple users accessing from the same IP, or the
same user accessing from different IPs. - Due to lack of more data available we consider
hits from each unique IP as belonging to a unique
user for a unique session.
14No of Sessions
15(No Transcript)
16(No Transcript)
17Countries By Hour
18Exit Points
19Different Pages Accessed
20Web Diagram Freq 19,000
21Web Diagram Freq 1,000
22Associations
23Summary and Conclusions
- Usage of the website by time of the day.
- This will help busy hour identification, and
provide information of the server capacity
required for the website, and when maintenance
window can be scheduled. - Usage of website from different geographic
location. - This can provide the data of the distribution of
users across geographical locations - Exit screens
- provide information on where the users exit from
the website. This input can help redesign the
webpage if it provides information on which pages
are breaking the flow of the user session.
24Summary and Conclusions
- Most accessed and least accessed pages
- This can be used for variable pricing of
advertisings on the web page. This can also be
used for better user interface design and space
utilization, by removing or repositioning the
links that are infrequently accessed. - Associations
- Provide information on unique actions on the
website and the sequence in which the user has
performed these actions. This can be used in
better user interface design. - Web diagrams
- Gives information on co-occurrence of actions on
the webpage and their significance also
provides inputs on user interface design.
25- Questions?
- Suggestions?
- Comments?