Title: UBB Mining: Finding Unexpected Browsing Behaviour in Clickstream Data to Improve a Websites Design
1UBB Mining Finding Unexpected Browsing Behaviour
in Clickstream Data to Improve a Websites Design
- I-Hsien Ting, Chris Kimble, Daniel Kudenko
- Department of Computer Science, The University of
York, United Kingdom - derrick, kimble, kudenko_at_cs.york.ac.uk
The 2005 IEEE/WIC/ACM International Conference on
Web Intelligence 19-22 September 2005
2Introduction
- Web site design is the most important success
factor for a website, especially in E-commerce - It is valuable to analyse browsing behaviour and
apply the results to improve the website design - Most of the research about using web usage mining
techniques to discover users browsing behaviour
is based on the Direct Method - This method processes the Clickstream data
directly to find 'an interesting pattern'. - However, different patterns of users browsing
behaviour can have very different meanings in
different websites.
3Introduction Cont.
- UBB Mining Finding Unexpected Browsing Behaviour
in Clickstream Data - The designer of the site to define patterns of
'expected browsing behaviour - Browsing routes that do not match the expected
route are then identified as unexpected browsing
behaviour - The website designer can then use these to find
the reason why this behaviour occurs and act
accordingly - We consider UBB mining to be a form of data
mining, since it discovers regularities (or
actually irregularities) in the Clickstream
data. - The search space for these regularities is
restricted by additional knowledge, in our case
the expected browsing route
4Outline
- Introduction
- Discovering users browsing behaviour
- Different Viewpoints of Browsing Patterns
- UBB Mining
- The process of UBB Mining
- The techniques of UBB Mining
- Expected route definition
- Browsing route segmentation
- UBB discovery
- Experiment Results
- Conclusion
5Discovering Users Browsing Behaviours
- Visualisation technique
- To present the users browsing history in a
visualised graph or map (Canter et al. 1985,
Domel, 1994, Ting et al. 2004) - Advantage
- The results are very easy to be read and can be
understood by the human eye - Disadvantage
- It is not robust enough to deal with a large
amount of complex Clickstream data
6Discovering Users Browsing Behaviours Cont.
- Web usage mining
- A clustering algorithm can group the users into
suitable clusters according to their browsing
behaviour - An association rule algorithm can discover the
relationship between different user's browsing
routes - Advantage
- A large amount of data can be processed very
efficiently - Disadvantage
- The results produced by this kind of tool are
sometimes difficult to interpret and explain.
7Different Viewpoints of Browsing Patterns
Upstairs Pattern
Fingers Pattern
- Upstairs Pattern
- Viewpoint 1Browsing smoothly
- Viewpoint 2Doesnt follow the websites
structure
- Fingers Pattern
- Viewpoint 1 Falling into a browsing loop. The
user got lost. - Viewpoint 2 The user follows the web sites
structure. Browsing very well!
8Different Viewpoints of Browsing Patterns Cont.
- Traditional web usage mining techniques are also
problematic. - The results are not always easy to explain and
sometimes a designer cannot identify a problem - UBB mining
- Based on the website designers viewpoint
- The website designer can get the information
about how their website is used - It will be easier for them to identify instances
of unexpected browsing behaviour - The results should then be of direct help to the
website designer when reviewing or redesigning
their site
9UBB Mining
- The Process of UBB Mining
10UBB Mining Data Pre-processing and Data
Restoration
- The raw server-side Clickstream data must be
pre-processed to clean the noise, incomplete or
irrelevant data before using it for web usage
mining. - Clickstream data created by Bots needs to be
removed to ensure the data is really from a
'user'. (Tan and Kumar 2000) - User and session identification are in order to
distinguish the Clickstream data from different
users. and to divide the users browsing history
into a number of distinct sessions. (Srivastave
et al. 2000) - Some data is lost due to caching. This lost data
must be restored to make sure the users browsing
pattern is as correct and complete as possible.
(Ting, et al. 2005)
11Expected Route Common Subsequence
- Common Subsequence (CS)
- Given that there are two sequences X and Y, then
if Z is a subsequence of both X and Y, we say
that Z is a common subsequence of both X and Y
(Banerjee, 2001) - Xa,b,c,d,e,f,g and Yb,d,e,h,i
- the common subsequences of sequence X and Y will
be - CSb,d,e,b,d,d,e,b,e,b,d,e
12Expected Route Longest Common Subsequence
- Longest Common Subsequence (LCS)
- LCS is a common subsequence with maximum length
(not necessarily unique) - Xa,b,c,d,e,f,g and Yb,d,e,h,i
- CSb,d,e,b,d,d,e,b,e,b,d,e
- The LCS of sequence X and Y is LCSb,d,e
13Expected Route Continuous Common Subsequence
- Continuous Common Subsequence (CCS)
- A CCS is a special instance of a CS
- Users browsing behaviour is a continuous
behaviour - The browsing behaviour is very different if the
browsing sequences of two users are not identical
- Consider the two browsing sequences
Aa,b,a,c,a,d,a and Ba,b,c,d - These two sequences all match the LCSa,b,c,d
and both are very similar - Their browsing behaviour are quite different
14Expected Route Continuous Common
Subsequence Cont.
- Continuous Common Subsequence (CCS)
- For Xa,b,c,d,e,f,g and Yb,d,e,h,i
- CSb, d, e, b,d, d,e, b,e, b,d,e.
- A CCS is a CS where there are no in-between nodes
in the original sequences - The CCS in above two sequences X and Y will be
d?e - The CCS can be divided into many sub-CCS
- E.g. There is a CCS Aa?b?c?d?e
- Then it can be divided into A1a?b?c?d?e or
A2a?b?c?d?e or A3a?b?c?d?eetc.
15Expected Route Continuous Common
Subsequence Cont.
- The Expected Route
- In this paper, the concept of a CCS is used to
define the expected route - The expected route is predefined, usually by the
website designer or the person who has overall
responsibility for site content - e.g. marketing manager, website owner, or website
manager
16Expected Route Continuous Common
Subsequence Cont.
- A predefined expected route
- ER R1x1?x2?F1pa?R2x3?x4
- R1x1?x2 is a restricted subsequence, i.e. must
be an exact match - F1pa is a flexible subsequence, it can be
viewed as a pattern, matching a set of possible
CCS - The p in F1 represents number of pages
- The a represents the attributes of the pages
(e.g. information, product, job etc) - Time, can also be represented in the flexible
subsequence as t. However, this will not be
discussed in this paper - ERR1index?product_index?F1p1a1?R2cart?check
out
17UBB Mining Browsing Route Segmentation
- ERR1index?product_index?F1p1a1?R2cart?check
out - URindex?product_index?product1?product2?service?c
art?checkout - Segmentation checking whether UR matches ER
- Step 1 Matching each restricted route in UR
- Matching R1 in UR
- URR1index?product_index?product1?product2?servi
ce?cart?checkout - If cant find any subsequence match R1, then
classify as UBB sequence - Matching R2 in UR
- URR1index?product_index?product1?product2?servi
ce?R2cart?checkout - If cant find any subsequence match R2, then
classify as UBB sequence - Step 2 Grouping remaining continuous routes in
between segmented restricted routes as flexible
routes - URR1index?product_index?F1product1?product2?se
rvice?R2cart?checkout - Continue analysis
18UBB Mining Unexpected Browsing Behaviour
Discovery
- ERR1index?product_index?F1p1a1?R2cart?check
out - URR1index?product_index?F1product1?product2?se
rvice?R2cart?checkout - When p2 and aany in the F1pa then
- Unexpected Browsing Behaviour
- When p3 and aany in the F1pa then
- Expected Browsing Behaviour
- When p3 and aproduct in the F1pa then
- Unexpected Browsing Behaviour
19Experiment Results
- A website for a module of a university degree
course was selected as a testbed - A number of predefined expected routes were
produced based on the website designers
expectation of the students use of the site - ER1R1MIS
- ER2R1MIS?MIS_overview?F1p1, alecture
- ER3R1MIS?MIS_overview?lecturei?F1p3,
alecurei_related?R2MIS_overview?lecturei1
20Experiment Results Cont.
Table 1 Discovered expected browsing routes (out
of 386 sessions)
21Experiment Results Cont.
Table 2 Discovered UBBs (out of 386 sessions)
22Performance Evaluation
23Conclusion
- We proposed a web usage mining approach called
UBB mining - Based on detecting deviations from predefined
routes - UBB mining is a sequential mining technique based
on the concept of a continuous common
subsequence(CCS) - Two algorithms are included in the UBB mining
the segmentation algorithm and the UBB discovery
algorithm - A website designer can discover interesting
users browsing behaviours, which are unexpected
24Future Research
- Our future research will move in two directions
- The first concerns the way in which the expected
routes are created - With larger sites and different designers,
expected routes definition could be more of a
problem - We are currently working on a tool that will
allow a designer to browse a site and "record" an
expected route - We now want to move from the pattern discovery
and analysis step to the recommendation and
action step - We will then be able to focus on how to apply the
results that are discovered by UBB mining for
improving a websites design
25Thanks for Your Attention andAny Question?
- I-Hsien Ting, Chris Kimble, Daniel Kudenko
- Department of Computer Science
- The University of York, United Kingdom
- derrick, kimble, kudenko_at_cs.york.ac.uk