UBB Mining: Finding Unexpected Browsing Behaviour in Clickstream Data to Improve a Websites Design - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

UBB Mining: Finding Unexpected Browsing Behaviour in Clickstream Data to Improve a Websites Design

Description:

I-Hsien Ting, Chris Kimble, Daniel Kudenko. Department of Computer Science, The University of York, United Kingdom {derrick, kimble, kudenko}_at_cs.york.ac.uk ... – PowerPoint PPT presentation

Number of Views:212
Avg rating:3.0/5.0
Slides: 26
Provided by: imNu
Category:

less

Transcript and Presenter's Notes

Title: UBB Mining: Finding Unexpected Browsing Behaviour in Clickstream Data to Improve a Websites Design


1
UBB Mining Finding Unexpected Browsing Behaviour
in Clickstream Data to Improve a Websites Design
  • I-Hsien Ting, Chris Kimble, Daniel Kudenko
  • Department of Computer Science, The University of
    York, United Kingdom
  • derrick, kimble, kudenko_at_cs.york.ac.uk

The 2005 IEEE/WIC/ACM International Conference on
Web Intelligence 19-22 September 2005
2
Introduction
  • Web site design is the most important success
    factor for a website, especially in E-commerce
  • It is valuable to analyse browsing behaviour and
    apply the results to improve the website design
  • Most of the research about using web usage mining
    techniques to discover users browsing behaviour
    is based on the Direct Method
  • This method processes the Clickstream data
    directly to find 'an interesting pattern'.
  • However, different patterns of users browsing
    behaviour can have very different meanings in
    different websites.

3
Introduction Cont.
  • UBB Mining Finding Unexpected Browsing Behaviour
    in Clickstream Data
  • The designer of the site to define patterns of
    'expected browsing behaviour
  • Browsing routes that do not match the expected
    route are then identified as unexpected browsing
    behaviour
  • The website designer can then use these to find
    the reason why this behaviour occurs and act
    accordingly
  • We consider UBB mining to be a form of data
    mining, since it discovers regularities (or
    actually irregularities) in the Clickstream
    data.
  • The search space for these regularities is
    restricted by additional knowledge, in our case
    the expected browsing route

4
Outline
  • Introduction
  • Discovering users browsing behaviour
  • Different Viewpoints of Browsing Patterns
  • UBB Mining
  • The process of UBB Mining
  • The techniques of UBB Mining
  • Expected route definition
  • Browsing route segmentation
  • UBB discovery
  • Experiment Results
  • Conclusion

5
Discovering Users Browsing Behaviours
  • Visualisation technique
  • To present the users browsing history in a
    visualised graph or map (Canter et al. 1985,
    Domel, 1994, Ting et al. 2004)
  • Advantage
  • The results are very easy to be read and can be
    understood by the human eye
  • Disadvantage
  • It is not robust enough to deal with a large
    amount of complex Clickstream data

6
Discovering Users Browsing Behaviours Cont.
  • Web usage mining
  • A clustering algorithm can group the users into
    suitable clusters according to their browsing
    behaviour
  • An association rule algorithm can discover the
    relationship between different user's browsing
    routes
  • Advantage
  • A large amount of data can be processed very
    efficiently
  • Disadvantage
  • The results produced by this kind of tool are
    sometimes difficult to interpret and explain.

7
Different Viewpoints of Browsing Patterns
Upstairs Pattern
Fingers Pattern
  • Upstairs Pattern
  • Viewpoint 1Browsing smoothly
  • Viewpoint 2Doesnt follow the websites
    structure
  • Fingers Pattern
  • Viewpoint 1 Falling into a browsing loop. The
    user got lost.
  • Viewpoint 2 The user follows the web sites
    structure. Browsing very well!

8
Different Viewpoints of Browsing Patterns Cont.
  • Traditional web usage mining techniques are also
    problematic.
  • The results are not always easy to explain and
    sometimes a designer cannot identify a problem
  • UBB mining
  • Based on the website designers viewpoint
  • The website designer can get the information
    about how their website is used
  • It will be easier for them to identify instances
    of unexpected browsing behaviour
  • The results should then be of direct help to the
    website designer when reviewing or redesigning
    their site

9
UBB Mining
  • The Process of UBB Mining

10
UBB Mining Data Pre-processing and Data
Restoration
  • The raw server-side Clickstream data must be
    pre-processed to clean the noise, incomplete or
    irrelevant data before using it for web usage
    mining.
  • Clickstream data created by Bots needs to be
    removed to ensure the data is really from a
    'user'. (Tan and Kumar 2000)
  • User and session identification are in order to
    distinguish the Clickstream data from different
    users. and to divide the users browsing history
    into a number of distinct sessions. (Srivastave
    et al. 2000)
  • Some data is lost due to caching. This lost data
    must be restored to make sure the users browsing
    pattern is as correct and complete as possible.
    (Ting, et al. 2005)

11
Expected Route Common Subsequence
  • Common Subsequence (CS)
  • Given that there are two sequences X and Y, then
    if Z is a subsequence of both X and Y, we say
    that Z is a common subsequence of both X and Y
    (Banerjee, 2001)
  • Xa,b,c,d,e,f,g and Yb,d,e,h,i
  • the common subsequences of sequence X and Y will
    be
  • CSb,d,e,b,d,d,e,b,e,b,d,e

12
Expected Route Longest Common Subsequence
  • Longest Common Subsequence (LCS)
  • LCS is a common subsequence with maximum length
    (not necessarily unique)
  • Xa,b,c,d,e,f,g and Yb,d,e,h,i
  • CSb,d,e,b,d,d,e,b,e,b,d,e
  • The LCS of sequence X and Y is LCSb,d,e

13
Expected Route Continuous Common Subsequence
  • Continuous Common Subsequence (CCS)
  • A CCS is a special instance of a CS
  • Users browsing behaviour is a continuous
    behaviour
  • The browsing behaviour is very different if the
    browsing sequences of two users are not identical
  • Consider the two browsing sequences
    Aa,b,a,c,a,d,a and Ba,b,c,d
  • These two sequences all match the LCSa,b,c,d
    and both are very similar
  • Their browsing behaviour are quite different

14
Expected Route Continuous Common
Subsequence Cont.
  • Continuous Common Subsequence (CCS)
  • For Xa,b,c,d,e,f,g and Yb,d,e,h,i
  • CSb, d, e, b,d, d,e, b,e, b,d,e.
  • A CCS is a CS where there are no in-between nodes
    in the original sequences
  • The CCS in above two sequences X and Y will be
    d?e
  • The CCS can be divided into many sub-CCS
  • E.g. There is a CCS Aa?b?c?d?e
  • Then it can be divided into A1a?b?c?d?e or
    A2a?b?c?d?e or A3a?b?c?d?eetc.

15
Expected Route Continuous Common
Subsequence Cont.
  • The Expected Route
  • In this paper, the concept of a CCS is used to
    define the expected route
  • The expected route is predefined, usually by the
    website designer or the person who has overall
    responsibility for site content
  • e.g. marketing manager, website owner, or website
    manager

16
Expected Route Continuous Common
Subsequence Cont.
  • A predefined expected route
  • ER R1x1?x2?F1pa?R2x3?x4
  • R1x1?x2 is a restricted subsequence, i.e. must
    be an exact match
  • F1pa is a flexible subsequence, it can be
    viewed as a pattern, matching a set of possible
    CCS
  • The p in F1 represents number of pages
  • The a represents the attributes of the pages
    (e.g. information, product, job etc)
  • Time, can also be represented in the flexible
    subsequence as t. However, this will not be
    discussed in this paper
  • ERR1index?product_index?F1p1a1?R2cart?check
    out

17
UBB Mining Browsing Route Segmentation
  • ERR1index?product_index?F1p1a1?R2cart?check
    out
  • URindex?product_index?product1?product2?service?c
    art?checkout
  • Segmentation checking whether UR matches ER
  • Step 1 Matching each restricted route in UR
  • Matching R1 in UR
  • URR1index?product_index?product1?product2?servi
    ce?cart?checkout
  • If cant find any subsequence match R1, then
    classify as UBB sequence
  • Matching R2 in UR
  • URR1index?product_index?product1?product2?servi
    ce?R2cart?checkout
  • If cant find any subsequence match R2, then
    classify as UBB sequence
  • Step 2 Grouping remaining continuous routes in
    between segmented restricted routes as flexible
    routes
  • URR1index?product_index?F1product1?product2?se
    rvice?R2cart?checkout
  • Continue analysis

18
UBB Mining Unexpected Browsing Behaviour
Discovery
  • ERR1index?product_index?F1p1a1?R2cart?check
    out
  • URR1index?product_index?F1product1?product2?se
    rvice?R2cart?checkout 
  • When p2 and aany in the F1pa then
  • Unexpected Browsing Behaviour
  • When p3 and aany in the F1pa then
  • Expected Browsing Behaviour 
  • When p3 and aproduct in the F1pa then
  • Unexpected Browsing Behaviour

19
Experiment Results
  • A website for a module of a university degree
    course was selected as a testbed
  • A number of predefined expected routes were
    produced based on the website designers
    expectation of the students use of the site
  • ER1R1MIS
  • ER2R1MIS?MIS_overview?F1p1, alecture
  • ER3R1MIS?MIS_overview?lecturei?F1p3,
    alecurei_related?R2MIS_overview?lecturei1

20
Experiment Results Cont.
Table 1 Discovered expected browsing routes (out
of 386 sessions)
21
Experiment Results Cont.
Table 2 Discovered UBBs (out of 386 sessions)
22
Performance Evaluation
23
Conclusion
  • We proposed a web usage mining approach called
    UBB mining
  • Based on detecting deviations from predefined
    routes
  • UBB mining is a sequential mining technique based
    on the concept of a continuous common
    subsequence(CCS)
  • Two algorithms are included in the UBB mining
    the segmentation algorithm and the UBB discovery
    algorithm
  • A website designer can discover interesting
    users browsing behaviours, which are unexpected

24
Future Research
  • Our future research will move in two directions
  • The first concerns the way in which the expected
    routes are created
  • With larger sites and different designers,
    expected routes definition could be more of a
    problem
  • We are currently working on a tool that will
    allow a designer to browse a site and "record" an
    expected route
  • We now want to move from the pattern discovery
    and analysis step to the recommendation and
    action step
  • We will then be able to focus on how to apply the
    results that are discovered by UBB mining for
    improving a websites design

25
Thanks for Your Attention andAny Question?
  • I-Hsien Ting, Chris Kimble, Daniel Kudenko
  • Department of Computer Science
  • The University of York, United Kingdom
  • derrick, kimble, kudenko_at_cs.york.ac.uk
Write a Comment
User Comments (0)
About PowerShow.com