Web Usage Mining: a practical study - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Web Usage Mining: a practical study

Description:

use of data mining techniques to automatically discover and extract information ... requests information on Beckham', show link to Real Madrid' in bold ' ... – PowerPoint PPT presentation

Number of Views:102
Avg rating:3.0/5.0
Slides: 19
Provided by: ndb2
Category:

less

Transcript and Presenter's Notes

Title: Web Usage Mining: a practical study


1
Web Usage Mininga practical study
  • Huysmans Johan
  • Baesens Bart
  • Vanthienen Jan

Firstname.Lastname_at_econ.kuleuven.ac.be

Poland, KAM 2004
2
Web Mining
  • use of data mining techniques to automatically
    discover and extract information from WWW
    documents and services (Etzioni, 1996)

3
Web Usage Mining
  • WUMapplication of data mining techniques to
    discover usage patterns from web data
  • On what data?
  • Web Forms
  • Log Files
  • 195.162.218.155 27/Jun/2002000154 0200
  • "GET /dutch/shop/detail.html HTTP/1.1" 200
    38890 "http//www.msn.be/shopping/food/"
    "Mozilla/4.0 (MSIE 6.0)"

4
Why perform WUM?
  • Personalization
  • Recommender systems
  • requests information on Beckham, show link to
    Real Madrid in bold
  • If I have two million customers on the web, I
    should have 2 million stores on the web (Jeff
    Bezos, Amazon)
  • Performance Improvement
  • Site Modification
  • Where/why leave visitors my site?

5
The process of WUM
Pre- processing
Pattern Discovery
Pattern Analysis
RAW LOG
Interesting patterns, rules and statistics
Preprocessed Log
Patterns, rules and statistics
6
Association Rules
  • Agrawal Srikant (1993)
  • 90 of the people that buy bread and butter also
    purchase milk

with X, Y ? I and X ? Y ?
Example AÞC (support50 confidence66.7)
7
Data Preprocessing
  • Time-consuming Phase !!
  • Combine logs
  • Clean Logs
  • User Identification
  • Robot/Crawler Identification
  • Session Identification
  • Path Completion
  • Transaction Identification

8
Combining and Cleaning Logs
  • Combine
  • Different Log Types/ Different Servers
  • Cleaning
  • One Click ? One Request
  • Removing parameters?
  • /help.html?helpcode3 and /help.html?helpcode5
  • Unify names
  • www.site.com versus www.site.be

9
User Identification
  • Assign Requests to users
  • No exact solution ? Heuristics
  • How?
  • Combination IP-address/User-agent
  • Embedded Session IDs
  • Explicit Registration
  • Cookies
  • Combination of the above techniques

10
Robot Identification
  • Robot/Crawler/Wanderer/Spider
  • programs that traverse the web automatically
  • Examples Googlebot, Link Checkers, Email
    gatherers
  • Identification
  • Robots.txt
  • User agent
  • Specific Behavior

11
Session Identification
  • Some users visit the site many times
  • Divide all requests from these users into several
    sessions or visits
  • How?
  • Use a time-out (standard 30 minutes)
  • Small time-out customers lose their market
    basket
  • Large time-out server requirements larger
  • Recent Research time-out of at least 60 minutes

12
Session Identification (cont.)
Number of sessions
Time-out (minutes)
13
Path Completion
  • Cache memory in which a Web browser stores
    information from recently visited Web sites,
    making it easier and faster for the user to
    revisit a site.
  • Results in missing requests in logs
  • Example

A
A-B-C becomes A-B-A-C
C
B
14
Transaction Identification
  • a semantically meaningful subset of a user
    session (Cooley, 2000)
  • optional step
  • Example news-site
  • Transactions containing sports news,
    international news,
  • Maximal Forward Reference (Chen et al., 1996)
  • Repeated visit to a page is for navigational
    purposes
  • A-B-C-D-C-B-E becomes A-B-C-D and A-B-E

15
Pattern Analysis
  • Many Association Rules found
  • Task detect Interesting Rules
  • How?
  • Additional measures
  • Visualization
  • Query Languages

16
Visualization
17
Pattern Analysis
  • Payment Process
  • Many different steps specify name, enter
    address, choose delivery options, enter Visa
    number,

12
98
Basket/home.html
checkout/entry.html
checkout/address.html
85
checkout/timeandpayment.html
checkout/overview.html
payment
??
83
18
Questions ???
quéstions
???????
??
preguntas
Frage
???t?se??
Vragen
domande
??
Pytanie
Write a Comment
User Comments (0)
About PowerShow.com