Web Usage Mining for EBusiness Applications - PowerPoint PPT Presentation

1 / 77
About This Presentation
Title:

Web Usage Mining for EBusiness Applications

Description:

Despite its success, one problem of the current WWW is that much of this ... Need to look for the shortest backwards path from E to C based on the site topology. ... – PowerPoint PPT presentation

Number of Views:1770
Avg rating:3.0/5.0
Slides: 78
Provided by: bettinaber
Category:

less

Transcript and Presenter's Notes

Title: Web Usage Mining for EBusiness Applications


1
Web Usage Mining an overview
Bettina Berendt
Humboldt-Universität zu Berlin, Institute of
Information Systems http//www.wiwi.hu-berlin.de/
berendt/ Talk at Universidad Politecnica de
Madrid, 24 February 2005
2
Acknowledgements
  • Andreas Hotho
  • Ernestina Menasalvas
  • Bamshad Mobasher
  • Myra Spiliopoulou
  • Gerd Stumme
  • Max
    Teltzrow

3
Agenda
Introduction
Data Acquisition and Data Preparation
Forms of analysis mining techniques
A case study
Outlook
4
Data mining a definition
  • the process
  • of exploration and analysis,
  • by automatic or semi-automatic means,
  • of large quantities of data
  • in order to discover meaningful patterns and
    results.
  • (Berry Linoff, 1997, 2000)

Picture from http//www.smithsonianmag.si.edu/smit
hsonian/issues98/jan98/mining_jpg.html
5
Data Mining and Knowledge Discovery
  • Knowledge discovery
  • the non-trivial process of identifying valid,
    novel, potentially useful, and ultimately
    understandable patterns in data. (from Fayyad,
    U.M., Piatetsky-Shapiro, G., Smyth, P.,
    Uthurusamy, R. (Eds.) (1996). Advances in
    Knowledge Discovery and Data Mining. Boston, MA
    AAAI/MIT Press.)
  • Data mining
  • sometimes refers to the whole process of
    knowledge discovery and sometimes to the specific
    machine learning phase.
  • (from Kohavi Provosts glossary at
    http//robotics.stanford.edu/ronnyk/glossary.ht
    ml)

6
What is Web Mining?
  • Despite its success, one problem of the current
    WWW is that much of this knowledge lies dormant
    in the data.
  • Web mining tries to overcome these problems by
    applying data mining techniques to the content,
    (hyperlink) structure, and usage of Web resources.
  • Goals include
  • the improvement of site design and site
    structure,
  • the generation of dynamic recommendations,
  • and improving marketing.

Web Mining Areas Web content mining
7
Application examples eCommerce
8
Knowledge discovery multi-stage and iterative
The CRISP-DM process model
More on CRISP-DM as a framework for Web
mining Berendt, Menasalvas,
Spiliopoulou, Tutorial at ECML/PKDD 2004
http//www.crisp-dm.org/Images/187343_CRISPart.jpg
9
Agenda
Introduction
Data Acquisition and Data Preparation
Forms of analysis mining techniques
A case study
Outlook
10
Web Usage Mining
  • Discovery of meaningful patterns from data
    generated by client-server transactions on one or
    more Web servers
  • Typical Sources of Data
  • automatically generated data stored in server
    access logs, referrer logs, agent logs, and
    client-side cookies
  • e-commerce and product-oriented user events
    (e.g., shopping cart changes, ad or product
    click-throughs, etc.)
  • user profiles and/or user ratings
  • meta-data, page attributes, page content, site
    structure

11
Data collection
Web server
Client (Browser)
Proxy
12
Whats in a typical Web server log
(Requests to www.acr-news.org)
ltip_addrgt - - ltdategtltmethodgtltfilegtltprotocolgtltcodegt
ltbytesgtltreferrergtltuser_agentgt
203.30.5.145 - - 01/Jun/1999030921 -0600
"GET /Calls/OWOM.html HTTP/1.0" 200 3942
"http//www.lycos.com/cgi-bin/pursuit?queryadvert
isingpsychology-maxhits20catdir"
"Mozilla/4.5 en (Win98 I)" 203.30.5.145 - -
01/Jun/1999030923 -0600 "GET
/Calls/Images/earthani.gif HTTP/1.0" 200 10689
"http//www.acr-news.org/Calls/OWOM.html"
"Mozilla/4.5 en (Win98 I)" 203.30.5.145 - -
01/Jun/1999030924 -0600 "GET
/Calls/Images/line.gif HTTP/1.0" 200 190
"http//www.acr-news.org/Calls/OWOM.html"
"Mozilla/4.5 en (Win98 I)" 203.252.234.33 - -
01/Jun/1999031231 -0600 "GET / HTTP/1.0" 200
4980 "" "Mozilla/4.06 en (Win95
I)" 203.252.234.33 - - 01/Jun/1999031235
-0600 "GET /Images/line.gif HTTP/1.0" 200 190
"http//www.acr-news.org/" "Mozilla/4.06 en
(Win95 I)" 203.252.234.33 - - 01/Jun/19990312
35 -0600 "GET /Images/red.gif HTTP/1.0" 200 104
"http//www.acr-news.org/" "Mozilla/4.06 en
(Win95 I)" 203.252.234.33 - - 01/Jun/19990312
35 -0600 "GET /Images/earthani.gif HTTP/1.0" 200
10689 "http//www.acr-news.org/" "Mozilla/4.06
en (Win95 I)" 203.252.234.33 - -
01/Jun/1999031311 -0600 "GET /CP.html
HTTP/1.0" 200 3218 "http//www.acr-news.org/"
"Mozilla/4.06 en (Win95 I) 203.30.5.145 - -
01/Jun/1999031325 -0600 "GET
/Calls/AWAC.html HTTP/1.0" 200 104
"http//www.acr-news.org/Calls/OWOM.html"
"Mozilla/4.5 en (Win98 I)"
13
and what does it mean?
(Requests to www.acr-news.org)
ltip_addrgt - - ltdategtltmethodgtltfilegtltprotocolgtltcodegt
ltbytesgtltreferrergtltuser_agentgt
203.30.5.145 - - 01/Jun/1999030921 -0600
"GET /Calls/OWOM.html HTTP/1.0" 200 3942
"http//www.lycos.com/cgi-bin/pursuit?queryadvert
isingpsychology-maxhits20catdir"
"Mozilla/4.5 en (Win98 I)" 203.30.5.145 - -
01/Jun/1999030923 -0600 "GET
/Calls/Images/earthani.gif HTTP/1.0" 200 10689
"http//www.acr-news.org/Calls/OWOM.html"
"Mozilla/4.5 en (Win98 I)" 203.30.5.145 - -
01/Jun/1999030924 -0600 "GET
/Calls/Images/line.gif HTTP/1.0" 200 190
"http//www.acr-news.org/Calls/OWOM.html"
"Mozilla/4.5 en (Win98 I)" 203.252.234.33 - -
01/Jun/1999031231 -0600 "GET / HTTP/1.0" 200
4980 "" "Mozilla/4.06 en (Win95
I)" 203.252.234.33 - - 01/Jun/1999031235
-0600 "GET /Images/line.gif HTTP/1.0" 200 190
"http//www.acr-news.org/" "Mozilla/4.06 en
(Win95 I)" 203.252.234.33 - - 01/Jun/19990312
35 -0600 "GET /Images/red.gif HTTP/1.0" 200 104
"http//www.acr-news.org/" "Mozilla/4.06 en
(Win95 I)" 203.252.234.33 - - 01/Jun/19990312
35 -0600 "GET /Images/earthani.gif HTTP/1.0" 200
10689 "http//www.acr-news.org/" "Mozilla/4.06
en (Win95 I)" 203.252.234.33 - -
01/Jun/1999031311 -0600 "GET /CP.html
HTTP/1.0" 200 3218 "http//www.acr-news.org/"
"Mozilla/4.06 en (Win95 I) 203.30.5.145 - -
01/Jun/1999031325 -0600 "GET
/Calls/AWAC.html HTTP/1.0" 200 104
"http//www.acr-news.org/Calls/OWOM.html"
"Mozilla/4.5 en (Win98 I)"
14
Data Preprocessing (1)
  • Data cleaning
  • remove irrelevant references and fields in server
    logs
  • remove references due to spider navigation
  • remove erroneous references
  • add missing references due to caching (done after
    sessionization)
  • Data integration
  • synchronize data from multiple server logs
  • Integrate semantics, e.g.,
  • meta-data (e.g., content labels)
  • e-commerce and application server data
  • integrate demographic / registration data

15
Data Preprocessing (2)
  • Data Transformation
  • user identification
  • sessionization / episode identification
  • pageview identification
  • a pageview is a set of page files and associated
    objects that contribute to a single display in a
    Web Browser
  • Data Reduction
  • sampling and dimensionality reduction (ignoring
    certain pageviews / items)
  • Identifying User Transactions (i.e., sets or
    sequences of pageviews possibly with associated
    weights)

16
Why sessionize?
  • Quality of the patterns discovered in KDD depends
    on the quality of the data on which mining is
    applied.
  • In Web usage analysis, these data are the
    sessions of the site visitors the activities
    performed by a user from the moment she enters
    the site until the moment she leaves it.
  • Difficult to obtain reliable usage data due to
    proxy servers and anonymizers, dynamic IP
    addresses, missing references due to caching, and
    the inability of servers to distinguish among
    different visits.
  • Cookies and embedded session IDs produce the most
    faithful approximation of users and their visits,
    but are not used in every site, and not accepted
    by every user.
  • Therefore, heuristics are needed that can
    sessionize the available access data.

17
Mechanisms for User Identification
18
Sessionization strategiesSessionization
heuristics
(Heuristics used in, e.g., CMS99, SF99,
formalized in BMSW01)
19
Path Completion
  • Refers to the problem of inferring missing user
    references due to caching.
  • Effective path completion requires extensive
    knowledge of the link structure within the site
  • Referrer information in server logs can also be
    used in disambiguating the inferred paths.
  • Problem gets much more complicated in frame-based
    sites.

20
Sessionization Example
21
Sessionization Example
1. Sort users (based on IPAgent)
22
Sessionization Example
2. Sessionize using heuristics (h1 with 30 min)
The h1 heuristic (with timeout variable of 30
minutes) will result in the two sessions given
above.
23
Sessionization Example
2. Sessionize using heuristics (another example
with href)
In this case, the referrer-based heuristics will
result in a single session, while the h1
heuristic (with timeout 30 minutes) will result
in two different sessions.
24
Sessionization Example
3. Perform Path Completion
AgtC , CgtB , BgtD , DgtE , CgtF
Need to look for the shortest backwards path from
E to C based on the site topology. Note, however,
that the elements of the path need to have
occurred in the user trail previously.
EgtD, DgtB, BgtC
25
Why integrate semantics?
  • Basic idea associate each requested page with
    one or more domain concepts, to better understand
    the process of navigation
  • Example a shopping site

From ...
p3ee24304.dip.t-dialin.net - - 19/Mar/20021203
51 0100 "GET /search.html?lostsee20stran
dsyn023785ordasc HTTP/1.0" 200 1759
p3ee24304.dip.t-dialin.net - -
19/Mar/2002120506 0100 "GET
/search.html?lostsee20strandplowsyn023785or
ddesc HTTP/1.0" 200 8450 p3ee24304.dip.t-dialin.n
et - - 19/Mar/2002120641 0100 "GET
/mlesen.html?Item3456syn023785 HTTP/1.0" 200
3478
To ...
Refine search
Choose item
Search by category
Search by Categorytitle
Look at indiv- idual product
26
Ontology-based behaviour modelling basic ideas
(1)
  • Atomic application events The request for a Web
    page signals interest in the concept(s) and
    relations dealt with in this page interest in
    the obtained content as well as in the requested
    service.
  • Formally a request as a (multi)set, or as a
    vector, of concepts/relations.

27
Ontology-based behaviour modelling basic ideas
(2)
  • Composite application events Sequences, regular
    expressions, etc., that consist of atomic
    application events.
  • Ex. Spiliopoulou, Pohle und Teltzrow (Proc.
    Wirtschaftsinformatik 2002) modelled the customer
    buying process known from marketing. Depending on
    which of its phases a user passes through, and in
    which order, (s)he can be assigned to a user type
    (Moe, J. Consumer Psychology 2002).
  • Example knowledge builders

28
Basic Framework for E-Commerce Data Analysis
Web Usage and E-Business Analytics
29
Agenda
Introduction
Data Acquisition and Data Preparation
Forms of analysis mining techniques
A case study
Outlook
30
Web Usage and E-Business Analytics
  • Session Analysis
  • Static Aggregation and Statistics
  • OLAP
  • Data Mining

Different Levels of Analysis
31
Session Analysis
  • Simplest form of analysis examine individual or
    groups of server sessions and e-commerce data.
  • Advantages
  • Gain insight into typical customer behaviors.
  • Trace specific problems with the site.
  • Drawbacks
  • LOTS of data.
  • Difficult to generalize.

32
Static Aggregation (Reports)
  • Most common form of analysis.
  • Data aggregated by predetermined units such as
    days or sessions.
  • Generally gives most bang for the buck.
  • Advantages
  • Gives quick overview of how a site is being used.
  • Minimal disk space or processing power required.
  • Drawbacks
  • No ability to dig deeper into the data.

33
Online Analytical Processing (OLAP)
  • Allows changes to aggregation level for multiple
    dimensions.
  • Generally associated with a Data Warehouse.
  • Advantages Drawbacks
  • Very flexible
  • Requires significantly more resources than static
    reporting.

34
Data Mining Going deeper
Markov chains
Prediction of next event
Sequence mining
Discovery of associated events or application
objects
Association rules
Discovery of visitor groups with common
properties and interests
Clustering
Discovery of visitor groups with common behaviour
Session Clustering
Characterization of visitors with respect to a
set of predefined classes
Classification
Card fraud detection
35
KDD Techniques for Web Applications Examples (1)
  • Calibration of a Web server
  • Prediction of the next page invocation over a
    group of concurrent Web users under certain
    constraints
  • Sequence mining, Markov chains
  • Cross-selling of products
  • Mapping of Web pages/objects to products
  • Discovery of associated products
  • Association rules, Sequence Mining
  • Placement of associated products on the same page

36
KDD Techniques for Web Applications Examples (2)
  • Sophisticated cross-selling and up-selling of
    products
  • Mapping of pages/objects to products of different
    price groups
  • Identification of Customer Groups
  • Clustering, Classification
  • Discovery of associated products of the
    same/different price categories
  • Association rules, Sequence Mining
  • Formulation of recommendations to the end-user
  • Suggestions on associated products
  • Suggestions based on the preferences of similar
    users

37
Agenda
Introduction
Data Acquisition and Data Preparation
Forms of analysis mining techniques
A case study
Outlook
38
A multi-channel retailer, its business goals, and
analysis questions
  • General goals Standard e-tailer goals
    attract users/shoppers and convert them into
    customers
  • Specific goals assess the success of the Web
    site in relation to other distribution channels
  • ? Questions of the evaluation
  • What business metrics can be calculated from Web
    usage data, transaction and demographic data for
    determining online success?
  • Are there cross-channel effects between a
    companys e-shop and its physical stores?

Background Internet market shares BCG 2002
TB03,TBG03
39
Outline of the KDD process
  • Business underst. customer buying process
  • Data
  • Web server sessions, transaction info.
  • Data understanding main step
  • modelling the semantics of the site in terms of a
    hierarchy of service concepts
  • Data preparation
  • Session IDs usual data cleaning steps
  • Linking of sessions transaction information
    (anonymized)
  • Modelling / pattern discovery
  • Web metrics, cluster analysis, association rules,
    sequence mining correlation analysis,
    questionnaire study, qualitative market analysis
  • Evaluation Interesting patterns

40
Agenda Case Study
Business Understanding
Data understanding and preparation
Pattern discovery evaluation Success metrics
Pattern disc. eval. Behavioural patterns
Pattern disc. eval. User types
Pattern disc. eval. Behaviour demographics
41
Description of the site and its services
  • The retailer operates an e-shop and more than
    5000 retail shops in over 10 European countries
  • It sells a wide range of consumer electronics
  • Online customers can pay, pick-up/deliver and
    return both online and offline
  • Web pages provide for all tasks in the customer
    buying process

42
Purchase Phases (Page Concepts) at Large MC
Retailers
Home (Acquisition)
1. Acquisition (home) All Web pages that are
semantically related to the initial acquisition
of a visitor
43
Purchase Phases (Page Concepts) at Large MC
Retailers
Home (Acquisition)
Product Impression
2. Catalogue information pages providing an
overview of product categories.
44
Purchase Phases (Page Concepts) at Large MC
Retailers
Product Click-Through
Home (Acquisition)
Product Impression
3. Information product (infprod) pages
displaying information about a specific product
45
Purchase Phases (Page Concepts) at Large MC
Retailers
Offlineinfo
Home (Acquisition)
Product Click-Through
Product Impression
4. offline information (offinfo) All pages
related to any offline information store locator
(pages for finding physical stores in ones
neighbourhood), information about offline
services, offline referrers etc.
46
Purchase Phases (Page Concepts) at Large MC
Retailers
Transaction
Offlineinfo
Home (Acquisition)
Product Click-Through
Product Impression
5. transaction (transact) steps before an actual
purchase, starting with a customer entering the
order process check-out, input of customer data,
payment and delivery preferences (online or
offline), etc.
47
Purchase Phases (Page Concepts) at Large MC
Retailers
Transaction
Purchase
Offlineinfo
Home (Acquisition)
Product Click-Through
Product Impression
6. purchase indicates if a visitor completed the
transaction process and bought a product, e.g.
invocation of an order confirmation page.
48
Agenda Case Study
Business Understanding
Data understanding and preparation
Pattern discovery evaluation Success metrics
Pattern disc. eval. Behavioural patterns
Pattern disc. eval. User types
Pattern disc. eval. Behaviour demographics
49
Data and data preparation
  • Data sources and sample
  • 92,467 sessions from the companys Web logs from
    21 days in 2002
  • anonymized transaction information of 13,653
    customers who bought online over a period of 8
    months in 2001/02.
  • 621 transaction records (21 days) were linked to
    Web-usage records
  • Data preparation
  • Sessions were determined by session IDs
  • Robot visits eliminated, usual data cleaning
    steps
  • Each URL request mapped to a service concept from
    c1,...,cn
  • Session representation s w1, ...wn, with wi
    weight of ci, indicating whether or not the
    concept was visited (1/0), or how often it was
    visited
  • Customer record feature vector incl. session and
    transaction data

50
Site semantics A service concept hierarchy
760,535 page requests were mapped onto the
concepts from this hierarchy
Any
Services
Game
Offline Service and Support
Acquisition
Registration
Company Infos
Offline Referrer
Advertiser
Other
Home
Other
Transaction
Information
Fulfillment/ Service
Customer Data
Shopping Cart
Payment
Store Locator
Information Catalog
Information Product
Multi-Channel Concept
51
Types of patterns
  • Conversion rates ( confidence of
    content-specified sequential association rules)
    for assessing business success
  • Association rule and sequence analysis for
    understanding online/offline preferences and
    their temporal development
  • Cluster analysis for customer segmentation
  • Correlation analysis for investigating the
    relationship between demographic indicators and
    online/offline preferences

52
gtgt Session representation
  • Each session represented as a feature vector on
    the multi-channel concepts
  • Two methods used for definition of new conversion
    metrics
  • ? weighted-concept method (number of visits to a
    concept)
  • dichotomized concept method (whether or not
    concept was visited)

53
Agenda Case Study
Business Understanding
Data understanding and preparation
Pattern discovery evaluation Success metrics
Pattern disc. eval. Behavioural patterns
Pattern disc. eval. User types
Pattern disc. eval. Behaviour demographics
54
Life Cycle Metrics
  • Developed integrated scheme for formalizing both
    the life-cycle metrics of (Cutler, Sterne 2000)
    and the micro-conversion rates of (Lee et al.
    2001)

W (target market)
S (suspects / site visitors)
nS
nP
P (prospects / active investigators)
C (customers)
Cb (abandon cart)
nC
CR (repeat customers)
CA (attrited customers)
C1 (One time Customers)
1 M Marketing Data, C Cookies SI Session IDs,
TA Transaction Data
55
Micro Conversion Rates
Cutler and Sterne (2001)
W (whole population)
S (suspects / site visitors)
nS
P (prospects / active investigators)
nP
C (customers)
Cb (abandon cart)
nC
CR (repeat customers)
CA (attrited customers)
C1 (One time Customers)
56
Micro Conversion Rates
P
nM1 nC
M1 (saw a product impression)
nM2 nC
M2 (performed a product click through)
nM3 nC
M3 (effected a basket placement)
nM4 Cb
M4 (made a product purchase) C
57
Multi-Channel Metrics
C
WM5 (paid online)
SM5 (paid in store)
WM5 (belong to SM5 in at least one following
transaction)
WM5 (belong to WM5 in every following
transaction)
C
WM6 (direct delivery)
SM6 (pick up in store)
WM6 (belong to SM6 in at least one following
transaction)
WM6 (belong to WM6 in every following
transaction)
58
gtgt Conversion Formalization
Dichotomized concept conversion rate from concept
ci to concept cj
Weighted concept visit rate
Offline Conversion Rate (OCR)
59
gtgt Metrics Results
Time frame May 2001 to May 2002
60
Agenda Case Study
Business Understanding
Data understanding and preparation
Pattern discovery evaluation Success metrics
Pattern disc. eval. Behavioural patterns
Pattern disc. eval. User types
Pattern disc. eval. Behaviour demographics
61
Internal consistency of preferences payment
and delivery preferences
  • Online payment ? Direct delivery (s0.27, c0.97)
    lt 1/3 traditional onl.users!
  • Online payment ? In-store pickup (s0.02, c0.03)
  • Cash on delivery ? Direct delivery (s0.02,
    c0.03)
  • In-store payment ? In-store pickup (s0.69,
    c0.94)
  • ? Site is primarily used to collect information.

s support, c confidence of the sequence
62
Internal consistency of preferences return
preferences
s support, c confidence of the association rule
  • Return ? In-store (s0.06, c0.87)
  • Return ? Mail-in (s0.04, c0.13)
  • ? Customers may wish personal assistance.
  • (a result supported by the service mix analysis
    of different multi-channel retailers and by
    questionnaire results)

63
Development of preferences over time
s support, c confidence of the sequence
  • Direct delivery ? In-store pickup in ?1 following
    transaction (s0.001,c0.15)
  • Direct delivery ? Direct delivery in all
    following transactions (s0.003,c0.85)
  • In-store pickup ? Direct delivery in ?1 foll.
    transaction (s0.001, c0.10) ()
  • In-store pickup ? In-store pickup in all foll.
    transactions (s0.004, c0.90)
  • Results for payment migration are similar.
  • ? 90 of repeat customers did not change
    transaction preferences at all.
  • ? Rule () as an indicator of the development of
    trust?!

64
Agenda Case Study
Business Understanding
Data understanding and preparation
Pattern discovery evaluation Success metrics
Pattern disc. eval. Behavioural patterns
Pattern disc. eval. User types
Pattern disc. eval. Behaviour demographics
65
Market segments
Largest group visits all concepts except offline
information
Cluster centers of the weighted purchase
sessions with direct delivery preference
Cluster centers of the weighted purchase
sessions with pick-up in store preference
Tend to arrive with prior knowledge
Tend to be "true multi-channel users"
Tend to be "true online users"
66
Agenda Case Study
Business Understanding
Data understanding and preparation
Pattern discovery evaluation Success metrics
Pattern disc. eval. Behavioural patterns
Pattern disc. eval. User types
Pattern disc. eval. Behaviour demographics
67
Shop and Customer Distribution
Customers
Shops




68
Impact of demographics and of the offline
distribution channel ?!
  • A significant Pearson correlation exists between
  • the number of customers per zip code area,
    normalised by the number of residents/zip code,
    and the distance to the next store (r -0.3, p lt
    0.001).
  • number of residents/zip code and distance to
    store (r -0.01, plt0.001)

69
Agenda
Introduction
Data Acquisition and Data Preparation
Forms of analysis mining techniques
A case study
Outlook
70
Many things to do, including ...
  • Deployment of Web mining results
    personalization
  • Architectures for Web mining
  • Methodology integration for treating Web mining
    as a project

71
References
  • AAP99 R. Agarwal, C. Aggarwal, and V. Prasad.
    A tree projection algorithm for generation of
    frequent itemsets. In Proceedings of the High
    Performance Data Mining Workshop, Puerto Rico,
    1999.
  • ACR99 Ackerman, M.S., Cranor, L.F., and Reagle,
    J. Privacy in E-commerce Examining user
    scenarios and privacy preferences. In Proceedings
    of the ACM Conference on Electronic Commerce EC'9
    (Denver, CL, Nov). 1999, 1-8.
  • Adam01 Adams, Anne. Users' Perceptions of
    Privacy in Multimedia Communications. PhD Thesis,
    University College London. 2001.
    http//www.cs.mdx.ac.uk/RIDL/aadams/thesis.PDF.
    Access date 20 June 2002.
  • AE01 Antón, A.E. and Earp, J.B. (2001). A
    Taxonomy for Web Site Privacy Requirements. NCSU
    Technical Report TR-2001-14, 18December 2001.
    http//www.csc.ncsu.edu/faculty/anton/pubs/antonTS
    E.pdf. Access Date 10 July 2002.
  • AT01 Adomavicius, G. and Tuzhilin, A.,
    Expert-driven validation of rule-based user
    models in personalization applications. Data
    Mining and Knowledge Discovery, 5 ( 1 / 2),
    33-58, 2000.
  • BE98 Brusilovsky, P., and Eklund, J. (1998). A
    study of user model based link annotation in
    educational hypermedia. Journal of Universal
    Computer Science, 4 , 429-448.
  • Bel00 Belkin, N.J. (2000). Helping people find
    what they don't know. Communications of the ACM,
    43 (8), 58-61.
  • Ber02a Berendt, B. (2002). Using site semantics
    to analyze, visualize, and support navigation.
    Data Mining and Knowledge Discovery, 6, 37-59.
  • Ber02b Berendt, B. (2002b). Detail and context
    in Web usage mining coarsening and visualizing
    sequences. In R. Kohavi, B. Masand, M.
    Spiliopoulou, J. Srivastava (Eds.), Extended
    Proceedings of WEBKDD 2001 - Mining Log Data
    Across All Customer TouchPoints. Berlin etc.
    Springer, LNAI 2356.
  • BHS02 Berendt, B., Hotho, A., Stumme, G.
    (2002). Towards Semantic Web Mining. In I.
    Horrocks J. Hendler (Eds.), The Semantic Web -
    ISWC 2002 (Proceedings of the 1st International
    Semantic Web Conference, June 9-12th, 2002,
    Sardinia, Italy) (pp. 264-278). LNCS, Heidelberg,
    Germany Springer.

72
References
  • BMNS02 Berendt, B., Mobasher, B., Nakagawa, M.,
    Spiliopoulou, M. (2002). The impact of site
    structure and user environment on session
    reconstruction in Web usage analysis. In
    Proceedings of the WebKDD 2002 Workshop at KDD
    2002. July 23rd, 2002, Edmonton, Alberta, CA.
  • BMSW01 Berendt, B., Mobasher, B.,Spiliopoulou,
    M. Wiltshire, J. (2001). Measuring the accuracy
    of sessionizers for web usage analysis. In
    Proceedings of the Workshop on Web Mining at SIAM
    Data Mining Conference 2001 (pp. 7-14). Chicago,
    IL, April 2001.
  • BSM04 Berendt, B., Menasalvas, E.,
    Spiliopoulou, M. (2004). Evaluation in Web
    Mining. Tutorial at the 15th European Conference
    on Machine Learning / 8th European Conference on
    Principles and Practice of Knowledge Discovery in
    Databases (ECML/PKDD'04), Pisa, Italy, 20
    September 2004.
  • BPW96 P. Berthon, L.F. Pitt and R.T. Watson.
    The World Wide Web as an advertising medium.
    Journal of Advertising Research, 36(1), pp.
    43-54, 1996.
  • Brus97 Brusilovsky, P. (1997). Efficient
    techniques for adaptive hypermedia. In C.
    Nicholas and J. Mayfield (Eds.), Intelligent
    hypertext Advanced techniques for the World Wide
    Web, Berlin Springer. 12-30.
  • BS00 Berendt, B. Spiliopoulou, M. (2000).
    Analysing navigation behaviour in web sites
    integrating multiple information systems. The
    VLDB Journal, 9, 56-75.
  • BSH02 Berendt, B., Stumme, G., Hotho, A.
    (Eds.) (2001). Proceedings of the Workshop
    "Semantic Web Mining" at the 13th European
    Conference on Machine Learning (ECML'02) / 6th
    European Conference on Principles and Practice of
    Knowledge Discovery in Databases (PKDD'02),
    Helsinki, Finland, 20 August 2002.
    http//ecmlpkdd.cs.helsinki.fi/semwebmine-2002.htm
    l
  • BSM02 Baron, S. and Spiliopoulou, M.,
    Monitoring the results of the KDD process An
    overview of pattern evolution. In J.M. Meij (Ed.)
    Dealing with the Data Flood Mining data, text
    and multimedia. Den Haag, Chapter 5, 2002.
  • CMS99 Cooley, R., B. Mobasher, J. Srivastava.
    1999. Data preparation for mining world wide web
    browsing patterns. Journal of Knowledge and
    Information Systems 1, 5-32.
  • Cool00 Cooley, R. (2000). Web Usage Mining
    Discovery and Application of Interesting Patterns
    from Web Data.University of Minnesota, Faculty of
    the Graduate School Ph.D. dissertation.
    http//www.cs.umn.edu/research/websift/papers/rwc_
    thesis.ps
  • CPP01 Chi, E.H., Pirolli, P., Pitkow, J.E.
    (2000). The scent of a site a system for
    analyzing and predicting information scent,
    usage, and usability of a Web site. In
    Proceedings CHI 2000 (pp. 161-168).

73
References
  • CPCP01 Chi, E.-H., Pirolli, P., Chen, K.,
    Pitkow, J.E. (2001). Using information scent to
    model user information needs and actions and the
    Web. In Proceedings CHI 2001 (pp. 490-497).
  • CS00 M. Cutler and J. Sterne. E-metrics
    Business metrics for the new economy. Technical
    report, NetGenesis Corp., http//www.netgen.com/em
    etrics (access date July 22, 2001)
  • DK00 M. Deshpande and G. Karypis. Selective
    Markov models for predicting Web-page accesses.
    Technical Report 00-056, University of
    Minessota, 2000.
  • DM02 Dai, H., Mobasher, B. (2002). Using
    ontologies to discover domain-level Web usage
    profiles. In BSH02.
  • DZ97 X. Dreze and F. Zufryden. Testing web site
    design and promotional content. Journal of
    Advertising Research,37(2), pp. 77-91, 1997.
  • Eigh97 Eighmey, J. (1997). Profiling user
    responses to commercial web sites. Journal of
    Advertising Research , 37(2), 59-66.
  • Epic97 Electronic Privacy Information Center
    (1997). Surfer Beware Personal Privacy and the
    Internet. http//www.epic.org/reports/surfer-bewar
    e.html. Access Date 10 July 2002.
  • Epic99 Electronic Privacy Information Center
    (1999). Surfer Beware III Privacy Policies
    without Privacy Protection. http//www.epic.org/re
    ports/surfer-beware3.html. Access Date 10 July
    2002.
  • EU95 Directive 95/46/EC of the European
    Parliament and the Council of 24 October 1995 on
    the protection of individuals with regard to the
    processing of personal data and on the free
    movement of such data. http//europa.eu.int/comm/i
    nternal_market/en/dataprot/law/. Access date 10
    July 2002.
  • EU00 Safe Harbor Privacy Principles.
    http//europa.eu.int/eurlex/en/consleg/pdf/2000/en
    _2000D0520_do_001.pdf, http//www.ita.doc.gov/td/e
    com/menu.html, and http//www.export.gov/safeharb
    or/. Access Date 10 July 2002.

74
References
  • FBH00 X. Fu, J. Budzik, and K. J. Hammond.
    Mining navigation history for recommendation. In
    Proc. 2000 International Conference on
    Intelligent User Interfaces, New Orleans, January
    2000. ACM.
  • FGL00 J. Forsyth and T. McGuire and J. Lavoie.
    All visitors are not created equal. McKinsey
    marketing practice. McKinsey Company.
    Whitepaper. 2000.
  • Flem98 Fleming, J. (1998). Web Navigation.
    Designing the User Experience. Sebastopol, CA
    O'Reilly.
  • GS02 Garfinkel, S., with Spafford, G. (2002).
    Web Security, Privacy Commerce. 2nd Ed.
    Sebastopol, CA O'Reilly.
  • Jane99 Janetzko, D. (1999). Statistische
    Anwendungen im Internet. Daten in Netzumgebungen
    erheben, auswerten und präsentieren. München,
    Germany Addison-Wesley.
  • JFM97 T. Joachims, D. Freitag, and T. Mitchell.
    Webwatcher A tour guide for the world wide web.
    In the 15th International Conference on
    Artificial Intelligence, Nagoya, Japan, 1997.
  • JM00 Jendricke, U. and Gerd tom Markotten, D.
    Usability meets security - The Identity Manager
    as your personal security assistant for the
    Internet. In Proceedings of the 16th Annual
    Computer Security Applications Conference (New
    Orleans, LA, Dec.). 2000.
  • KNY00 Kato, H., Nakayama, T., Yamane, Y.
    (2000). Navigation analysis tool based on the
    correlation between contents distribution and
    access patterns. In Working Notes of the Workshop
    "Web Mining for E-Commerce - Challenges and
    Opportunities." 6th ACM SIGKDD Int. Conf. on
    Knowledge Discovery and Data Mining. August
    20-23, 2000. Boston, MA. pp. 95-104. Available at
    http//robotics.stanford.edu/ronnyk/WEBKDD2000/pa
    pers/kato.pdf. Access Date 10 July 2002.
  • Kuhl96 R. Kuhlen. Informationsmarkt Chancen
    und Risiken der Kommerzialisierung von Wissen.
    2nd edition, 1996 (on German)
  • LAR00 W. Lin, S.A. Alvarez, C. Ruiz.
    Collaborative recommendation via adaptive
    association rule mining. In Proceedings of the
    Web Mining for E-Commerce Workshop (WebKDD'2000),
    August 2000, Boston.

75
References
  • LHM99 B. Liu, W. Hsu, and Y. Ma. Association
    rules with multiple minimum supports. In
    Proceedings of the ACM SIGKDD International
    Conference on Knowledge Discovery Data Mining
    (KDD-99, poster), San Diego, CA, August 1999.
  • Lieb95 H. Lieberman. Letizia An agent that
    assists web browsing. In Proc. of the 1995
    International Joint Conference on Artificial
    Intelligence, Montreal, Canada, 1995.
  • LPS00 Junghoung Lee, M. Podlaseck, E.
    Schonberg, R. Hoch and S. Gomory. Analysis and
    visualization of metrics for online
    merchandizing. In "Advances in Web Usage Mining
    and User Profiling Proc. of the WEBKDD'99
    Workshop", LNAI 1836, Springer Verlag, pp.
    123-138, 2000.
  • Maye97 Mayer-Schönberger,V.1997.The Internet
    and privacy legislation Cookies for a treat?
    West Virginia Journal of Law Technology 1.
    http//www.wvu.edu/wvjolt/Arch/Mayer/Mayer.htm.
    Access Date 10 July 2002.
  • MDL00 B. Mobasher, H. Dai, T. Luo, Y. Su, and
    J. Zhu. Integrating web usage and content mining
    for more effective personalization. In E-Commerce
    and Web Technologies , volume 1875 of LNCS .
    Springer Verlag, Sept. 2000.
  • MDLN01 B. Mobasher, H. Dai, T. Luo, M.
    Nakagawa. Effective personalization based on
    association rule discovery from Web usage data.
    In Proceedings of the 3rd ACM Workshop on Web
    Information and Data Management (WIDM01), held in
    conjunction with the International Conference on
    Information and Knowledge Management (CIKM 2001),
    ACM Press, Atlanta, November 2001.
  • MDLN02 Mobasher, B., H. Dai, T. Luo, and M.
    Nakagawa 2002. Discovery and evaluation of
    aggregate usage profiles for Web personalization.
    Data Mining and Knowledge Discovery 6, 61-82.
  • Moe W. Moe. Buying, searching, or browsing
    Differentiating between online shoppers using
    in-store navigational clickstream. In Journal of
    Consumer Psychology.
  • Niel96 Nielsen, J. (1996). Top Ten Mistakes in
    Web Design. Alertbox for May 1996.
    http//www.useit.com/alertbox/9605.html. Access
    Date 10 July 2002.
  • Niel99 Nielsen, J. (1999). "Top Ten Mistakes"
    Revisited Three Years Later. Alertbox, May 2,
    1999. http//www.useit.com/alertbox/990502.html.
    Access Date 10 July 2002.

76
References
  • Niel00 Nielsen, J. (2000). Designing Web
    Usability The Practice of Simplicity. New Riders
    Publishing.
  • Niel01 Nielsen, J. (2001). Usability Metrics.
    Alertbox, January 21, 2001. http//www.useit.com/a
    lertbox/20010121.html. Access Date 10 July
    2002.
  • Obe00 Oberle, D. Semantic Community Web Portals
    - Personalization. Studienarbeit. Universität
    Karlsruhe, 2000.
  • PP99 J. Pitkow and P. Pirolli. Mining longest
    repeating subsequences to Predict WWW Surfing. In
    Proceedings of the 1999 USENIX Annual Technical
    Conference, 1999.
  • PS02 C. Pohle, M. Spiliopoulou. Building and
    exploiting ad hoc concept hierarchies for Web log
    analysis. In Proc. of DaWaK 2002, Aix en
    Provence, France, Springer Verlag, Sept. 2002.
  • PZK01 Padmanabhan,B.,Z.Zheng,S.O.Kimbrough.2001.
    Personalization from incomplete data What you
    dont know can hurt. In Proceedings of ACM SIGKDD
    International Conference on Knowledge Discovery
    and Data Mining,San Francisco,CA.154-163.
  • SA95 Srikant, R., Agrawal, R. (1995). Mining
    Generalized Association Rules. In Proceedings of
    the 21st International Conference on Very Large
    Databases (pp. 407-419). Zurich, Switzerland,
    September 1995.
  • SF99 Spiliopoulou, M., L.C. Faulstich. 1999.
    WUM a tool for Web utilization analysis. In
    Proceedings EDBT (Workshop WebDB'98), LNCS 1590,
    Berlin, Germany Springer. 184-203.
  • Spiliopoulou, M., Mobasher, B., Berendt, B.
    (2002). Web Usage Mining for E-Business
    Applications. Tutorial at the 13th European
    Conference on Machine Learning (ECML'02) / 6th
    European Conference on Principles and Practice of
    Knowledge Discovery in Databases (PKDD'02),
    Helsinki, Finland, 19 August 2002.
  • SGB01 Spiekermann, S., Grossklags, J., and
    Berendt, B. E-privacy in 2nd generation
    E-Commerce privacy preferences versus actual
    behavior. In Proceedings of the ACM Conference on
    Electronic Commerce (EC'01). (Tampa, FL, Oct.).
    2001, 38-47.
  • SH01 Shearin, S. and Liebermann, H. Intelligent
    profiling by example. In Proceedings of the ACM
    Conference on Intelligent User Interfaces (Santa
    Fe, NM, January). 2001.
  • SHB01 Stumme, G., Hotho, A., Berendt, B.
    (Eds.) (2001). Freiburg, Germany, 3 Proceedings
    of the Workshop "Semantic Web Mining" at the 12th
    European Conference on Machine Learning (ECML'01)
    / 5th European Conference on Principles and
    Practice of Knowledge Discovery in Databases
    (PKDD'01), September 2001. http//semwebmine2001.a
    ifb.uni-karlsruhe.de.
  • Shne98 Shneiderman, B. (1998). Designing User
    Interface. Strategies for Effective
    Human-Computer Interaction. 3rd edition. Reading,
    MA Addison-Wesley.

77
References
  • SMBN03 Spiliopoulou, M., Mobasher, B., Berendt,
    B., Nakagawa, M. (2003). A Framework for the
    Evaluation of Session Reconstruction Heuristics
    in Web Usage Analyis. To appear in INFORMS
    Journal on Computing, 15.
  • SP01 M. Spiliopoulou,C.Pohle. Data mining for
    measuring and improving the success of Web sites.
    In Journal of Data Mining and Knowledge
    Discovery, Special Issue on E-commerce, 5, pp.
    85114. Kluwer Academic Publishers. 2001
  • Spen99 Spendolini, M. (1999). Customer
    Measurement Systems - Opportunities for
    Improvement. White paper, MJS Associates,
    accenture CRM Portal. http//www.crmproject.com/do
    cuments.asp?d_ID753. Access Date 10 July 2002.
  • Spi99 M. Spiliopoulou. The laborious way from
    data mining to Web mining. Int. Journal of Comp.
    Sys., Sci. Eng., Special Issue on "Semantics of
    the Web", 14, pp. 113126, 1999.
  • SPT02 Spiliopoulou, M., Pohle, C., and
    Teltzrow, M. (2002). Modelling and Mining Web
    Site Usage Strategies.To appear in Proceedings of
    the Multi-Konferenz Wirtschaftsinformatik,
    Nürnberg, Germany, 9-11 September.
  • Sul97 T. Sullivan. Reading reader reaction A
    proposal for inferential analysis of web server
    log files. Proc. of the Web Conference'97, 1997.
  • TB03 Teltzrow, M., Berendt, B. (2003).
    Web-Usage-Based Success Metrics for Multi-Channel
    Businesses. In Proceedings of the WebKDD 2003
    Workshop - Webmining as a Premise to Effective
    and Intelligent Web Applications.. August 27th,
    2003, Washington DC, USA. Held in conjunction
    with The Ninth ACM SIGKDD International
    Conference on Knowledge Discovery and Data
    Mining.
  • TBG03 Teltzrow, M., Berendt, B., Günther, O.
    (2003). Consumer behaviour at multi-channel
    retailers. In Proceedings of the 4th IBM
    eBusiness Conference, School of Management,
    University of Surrey, 9th December 2003.
  • Trus00 TrustE. (2000). TrustE Online Privacy
    Resource Book. http//www.truste.org/about/oprah.d
    oc. Access Date 10 July 2002.
  • Usab99 The Usability Group. (1999). What is
    Strategic Usability? http//usability.com/umi_what
    .htm. Access Date 10 July 2002.
  • Volo00 Volokh, E. (2000). Personalization and
    privacy. Communications of the ACM, 43(8), 84-88.
  • WB90 Warren, S. and Brandeis, L. The right of
    privacy. Harvard Law Review, 4, 193.
  • West67 Westin, A. (1967). Privacy and Freedom.
    Boston Atheneum Press.
  • W3C00 W3C. The Platform for Privacy Preferences
    1.0 (P3P1.0) Specification. http//www.w3.org/TR/2
    000/CR-P3P-20001215 and http//www.w3.org/TR/P3P.
    Access Date 10 July 2002.
Write a Comment
User Comments (0)
About PowerShow.com