Towards Adaptive Websites: A Conceptual Framework and Case Study Mike PerkowitzOren Etzioni - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Towards Adaptive Websites: A Conceptual Framework and Case Study Mike PerkowitzOren Etzioni

Description:

Towards Adaptive Websites: A Conceptual Framework and Case Study Mike Perkowitz Oren Etzioni Presented By Ben Childs James Hunter Sergey Petrov Issam Souilah – PowerPoint PPT presentation

Number of Views:140
Avg rating:3.0/5.0
Slides: 26
Provided by: iss79
Category:

less

Transcript and Presenter's Notes

Title: Towards Adaptive Websites: A Conceptual Framework and Case Study Mike PerkowitzOren Etzioni


1
Towards Adaptive Websites A Conceptual
Framework and Case StudyMike Perkowitz Oren
Etzioni
  • Presented By
  • Ben Childs James Hunter
  • Sergey Petrov Issam Souilah

Department of Computer Science, November 2005
2
Agenda
  • Introduction
  • Paper Outline
  • Motivation
  • What are Adaptive Websites?
  • Approaches to Adaptation
  • The Index Page Synthesis Use Case
  • The PageGather Algorithm
  • Description of The Algorithm
  • Experimental Method
  • Time Complexity
  • Comparison with Related Algorithms
  • The IndexFinder Algorithm
  • Conceptual Cluster Mining
  • Experiments
  • Implementations
  • Conclusions
  • Related Work
  • Summary
  • Resources

3
Paper Outline
  • Published 1999 (latest version 2001)
  • Explores adaptive web sites
  • Describes the design space of adaptive web sites
  • Considers a case study index page synthesis
  • Presents two algorithms
  • PageGather a statistical cluster mining
    algorithm
  • IndexFinder a conceptual cluster mining algorithm

4
Motivation
  • Designing a complex web site so it readily yields
    its information is tricky, because
  • Different visitors have distinct goals
  • Same user may seek different information at
    different times
  • Many sites outgrow their original design,
    accumulating links and pages in unlikely places
  • A site may be designed for a particular use, but
    may be used in unanticipated ways in practice
  • Too often, web sites are fossils cast in HTML,
    while web navigation is dynamic, time-dependent,
    and idiosyncratic

5
What Are Adaptive Websites?
  • Adaptive websites are sites that automatically
    improve their organization and presentation by
    learning from visitor access patterns
  • They mine the data buried in web server logs to
    produce more easily navigable websites
  • To demonstrate the feasibility of adaptive
    websites, the index page synthesis use case is
    considered

6
Approaches to Adaptation
  • Aim is to make a website better, so we need a
    clear quality measure
  • Quality measure as a function of variables
  • How often users find what they are looking for
  • How many clicks users have to make to get to
    their goal
  • How much time users spend reading link text and
    scrolling through pages
  • Two approaches to adaptation
  • Content-based organizes and presents pages
    based on their content.
  • Access-based uses the way past visitors have
    interacted with the site to guide how information
    is structured.
  • Content-based and access-based adaptations are
    complementary and may be used together

7
The Index Page Synthesis Case Study (1)
  • Page synthesis is the automatic creation of web
    pages
  • An index page is a page consisting of links to a
    set of pages that cover a particular topic
  • Index page synthesis problem given a web site
    and a visitor access log, create new index pages
    containing collections of links to related but
    currently unlinked pages

8
The Index Page Synthesis Case Study (2)
  • The Index Page Synthesis Problem
  • What are the contents (i.e. hyperlinks) of the
    index page?
  • How are the hyperlinks on the page ordered?
  • How are the hyperlinks labeled?
  • What is the title of the page? Does it correspond
    to a coherent concept?
  • Is it appropriate to add the page to the site? If
    so, where?

9
Solutions
  • 2 Algorithms have been suggested by the authors
    of the paper
  • PageGather
  • IndexFinder

10
The PageGather Algorithm
  1. The PageGather algorithm is a statistical cluster
    mining algorithm
  2. Clustering algorithms take a collection of
    objects as their input and produce a partition of
    the collection
  3. Cluster mining is a variation on traditional
    clustering that may place a single object in
    multiple overlapping clusters
  4. PageGather uses cluster mining to find
    collections of related pages at a website

11
Description of PageGather
  1. Process the access log into visits
  2. Compute the co-occurrence frequencies between
    pages and create a similarity matrix
  3. Create the graph corresponding to the matrix, and
    find maximal cliques (or connected components) in
    the graph
  4. Rank the clusters found, and choose which to
    output
  5. Eliminate overlap among the clusters
  6. Present it to the webmaster for evaluation

12
Experimental Method
  • Experiments draw on data collected from three
    distinct collections of web pages
  • The effectiveness of index page synthesis is
    based on three factors
  • Impact How many people use the new pages and how
    often
  • Benefit How much effort is saved by those who
    visit the pages
  • Recall How much information sought by the user
    was actually found

13
Time Complexity
  • What is the running time of PageGather?
  • Let L be the number of page views in the log and
    N the number of pages at the site
  • Step (1) requires O(L log L) time page views
    must be sorted by origin and time
  • Step (2) requires O(L N2) time must process
    the log and create a matrix of size O(N2)
  • In step (3) we may find either connected
    components (linear in the size of the graph) or
    cliques (exponential in general, but since size
    of discovered clusters is bound to k, this step
    is a polynomial of degree k)

14
Comparison with related algorithms
  • PageGather significantly outperforms other
    statistical clustering algorithms, but is not as
    well as human-authored clusters

15
The IndexFinder Algorithm
  • PageGather relies on a statistical approach to
    discovering candidate link sets its candidates
    do not correspond precisely to intuitive
    concepts, whereas human-authored index pages do
  • An algorithm that finds only candidate link sets
    that are conceptually coherent is desired
  • IndexFinder is a key extension to PageGather that
    guarantees that only sets corresponding to topics
    are generated

16
IndexFinder - Problem Definition
  • Given
  • A data collection D (e.g., a set of pages at a
    web site)
  • A pairwise similarity measure m defined over D
    (e.g., page co-occurrence frequencies derived
    from access logs)
  • A conceptual language L for describing elements
    of D (e.g., conjunctions of descriptive features)
  • A description in L of each object in D
  • Output all subsets c of D such that
  • c is highly cohesive with respect to m (e.g., the
    average pairwise similarity of the objects in c
    exceeds some threshold)
  • c corresponds to a concept expressible in L

17
IndexFinder Previous work
  • Relevant previous work of three types
  • Statistical approaches (e.g., PageGather) useful
    for finding cohesive sets in large collections of
    data, but make no attempt to ensure that their
    results correspond to an intuitive concept
  • Conceptual clustering algorithms (e.g., Fishers
    COBWEB 4) partition a data collection into
    clusters of similar objects. Moreover, objects
    are described in a conceptual descriptive
    language
  • Concept learning algorithms aim to find a
    conceptual description of a set of objects from a
    data collection (note that data needs to be
    classified in advance)

18
Experiments
  • Experiments show that IndexFinder outperforms
    both PageGather and COBWEB and is close to the
    performance of the human-authored index pages

19
Implementations
  • And More
  • Use both users path and model to guess what
    pages they are interested in seeing e.g., AVANTI
    Project 1
  • Automatic user categorization
  • Hybrid approach
  • Footprints 2 uses the metaphor of travellers
    creating footpaths in the grass over time
  • Using meta-information e.g., XML, Apples
    Meta-Content Format, STRUDEL 3
  • Client-side customization

20
Conclusions (1)
  • PageGather and IndexFinder outperform traditional
    methods including the Apriori data mining
    algorithm, standard clustering algorithms and the
    COBWEB conceptual clustering algorithm
  • PageGather and IndexFinder are instances of
    novel, domain-independent approaches to
    unsupervised data mining
  • Extensions and applications to these approaches
    outside the domain of adaptive websites can be
    found

21
Conclusions (2)
  • Future work may focus on the automatic placement
    of new index pages at the website
  • Automatically suggesting names for the new pages,
    and deciding where in the site they should be
    located
  • Index page synthesis itself is a step towards the
    long-term goal of change in view adaptive
    websites that automatically suggest
    re-organisations of their contents based on
    visitor access patterns

22
Related Work
  • By the authors
  • Mainly updates to the original paper (most recent
    one in 2001)
  • By others
  • Adaplix 5 A system that extends HTML by
    introducing conditional statements and an
    inductive logic programming component to learn
    the user's browsing preferences
  • WebWatcher 6 A tour guide of the web. It
    accompanies the user from page to page,
    highlighting hyperlinks that it believes will be
    of interest

23
Summary
  • We have covered
  • Adaptive Websites
  • The Index Page Synthesis Use Case
  • The PageGather Algorithm
  • The IndexFinder Algorithm
  • Implementations
  • Related Work

24
Any Questions?
25
Resources
  • 1 J. Fink, A Kobsa, and A. Nill. User-oriented
    Adaptivity and Adaptability in the AVANTI
    Project. In Designing for the Web Empirical
    Studies, Microsoft Usability Group, Redmond
    (WA)., 1996.
  • 2 A. Wexelblat and P. Maes. Footprints
    History-rich web browsing. In Proc. Conf.
    Computer-Assisted Information Retrieval (RIAO),
    pages 75-84, 1997.
  • 3 M. Fernandez, D. Florescu, J. Kang, A. Levy,
    and D. Suciu. System Demonstration - Strudel A
    Web-site Management System. In ACM SIGMOD
    Conference on Management of Data, 1997.
  • 4 D. Fisher. Knowledge Acquisition Via
    Incremental Conceptual Clustering. Machine
    Learning, 2139-172, 1987
  • 5 Nico Jacobs. Adaplix Towards Adaptive
    Websites. In P. De Bra and L. Hardman, editors,
    Proceedings van de Informatiewetenschap'99
    Conferentie, pages 22--28. Eindhoven University
    of Technology, November 1999
  • 6 URL http//www.cs.cmu.edu/webwatcher,
    accessed on 22 November 2005
Write a Comment
User Comments (0)
About PowerShow.com