Towards Adaptive Websites: A Conceptual Framework and Case Study Mike PerkowitzOren Etzioni - PowerPoint PPT Presentation

1 / 25

About This Presentation

Title:

Towards Adaptive Websites: A Conceptual Framework and Case Study Mike PerkowitzOren Etzioni

Description:

Towards Adaptive Websites: A Conceptual Framework and Case Study Mike Perkowitz Oren Etzioni Presented By Ben Childs James Hunter Sergey Petrov Issam Souilah – PowerPoint PPT presentation

Number of Views:140

Avg rating:3.0/5.0

Slides: 26

Provided by: iss79

Category:

more less

Transcript and Presenter's Notes

Title: Towards Adaptive Websites: A Conceptual Framework and Case Study Mike PerkowitzOren Etzioni

1
Towards Adaptive Websites A Conceptual
Framework and Case StudyMike Perkowitz Oren
Etzioni

Presented By
Ben Childs James Hunter
Sergey Petrov Issam Souilah

Department of Computer Science, November 2005
2
Agenda

Introduction
Paper Outline
Motivation
What are Adaptive Websites?
Approaches to Adaptation
The Index Page Synthesis Use Case
The PageGather Algorithm
Description of The Algorithm
Experimental Method
Time Complexity
Comparison with Related Algorithms
The IndexFinder Algorithm
Conceptual Cluster Mining
Experiments
Implementations
Conclusions
Related Work
Summary
Resources

3
Paper Outline

Published 1999 (latest version 2001)
Explores adaptive web sites
Describes the design space of adaptive web sites
Considers a case study index page synthesis
Presents two algorithms
PageGather a statistical cluster mining
algorithm
IndexFinder a conceptual cluster mining algorithm

4
Motivation

Designing a complex web site so it readily yields
its information is tricky, because
Different visitors have distinct goals
Same user may seek different information at
different times
Many sites outgrow their original design,
accumulating links and pages in unlikely places
A site may be designed for a particular use, but
may be used in unanticipated ways in practice
Too often, web sites are fossils cast in HTML,
while web navigation is dynamic, time-dependent,
and idiosyncratic

5
What Are Adaptive Websites?

Adaptive websites are sites that automatically
improve their organization and presentation by
learning from visitor access patterns
They mine the data buried in web server logs to
produce more easily navigable websites
To demonstrate the feasibility of adaptive
websites, the index page synthesis use case is
considered

6
Approaches to Adaptation

Aim is to make a website better, so we need a
clear quality measure
Quality measure as a function of variables
How often users find what they are looking for
How many clicks users have to make to get to
their goal
How much time users spend reading link text and
scrolling through pages
Two approaches to adaptation
Content-based organizes and presents pages
based on their content.
Access-based uses the way past visitors have
interacted with the site to guide how information
is structured.
Content-based and access-based adaptations are
complementary and may be used together

7
The Index Page Synthesis Case Study (1)

Page synthesis is the automatic creation of web
pages
An index page is a page consisting of links to a
set of pages that cover a particular topic
Index page synthesis problem given a web site
and a visitor access log, create new index pages
containing collections of links to related but
currently unlinked pages

8
The Index Page Synthesis Case Study (2)

The Index Page Synthesis Problem
What are the contents (i.e. hyperlinks) of the
index page?
How are the hyperlinks on the page ordered?
How are the hyperlinks labeled?
What is the title of the page? Does it correspond
to a coherent concept?
Is it appropriate to add the page to the site? If
so, where?

9
Solutions

2 Algorithms have been suggested by the authors
of the paper
PageGather
IndexFinder

10
The PageGather Algorithm

The PageGather algorithm is a statistical cluster
mining algorithm
Clustering algorithms take a collection of
objects as their input and produce a partition of
the collection
Cluster mining is a variation on traditional
clustering that may place a single object in
multiple overlapping clusters
PageGather uses cluster mining to find
collections of related pages at a website

11
Description of PageGather

Process the access log into visits
Compute the co-occurrence frequencies between
pages and create a similarity matrix
Create the graph corresponding to the matrix, and
find maximal cliques (or connected components) in
the graph
Rank the clusters found, and choose which to
output
Eliminate overlap among the clusters
Present it to the webmaster for evaluation

12
Experimental Method

Experiments draw on data collected from three
distinct collections of web pages
The effectiveness of index page synthesis is
based on three factors
Impact How many people use the new pages and how
often
Benefit How much effort is saved by those who
visit the pages
Recall How much information sought by the user
was actually found

13
Time Complexity

What is the running time of PageGather?
Let L be the number of page views in the log and
N the number of pages at the site
Step (1) requires O(L log L) time page views
must be sorted by origin and time
Step (2) requires O(L N2) time must process
the log and create a matrix of size O(N2)
In step (3) we may find either connected
components (linear in the size of the graph) or
cliques (exponential in general, but since size
of discovered clusters is bound to k, this step
is a polynomial of degree k)

14
Comparison with related algorithms

PageGather significantly outperforms other
statistical clustering algorithms, but is not as
well as human-authored clusters

15
The IndexFinder Algorithm

PageGather relies on a statistical approach to
discovering candidate link sets its candidates
do not correspond precisely to intuitive
concepts, whereas human-authored index pages do
An algorithm that finds only candidate link sets
that are conceptually coherent is desired
IndexFinder is a key extension to PageGather that
guarantees that only sets corresponding to topics
are generated

16
IndexFinder - Problem Definition

Given
A data collection D (e.g., a set of pages at a
web site)
A pairwise similarity measure m defined over D
(e.g., page co-occurrence frequencies derived
from access logs)
A conceptual language L for describing elements
of D (e.g., conjunctions of descriptive features)
A description in L of each object in D
Output all subsets c of D such that
c is highly cohesive with respect to m (e.g., the
average pairwise similarity of the objects in c
exceeds some threshold)
c corresponds to a concept expressible in L

17
IndexFinder Previous work

Relevant previous work of three types
Statistical approaches (e.g., PageGather) useful
for finding cohesive sets in large collections of
data, but make no attempt to ensure that their
results correspond to an intuitive concept
Conceptual clustering algorithms (e.g., Fishers
COBWEB 4) partition a data collection into
clusters of similar objects. Moreover, objects
are described in a conceptual descriptive
language
Concept learning algorithms aim to find a
conceptual description of a set of objects from a
data collection (note that data needs to be
classified in advance)

18
Experiments

Experiments show that IndexFinder outperforms
both PageGather and COBWEB and is close to the
performance of the human-authored index pages

19
Implementations

And More
Use both users path and model to guess what
pages they are interested in seeing e.g., AVANTI
Project 1
Automatic user categorization
Hybrid approach
Footprints 2 uses the metaphor of travellers
creating footpaths in the grass over time
Using meta-information e.g., XML, Apples
Meta-Content Format, STRUDEL 3
Client-side customization

20
Conclusions (1)

PageGather and IndexFinder outperform traditional
methods including the Apriori data mining
algorithm, standard clustering algorithms and the
COBWEB conceptual clustering algorithm
PageGather and IndexFinder are instances of
novel, domain-independent approaches to
unsupervised data mining
Extensions and applications to these approaches
outside the domain of adaptive websites can be
found

21
Conclusions (2)

Future work may focus on the automatic placement
of new index pages at the website
Automatically suggesting names for the new pages,
and deciding where in the site they should be
located
Index page synthesis itself is a step towards the
long-term goal of change in view adaptive
websites that automatically suggest
re-organisations of their contents based on
visitor access patterns

22
Related Work

By the authors
Mainly updates to the original paper (most recent
one in 2001)
By others
Adaplix 5 A system that extends HTML by
introducing conditional statements and an
inductive logic programming component to learn
the user's browsing preferences
WebWatcher 6 A tour guide of the web. It
accompanies the user from page to page,
highlighting hyperlinks that it believes will be
of interest

23
Summary

We have covered
Adaptive Websites
The Index Page Synthesis Use Case
The PageGather Algorithm
The IndexFinder Algorithm
Implementations
Related Work

24
Any Questions?
25
Resources

1 J. Fink, A Kobsa, and A. Nill. User-oriented
Adaptivity and Adaptability in the AVANTI
Project. In Designing for the Web Empirical
Studies, Microsoft Usability Group, Redmond
(WA)., 1996.
2 A. Wexelblat and P. Maes. Footprints
History-rich web browsing. In Proc. Conf.
Computer-Assisted Information Retrieval (RIAO),
pages 75-84, 1997.
3 M. Fernandez, D. Florescu, J. Kang, A. Levy,
and D. Suciu. System Demonstration - Strudel A
Web-site Management System. In ACM SIGMOD
Conference on Management of Data, 1997.
4 D. Fisher. Knowledge Acquisition Via
Incremental Conceptual Clustering. Machine
Learning, 2139-172, 1987
5 Nico Jacobs. Adaplix Towards Adaptive
Websites. In P. De Bra and L. Hardman, editors,
Proceedings van de Informatiewetenschap'99
Conferentie, pages 22--28. Eindhoven University
of Technology, November 1999
6 URL http//www.cs.cmu.edu/webwatcher,
accessed on 22 November 2005