Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails - PowerPoint PPT Presentation

About This Presentation
Title:

Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails

Description:

by email) 3: Allow the Memex. client to attach to. your Web browser. 4: Log on to the ... Personalization picking Yahoo nodes. Complex relations between topics ... – PowerPoint PPT presentation

Number of Views:97
Avg rating:3.0/5.0
Slides: 47
Provided by: CFI9
Category:

less

Transcript and Presenter's Notes

Title: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails


1
Memex A Browsing Assistant forCollaborative
Archiving andMining of Surf Trails
  • Soumen ChakrabartiSandeep SrivastavaMallela
    SubramanyamMitul Tiwari
  • Indian Institute of Technology Bombay

2
Sources of Web information
  • Sources already exploited
  • Text on pages (keyword search)
  • Link between pages (popularity rating)
  • Topic taxonomies (query expansion)
  • Sources not exploited enough yet
  • Public surfing history
  • Public bookmarks
  • Collaboration is central to hypertext
  • Lack of trust limits collaboration on Web

3
Our goals
  • Infrastructure to support spontaneous formation
    of topic-based collaborative Web communities
  • Browsing assistant client
  • Community server
  • Mining algorithms for personal and community
    level topic management and collaborative resource
    discovery
  • Extensible API for plugging in additional
    hypertext analysis tools

4
1 Create a Memex account (password sent by email)
5
Function tabs
Memex client applet attaches to browser
Privacy choice
6
Preparing to import initial bookmarks
7
Bookmarks imported
8
For Memex to suggest an initial topic
organization, select all bookmarks
9
and send them to the clustering tab
10
Switch to the clustering tab
URLs to be clustered appear here
11
Submit the URLs to the server-side Memex
clustering demon
12
Check later if the server has completed the
clustering task
13
Two top-level clusters about software and music
14
Expanding the software cluster to study it
in more detail
15
User can freely reorganize URL placement
using cut-and-paste
16
User can freely reorganize URL placement
using cut-and-paste
17
User can freely reorganize URL placement
using cut-and-paste
18
Moving an entire folder from the cluster tab
19
to the folder tab together with example URLs
20
to the folder tab together with example URLs
21
Folder names can be edited as per taste
this also gives Memex additional clues about the
folders contents
22
New folders can be created to hold clusters found
in the cluster tab
23
New folders can be created to hold clusters found
in the cluster tab
24
A topic hierarchy which is too detailed for the
user can be flattened
25
A topic hierarchy which is too detailed for the
user can be flattened
26
Groups of closely related URLs can be moved
back to folders in the folder tab
27
Groups of closely related URLs can be moved
back to folders in the folder tab
28
Memex helps the user derive a starting topic
hierarchy from unstructured bookmarks
29
The user then continues browsing in multiple
sessions. Relevant pages found by other members
of the community and made public are
available for collaborative surfing
30
If permission is granted, the Memex applet
monitors the trail that the surfer follows
and uploads it to the server for further analysis
and mining
31
If permission is granted, the Memex applet
monitors the trail that the surfer follows
and uploads it to the server for further analysis
and mining
32
Such surf trails together with page contents are
valuable inputs to the Memex server-side hypertext
mining and resource discovery demons
33
? indicates that Memex is not sure about the
folder assignment. Users can easily correct
mistakes and this forms additional valuable
training data.
In the background, the Memex classifier finds the
most suitable folders to assign to each
history items. History is never deleted (disk is
cheap). When the user refreshes the view, surf
history from others and herself are found
categorized into the users familiar topic tree.
34
Automatic collaborative classification also lets
users return to a topic-restricted surfing
context quickly, and replay the last few
surfing actions within that topic of interest.
35
Personalized topic-based history management is
far superior to the one- dimensional history
list provided by popular browsers
36
Users can switch topics with a single click, and
browsing is not limited by the linear back and
forward paradigm supported by browsers.
37
Users can switch topics with a single click, and
browsing is not limited by the linear back and
forward paradigm supported by browsers.
38
A flexible interactive search lets the user
locate any page ever visited from anywhere using
this account, combining content with popularity,
site selections and timeliness
39
A flexible interactive search lets the user
locate any page ever visited from anywhere using
this account, combining content with popularity,
site selections and timeliness
40
Close integration of the Memex client with
the browser is non-trivial to implement but adds
greatly to comfort and ease of use
41
Memex system diagram
Browser
Memex server
Visit
Client JAR
Taxonomy synthesis
Resource discovery
Search
Attach
Recommendation
Folder
Download
Context
Classification
Mining demons
Running client applet
Event-handler servlets
Archive
Clustering
Relational metadata
Text index
Topic models
Memex client-server protocol and workload sharing
negotiations
42
Document workflow
Page visit and bookmarking events logged
NODE table
Browser
Memex client
Push new version
Per-document version queue
Crawler
Pop and discard old version
Demon Registry
Search indexer
Classifier service
Clustering service
Garbage collector
43
Autonomous topic organization
  • Bookmarks often collected into topics
  • Surfers use personal topic organization
  • One-size-fits all taxonomy inadequate
  • Many topics over-developed for most of us
  • http//dmoz.org/Sports/Hockey/Underwater_Hockey/
  • But deeper interests often underdeveloped
  • Structure reorganization also desirable
  • Best taxonomy depends on community behavior as
    well as page content

44
Autonomy and collaboration
  • Personalization ? picking Yahoo nodes
  • Complex relations between topics
  • Need simplest common ground
  • Coalesce similar topics where possible
  • without sacrificing individual taste

User2
User1
User3
Yahoo
Cycling
Sports
Biz
Sports
Sports
Shops
Hiking
Cycling
Bikeshops
Bikeshops
Subsumption
Tree inversion
45
Taxonomy synthesis example
Media
kpfa.org
bbc.co.uk
kron.com
Broadcasting
channel4.com
kcbs.com
Entertainment
foxmovies.com
miramax.com
Studios
lucasfilms.com
  • Generating themes makes map simpler
  • But distorts contents of original folders
  • Joint optimization gives best themes

46
Summary and project status
  • Collaborative resource discovery and topic
    management system
  • Testbed for hypertext mining research
  • Signed Java2 client
  • Netscape 4.5 available
  • IE5 planned
  • Server for Unix and Windows
  • IBM UDB, Berkeley DB, servlets
  • Non-trivial to install and manage
  • Simple-to-use RPMs being planned
  • http//www.cse.iitb.ernet.in/soumen
Write a Comment
User Comments (0)
About PowerShow.com