Architecture for graphical maps of Web contents - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Architecture for graphical maps of Web contents

Description:

Architecture for graphical maps of Web contents Krzysztof Ciesielski, Michal Draminski, Mieczyslaw Klopotek, Mariusz Kujawiak, Slawomir Wierzchon – PowerPoint PPT presentation

Number of Views:126
Avg rating:3.0/5.0
Slides: 25
Provided by: ipipanWa8
Category:

less

Transcript and Presenter's Notes

Title: Architecture for graphical maps of Web contents


1
Architecture for graphical maps of Web contents
  • Krzysztof Ciesielski, Michal Draminski,
    Mieczyslaw Klopotek, Mariusz Kujawiak, Slawomir
    Wierzchon
  • Institute of Computer Science, PAS, Warsaw
  • University of Podlasie, Siedlce
  • Bialystok University of Technology

2
Agenda
  • Motivation
  • Architecture
  • Map interface
  • Map creation
  • Map clustering
  • Execution time of map creation
  • Convergence of map creation
  • Future direction

3
Motivation
  • the Web and also intranets become increasingly
    content-rich
  • a good way of presenting massive document sets in
    an understandable way will be crucial in the near
    future.
  • The BEATCA project envisages creation of a
    user-friendly content presentation of moderate
    size document collections (with millions of
    documents).

4
Our approach
  • The presentation method is based on the WebSOM's
    map idea and is enriched with novel methods of
    document analysis, clustering and visualization.
  • A special architecture has been elaborated to
    enable experiments with various brands of map
    creation algorithm.
  • Our research targets at creation of a
    full-fledged search engine (with working name
    Beatca) for small collections of documents
    capable of representing on-line replies to
    queries in graphical form on a document map.

5
Architecture
  • We follow the general architecture for search
    engines,
  • the preparation of documents for retrieval is
    done by an indexer, which turns the HTML etc.
    representation of a document into a vector-space
    model representation,
  • the map creator is applied, turning the
    vector-space representation into a form
    appropriate for on-the-fly map generation,
  • Maps are used by the query processor responding
    to user's queries.

6
Architecture
..................
Base Registry
Search Engine
Indexer
Optimizer
Mapper
Vector Base
Robot
Map
HT Base
Indexer
Mapper
Optimizer
Vector Base
Map
..................
..................
..................
HT Base
7
User interface
  • Search results are presented on a document map
  • The map can have one of two forms
  • The traditional flat map
  • The rotating torus

8
(No Transcript)
9
Rotating torus representation of the map
10
How are the maps created
  • A modified WebSOM method is used
  • Based on our observation of radical reduction of
    document vector variation
  • Multi-level maps

11
A map for 20 newsgroups
12
A detailed map for SyskillWebert 4 document
groups
13
A high level map for SyskillWebert 4 document
groups
14
Clustering groups documents
  • A fuzzy isodata method used
  • Entropy based
  • Initialisation with Minimum weight spanning tree
  • Clustered documents are labeled by weighed
    centroids of cell reference vectors modified with
    entropy

15
Approximate clustering using minimal spanning
tree for 5 newsgroups
16
Label candi-datesfor clusters(5 news-groups)
Word Rank Cluster 1 sci.math Cluster 2 sci.med / sci.math Cluster 3 talk. religion misc (a) Cluster 4 soc. culture. israel Cluster 5 comp. windows.x Cluster 6 talk. religion misc (b)
1 die cipher men israel boot funding
2 probable block raped palestinian windows study
3 theory stream women gun files taxes
4 registers key children aziz menus stock
5 mathematics otp child iraqis lib health
6 equation algorithms sex koppel icon market
7 kr hsm soc israeli label social
8 cos simon father jews folder mercer
9 sequence combinations paternity resolution msvcrtd governing
10 tex shen feminist oliver pcr vaccinations
11 space distinction trolling utah daffyd measurement
12 gravitational encryption white johnc shortcut ss
13 wave epimethius lib nra netzero duke
14 latex randomness england 1991 obj quantum
15 pdf smartcard support firearms tab jama
16 mac entropy woman settlements kernel hopems
17 files yahoo black palestine duck bushes
18 israel ici brother permitted installed computer
19 debt model chat gis backup companies
20 unsigned lottery media iraq desktop diabetes
17
Experiments with execution time
  • The impact of the following factors on the speed
    o9f map creation was investigated
  • Map size
  • Optimization method
  • Dictionary optimization (extreme entropy and
    extreme frequency)
  • Reference vector optimization

18
(No Transcript)
19
(No Transcript)
20
Convergence
  • We checked the convergence of the maps to a
    stable state depending on
  • Type of alpha function (search radius reduction)
  • Type of winner search method

21
(No Transcript)
22
(No Transcript)
23
(No Transcript)
24
Future research
  • We intend to integrate Bayesian and immune system
    methodologies with WebSOM in order to achieve new
    clustering effects.
  • Bayesian networks will be applied in particular
    to classify documents, to accelerate document
    clustering processes, to construct a thesaurus
    supporting query enrichment, and to keyword
    extraction.
  • Immuno-genetic systems will be used for adaptive
    document clustering by referring to the mechanism
    of so-called metadynamics, for extraction of
    compact characteristics of document groups by
    exploitation of the mechanism of construction of
    universal and specialized antibodies , and for
    visualisation and adjustment of resolution of
    document maps.

25
Thank you
  • Any questions?
Write a Comment
User Comments (0)
About PowerShow.com