Visualization of Web Search Results - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Visualization of Web Search Results

Description:

IV joins the human's capacity of visual thinking and the computer's capacity of ... Thumbnails. Increasing level of detail. 27. Synchronized Alternative Visualizations ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 35
Provided by: joey49
Category:

less

Transcript and Presenter's Notes

Title: Visualization of Web Search Results


1
Visualization of Web Search Results
  • Zhigang Li
  • A Web Technology Presentation, Oct 2001, UTD

2
Outlines
  • The Challenge An Exploding Web
  • Web Directories
  • Examples and demos
  • Advanced presentation techniques
  • Search Engines
  • Spiders
  • Page relevance and ranking
  • Metasearch
  • Major tools
  • Visualizing Search Results
  • Factors for choosing visualizations
  • Synchronized alternative visualization
  • Examples and demos

3
Types of Search Services
  • Subject Directory
  • Search Engine
  • Metasearch

Information Visualization (IV)
  • The use of computer-supported, interactive,
    visual representations of abstract data to
    amplify cognition.
  • IV joins the humans capacity of visual thinking
    and the computers capacity of analytical
    computing, thereby building a bi-directional
    visual and interactive interface between human
    user and the information resources.

4
The Big Picture An Exploding Web
  • 9.6 million web servers as of Dec 1999
  • 72.4 million web sites as of Jan 2000
  • 275 million people online as of Mar 2000
  • 800 million publicly indexable pages
  • 180 million images
  • 30 web pages are copied or mirrored
  • 1 billion hyperlinks

5
The Challenge
  • Huge number
  • No single search engine indexes more than 16 of
    web sites
  • All search engines combined covering only 42
  • Extreme heterogeneity
  • Variable information value
  • Variable length
  • Often containing grammatical mistakes and typos
  • Content may be outdated, false, or unreliable
  • Multiple data formats
  • Multiple languages and alphabets
  • Need for speed
  • 15,000 20,000 search queries requested per
    minute

6
History
  • Archie 1990
  • First Internet search engine
  • Directory service of anonymous FTP files
  • WWW Wander 1993
  • First gathered Internets content
  • Aliweb 1993
  • Index web pages on a server, which must register
    with Aliweb
  • Simple retrieval program to search collected
    indices
  • JumpStation 1993
  • Used a web robot (spider) to gather information
  • Used exhaustive match to retrieve pages
  • RBSE Spider 1993
  • First to implement ranked-relevance retrieval
  • Used WAIS (Wide-area Indexing Service)

7
Subject Trees (Directories)
Small coverage of the web, high quality of web
links
8
Subject Trees Inadequate UI
  • How to find Yahoo.com from Yahoos listing
  • Visit http//www.yahoo.com
  • Select Computers and Internet from the 14 main
    categories
  • Select Internet from the 24 subcategories
  • Select World Wide Web from the 44
    sub-subcategories
  • Select Searching the Web from the 36
    sub-sub-subcategories
  • Select Search Engines and Directories from the
    9 subcategories
  • Find Yahoo there (out of 200 peers).
  • Problems
  • User cant view the whole information structure
  • Navigation is a slow process
  • Easily get lost

9
New Interfaces Hyperbolic Tree
  • Designed for exploring hierarchical data
  • Focus context
  • Intuitive and interactive
  • A Xerox PARC invention

10
New Interfaces Web Map
  • Multi-level visual directory of 2 million web
    sites
  • Hierarchical categories are represented by
    irregular polygons
  • Zooming brings more details into view
  • Websites shown as symbols
  • The closer that any two documents or categories
    are in terms of content, the closer they appear
    on the map
  • Topography represents rating

11
New Interface Mapuccino
  • Multiple layout scheme of tree structure,
    including fisheye
  • Nodes represent HTML pages
  • An IBM product
  • Zooming / panning

12
Search Engines
Databases of the Internet content
13
Sizes of Search Engines
GGGoogle, FASTFAST, AVAltaVista, INKInktomi,
WTWebTop.com, NLNorthern Light,  EXExcite
14
Spiders for Search Engines
  • Where to explore next?
  • Depth-first high load on servers
  • Breath-first favors smaller web servers
  • Best-first based on popularity heuristic
  • What information to keep?
  • Titlesheaders vs. whole document
  • Manual description vs. automated abstracts

Create a queue of pages to be explored
Choose a page
Add to queue
Fetch page content, extract all links
Database
Process page to extract information
15
Page Relevance/Ranking
  • A common complaint they return too many pages
    (the search engines didnt rank the pages very
    well)
  • Google uses PageRank based on the linkage
    structure of the Internet
  • DirectHit uses popularity data (number of
    visitors of a specific link)
  • More and more search engines are providing
    rankings based on comprehensive analysis.
  • Some are offering advanced features like

16
PageRank Algorithm of Google
  • PageRank computed bydl Number of pages
    pointed to it nl Number of outgoing linksa
    Damping factor
  • PageRank is the probability of a random user in
    visiting a page.
  • Damping factor is the probability of the user
    gets bored at that page and requests another
    random page.

Document
dl
nl
17
Stockpiling and Retrieval
  • Databases must
  • Allow efficient insertion of new documents
  • Allow efficient update of documents when spider
    revisits a page
  • Allow random access to records required during
    retrieval phase
  • Be efficient in terms of storage space
  • Retrieval
  • Boolean keyword query
  • Regular expression matching
  • No relevance order results are presented in
    database order
  • Scanning the entire database is computationally
    expensive
  • Vector space (statistical) retrieval
  • Inverted file indexing unique word ? list of
    documents (and positions)
  • Relevance frequency, proximity, position

18
Metasearch Searching the search engines
Accessing variable databases from the web
19
MetaCrawler Softbot
  • Design Issues
  • Provides a single unified interface
  • Performs tasks as quickly as possible
  • Adapts to a rapidly changing environment
  • Interacting with Search Engines
  • Must formulate the query formats
  • Must understand query results
  • Preprocess References
  • Download pages for analysis
  • Results collation
  • Domain, path, title comparison

20
Metasearch Strengths and weaknesses
  • Advantages
  • Ability to combine results of multiple search
    engines
  • Ability to provide a consistent user interface
  • Deficiencies
  • Have difficulty in ranking the list of results
  • Limited coverage, poor precision
  • Subject to outdated database info in major search
    engines

21
Inquirus and Specific Expressive Forms
  • Inquirus is a metasearch engine from NEC
  • How Inquirus overcomes the deficiencies of
    metasearching
  • Download and analyze the individual documents,
    rather than working with the list of summaries
    returned by search engines
  • Identify pages no longer exists or no longer
    containing queried terms
  • Generate more useful summaries
  • Improved document ranking using proximity
    information
  • Show local context of query terms
  • Specific Expressive Forms (SEF)
  • A technique that transforms queries into specific
    forms
  • Used by Inquirus
  • Example What does NASDAQ stand for? ? NASDAQ
    stands for, NASDAQ is an abbreviation, NASDAQ
    means.

22
ResearchIndex.org from NEC
  • Autonomous citation indexing
  • Provides reference linking
  • Shows citation context
  • Lists related and similar documents
  • Query-sensitive summaries
  • Page images, PS, PDF
  • Autonomous location of articles

23
New Search Tools Vivisimo
  • Information clustering Vivisimo is a metasearch
    engine that categorizes summaries returned by
    other search engines and groups pages
    accordingly.
  • A hierarchy of categories is provided
    automatically.

24
More Search Tools
  • Features results ranking
  • Understands natural language and boolean query
  • Results clustering and ranking
  • Understands boolean search query

SearchServer
  • Comprehensive, w/ ranking, but slow
  • Understands natural language query
  • Can use comprehensive boolean queries
  • Results integrated and ranked

25
Results Visualization
  • Phase 1 Formulation Expressing the search
  • Phase 2 Initiation of action Launching the
    search
  • Phase 3 Review of results Reading messages and
    outcomes
  • Set level Representation of whole set
  • Web site level The structure of a website
  • Document level Specific URLs
  • Phase 4 Refinement Formulating the next step

26
Factors for choosing visualizations
  • 4-T environment
  • Target user group
  • Type and number of data
  • Task to be done
  • Technical possibilities
  • There is not a single best visualization for all
    use cases
  • Synchronized alternative visualizations are
    encouraged

Increasing level of detail
Vector
Scatter plot
Bar graph
List
Tilebars
R. Curve
Thumbnails
27
Synchronized Alternative Visualizations
  • Scatter Plot and Document Vector
  • Scatter plot shows document clusters
  • Document vector plots list them in 1-D
  • User can highlight data points

28
Synchronized Alternative Visualizations
  • Bar Graphs
  • Bar graph shows the relevance for each keyword
  • List can be ordered by a column
  • Tilebars and Relevance Curves
  • Each row in the tilebar stands for one keyword
  • Length of tilebar stands for document size
  • Darkness of tile stands for relevance
  • Relevance curve plot the same information

Tilebars
R. Curve
29
Result Display Current Practice
  • Ranked list of titles
  • Number of hits of each term
  • Highlighted digest
  • Inter-document similarity

VISUALIZING PHYSICS WITH SOUND The Trivial
Case Listen to this amplitude being repeated
over and over with time harmonic oscillator (40k)
realistic pendulum (40k) anharmonic oscillator
(40k) particle in square well potential with
driving force (40k) More Visualizing Physics
With...10, http//goophy.physics.orst.edu/nacse/
hans/SOUND/sound.html (Direct Hit) More Like
This     Spotfire - Welcome to Spotfire
Spotfire is the leading provider of decision
analytic software solutions, speeding research
and development of pharmaceutical, biotechnology,
chemcials, semiconductor and manufacturing
companies worldwide.9, http//www.ivee.com/
(Direct Hit) More Like This     Dr. Dobb's Web
Site Altoweb - Click Here for 30 Day Free Trial
Altoweb - Click Here for 30 Day Free Trial
Talarian Macrovision's GLOBEtrotte TECHNETCAST
DEVSEARCHER OP-EDS COLUMNS ARTICLES CMP's
Software Development...9, http//www.ddj.com/oped
/1997/kim.htm (Direct Hit) More Like This
    Business Information Visualization for
Decision-Making Support Business Information
Visualization for Decision-Making Support -- A
Research Strategy Introduction In most management
domains, problem-solving is overwhelming because
of the large amount of complicated data, multiple
complex relationships among... Due...8, http//hs
b.baylor.edu/ramsower/acis/papers/zhang.htm
(Direct Hit) More Like This     infovis.org
Welcome to the site, which now hosts the
InfoVis symposium pages and the infovis email
digest archives. IEEE Symposia InfoVis 2000
InfoVis 99 InfoVis 98 InfoVis 97 InfoVis
968, http//www.infovis.org/ (Direct Hit) More
Like This     Software Visualization research at
GVU Georgia Institute of Technology,
Georgia7, http//www.cc.gatech.edu/gvu/softviz/So
ftViz.html (Direct Hit) More Like This
    IEEE Symposium on Information Visualization
(InfoVis '97) 7, http//www.erc.msstate.edu/con
...erences/vis97/cfp/infoviz.html (Direct Hit)
More Like This     Graphics and visualization
links This page under continuous construction.
New links are added when we have time. We make no
claims for completeness of any kind... The
included links have been hand-picked by the
visualization crew at CSC and lead to sites or
articles with... SAL...6, http//www.csc.fi/visua
lization/links.html (Direct Hit) More Like This

IEEE Symposium on Information Visualization
infovis_logo.gif (16802 bytes) (InfoVis '98)
InfoVis '98, the fourth Information Visualization
Symposium, will be held to focus on the rapidly
growing area of information visualization.
Increasing amounts of data and information and
the availability...1000, http//www.erc.msstate.e
du/conferences/infovis98/ (Direct Hit) More
Like This
30
Gallery Envision Matrix of Icons
  • Sorted by index terms
  • Ranked by relevance
  • Color, icon size, icon shape carry different
    information
  • User rating supported

31
Gallery Lighthouse Flying Stars
  • Ranked by relevance, distributed according to
    similarity
  • Animated rotation of the cluster reveals all
    spheres

32
Gallery MarketMap
  • Trade companies grouped into sectors
  • Neighboring stocks have historically similar
    movements
  • Color and brightness indicate price changes
  • Size corresponds to market capitalization
  • Headliners, gainers, and losers just a click away

33
Gallery TileBar Patterns
  • Length indicates document size
  • Shade of tile means frequency of queried terms
  • Distribution shown by the tile pattern

34
References
  • V. Ceric, "Advancements and trends in the world
    wide web search," Proceedings of the 22nd
    International Conference on Information
    Technology Interfaces, 2000, pp. 211 -220
  • R. Filman and F. Pena-Mora, "Seek, and ye shall
    find Web search engines comparison," IEEE
    Internet Computing, Vol. 2 No. 4, July 1998, pp.
    78 -83
  • M.I. Mauldin, "Lycos design choices in an
    Internet search service," IEEE Expert see also
    IEEE Intelligent Systems, Vol. 12 No. 1, Jan.
    1997, pp. 8 -11
  • E. Selberg and Oren Etzioni, The MetaCrawler
    architecture for resource aggregation on the
    Web, ibid., pp. 11-14.
  • T.M. Mann and H. Reiterer, "Evaluation of
    different visualizations of Web search results,"
    Proceedings of the 11th International Workshop on
    Database and Expert Systems Applications, 2000,
    pp. 586 -590
  • S. Mukherjea and Y. Hara, "Visualizing World-Wide
    Web search engine results," Proceedings of 1999
    IEEE International Conference on Information
    Visualization, 1999, pp. 400 - 405
  • T.M. Mann, Visualization of WWW-search results,
    Database and Expert Systems Applications, 1999.
    Proceedings. Tenth International Workshop on,
    1999, pp. 264 268
  • Longzhuang Li Yi Shang. "A new statistical
    method for performance evaluation of search
    engines", Tools with Artificial Intelligence,
    2000. ICTAI 2000. Proceedings. 12th IEEE
    International Conference on , 2000 pp. 208 215
  • A. C. Tsoi, Structure of the Internet?,
    Proceedings of 2001 International Symposium of
    Intelligent Multimedia, Video and Speech
    Processing, May 2001. pp. 449 - 452.
  • SearchIQ http//www.zdnet.com/searchiq/directory/
    multi.html
  • Search Engine Watch http//searchenginewatch.com/
Write a Comment
User Comments (0)
About PowerShow.com