Searching the Deep Web - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Searching the Deep Web

Description:

What is the 'Deep Web'? (1 of 2) Web content crawlers can not get ... Deployed the first 'deep web' application in the Federal ... Deep Web Search Application: ... – PowerPoint PPT presentation

Number of Views:281
Avg rating:3.0/5.0
Slides: 32
Provided by: webser
Category:
Tags: deep | searching | web

less

Transcript and Presenter's Notes

Title: Searching the Deep Web


1
Searching the Deep Web
  • Abe LedermanDeep Web Technologies, LLCLos
    Alamos, NM
  • Arlington, Virginia
  • October 25, 2002

2
What is the Deep Web? (1 of 2)
  • Web content crawlers can not get to because it is
  • Publicly available information is 400-550 times
    larger than the known web (surface web)
  • Consists of 7500 terabytes of information
  • inside databases
  • behind firewalls
  • available only for a fee

3
What is the Deep Web? (2 of 2)
  • 200,000 deep web websites have been identified
  • Content is of higher quality than the surface web
  • Growing faster than the surface web
  • (according to BrightPlanet March 2000 study)

4
Background
  • Cofounder of Verity in 1988
  • Started consulting to Los Alamos National
    Laboratory in 1994
  • Demonstrated web search retrieval application
    at a Verity Users Conference in April 1994
  • Developed SciSearch_at_LANL application in 1995
  • Founded Innovative Web Applications in 1996
  • Deployed the first "deep web" application in the
    Federal government, the Environmental Science
    Network in February 1999.

5
Distributed Explorit Features (1 of
2)
  • Search engine independent
  • Configurable user interface
  • Common, consistent user interface across
    heterogeneous document collections
  • Built-in navigation capabilities
  • Built-in mark download capability

6
Distributed Explorit Features (2 of
2)
  • Field including date-range searching
  • Supports access to login-restricted sites
  • Supports access to sites that use cookies other
    session ids
  • Distributed Alerts for notification of new
    relevant content
  • Personal Library to organize search results

7
Distributed Explorit How it Works (1 of 2)
  • Translates user search requests into the syntax
    that is understood by the web database being
    searched

The search deep web may be translated
into deep web deep web deep AND web deep
ADJ web
8
Distributed Explorit How it Works (2 of 2)
  • Maps fields that user is searching to fields that
    are available in database
  • Submits searches in parallel, in real time, to
    the various databases selected by the user
  • Sends requests to web server in same format as a
    web browser would
  • Result lists are parsed, and field values
    (author, title, etc) are extracted.

9
Deep Web Search Application Advantages
  • No crawling or indexing of data required
  • Centralized access to multiple databases from one
    search form
  • User needs to learn only one search language
  • Common user interface across all sources accessed
  • Improved functionality, in many cases, as
    compared to what is provided by the source itself
  • Configurable for a wide variety of web databases
    and information sources

10
Deep Web Search Application Disadvantages
  • Changes in database engine used, result list
    format, etc will temporarily disable access to
    source
  • Not all search functionality of source search may
    be exposed
  • Some of the information that a source returns may
    be lost
  • Increase network/bandwidth requirements of server

11
Performance Characteristics of aDeep Web Search
Application
  • Requires no disk space for indices
  • Requires minimal CPU resources
  • Requires more network resources than a comparable
    application that only searches local content

12
Distributed Explorit is a Stepping Stone
  • From legacy content in government databases
  • To content that is re-architected, re-purposed to
    be accessed via emerging technologies (e.g. XML,
    Web Services)
  • Enables cheap and quick development of
    applications that showcase the value of universal
    access to content
  • Gateway between legacy world and emerging
    technologies

13
Demo
  • US. Government Science Portal
  • -Collaborative effort of 10 Federal Government
    Agencies
  • http//www.science.gov
  • We searched three collections for cyber
    security.

14
(No Transcript)
15

16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
The Problem
  • The haystack is getting bigger
  • How do you find that needle?

20
The Future (1 of 2)
  • Clustering of results
  • In-depth searching
  • Indexing of results (uniform relevance ranking)
  • Sophisticated result analysis
  • Assist user in identifying best sources for a
    given search
  • Collaborative discovery

21
The Future (2 of 2)
  • Web Service compliant interface
  • Output of search results in both HTML and XML
  • Support for querying with XQuery when this W3C
    standard is finalized
  • Dynamic cross-content hyperlinking
  • Leads to UberPortal, portal of portals

22
(No Transcript)
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
For additional information
  • Searching the Deep Web
  • Directed Query Engine Applications
  • At the Department of Energy
  • www.dlib.org/dlib/january01/warnick/01warnick.html
  • The Deep Web
  • Surfacing Hidden Value
  • www.brightplanet.com/deepcontent/tutorials/DeepWeb
    /index.asp

30
Contact Information
  • Abe Lederman Deep Web Technologies154 Piedra
    Loop Los Alamos, NM 87544(505)672-0007
  • abe_at_deepwebtech.com

31
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com