The University of Illinois DLI Project - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

The University of Illinois DLI Project

Description:

1000 collections across all of engineering. testbed for vocabulary switching (federation) ... UC Santa Barbara: collections and metadata. Semantic Analysis of ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 37
Provided by: BruceR78
Category:

less

Transcript and Presenter's Notes

Title: The University of Illinois DLI Project


1
Federating Repositoriesof Scientific Literature
www.canis.uiuc.edu
The Interspace Prototype (1997-2000) Digital
Libraries Initiative (1994-1998) Worm Community
System (1990-1993) Telesophy System (1984-1989)
2
Federating Repositoriesof Scientific
LiteratureThe University of Illinois Digital
Libraries Initiative (DLI)Project Status
RetrospectiveBruce R. Schatz dli_at_uiuc.eduhttp
//dli.grainger.uiuc.eduAAAS-98, Digital
Libraries SessionPhiladelphia, February 1998
3
Evolution of Information Retrieval across the Net
from Bruce R. Schatz, Information Retrieval in
Digital Libraries Bringing Search to the Net
cover article in Science, vol 275, Jan 17, 1997
special issue on Bioinformatics
4
Illinois DLI Status
  • Production Testbed based in a Real Library
  • Document Search based on Structure
  • SGML Publisher Stream deployed at U of Illinois
  • Technology Research for Scalable Federation
  • Concept Search based on Semantics
  • Statistical Indexes across subjects and media

5
Production Testbed Status
  • Based in major Engineering Library
  • Production Stream - in testbed before on shelves
  • Full-text SGML -- Federated Structure Search
  • 5 publishers, 55 journals, 40,000 articles
  • Web version campus rollout October 1997
  • integrated within library information services

6
Production Testbed Evaluation
  • 700 users, steadily increasing to max 1500
  • used in intro Computer Science classes
  • developers and evaluators work closely
  • needs assessment and usability studies
  • careful multi-modal usage evaluation
  • session observations and transaction logs

7
Primary Partners
  • journal/magazine Publishers
  • American Institute of Physics (AIP)
  • American Physical Society (APS)
  • American Astronomical Society (AAS)
  • American Society of Civil Engineers (ASCE)
  • American Society of Mechanical Engineers (ASME)
  • American Society of Agricultural Engineers (ASAE)
  • American Institute of Aeronautics Astronautics
    (AIAA)
  • Institute of Electrical and Electronics Engineers
    (IEEE)
  • Institution of Electrical Engineers (IEE)
  • IEEE Computer Society (IEEE-CS)
  • testbed SoftQuad, OpenText
  • infrastructure Hewlett-Packard, Microsoft

8
DeLIver Search Interface
9
DeLIver Search Results
10
(Full Text Retrieval)
11
Result of Figure Caption Search
12
Dynamic Linking in Bibliography
13
Testbed Difficulties
  • Original plan was to modify Mosaic for search
  • Web became commercial -- we lost control of
    developers
  • Plan to use standard BRS as fulltext backend
  • needed to use SGML specific OpenText search
    engine
  • good-quality SGML simply not available
  • we had to train every publisher nothing was
    ready
  • SGML interactive display not journal quality
  • physics requires equations -- hard to display
    well
  • Custom software hard to deploy widely
  • Web widespread but too lowend for professional
    search

14
Testbed Successes
  • Willing to build custom encoding procedures
  • so succeed with SGML where Elsevier and OCLC
    failed
  • Canonical encoding for structure tags
  • so can federate across publishers and journals
  • Willing to build custom software for Search
  • so able to do multiple views not single stream
    like Web
  • Production repositories for real Publishers
  • became RD arm of major scientific publishers
  • Changing the nature of libraries with research
  • research prototype becomes standard service

15
Technology Transfer
  • Illinois DLI considered RD arm of publishers
  • broad spectrum of major publishers in scientific
    literature
  • successful annual partners workshop plus
    high-level visits
  • Technology transferred to Publisher partners
  • contract with AIP to clone testbed software
    processing
  • arrangements with ASCE for a second cloning
  • Testbed Continuance by University Library
  • industrial partners program between Library
    Publishers
  • company formed to provide software and service

16
Technology Research
  • Scalable Semantics becoming feasible
  • statistical clustering proves useful
    interactively
  • concept spaces and category maps
  • Semantic indexes for large collections
  • 400K Inspec (1995)
  • 4M Compendex (1996)
  • Simulation of Community Repositories
  • 1000 collections across all of engineering
  • testbed for vocabulary switching (federation)

17
Vocabulary Switching
  • Grand Challenge of Digital Libraries
  • semantic interoperability across subject domains
  • vocabulary switching to suggest across domains
  • Generating 1000 community repositories
  • 600 categories across engineering (38 top-level)
  • 150 categories across EE, CS, physics
  • 3M raw abstracts, about 10M in community spaces
  • large-scale supercomputer simulation
  • 7 days of dedicated computation (10 days overall)
  • have space navigation need space intersection

18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
(No Transcript)
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
Multimedia Federation
  • Semantic Indexing within Media
  • Text, Image, Number
  • Semantic Interoperability across Media
  • Spatial Data (GIS) dataset intersection
  • Multi-site DLI Collaboration
  • U Illinois systems and supercomputers
  • U Arizona algorithms and experiments
  • UC Santa Barbara collections and metadata

29
Semantic Analysis of Multimedia
  • Collections of Objects containing Units
  • Text community repository (topic proximity)
  • document abstracts containing noun phrases
  • Image aerial photograph (spatial proximity)
  • feature regions containing texture tiles
  • Units are media-dependent (statistical parsers)
  • Text phrase segmentation (nouns on word parts of
    speech)
  • Image texture segmentation (orientation on pixel
    densities)
  • Indexes are media-independent (statistical
    clusters)
  • Concept co-occurrence similarity of units within
    objects
  • Category self-organizing maps of objects within
    collections

30
Media Interoperability Experiment
  • Feature regions containing texture tiles in
    aerial photos
  • 1M regions in 5K photos around southern
    California (GIS)
  • text concept space and category map in geoscience
  • 10M phrases in 500K abstracts from Georef and
    Petroleum Abstracts
  • image concept space and category map in aerial
    photos
  • tile similarity space and visual thesaurus maps
    (10M tiles)
  • numeric satellite sensor data
  • 1M NASA AVHRR temperature records, 2M GNIS
    feature names
  • spatial gazetteer as bridge imageltgttextltgtnumber
  • images are labeled by GNIS gazetteer (feature
    names for text search)

31
(No Transcript)
32
(No Transcript)
33
Federated Search
  • Multiple Indexes in Distributed Repositories
  • text search SGML for full-text articles
    (Testbed) bibliographic
    abstracts for full coverage (INSPEC)
  • term suggestion thesaurus for taxonomy
    (INSPEC)
  • concept spaces for term coverage
    (SGML)
  • Multiple View User Interface Client
  • uniform displays for multiple indexes
  • drag-and-drop between display views to
    mix-and-match
  • uniform search across multiple repositories
  • Multiple Protocol Stateful Gateway
  • single query stream analog to single user
    interface
  • will handle distributed repositories for
    federation, e.g. AAS
  • Opentext (socket), term-suggest (SQL), Ovid/DRA
    (Z39.50)

34
IODyne Engineering Search Example
35
Building a new Community
  • starting the field of Digital Libraries
  • IEEE Computer DLI special issue May 1996
  • Computer DLI retrospective planned for 1999
  • Allerton workshops on DL Sociology
  • edited book planned on DL Evaluation
  • DLI National Coordination effort
  • Illinois DLI retrospective conference (Mar 98)

36
The 21st Century Analysis
  • Beyond Search to Analysis
  • Cross-Correlating Information from many sources
    across the Net
  • The Net solves problems
  • Every community has its own special library
  • Every community and every person does indexing !!
  • The Internet evolves into the Interspace
Write a Comment
User Comments (0)
About PowerShow.com