Building Digital Libraries Made Easy: Toward Open Digital Libraries ICADL 2002 - PowerPoint PPT Presentation

Loading...

PPT – Building Digital Libraries Made Easy: Toward Open Digital Libraries ICADL 2002 PowerPoint presentation | free to download - id: 4a0f03-MmRlZ



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Building Digital Libraries Made Easy: Toward Open Digital Libraries ICADL 2002

Description:

Toward Open Digital Libraries ICADL 2002 Singapore Dec. 2002 Edward A. Fox (with Hussein Suleman, Ming Luo) fox_at_vt.edu http://fox.cs.vt.edu – PowerPoint PPT presentation

Number of Views:93
Avg rating:3.0/5.0
Slides: 75
Provided by: multime
Learn more at: http://fox.cs.vt.edu
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Building Digital Libraries Made Easy: Toward Open Digital Libraries ICADL 2002


1
Building Digital Libraries Made EasyToward Open
Digital Libraries ICADL 2002 Singapore
Dec. 2002
  • Edward A. Fox
  • (with Hussein Suleman, Ming Luo)
  • fox_at_vt.edu http//fox.cs.vt.edu
  • CS DLRL Internet TIC
  • NDLTD CITIDEL NSDL
  • Virginia Tech, Blacksburg, VA, USA

2
Acknowledgements (Selected)
  • Sponsors ACM, Adobe, DLF, IBM, Mellon
    Foundation, Microsoft, NSF (Grants CDA-9312611
    DUE-0121741, 0136690, 0121679 IIS-0080748,
    0086227, 0002935, and 9986089), OCLC, SOLINET,
    UNESCO, US Dept. Ed. (FIPSE), VTLS,
  • Faculty/Staff (now) Boots Cassel, Su-Shing Chen,
    Debra Dudley, Jeremy Frumkin, Joe Futrelle, Lee
    Giles, Martin Halbert, Rex Hartson, John
    Impagliazzo, Deborah Knox, JAN Lee, Kurt Maly,
    Gail McMillan, Eric Morgan, Manuel Perez,
    Muhammad Zubair,
  • Students Fernando Das Neves, Marcos Goncalves,
    Rohit Kelapure, Aaron Krowne, Paul Mather, Ryan
    Richardson, Priya Shivakumar, Wensi Xi, Liang Xu,
    Baoping Zhang,

3
Outline
  • Overview, Problem
  • Experience Case Study Projects
  • Open Archives Initiative
  • Hussein Suleman Dissertation
  • DL in a Box, OCKHAM
  • Summary and Conclusion

4
Overview
  • We
  • address the problem of how to develop DLs
  • build on experience in building many DLs
  • strive for simplicity as per OCKHAM initiative
  • build upon the Open Archives Initiative
  • demonstrate our approach in diverse situations
  • and invite all to
  • use DL-in-a-box and
  • help build Open Digital Libraries.

5
Problem
  • Why do DL developers continue to reinvent the
    wheel? The top 10 reasons are
  • The library budget wont allow purchase of a
    commercial DL system.
  • Unless the development effort is local, there
    wont be any control.
  • DLs are extensions of DBMSs, so they are simple
    applications to develop.
  • Since DLs operate on the Web, one must adopt the
    newest W3C proposal.

6
Problem contd
  1. Since technology moves so quickly, it is
    essential to follow the latest fad.
  2. CS students always develop from scratch.
  3. This team knows it can do it better.
  4. This system must have more capabilities than any
    other system.
  5. This DL has to be more flexible and extensible.
  6. This is the right system architecture at last!

7
Outline
  • Overview, Problem
  • Experience Case Study Projects
  • Open Archives Initiative
  • Hussein Suleman Dissertation
  • DL in a Box, OCKHAM
  • Summary and Conclusion

8
Experience Case Study Projects
  • AmericanSouth.org
  • NDLTD
  • CSTC
  • JERIC
  • CITIDEL
  • NSDL
  • Digital Library in a Box

9
AmericanSouth.org
  • Domain culture and history of the southern
    region of America (USA)
  • Genre diverse distributed collections at a dozen
    universities
  • Submission Collection local sites ? Emory
    University (for SOLINET)

10
Networked Digital Library of Theses and
Dissertations (NDLTD)
  • Domain graduate education and research
  • Genre electronic theses and dissertations (ETDs)
  • Submission Collection local sites ?
    www.ndltd.org, www.theses.org

11
Computer Science Teaching Center (CSTC)
  • Domain teaching computer science
  • Genre courseware
  • Submission Collection www.cstc.org

12
CS Teaching Center (CSTC) Lessons Learned
  • Instead of building large, expensive multimedia
    packages, that become obsolete and are difficult
    to re-use, concentrate on small knowledge units.
  • Learners benefit from having well-crafted modules
    that have been reviewed and tested.
  • Use digital libraries to build a powerful base of
    support for learners, upon which a variety of
    courses, self-study tutorials reference
    resources can be built.

13
(No Transcript)
14
Browsing (2)
15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
ACM Journal of Educational Resources in Computing
(JERIC)
  • Domain teaching computer science
  • Genre courseware, scholarly articles
  • Submission Collection CSTC, ACM Digital Library

19
JERIC
  • Journal of Educational Resources in Computing
  • Accessible from www.cstc.org and www.acm.org and
    www.citidel.org
  • ACM and SIGCSE support
  • Refereed and interactive
  • Part of ACM Digital Library

20
Computing and Information Technology Interactive
Digital Educational Library (CITIDEL)
  • Domain computing / information technology
  • Genre one-stop-shopping for teachers learners
    courseware (CSTC, JERIC), leading DLs (ACM,
    IEEE-CS, DBLP, CiteSeer), PlanetMath.org,
    technical reports,
  • Submission Collection sub/partner collections
    ? www.citidel.org

21
CITIDEL Team
  • An NSDL Collection Track project
  • Led by Virginia Tech, with co-PIs
  • Fox (director, DL systems)
  • Lee (history)
  • Perez (user interface, Spanish support)
  • Partners
  • College of New Jersey (Knox)
  • Hofstra (Impagliazzo)
  • Villanova (Cassel)
  • Penn State (Giles)

22
Summary of Spring 2001 Survey of CITIDEL-related Collections and their Sizes Summary of Spring 2001 Survey of CITIDEL-related Collections and their Sizes Summary of Spring 2001 Survey of CITIDEL-related Collections and their Sizes Summary of Spring 2001 Survey of CITIDEL-related Collections and their Sizes Summary of Spring 2001 Survey of CITIDEL-related Collections and their Sizes
Size of Collection 1-5 items 6-100 items 101-999items 1000items
Number ofCollectionsIdentified 100-300 50 20-35 10-25
23
Multi-dimensional Categorization
24
CITIDEL Collection Sources
include
Experts finding aids
IEEE-CS
CSTC
Research Index
ACM
NCSTRL
include
include
include
metadata
fulltext
NECs data
data processed w. R.I.
Borners info viz software repository
include
include
JERIC
SIGCSE proceedings
ACM DL
25
CITIDEL Collection Building
thru
Nominating
Submitting
include after
after
or thru
Creating
include after
Searching, Browsing
Crawling
Composing
aided by
using
thru
using
GetSmart
Classifying
Crawlifier
VIADUCT
26
Overview of CITIDEL architecture
27
Distributed repository structure
28
Digital library architecture for local and
interoperable CITIDEL services
29
National Science Digital Library (NSDL)
  • Domain undergraduate and K-12 education, etc.
  • Genre educational resources
  • Submission Collection sites of 90 projects ?
    www.nsdl.org

30
NSDL Information ArchitectureDeveloped by the
Technical Infrastructure Workgroup
User Interfaces
CoreNSDL Bus
Usage Enhancement
Collection Building
31
Digital Library in a Box
  • Domain helping DL projects
  • Genre any domain, but especially those involved
    in NSDL (since funded in part is through NSDL
    with U. FL, NCSA)
  • Software and Documentation http//dlbox.nudl.org

32
Outline
  • Overview, Problem
  • Experience Case Study Projects
  • Open Archives Initiative
  • Hussein Suleman Dissertation
  • DL in a Box, OCKHAM
  • Summary and Conclusion

33
Open Archives Initiative OAI www.openarchives.org
openarchives_at_openarchives.org
34
The World According to OAI
Service Providers
Discovery
Current Awareness
Preservation
Data Providers
35
Technical Umbrella for Practical Interoperability
Metadata Harvesting
Reference Libraries
Museums
Publishers
E-PrintArchives
that can be exploited by different communities
36
Tiered Model of Interoperability
Mediator services
Metadata harvesting
Document models
37
OAI Black Box Perspective
Services
Browse
Summarize
Search
Visualize
Metadata
Docs
DO
DO
DO
DO
DO
DO
DO
38
Aggregation throughOAI Harvesting
IEEE-CS, ACM,
39
Protocol for Metadata Harvesting
  • Service Requests
  • Identify
  • ListMetadataFormats
  • ListSets
  • GetRecord
  • ListIdentifiers
  • ListRecords
  • Metadata Multiplicity
  • Date/Time Ranges
  • Sets (with semantics depending on local data
    providers)
  • Resumption Tokens

40
NDLTD OAI Example
41
Outline
  • Overview, Problem
  • Experience Case Study Projects
  • Open Archives Initiative
  • Hussein Suleman Dissertation
  • DL in a Box, OCKHAM
  • Summary and Conclusion

42
Open Digital Library (ODL) Hypothesis (Hussein
Suleman)
  • Can we leverage the successful model of the OAI
    Protocol for Metadata Harvesting to alleviate our
    architectural problems ?
  • Maybe if
  • Digital Libraries can be modeled as
  • networks of extended Open Archives, where
  • each extended Open Archive is a
  • source of data and/or a provider of services.

43
Example Architecture (NDLTD)
Virginia Tech
User Interface
PhysNet
Humboldt
Browse
Search
Recent
Duisburg
Union Catalog
CalTech
Dresden
MIT Filter
User Interface OAI/ODL archive OAI/ODL protocol
legend
MIT
44
ODL Demonstration - FrontPage
45
ODL Demonstration - Search
46
ODL Demonstration - Browse
47
Hussein Sulemans Thesis Summary
  • Open Digital Libraries (DLs)
  • Open Archives Initiative (OAI)
  • Protocol for Metadata Harvesting (PMH)
  • Extending OAI-PMH provides the glue for building
    componentized DLs.
  • Lightweight protocols connect the components to
    support modular systems with good efficiency.

48
Research in a Nutshell
  • We build extensible modular systems with
    customizable services.
  • This supports interoperability and allows
    distributed development.
  • This is in use in www.cstc.org,
    AmericanSouth.org, www.citidel.org,
  • Components include search, browse, annotate,
    editorial support, union, filter, whats-new,
    submit, rate, recommend,

49
?
users
digital objects
50
componentized digital library
51
open digital library
52
ODL Component Requirements
  • Search
  • Retrieve a list of items
  • Index new items
  • Annotate
  • Add annotation to item
  • Retrieve a list of annotations for an item

53
Open Digital Library Components
  • Running now
  • XML-File (data provider from file system)
  • Union, search, browse, recent, filter
  • E-journal/review, Submit, Edit, Annotation
  • Class projects
  • High performance multilingual search
  • Recommender, Rating Mirroring (see JCDL02)
  • Working with NCSA from DB, unstructured text
  • Others discussed
  • Classification/categorization
  • DL-Viz interconnection (VIDI Jun Wang ETD)

54
Open Digital Library Extended
As Metadata Search Service Provider
As Metadata Browse Service Provider
As Whats New Service Provider
As Annotation Search Service Provider
As Recommend Rate Service Provider
DBBrowse Browse Engine
IRDB-1 Search Engine
Recommend
IRDB-2 Search Engine
Whats New Engine
Rate Engine
XML File Coll. Data Provider 1
DBUnion Archive Merger Component
Annotation Engine
Harvest from data providers
XML File Coll. Data Provider 2
Filter
XML File Coll. Data Provider 3
OAI-PMH Data Provider
Submit Archive
OAIB (NCSA from RDBMS)
55
Example Open Digital Library
ODLRecent
USER INTERFACE
Recent
PMH
ODLUnion
Filter
PMH
ODLUnion
Browse
Union
PMH
ODLBrowse
PMH
ODLUnion
Filter
PMH
Students and researchers
Search
ODLSearch
Digital Library for the Networked Digital
Library of Theses and Dissertations
(www.ndltd.org)
ETD collections
56
Example Open Digital Library
Digital Library for the Computer Science Teaching
Center (www.cstc.org)
57
CSTC User Interface
58
OPEN ARCHIVE
59
Layer 1 OAI PMH
  • Protocol for Metadata Harvesting
  • Transfer stream of metadata from one archive or
    component to another
  • Service Requests
  • Identify, ListSets, ListMetadataFormats
  • GetRecord, ListIdentifiers, ListRecords

60
Layer 2 Extended OAI-PMH
  • OAI-PMH extensions for general-purpose
    inter-component communication
  • Added in generic containers in every response for
    additional information
  • Added PutRecord to submit a record
  • Increased granularity to support times as well as
    dates (same as OAI-PMH v2.0)
  • Ignored DC requirement

61
Layer 3 ODL Protocols
  • Specialized protocol semantics for different
    components, e.g.
  • Search component uses ODLSearch protocol
  • ListRecords and ListIdentifiers embed query terms
    in set parameter
  • Annotation component uses ODLAnnotate protocol
  • ListRecords and ListIdentifiers specify the item
    for which annotations are requested in the set
    parameter
  • PutRecord adds an annotation to an item

62
Performance Optimizations
  • Caching of responses
  • Persistent CGI mechanisms
  • FastCGI
  • SpeedyCGI
  • Request multiple records in a single operation
    (proposed)

63
What have we accomplished ?
  • Complete protocol-level separation among
    components within the DL
  • Seamless integration with little glue
  • Simple extensions of OAI-PMH
  • Modular and portable components
  • Efficient in speed but not as efficient in
    storage

64
Outline
  • Overview, Problem
  • Experience Case Study Projects
  • Open Archives Initiative
  • Hussein Suleman Dissertation
  • DL in a Box, OCKHAM
  • Summary and Conclusion

65
Digital Library In A Box
  • http//dlbox.nudl.org
  • Part of NSFs National Science Digital Library
    (www.nsdl.org)
  • Offers Shrink-wrap Open Digital Library
    Components Open Source Software
  • Users install ready-made digital library
    solutions, or build their own from snap-together
    components.

66
(No Transcript)
67
OCKHAM
  • Simplicity (a la OCCAMs razor)
  • Support by Mellon and DLF
  • Next meeting in Atlanta Jan. 8, 2003
  • Four main ideas
  • Components
  • Lightweight protocols
  • Open reference models (e.g., 5S, OAIS)
  • Community perspective and involvement

68
5S Layers
Societies
Scenarios
Spaces
Structures
Streams
69
Outline
  • Overview, Problem
  • Experience Case Study Projects
  • Open Archives Initiative
  • Hussein Suleman Dissertation
  • DL in a Box, OCKHAM
  • Summary and Conclusion

70
Summary and Conclusion
  • It is possible to build DLs easily.
  • The ODL approach to this has been developed and
    validated in a number of settings.
  • Everyone is invited to
  • Use ODL components
  • Refine or add ODL components, protocols
  • Join ODL and OCKHAM
  • For more information see

71
(Somewhat) Open Issues
  • Is this scalable? Portable ? Extensible ?
  • Can we define all popular DL services using such
    a methodology? (completeness problem)
  • Can we define DLs as configurations of ODL
    components? (composition problem)
  • Is OAI-PMH a good baseline protocol ? Can we
    design a better baseline protocol upon which to
    base harvesting and repository access?
  • To what degree is an ODL network equivalent to a
    monolithic system? (comparison problem)

72
Ultimate Goal
  • Package different configurations into instant DL
    systems or subsystems
  • DL building component configuration
  • All DLs speak the same language(s)
  • Basic services are trivial to provide so more
    effort is spent on advanced capabilities of DLs

73
Selected Links
  • CITIDEL www.citidel.org
  • NCSTRL www.ncstrl.org
  • NDLTD www.ndltd.org
  • NSDL www.nsdl.org
  • Open Archives Initiative
  • www.openarchives.org
  • www.openarchives.org/OAI/openarchivesprotocol.htm
  • www.dlib.vt.edu/projects/OAI/

74
More Links
  • Hussein Sulemans Dissertation
  • http//purl.org/net/hsdiss/odl.pdf
  • Repository Explorer
  • http//purl.org/net/oai_explorer
  • DL Courseware http//ei.cs.vt.edu/dlib
  • Virginia Tech Digital Library Research Laboratory
    (DLRL) www.dlib.vt.edu
  • Listservs
  • dl-in-a-box-l_at_listserv.vt.edu
  • ockham-sys_at_listserv.cc.emory.edu
About PowerShow.com