Lightweight Federation of Interoperable Digital Libraries - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

Lightweight Federation of Interoperable Digital Libraries

Description:

Experimentation and Implementation Interface for keyword 'html' ... Metadata retrieval approach. Define specification on how metadata are presented in those pages ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 57
Provided by: shi1159
Category:

less

Transcript and Presenter's Notes

Title: Lightweight Federation of Interoperable Digital Libraries


1
Lightweight Federation of Interoperable Digital
Libraries
  • Ph.D. Dissertation Defense
  • Rong Shi
  • Department of Computer Science
  • Old Dominion University
  • November, 2004

2
Overview
  • Introduction problem statement/motivation
  • Background and related work
  • Lightweight Federated Digital Library (LFDL)
    system architecture
  • Design and implementation
  • Conclusion and future work
  • Demo

3
Why Digital Libraries
  • Proliferations of digital information
  • Digital Library digitized, organized, universal
    accessible collection of information
  • DL vs. WWW/Web search engine
  • Search spectrum
  • Contents and management
  • User interface and service

4
Motivation - why DL interoperability
  • Digital libraries are important tools used by a
    large number of people everyday
  • Search interfaces of digital libraries vary
    greatly and can lead to confusion
  • Redundant work needs to be done when searching
    multiple digital libraries
  • Most digital libraries are unable to interact
    with each other
  • Many digital libraries have proprietary
    architecture and wont change it to
    interoperate with other libraries

5
Objectives
  • Goal of interoperability
  • Interoperability among non-cooperating DLs
  • Federated search service
  • Existing DL systems intact
  • User transparency when add, delete, modify
    participants
  • Lightweight, dynamic, flexible
  • Service quality, usability, performance, and
    scalability
  • Solution
  • LFDL lightweight Federated Digital Library, or
    InterOp

6
Framework
7
Addressed Issues
  • The feasibility of interoperability of
    non-cooperating DLs
  • The architecture of building a federation service
  • The quality and usability of the federated
    service
  • System performance and robustness

8
Background
  • General approaches
  • Distributed search
  • accurate fresher results, but
  • joint protocol required, performance, reliability
    and scalability issues
  • Models and sample solutions
  • Fully cooperative federation NCSTRL/NCSTRL,
  • Protocol exchange Z3950, STARTS, SDLIP, SDARTS,
  • Results Gathering meta web search engines,
    SearchLight,
  • Metadata harvesting
  • Better scalability and enhanced value-added
    services, but
  • Update overhead and synchronization for data
    freshness, harvesting protocol required

9
Distributed Search Fully Cooperative
Federation NCSTRL
  • Dienst, Distributed search, UI,
  • High cost
  • Install and run same software
  • Inflexible
  • Software update?
  • Internal structure change?
  • Most suitable for each participant?
  • Performance, reliability, scalability
  • Only returns after all sites have responded, or
    timeout occurs
  • Response time worst case
  • Only as good as its weakest link, not scalable

10
Distributed Search Protocol Exchange SDLIP
  • By Andreas Paepcke, etc. from Stanford
  • Protocol between client and LSP
  • LSP Library Service Proxy - Wrapper for
    information sources

11
SDLIP Review
  • API or middleware toolkit, not for end user and
    DL
  • Need to code both client side and DL LSP
  • Non-flexible Proxy
  • For each DL need to code LSP
  • Hard coded access rules, results parsing rules
  • Add DL or change DL? Change code, recompile,
    restart
  • Not so lightweight
  • Too comprehensive and complicated, detailed to
    low level
  • need to install software or follow API to fully
    compliant with SDLIP
  • Non-efficient interface
  • Not really unified UI, non-simultaneous search

12
Distributed Search Results Gathering
SearchLight
  • California Digital Library initiatives
  • Similar to meta web search engine
  • Simple search interface
  • Search multiple DLs at the same time
  • Problem
  • Same as meta web search engine
  • Simple UI only keyword search, inaccurate query
    mapping, irrelevant results
  • Non-uniformly results displaying no results
    parsing, processing, or merging, only show
    returned hits, search DL again for a result doc
  • Inflexible resource changes?
  • Inefficient performance suffering, wait for all
    results back

13
Metadata Harvesting OAI
  • Meta-data harvesting protocol
  • Data Providers documents and archives manager
    and maintainer
  • Service Providers end-user service to access
    archives
  • OAI protocol OAI-PMH (Protocol for Metadata
    harvesting)
  • Defines metadata harvesting framework, de-facto
    standard
  • Syntax
  • HTTP URL request and XML response
  • Validated using XML Schema
  • Metadata format Dublin Core
  • OAI review
  • For data providers
  • Agreement on how to publicize metadata
  • Easier to participate, but
  • Still need to stick to some convention like in
    federation
  • For service providers
  • Agreement on how to utilize metadata
  • Can provide scalable, efficient service but
  • Need to follow convention like in federation,
    data synchronization, server availability

14
Summary of Current Approaches
15
LFDL Introduction
  • General principal
  • Distributed search results gathering
  • Lightweight no work for data providers to join
  • Basic solution
  • Dynamic DL specification registration
  • Dynamic universal interface
  • Dynamic Query mapping
  • Efficient system management
  • Local repository

16
LFDL Basic Approach
17
LFDL Architecture
18
Data-centered specification- based LFDL Federation
19
LFDL Design DL specification
  • DLDL in XML
  • Structure
  • General info on a digital library
  • Search URL
  • Search method
  • Query Mapping rules
  • Access methods of the digital library
  • Search interface definition
  • Mapped to LFDL universal interface
  • Results retrieval and parsing rules
  • Information to be retrieved from the digital
    library
  • Specification DTD

20
DL Specification - sample
  • Specification for NEEDS
  • Access information
  • ltSEARCHDATA Title"Search Info"gt
  • ltSEARCH-METHOD Title"Search
    Method"gtPOSTlt/SEARCH-METHODgt
  • ltSEARCH-URL Title"Search
    URL"gthttp//www.needs.org/needs/public/search/sea
    rch_results/index.jhtml?_DARGS/needs/public/searc
    h/index_body.jhtmllt/SEARCH-URLgt
  • Search interface
  • ltFORMFIELDgt
  • ltINPUTNAMEgt
  • ltINPUTNAME_VALUE Title"Internal Form
    Name"gt/smete/forms/FindLearningObjects.keywordlt/I
    NPUTNAME_VALUEgt
  • ltINPUTNAME_MAPPING Title"Mapped UI
    Field Name"gtUI_keywordlt/INPUTNAME_MAPPINGgt
  • lt/INPUTNAMEgt
  • ltINPUTTYPE Title"Form Type"gttext
    inputlt/INPUTTYPEgt
  • ltINPUTVALUE/gt
  • lt/FORMFIELDgt
  • Simulated search interface from specification

21
DL Specification sample cont.
  • Results matching
  • ltOVAR-TAG Title"Output Tag"gtAlt/OVAR-TAGgt
  • ltOVAR-MATCH Title"Output Match"gtneeds/public/sea
    rch/search_results/learning_resource/summarylt/OVAR
    -MATCHgt
  • ltCOMMENT-MATCH-START Title"Comment match
    start"gt, lt/COMMENT-MATCH-STARTgt
  • ltCOMMENT-MATCH-END Title"Comment match
    end"gt/pgtlt/COMMENT-MATCH-ENDgt
  • Multiple results page
  • ltMULTIPAGE Title"Multi Page Information"gt
  • ltMULTI-PAGE Title"MultiPage"gtyeslt/MULTI-PAGEgt
  • ltHAS-NEXT Title"Contains Next
    Link"gtnolt/HAS-NEXTgt
  • ltNEXT-URL Title"Matching String"gtnulllt/NEXT
    -URLgt
  • ltLINK-URL Title"Matching String"gt/needs/pub
    lic/search/search_results/index.jhtml?queryIdlt/LI
    NK-URLgt
  • ltURL-ADDITIONAL-MATCHgtpagelt/URL-ADDITIONAL-M
    ATCHgt
  • ltPAGE-HIT Title"No. of hits per
    page"gt10lt/PAGE-HITgt
  • lt/MULTIPAGEgt

22
Query Mapping Samples
23
Specification Issues
  • Interface capture and query mapping
  • Semantics mapping
  • Non-web form based proprietary interface
  • Java applet
  • Multimedia
  • Search behavior simulation and specification
  • Access control
  • Multi-steps search

24
LFDL Prototype Universal Search Interface
25
LFDL Search Service
26
LFDL Search Service User-centered Dynamic Search
  • Keyword-driven dynamic interface
  • Base on keyword-hit set of each DL
  • Generating keyword-hit set for each DL
  • Source of base keyword set from OAI test-bed
  • Data from Arc metadata database
  • Data from user search logs
  • Generate keyword-hit set
  • Calculate hits from Arc metadata database
  • Query remote DL based on base keywords, parse
    results
  • DL specification add parsing rules
  • ltDOCHIT Title"Doc hits match string"gt
  • ltMATCHSTRING Title"Output Match"gttotal
    resultslt/MATCHSTRINGgt
  • ltBEFORESTRING Title"before
    string"gtoflt/BEFORESTRINGgt
  • ltAFTERSTRING Title"after string"gttotal
    resultslt/AFTERSTRINGgt
  • lt/DOCHITgt

27
Keyword-hit - from Arc DB and remote DL
28
Populating Keyword-hit for a DL
29
Dynamic Interface- Design
  • Interface generation
  • Generic universal interface
  • Based on Dublin Core
  • Complete DL specification in DLDL
  • Filter field type, name, values
  • Mapping with UI field
  • Allow DL unique features (no mapping)

30
Interface Generation Algorithm
  • Factors
  • Input keyword
  • Generic base UI
  • DL keyword-hit
  • DL specification
  • Algorithm
  • Weight based DL features selection
  • DL weight determined by keyword-hit
  • Absolute feature weight within UI
  • Relative feature weight within a DL
  • User behavior from user features selection log
  • Algorithm
  • balance all weights
  • select features with highest weight

31
Universal Search Interface Based on DC Element
32
Enhanced DLDL and Specification
  • ltFORMFIELDgt
  •   ltREQUIRED Title"Required Field or
    not"gtYlt/REQUIREDgt
  •   ltWEIGHT Title"Weight of Field"gt1lt/WEIGHTgt
  •   ltTYPE Title"Search Criteria or Display
    Option"gtSearch Criterialt/TYPEgt
  •   ltLABEL Title"Displayed Field
    Name"gtKeywordslt/LABELgt
  •   ltLENGTH Title"Field Length"gt35lt/LENGTHgt
  • - ltINPUTNAMEgt
  •   ltINPUTNAME_VALUE Title"Internal Form
    Name"gtkeywordslt/INPUTNAME_VALUEgt
  •   ltINPUTNAME_MAPPING Title"Mapped UI Field
    Name"gtUI_keywordlt/INPUTNAME_MAPPINGgt
  •   lt/INPUTNAMEgt
  •   ltINPUTTYPE Title"Form Type"gttext
    inputlt/INPUTTYPEgt
  •   ltINPUTVALUE /gt
  •   lt/FORMFIELDgt

33
Experimentation and Implementation Interface
for keyword html
34
Experimentation and Implementation Interface
for keyword university
35
Experimentation and Implementation Interface
customization
36
Results Presentation Service Automatic Metadata
Extraction
  • Metadata is key
  • Service usability present rich , interactive,
    and dynamic search results consistently
  • Performance and robustness local repository and
    intelligent cash
  • Available metadata sources
  • List page of search results
  • Detail page of a selected document/record
  • Metadata retrieval approach
  • Define specification on how metadata are
    presented in those pages
  • Use Dublin Core as common metadata mapping set
  • Develop metadata parser to extract metadata
  • Store parsed metadata in local repository
  • Build up metadata repository
  • Proactive
  • Passive or piggyback

37
Metadata Extraction Approach
38
(No Transcript)
39
Metadata Retrieval and Parsing Workflow
40
Metadata Parsing Rules Definition
  • Extended DLDL
  • Two levels list page and record page
  • String parsing separate raw string to segments
    corresponding to metadata fields

41
Part of DTD for DL parsing rules specification
  • lt!ELEMENT RESULT-METADATA (MATCH-START,MATCH-END,E
    XCLUDE,REPLACE,DELIMITER,METADATA-FIELD)gt
  • lt!ELEMENT RECORD-METADATA (MATCH-START?,MATCH-END?
    ,EXCLUDE,REPLACE,DELIMITER,METADATA-FIELD)gt
  • lt!ELEMENT METADATA-FIELD (PCDATA)gt
  • lt!ATTLIST METADATA-FIELD Title CDATA "information
    about a particular metadata field"gt
  • lt!ATTLIST METADATA-FIELD order CDATA IMPLIEDgt
  • lt!ATTLIST METADATA-FIELD multiple (true false)
    IMPLIEDgt
  • lt!ATTLIST METADATA-FIELD delimeter CDATA
    IMPLIEDgt
  • lt!ATTLIST METADATA-FIELD format CDATA IMPLIEDgt
  • lt!ATTLIST METADATA-FIELD null_value_string CDATA
    IMPLIEDgt

42
Sample Specification for CogPrints
  • ltRESULT-METADATA hasRecordLevel"true"gt
  • ltMATCH-STARTgtnulllt/MATCH-STARTgt
  • ltMATCH-ENDgtnulllt/MATCH-ENDgt
  • lt/RESULT-METADATAgt
  • ltRECORD-METADATAgt
  • ltMATCH-STARTgtname"DC.title"lt/MATCH-STARTgt
  • ltMATCH-END isLastIndex"true"gt"
    name"DC.creator"lt/MATCH-ENDgt
  • ltEXCLUDEgt/gtltmeta content"lt/EXCLUDEgt
  • ltREPLACEgt
  • ltOLD-STRINGgt" name"DC.creator"lt/OLD-STRINGgt
  • ltNEW-STRINGgt lt/NEW-STRINGgt
  • lt/REPLACEgt
  • ltMETADATA-FIELD order"1" multiple"true"
    delimeter" "gtCREATORlt/METADATA-FIELDgt
  • lt/RECORD-METADATAgt

43
Results
44
Results Merging and Presentation
  • Group results based on metadata field
  • Can further tailor interface and format results
    using XSLT

45
Performance Improvement Intelligent Cache
  • Search scenario
  • Case 1 a query for keywordcomputer
  • Case 2 a query for keywordcomputer AND
    date2002
  • Results LFDL prototype caching
  • Cache grouped by query string, so
  • Case 1 no cache hits, distributed search request
    sent to DLs
  • Case 2 no cache hits, distributed search request
    sent to DLs
  • Intelligent Cache Enhanced LFDL caching
  • Cache grouped by metadata, so
  • Case 1 no cache hits, distributed search request
    sent to DLs
  • Case 2 cache hits, search served locally

46
Local Metadata Repository
  • All searches are served locally first
  • A secondary in memory metadata cache for better
    performance and system reliability
  • Cache grouped by metadata instead of query string
  • Cache-based distributed search
  • Display results from cache, at the same time
  • Still send out query to DLs to update cache
  • Transparent to end users

47
Metadata Cache and Repository
48
Cache Replacement Algorithm
  • Replacement algorithm least used plus least
    recent used metadata
  • Initial system-wide parameters cache size, cache
    keep safe size
  • Runtime parameters per metadata record
    date_last_used, total_usage
  • Algorithm implementation
  • when first start load from db order by
    date_last_used, total_usage and pick based on
    cache size
  • String orderBy " ORDER BY total_usage desc,
    date_last_used desc"
  • String selectMetadata "SELECT internalID,
    identifier, archive, datestamp, title, creator,
    subject, description, publisher, publication,
    keyword, category contributor, type, format,
    source, language, status, date_last_used,
    total_usage FROM dc orderBy
  • each time when user view a metadata, update
    date_last_used and total_usage
  • if cache full, remove least used from cache and
    save to db(first sort by date_last_used, keep
    safe, then sort by total_usage)
  • cache size and keep safe size can changed at
    runtime

49
Registration and Management service
50
Registration and Management
  • Registration service
  • Validate, add, update and remove a DLs
    specification
  • Implementation
  • LDAP-based
  • Tightly-integrated
  • Management service
  • Real-time system monitoring
  • Registered DL
  • Average system response time
  • Resource usage
  • Search activities
  • Real-time system reconfiguration and maintainence
  • Turn on/off debug mode
  • Reallocate system resources
  • Update keyword-hits database

51
Major Contributions
  • Scope
  • provide service for non-cooperating DLs
  • DLs like IEEE and ACM may continue to work
    independently without participating any
    interoperation
  • Automatic metadata extraction from
    non-cooperating DLs
  • Architecture
  • lightweight, dynamic, data-centered, rule-driven
    architecture
  • DL specification defines interoperability
    processing rules using DLDL, a human-readable and
    highly maintainable xml-based language
  • Dynamic DL registration, removal, or modification
  • Powerful federation engine enforces rules defined
    in specification and enable DLs to join
    federation in real-time no code change or
    restart system
  • quickly forming a gathering service for a special
    community, just add specification of DL and that
    DL will be incorporated into service on the fly
    lots of work behind the scene come up a new
    language, develop a processing engine to process
    spec written in that language
  • Can be used in other domain, like web robot,
    shopping agent, price comparison agent

52
Major Contributions
  • Approach
  • lies in between of distributed search with no
    caching (Dienst) and harvesting (Arc)
  • so can achieve both advantages - data freshness
    from distributed search and rich service,
    reliability, performance from harvesting
  • Service design
  • Service quality and usability
  • dynamic user-centered, keyword driven search
    interface
  • can be applied to other DL applications, like
    archon, to design flexible interface based on
    archive and metadata
  • Results processing and presentation for rich,
    user friendly service
  • parse results, so that can display results
    uniformly without showing each DL native results
    page
  • Automatic metadata parsing and retrieval can be
    used by other domains and applications such as
    metadata extraction from PDF files
  • System performance, efficiency and robustness
  • Local metadata repository and intelligent cache

53
Publications
  • R. Shi, K. Maly and M. Zubair. Interoperable
    Federated Digital Library using XML and LDAP.
    Global Digital Library Development in the New
    Millennium, pages 277-286, May 2001.
  • R. Shi, K. Maly and M. Zubair. Dynamic
    Interoperation of Non-cooperating Digital
    Libraries. In Proceedings of International
    Conference on Digital Library - IT Opportunities
    and Challenges in the New Millennium, Beijing,
    China, July 2002.
  • R. Shi, K. Maly and M. Zubair. Automatic Metadata
    Discovery from Non-cooperative Digital libraries.
    In Proceedings of IADIS International Conference
    on e-Society 2003, Lisbon, Portugal, June 2003.
  • R. Shi, K. Maly and M. Zubair. Improving
    Federated Service for Non-cooperating Digital
    Libraries. In Proceedings of International
    Conference on Digital Libraries, New Delhi,
    India, February 2004.
  • M. Zubair, K.Maly, and R. Shi. Focus Research
    Libraries in Support of Active Learning. In
    Proceedings of International Conference on
    Information and Communication Technologies for
    Education, Vienna, December 2000.

54
Major Issues
  • Scalability
  • not easy to incorporate a large number of DL at
    one time
  • cache size limit caching
  • Resource intensive when serve query with large
    amount of results to process
  • however, it is still useful for building service
    for special communities with limited number of
    DLs
  • Incorporate DLs with complex or non web-based
    search interface
  • DL specification generation not automatic
  • DL behavior discovery still need to human
    intervene to find out if a DL change its
    searching and presenting mechanism
  • Effective evaluation, measurement, usefulness
    testing

55
Conclusion and Future Works
  • Federation service for non-cooperating DLs is
    possible
  • Dynamic user-centered interface is practical to
    improve quality of service
  • Locally harvested metadata improve service
    usability and performance
  • Future works
  • Complex interface mapping, access control
  • Scalability, and performance
  • Automatic specification generation
  • DL behavior changes discovery
  • Dynamic interface keyword relevance instead of
    hit (only user select that DL or DL has relevant
    results)
  • Personalized portal customized interface and
    results displaying most often used search and
    remember search preference caching options for
    fresh data or fast results

56
Demo
  • Steps http//www.cs.odu.edu/shi/interop/demo/doc
    /dissertation/demo/demo_steps.html
  • Site http//128.82.7.868088/interop/demo/index.h
    tml
Write a Comment
User Comments (0)
About PowerShow.com