Awareness Services for Digital Libraries - PowerPoint PPT Presentation

About This Presentation
Title:

Awareness Services for Digital Libraries

Description:

Awareness Services for Digital Libraries Arturo Crespo Hector Garcia-Molina Stanford University – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 21
Provided by: Artu58
Category:

less

Transcript and Presenter's Notes

Title: Awareness Services for Digital Libraries


1
Awareness Services for Digital Libraries
  • Arturo Crespo
  • Hector Garcia-Molina
  • Stanford University

2
Motivation
  • Our Objective create the next generation Data
    Repositories tailored to Digital Libraries needs
  • Persistence, Distribution, Intellectual Property,
    Indexing and Cataloging, Replication, ...

Data Storage Clients
Indexers
Replica
Naming
Data Storage
3
Data Stores and Clients
DB Tech Reports
DB Indexer
AI Tech Reports
CS Indexer
HCI Tech Reports
Data Stores
Clients
4
Data Store Services
  • Object access
  • Via a handle
  • Object awareness
  • Clients must be aware of changes at the store

5
A Case Study CS-TR and SIFT
  • SIFT a selective dissemination service
  • CS-TR A digital library of technical reports
    from about 50 universities
  • Awareness based on timestamps
  • Problems
  • File system timestamps
  • Application timestamps
  • Deletions

6
The Problem
  • How can a Data Storage Client detect the changes
    that have happened in remote Data Storages since
    the last update
  • There is not a Perfect Algorithm
  • The best algorithm for solving this problem
    depends on the characteristics of the relation
    between the Data Storage and the client

7
The Design Space
  • Ratio of Data Storages per Client
  • Statefull versus Stateless Data Storages in
    relation with the Clients
  • Push versus Pull Model
  • Update Frequency
  • Client awareness of Data Storages
  • Complexity of the Algorithm

How often the repository changes How often the
client is updated
8
Standard Mechanisms for Client Updating
  • Key Query Algorithm
  • Snapshot Differential Algorithm
  • Timestamps and Versions
  • Logs
  • Triggers
  • Signatures

9
Contributions
  • Survey of the spectrum of awareness options
  • Advantages and disadvantages of each one
  • All mechanisms can be capture by a single
    algorithm the UNI-AWARE algorithm
  • Enhancements for signature-based schemes
  • Reduced computation
  • Reduced communication costs

10
Related Work
  • Database replica maintenance
  • Remote file comparison
  • Deployment of programs over the network

11
The UNI-AWARE Algorithm
  • A unified algorithm that covers known schemes
  • Snapshot algorithm
  • Timestamps and versions
  • Logs
  • Triggers
  • Signatures
  • Algorithm is tailored to a specific scheme
    through the definition of custom functions

12
UNI-AWARE Signature Algorithm
  • Signature a token associated with each document
    that has a high probability of being unique and
    changes when the content of the object changes
  • Example CRC, checksums
  • Advantages
  • Robust as it does not require metadata
    maintenance
  • Easy to manage consistently when store fails or
    object migrates

13
UNI-AWARE Signature Algorithm
All signatures transferred
Data Store
Client
Document
Signature
Request Documents
14
DIST-UNI-AWARE Algorithm
  • Objective reduce amount of data exchanged
    between data store and clients
  • DIST-UNI-AWARE
  • Unified algorithm that can be tailored to
    different schemes
  • Hierarchical signatures
  • Hierarchical timestamps

15
DIST-UNI-AWARE
Signatures of Buckets transferred
Data Store
Client
Request more Signatures
Request Documents
Document
Signature
16
Advantages of Signature Algorithms
  • Support the push and pull models
  • No need for reliable storage of additional data
    structures if signatures are lost or corrupted,
    they can be recomputed
  • Efficient in usage of network resources, clients
    and data stores
  • Scales well in number of clients and documents

17
DIST-UNI-AWARE Enhancements
  • Increase group split factor
  • Client sends additional information at split time
  • Clustering of changed objects

18
Conclusions
  • Awareness mechanism for digital libraries
  • Separation of storage functionality and other
    services
  • Awareness schemes must be resilient to computer
    environment changes and bugs
  • UNI-AWARE and DIST-UNI-AWARE

19
Reference
  • Arturo Crespo, Hector Garcia-Molina. "Awareness
    Services for Digital Libraries." ECDL'97.
    http//www-db.stanford.edu/crespo/publications/

20
Awareness Services for Digital Libraries
  • Arturo Crespo
  • Hector Garcia-Molina
  • Stanford University
Write a Comment
User Comments (0)
About PowerShow.com