Open Access to Digital Libraries, - PowerPoint PPT Presentation

1 / 66
About This Presentation
Title:

Open Access to Digital Libraries,

Description:

09.30 What is a digital library? What is a 'second generation digital library' ... there is an image of Serge Abiteboul. and have been created after December 2000 ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 67
Provided by: Donatella2
Category:

less

Transcript and Presenter's Notes

Title: Open Access to Digital Libraries,


1
Open Access to Digital Libraries
  • Donatella Castelli
  • Pasquale Pagano
  • Manuele Simi
  • ISTI-CNR
  • Pisa

2
Outline
  • 15 May 2003
  • 09.30 What is a digital library?
  • What is a second generation digital
    library?
  • 11.00 Coffee break
  • 11.30 A second generation DL Scholnet
  • Introduction to its functionality
  • DoMDL document model
  • Annotation model
  • 13.00 Lunch
  • 14.30 Scholnet demo
  • 15.45 Break
  • 16.00 Experimentation of Scholnet by group 1
  • 17.00 End of Day 1

3
Outline (cont.)
  • 16 May 2003
  • 09.30 Experimentation of Scholnet by group 2
  • 10.15 OpenDLib demo
  • 10.30 The Scholnet Architecture
  • 11.00 Coffee break
  • 11.30 Scholnet demo
  • 12.00 How to set up a Scholnet DL
  • 13.00 Lunch
  • 14.30 Comparison with other DL systems
  • 15.00 Discussion and Questionnaire
  • 15.30 End of day 2

4
What is a DL?(traditional definition)
  • An institution which performs and/or supports
    (at least) the functions of a library in the
    context of distributed, networked collections of
    information objects in digital form
  • Nicholas Belkin
  • Eigth DELOS Workshop
  • Stockholm, 1998

5
Digital objects
  • Not only document descriptions (metadata)
  • but also
  • documents (texts, videos, sounds, 3D images, ..
    data, programs, maps, . )
  • Not only a catalogue.
  • but also a repository

6
DL basic services
  • Acquisition
  • Submission
  • Repository
  • Search/Browsing/Retrieval
  • Dissemination
  • User Interface

7
A digital library is thus the analogous of
  • A digital library
  • A digital museum
  • A digital archive
  • A digital audio-video archive
  • A data center
  • ..

8
The Origin
  • The library systems
  • The Web

9
Library systems in the past
  • from a direct communication (on-line) .

. to a communication via WEB HTTP protocol
10
Web Access to OPACs
Internet
  • .

HTTP protocol
Generic users access though the Web only the
search/retrieval service
WEB Interface
Search
The software modules that implement the other
services do not communicate through the Web
Cataloguing Service
Loan Service
Catalogue
11
Web access to OPACs
  • Each catalogue is accessible through its own user
    interface
  • User interfaces differ for
  • access points
  • names of the access points
  • language
  • graphics
  • .
  • Users must be familiar with many interfaces
  • No cross-searches are possible

12
  • An example of
  • user interface

13
A solution Z39.50
  • Standard communication protocol for
    information search and retrieval

It establishes the rules that regulate the
communication between the clients and the
servers (automatic catalogues)
Client
Server
Translation
Translation
Protocol Messages
14
Z39.50
  • It allows the search and retrieval of
    bibliographic information from different
    distributed OPACs by issuing a single query
    though a common user interface

OPAC - B
OPAC-A
Server Z39.50
Server Z39.50
Protocol Z39.50
User Interf. Client Z39.50
15
Virtual Library
  • Parallel access to selected Z39.50 OPACs
  • Common user interface
  • WEB access with mapping HTTP-Z39.50 (and
    vice-versa)
  • Traditional bibliographic search based on
    OPACs

16
Use of Z39.50
  • The use of Z39.50 is limited to catalographic
    resources
  • No project implements the protocol functions for
    the management of electronic documents

17
The Web
  • 90-ies global access to information resources
  • Different types of resources, stored on Internet
    distributed servers and accessible through the
    WWW-World Wide Web
  • The Web allows to access a specific resource by
    specifying the network address (URL)

18
Search engines
  • Index the words in the Web pages
  • Allow the resource discovery (without explicitly
    knowing the address)
  • Implement their own resource selection polices
    and their own indexing techniques
  • Offer statistics/probabilistic search
  • Return Web addresses

19
Resource discovery on the Web
  • Finding relevant information on the World
    Wide Web has become increasingly problematic
    due to the explosive growth of networked
    resources. Current Web indexing evolved rapidly
    to fill the demand for resource discovery tools,
    but that indexing, while useful, is a poor
    substitute for richer varieties of resource
    description.
  • Dublin Core Metadata Initiative
    lthttp//www.ietf.org/rfc/rfc2413.txtgt

20

The problem of the noise in the resource
discovery
  • There is the need of some form of cataloguing
    of the resources available on Internet in order
    to achieve a good balance between recall and
    precision
  • Descriptive rules must be suitable for all the
    types of information resources

Dublin Core Metadata Format
21

What is Dublin Core?
  • Dublin Core metadata is used to supplement
    existing methods for searching and indexing
    Web-based metadata, regardless of whether the
    corresponding resource is an electronic document
    or a "real" physical object.
  • Dublin Core metadata provides card catalog-like
    definitions for defining the properties of
    objects for Web-based resource discovery systems.

22
What is the DC Metadata Element Set?
  • It is a set of 16 descriptive semantic
    definitions. It represents a core set of elements
    likely to be useful across a broad range of
    vertical industries and disciplines of study
  • Title - Creator
  • Subject - Description
  • Publisher - Contributor
  • Date - Type
  • Format -
    Identifier
  • Source - Language
  • Relation - Coverage
  • Rights - Audience

23
Who can benefit from using DC metadata?
  • Dublin Core metadata is being used as the basis
    for descriptive systems by several interest
    groups such as
  • educational organizations
  • libraries
  • government institutions
  • scientific research sectors
  • Web page authors
  • businesses requiring more searchable sites
  • corporations with vast knowledge management
    systems

24
More Info about DC and metadata
  • http//dublincore.org
  • http//dublincore.org/usage/terms/dc/current-eleme
    nts/
  • Maria Bruna Baldacci, Rappresentazioni
    formalizzate, http//dlibcenter.iei.pi.cnr.it/it/
    index.html

25
First generation DLs in the US
  • 1994-1998 Digital Library Initiative Phase I
  • Funded by
  • National Science Foundation(NSF)
  • Department of Defense Advanced Research Project
    Agency (DARPA)
  • National Aeronautics and Space Administration
    (NASA)
  • Objective
  • The focus is to dramatically advance the
    means to collect, store, and organize information
    in digital forms, and make it available for
    searching, retrieval, and processing via
    communication networks all in user friendly
    ways

26
First generation DLs in Europe
  • 1996
  • ERCIM Technical Reference Digital Library

ERCIM European Consortium for Informatics
and Applied Mathematics
27
An example the NCSTRL DL
  • Networked Compurer Science Technical Reports
    Library (NCSTRL)
  • Focus
  • Proving the possibility of creating a DL as a
    federation of distributed services
  • Result
  • System operational on around one hundred of
    widespread servers

28
Distributed services
Search service
Internet
Users access the system through a Web interface
Repository Service
Repository Service
Repository service
Browse Service
The services are distributed on the
Internet. They communicate through an
established protocol.
Interface Service
Loan Service?
29
NCSTRL (cont.)
  • Documents
  • Computer Science Technical Reports published by
    more than one hundred research institutions
  • Descriptive metadata format
  • Author, title, abstract
  • (provided by the author of the doc)
  • Services
  • Search and browse on author, title and abstract
  • Submission is carried out by the author by
    sending the document and its metadata to the
    NCSTRL administrator

30
Another example the Informedia DL
  • Centralized audio-video DL system
  • Focus
  • Automatic content metadata extraction through
    the integration of various technologies
  • Speech understanding for automatically derived
    transcripts
  • Face, text and object recognition
  • Key frame extraction and indexing
  • Geocoding
  • Topic assignement

31
The Informedia DL
  • Documents
  • Audio-video resources (mainly News)
  • Metadata
  • Terms extracted from the transcript and from the
    image captions
  • Locations
  • Keyframe
  • Faces
  • Name of the speakers
  • Video abstract

32
The Informedia DL
  • Services
  • Search based on
  • Free text
  • Image similarity
  • Face and object similarity
  • Geographical information
  • Multiple presentation styles of query results

33
Informedia DL underlying technologies
  • Automatic indexing through
  • Speech understanding for automatically derived
    transcripts
  • Face, text and object recognition
  • Key frame extraction and indexing
  • Geocoding
  • Topic assignment
  • Automatic abstract generation

34
The Informedia DL example 1
35
The Informedia DL example 2
36
From the first to the second generation DLs
  • A DL is not only a instrument for a
  • wider
  • cheaper
  • faster
  • dissemination of information but it can also be
  • a mean for supporting the communication and
    collaboration between the members of a community
    of interest

37
Second generation DLs
  • A DL can offer more
  • New types of digital objects
  • New services

38
New types of digital objects
  • Multimedia
  • Structured
  • Annotated
  • Multilingual

The new document types enrich the possible forms
of remote collaboration among the members of a
community of interest
39
Multimedia documents
  • Videos and slides of tutorials, seminars,
    lectures
  • Training sessions
  • Project presentations
  • Demos

40
Structured documents
41
Annotated documents
  • rating
  • comment
  • description
  • link
  • agreement
  • disagreement
  • explanation
  • on the whole document or on its parts
  • authored by different people
  • public or restricted

42
Multilingual documents
  • Documents in different languages can be
    maintained in the same DL
  • These documents can be accessed by querying in
    the language of the document and in any other
    supported language

43
New Services (1)
  • New document types impose a re-thinking of the
    traditional library services
  • Submission
  • Description
  • Search
  • Dissemination

44
An example the acquisition of video
documents
It must be possible to structure the video into
meaningful parts (sequences, scenes, frames)
45
The description of video documents
and describe the video and its parts separately
46
Another example the search
  • Multiple search types options
  • Free text search
  • Fielded search
  • Monolingual and cross language search
  • Similarity search
  • Search using the doc structure
  • Search on annotations
  • ..

47
An example of complex query
  • All the seminars
  • such that contain a slide such that
  • is about XML
  • and contain a video such that
  • there is an image of Serge Abiteboul
  • and have been created after December 2000
  • and have a good rating

48
New services (2)
  • New services (not necessarily document centered)
    enabled by recently developed technologies, can
    be included in a DL to improve its potential
    usages
  • Recommenders
  • Co-operative work services
  • Peer-reviewing supporting services
  • Authoring services
  • E-learning tools
  • .

49
An example a collaborative environm.
50
An example a collaborative environm.
51
An example a recommender system
52
A new research trend from a DL ..
  • DL have been developed as ad-hoc systems
  • These systems require a great investment in terms
    of man power and technologies for the
    implementation of the software and their
    maintenance
  • Skilled personnell is needed
  • Few communities can build their own DL

53
to a digital library service system
  • A digital library service system
  • Flexible and open DL system that offers DL
    services. It can be customized according to the
    characteristics of the context where the DL must
    operate and to the needs of its users
  • (like a DBMS)

54
Example of customization dimensions
  • Service specific parameters
  • Metadata formats
  • Document types
  • Annotation model
  • Controlled vocabulary
  • Query language
  • Formats of the results

55
Customization is not enough
  • Each community needs specific services in
    addition to the basic services
  • (basic services e.g. search and browse)
  • For example
  • A community of physics may need a specific
    service to test the consistency of the results
    published
  • A worldwide community of medicine may need a
    translation service

56
Openness is also required
  • Open means that new services can be easily
    added (expandability)
  • Each community of users can add their own
    specific services
  • The use of a DL may raise new requirements other
    services can be added over the DL lifetime
    (dynamic expandability) to cover emerging needs

57
Another trend exploiting existing content
  • The production of the DL content is a very
    expensive process
  • DLs can also be built by exploiting content
    stored by existing distributed heterogeneous
    sources

58
The reference model
Each service operates on the data of multiple
archives
data providers open their archives
59
Data and service providers
  • An existing source can act as a data provider
    and a service provider
  • Advantages of this approach
  • Third parties services that operate on existing
    data can be implemented
  • A source can be accessed through advanced
    services built by others

60
Interoperability solutions
  • Several solutions are possible
  • The services apply schema mappings
  • Data providers implement a more or less complex
    protocol (e.g. OAI)
  • Automatic mapping generation (a current research
    topic)

61
An example NSDL
  • The National Science, Mathematics, Engineering
    and Technology Education Digital Library (NSDL)
  • Over the next five years NSDL is expected to
    serve millions of users and provide access to
    tens of millions of digital resources

62
NSDL core architecture
Portals
Metadata Repository
Users
Search Discovery
Services
Direct entry
Gathering
OAI Harvest
Collections
63
Spectrum of interoperability
  • To achieve widespread adoption, the cost of
    adoption must be low
  • Few collections have metadata conforming to
    common and well-established standards, if they
    have metadata at all
  • Sources do not necessarily implement a protocol
    that allows harvesting of resources

64
The NSDL metadata strategy
  • Collect (through a variety of ingesting
    mechanisms) item metadata from cooperating
    collections in any of eight supported native
    formats
  • When appropriate crosswalk native metadata to
    Qualified Dublin Core which will provide a lingua
    franca for interoperability
  • When item-level metadata does not exist and where
    possible, process content and generate metadata
    automatically
  • Accept that item level metadata will not always
    exists. Concentrate limited human effort on the
    creation of this collection-level metadata

65
Mechanisms for enter of metadata
  • Metadata ingest via Open Archives Initiative
    protocol for metadata harvesting
  • Metadata ingest via FTP, e-mail or web upload
  • (XML-base text file, Excel spreadsheet,
    tab-delimited text file)
  • Metadata ingest by direct entry
  • (by authorised users)
  • Metadata ingest by gathering
  • (web crawling automatic metadata generation)

66
User interface through portals
  • The NSDL users will be very diverse, including
    students, instructors, the public at all levels,
    librarians, community interest groups, NSDL
    federated partners
  • Access to the DL will be through portals (main
    portal, specialised portals, personalized
    portals)
Write a Comment
User Comments (0)
About PowerShow.com