The MellonFunded Fedora Project A Briefing for the Los Alamos National Laboratory August 26, 2002 - PowerPoint PPT Presentation

About This Presentation
Title:

The MellonFunded Fedora Project A Briefing for the Los Alamos National Laboratory August 26, 2002

Description:

Los Alamos National Laboratory. August 26, 2002. Sandy Payette. Cornell Information Science ... W3C behind SOAP, but only one strong voice out there for REST (Prescod) ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 68
Provided by: rosswa6
Category:

less

Transcript and Presenter's Notes

Title: The MellonFunded Fedora Project A Briefing for the Los Alamos National Laboratory August 26, 2002


1
The Mellon-Funded Fedora ProjectA Briefing for
the Los Alamos National LaboratoryAugust 26,
2002
  • Sandy Payette
  • Cornell Information Science

2
Motivation
  • The Problem of Complex Content

3
Digital Library Contentnot just documents ...
  • Some familiar objects
  • Complex, compound, dynamic objects

4
Key Research Questions
  • How can clients interact with heterogeneous
    collections of complex objects in a simple and
    interoperable manner?
  • How can complex objects be designed to be both
    generic and genre-specific at the same time?
  • How can we hide the complexity of an objects
    underlying data structures and relationships from
    clients?
  • How can we associate services and tools with
    objects to provide different presentations or
    transformations of the object content?
  • How can we associate specialized, fine-grained
    access control policies with specific objects, or
    with groups of objects?

5
The Flexible Extensible Digital Object Repository
Architecture (FEDORA)
  • Developed as a DARPA and NSF-funded research
    project at Cornell (1997-present)
  • CORBA-based reference implementation
  • Extensive interoperability testing
  • Policy Enforcement
  • Interpreted and re-implemented at University of
    Virginia (1999)
  • Simple web-oriented implementation, focused on
    access to collections
  • Java servlet and relational db
  • Virginia prototype supported testbed of
    10,000,000 digital objects with very good results
    (1999-2001)
  • Andrew W. Mellon Foundation granted Virginia and
    Cornell 1,000,000 to develop a full-featured
    production FEDORA system that that is web-based
    (2002)

6
FEDORAOriginal Research Goals
  • Flexibility object model that fits many
    different contexts
  • Management - of distributed digital content and
    services
  • Access stable interfaces to digital objects
    behavior-centric
  • Interoperability among digital objects and
    repositories
  • Extensibility easy evolution of object
    behaviors
  • Security rights management and access control
  • Preservation of content, plus look and feel

7
Model for Collaboration Digital Library Research
and Real Library Requirements
  • University of Virginia developing extensive
    digital collections since 1992
  • Virginia Digital Library RD Group chartered with
    finding solution for integration
  • Formal Requirements analysis
  • Search for commercial products
  • Discovery Cornell research parallels stated
    requirements

8
Virginia RequirementsHeterogeneous Digital
Collections
9
Virginia RequirementsManaging the Collections
  • Scalability to support hundred of millions of
    objects
  • Persistent unique names for all resources without
    respect to machine address
  • Support inter-relationships among objects
  • Manage the digital resources and metadata, as
    well as computer programs, services and tools
    that support them
  • Enforce appropriate policies for use of Library
    resources
  • Provide a high level of security
  • Support preservation activities appropriately

10
Virginia RequirementsDelivering the Collections
  • Well-architected, flexible relationships between
    services/tools and digital content
  • Digital objects, themselves, have ability to
    provide users with an appropriate launch-pad or
    tool to use the object content
  • Every resource can be used in any number of
    contexts
  • Move towards a digital library that is
    configurable by an aware user
  • Provide resource discovery (searching) across the
    full collection
  • Deep searching in particular collections

11
Shortcomings of commercial digital library
products
  • Narrow focus on specific media formats (e.g.
    image databases, document management)
  • Fail to effectively address interrelationships
    among digital entities
  • Fail to address interoperability no open
    interfaces to facilitate sharing of services no
    standard protocols for cross-system
    interoperability
  • Fail to provide facilities for managing programs
    and tools that are integral to delivering digital
    content.
  • Not extensible does not enable easy integration
    of new tools and services

12
The Fedora Architecture
  • Overview of Basic Model

13
FEDORA Basic Architectural Abstractions
  • Digital Object
  • Container for aggregating any digital content
  • Content disseminations based on behavior
    definitions
  • Extensibility of behavior mechanisms
  • Repository
  • Service layer for contained Digital Objects
  • Object lifecycle management
  • Access management

14
FEDORA Digital Object
Globally unique persistent id
Persistent ID (
PID
)
Public view access methods for obtaining
disseminations of digital object content
Disseminators
Internal view metadata necessary to manage the
object
System
Metadata
Datastreams
Protected view content that makes up the
basis of the object
15
FEDORA Digital Object Architecture
Behavior Definition
Object
Data Object
Persistent ID (PID)
Persistent ID (
PID
)
System
Metadata
Datastreams
Disseminators
Service Definition Metadata
Behavior Mechanism
Object
System
Metadata
Persistent ID (PID)
Datastreams
System
Metadata
Datastreams
Service Binding Metadata
16
Data Object Association to External Behavior
Service
17
Digital Object Interoperability Common Behaviors
for Variable Content
Functional equivalency
18
Digital Object Extensibility Adding New Behaviors
Digital Object
The same underlying content...
to create new disseminations not originally
conceived of
can be operated on in novel ways
19
Virginia Prototype
  • Content Models and Fedora Demos

20
General Image Content Model
(Mycenae image example)
21
MrSID Image Content Model
(Pavilion III image example)
22
Finding Aid Content Model
(Finding Aid example)
23
TEI Letter Content Model
(TEI letter example)
24
TEI Book Content Model
(TEI book example)
25
GDMS Content Model
(Mycenae example)
(lawn example)
26
Numerical Data Content Model
(ICPSR survey example)
27
The New FEDORA
  • Technical Specifications Part I

28
Background Material
  • Overview of Web Service Technologies

29
What is a Web Service?
  • A distributed application that runs over the
    internet.
  • An addressable network endpoint which receives
    structured messages returns structured responses.
  • A web application that publishes an open
    interface through which clients can send requests
    and received responses.

30
How is this different from plain old web
applications?
  • Formally defined API (application programming
    interface) defines a set of abstract operations
    for a web service
  • Published bindings for client to run operations
  • Standard protocol for invoking operations on the
    service.
  • XML as standard means of encoding service
    requests and responses.

31
Why are Web Services important?
  • Interoperability
  • Web applications can interact and build upon each
    other
  • Data is transferred in an interoperable manner
    (e.g., over HTTP)
  • Data is encoded in an interoperable format (XML)
  • Works in decentralized, distributed,
    operating-system independent environment.
  • Standards-oriented
  • Means to expose complex operations with rich data
    typing (via XML Schema language typing)
  • Ease of integrating distributed systems via the
    Web
  • W3C effort to develop this service architecture

32
How are Web Services Implemented?
  • The Simple Object Access Protocol (SOAP) Approach
  • SOAP is a messaging protocol that can run over
    different transport protocols (e.g., HTTP, SMTP)
  • Operation oriented (send a request to a end
    point)
  • Like CORBA, RMI, DCOMbut for Web and simpler
  • Application APIs can be defined and published
    using the Web Service Description Language (WSDL)
  • Requests and responses sent as XML messages
  • Supports simple and complex data typing in
    requests and responses
  • Supports transmission of binary data within
    requests or response packages

33
How are Web Services Implemented?
  • The REST (Representational State Transfer)
    Approach
  • URI HTTP XML
  • URI/resource driven message built into a URI
    (URL)
  • HTTP GET or POST
  • Response is XML data
  • Issues
  • Not a standard, but a style of doing web apps
    arguably it just gives a fancy name to how lots
    of people do applications on the web by default
    nothing really new here just argues to do things
    the way we have been, maybe a little more
    standard by using XML.
  • Fragile service definition URLs change
  • No data typing on requests
  • Limited ability to transmit complex requests on
    URL
  • W3C behind SOAP, but only one strong voice out
    there for REST (Prescod).

34
Example of Web Service using SOAP
My Application
SOAP Request (XML)
Google Web Service
SOAP/HTTP
SOAP/HTTP
doSpellingSuggestion(payet)
payette
SOAP Response (XML)
35
XML SOAP Request
lt?xml version"1.0" encoding"UTF-8"?gt SOAP-ENVEn
velope xmlnsSOAP-ENVhttp//schemas.xmlsoap.org/s
oap/envelope/
xmlnsxsi"http//www.w3.org/1999/XMLSchema-inst
ance
xmlnsxsd"http//www.w3.org/1999/XMLSchema"gt
ltSOAP-ENVBodygt ltmdoSpellingSuggestion
xmlnsm"urnGoogleSearch"gt ltkeygt/e325JlNPASJult/k
eygt ltphrasegtpayetlt/phrasegt lt/mdoSpellingSuggest
iongt lt/SOAP-ENVBodygt lt/SOAP-ENVEnvelopegt
36
XML SOAP Response
lt?xml version"1.0" encoding"UTF-8"?gt ltSOAP-ENVE
nvelope xmlnsSOAP-ENV"http//schemas.xmlsoap.org
/soap/envelope/"
xmlnsxsi"http//www.w3.org/1999/XMLSche
ma-instance"
xmlnsxsd"http//www.w3.org/1999/XMLSchema"gt
ltSOAP-ENVBodygt ltns1doSpellingSugges
tionResponse xmlnsns1"urnGoogleSearch"
SOAP-ENVencodingStyle"http//sc
hemas.xmlsoap.org/soap/encoding/"gt
ltreturn xsitype"xsdstring"gtpa
yettelt/returngt lt/ns1doSpellingSuggestionRespons
egt lt/SOAP-ENVBodygt lt/SOAP-ENVEnvelope
gt
37
New Fedora Key Features
  • Repository system exposed as two related Web
    services
  • described using WSDL
  • both SOAP and HTTP bindings
  • Digital objects encoded and stored as XML using
    Metadata Encoding and Transmission Standard
    (METS)
  • Digital object behaviors implemented as linkages
    to distributed web services (also described using
    WSDL)
  • Digital objects support versioning of both
    content and services.

38
New Fedora System
39
Web Service Communication View
40
The New FEDORA
  • Encoding Digital Objects in XML

41
Metadata Encoding and Transmission Standard (METS)
  • XML standard for encoding descriptive,
    administrative, and structural metadata of
    digital library objects
  • Developed under auspices of the Digital Library
    Federation
  • METS standard maintained by the Network
    Development and MARC Standards Office of the
    Library of Congress

http//www.loc.gov/standards/mets/
42
METS Schema
  • METS is written in the XML Schema Language
  • METS defines four sections for an object
  • Descriptive metadata
  • Administrative metadata
  • File group
  • Structure map
  • METS goals include
  • Facilitate management of objects within a
    repository
  • Provide a standard format for exchange of objects
    between repositories
  • Provide standard format for transmission of
    objects to users for rendering (via tools or
    applications)

43
Mapping Fedora to METS
44
Mapping Fedora to METS
45
Digital Object Versioning
  • Versioning within Data Objects
  • Datastream versioning
  • Date/time stamped
  • New version every time datastream is modified
  • Disseminator versioning
  • Date/time stamped
  • New version if disseminator is modified to
    reference a different Behavior Mechanism (better
    mousetrap)
  • Versioning within Behavior Definition and
    Mechanism Objects
  • New versions of WSDL metadata recorded in these
    objects (with date/time stamps)
  • This deserves much more explanation that this
    slide can offer! ?

46
METS Sample Fedora Object
Click here for image digital object
47
Fedora Dissemination Database
  • Alternate form of object storage that will act as
    a cache of most recent versions of digital
    objects
  • Ensure high-performance access (disseminations)
  • Repository system replicates from authoritative
    XML version of objects to relational database
  • Plan to phase-out the database in Phase 2-3
  • Access sub-system to work completely off the XML
    storage, as XML tools improve performance-wise.
  • Pursue different caching strategies as necessary

48
The New FEDORA
  • Repository System Design

49
Fedora Repository System
50
FEDORA Web Service API Definitions
  • API-M interface for management sub-system
  • Operations necessary to create and maintain
    objects and their components
  • Interface directly with authoritative XML version
    of object
  • API-A interface for access sub-system
  • Operations necessary for clients to perform
    disseminations on objects in the repository
  • No direct access to object internal structure or
    components
  • Will work against cached representation of object
    to optimize performance.

51
Fedora Management Sub-System Implements API-M
  • Object Management
  • Object Component Management
  • Object Validation
  • PID Generation
  • Interacts with Storage Subsystem

52
Other Sub-systems
  • Storage Sub-system
  • Responsible for all matters pertaining to reading
    and writing objects from persistent storage
  • Modular design can configure different object
    readers and writers to suit the context.
  • Modular design can configure different data
    store strategies (in phase 1 will have file
    system and relational database)
  • Security Sub-system
  • Store access control policies for repository and
    objects
  • Store user and group information
  • Enforcement of policies

53
Security Sub-systemAccess Control Policies
  • General Purpose
  • Only repository managers can add new
    disseminators to digital objects in the
    repository.
  • Object-Specific (e.g., Lecture object)
  • Guests may view course syllabus and slides 1-10
    of Lecture 1, but may not view the lecture video
    or any other slides.
  • Students may not view Lecture 2 video unless
    they submit assignment for Lecture 1.

See research at http//www.cs.cornell.edu/payett
e/prism/security/policy.htm
54
Fedora Repository System
55
Fedora Access Sub-System Implements API-A
  • Object Reflection
  • Identify the types of Behavior Definitions to
    which an object subscribes (via the objects
    Disseminators)
  • Reflect on a Behavior Definition to identify the
    kinds of disseminations that can be run on the
    object (i.e,. as method requests)
  • Dissemination
  • Fulfills requests for particular methods (i.e.,
    of a Behavior Definition) to be run on an object
  • Mediates access to supporting services (i.e.,
    Behavior Mechanisms) used to present or transform
    datastreams of the object
  • Returns a view of the objects content to client

56
API-A Object Reflection RequestsIdentify Types
of Behavior Definitions
  • Each Disseminator is said to subscribe to a
    Behavior Definition
  • It does this by referencing the PID of a
    particular Behavior Definition Object.
  • Each Behavior Definition Object contains metadata
    that describes a set of related behaviors (or
    operations)
  • Via API-A, clients can send a service request to
    determine what Behavior Definitions an object
    subscribes to.

57
API-A Object Reflection RequestGet Behavior
Methods
  • Each Disseminator has a Behavior Definition
    Object associated with it.
  • Each Disseminator has a Behavior Mechanism Object
    associated with it that describes how to bind to
    a particular service that complies with the
    Disseminators Behavior Definition.
  • Via API-A, clients can send a service request to
    obtain the list of method definitions associated
    with a particular Disseminator of the digital
    object.

58
API-A Object Reflection Requests
PID 101
59
API-A Dissemination Request
  • Clients can obtain content from a digital object
    with minimal knowledge about the object.
  • Behavior Definition identifiers and method
    definitions are the basis for making
    dissemination requests on digital objects
  • Clients do not need to know particulars of how
    to attach to the service (Behavior Mechanism)
    that is operating on its behalf.
  • A dissemination request requires just three
    things
  • Digital Object Identifier (PID)
  • Behavior Definition Identifier (BID)
  • Method name (and optional parameters) for a
    behavior

60
API-A Dissemination Request
Bird Digital Library1
White Birds Image 1 Image 2 Image 3
61
DisseminationsBenefits
  • Simple access dissemination requests shield
    clients from the internal structure of digital
    objects
  • Stable interface dissemination requests are like
    requests against an abstract interface in that
    they are not tied to object implementation
    details that may change over time (e.g., storage
    locations of datastreams)
  • Foster Interoperability different digital
    objects can vary in both the format of content
    and how it is structured, yet we can access them
    in a consistent manner via disseminations.

62
The New FEDORA
  • Software Deployment

63
Fedora Software Deployment Goals
  • An efficient, scalable, freely distributable
    FEDORA repository system ASAP
  • Make all software open source
  • A complete basic management and access interfaces
    with the initial release
  • Add other important digital library functionality
    in later releases
  • Create multiple testbed repositories to deploy
    and evaluate the software
  • Interoperability testing, including sharing of
    content and mechanisms among deployment partner
    repositories.

64
Deployment Group
  • Indiana University Digital Library group
  • NYU Humanities Computing group
  • Tufts Digital Collections and Archives
    Department
  • Kings College London Humanities Computing
  • Oxford Oxford Digital Library and The Refugee
    Studies Center
  • Library of Congress Motion Picture and Recorded
    Sound Division
  • Northwestern University library/academic
    computing
  • Los Alamos National Laboratory Research Library

65
Fedora Project Plan
  • Phase 1 (pre-release Oct 31, 2002 final Jan
    2003)
  • Repository system with management and access
    subsystems exposed as web services
  • Storage subsystem with XML object store and
    replication to relational database cache
  • Object builder tools (GUI and batch)
  • Basic set of behavior services
  • Phase 2 Add more production support
  • Security and policy enforcement
  • Additional management tools
  • Optimize performance for accessing XML objects
  • Object versioning
  • Collection objects
  • Advanced disk management
  • Phase 3 Enhance end-user support
  • New kinds of disseminators, with supporting
    behavior services
  • Efficiency and scale optimization

66
FEDORA Web Site www.fedora.info
67
Questions and Discussion
Write a Comment
User Comments (0)
About PowerShow.com