Title: The MellonFunded Fedora Project A Briefing for the Los Alamos National Laboratory August 26, 2002
1The Mellon-Funded Fedora ProjectA Briefing for
the Los Alamos National LaboratoryAugust 26,
2002
- Sandy Payette
- Cornell Information Science
2Motivation
- The Problem of Complex Content
3Digital Library Contentnot just documents ...
- Complex, compound, dynamic objects
4Key Research Questions
- How can clients interact with heterogeneous
collections of complex objects in a simple and
interoperable manner? - How can complex objects be designed to be both
generic and genre-specific at the same time? - How can we hide the complexity of an objects
underlying data structures and relationships from
clients? - How can we associate services and tools with
objects to provide different presentations or
transformations of the object content? - How can we associate specialized, fine-grained
access control policies with specific objects, or
with groups of objects?
5The Flexible Extensible Digital Object Repository
Architecture (FEDORA)
- Developed as a DARPA and NSF-funded research
project at Cornell (1997-present) - CORBA-based reference implementation
- Extensive interoperability testing
- Policy Enforcement
- Interpreted and re-implemented at University of
Virginia (1999) - Simple web-oriented implementation, focused on
access to collections - Java servlet and relational db
- Virginia prototype supported testbed of
10,000,000 digital objects with very good results
(1999-2001) - Andrew W. Mellon Foundation granted Virginia and
Cornell 1,000,000 to develop a full-featured
production FEDORA system that that is web-based
(2002)
6FEDORAOriginal Research Goals
- Flexibility object model that fits many
different contexts - Management - of distributed digital content and
services - Access stable interfaces to digital objects
behavior-centric - Interoperability among digital objects and
repositories - Extensibility easy evolution of object
behaviors - Security rights management and access control
- Preservation of content, plus look and feel
7Model for Collaboration Digital Library Research
and Real Library Requirements
- University of Virginia developing extensive
digital collections since 1992 - Virginia Digital Library RD Group chartered with
finding solution for integration - Formal Requirements analysis
- Search for commercial products
- Discovery Cornell research parallels stated
requirements
8Virginia RequirementsHeterogeneous Digital
Collections
9Virginia RequirementsManaging the Collections
- Scalability to support hundred of millions of
objects - Persistent unique names for all resources without
respect to machine address - Support inter-relationships among objects
- Manage the digital resources and metadata, as
well as computer programs, services and tools
that support them - Enforce appropriate policies for use of Library
resources - Provide a high level of security
- Support preservation activities appropriately
10Virginia RequirementsDelivering the Collections
- Well-architected, flexible relationships between
services/tools and digital content - Digital objects, themselves, have ability to
provide users with an appropriate launch-pad or
tool to use the object content - Every resource can be used in any number of
contexts - Move towards a digital library that is
configurable by an aware user - Provide resource discovery (searching) across the
full collection - Deep searching in particular collections
11Shortcomings of commercial digital library
products
- Narrow focus on specific media formats (e.g.
image databases, document management) - Fail to effectively address interrelationships
among digital entities - Fail to address interoperability no open
interfaces to facilitate sharing of services no
standard protocols for cross-system
interoperability - Fail to provide facilities for managing programs
and tools that are integral to delivering digital
content. - Not extensible does not enable easy integration
of new tools and services
12The Fedora Architecture
13FEDORA Basic Architectural Abstractions
- Digital Object
- Container for aggregating any digital content
- Content disseminations based on behavior
definitions - Extensibility of behavior mechanisms
- Repository
- Service layer for contained Digital Objects
- Object lifecycle management
- Access management
14FEDORA Digital Object
Globally unique persistent id
Persistent ID (
PID
)
Public view access methods for obtaining
disseminations of digital object content
Disseminators
Internal view metadata necessary to manage the
object
System
Metadata
Datastreams
Protected view content that makes up the
basis of the object
15FEDORA Digital Object Architecture
Behavior Definition
Object
Data Object
Persistent ID (PID)
Persistent ID (
PID
)
System
Metadata
Datastreams
Disseminators
Service Definition Metadata
Behavior Mechanism
Object
System
Metadata
Persistent ID (PID)
Datastreams
System
Metadata
Datastreams
Service Binding Metadata
16Data Object Association to External Behavior
Service
17Digital Object Interoperability Common Behaviors
for Variable Content
Functional equivalency
18Digital Object Extensibility Adding New Behaviors
Digital Object
The same underlying content...
to create new disseminations not originally
conceived of
can be operated on in novel ways
19Virginia Prototype
- Content Models and Fedora Demos
20General Image Content Model
(Mycenae image example)
21MrSID Image Content Model
(Pavilion III image example)
22Finding Aid Content Model
(Finding Aid example)
23TEI Letter Content Model
(TEI letter example)
24TEI Book Content Model
(TEI book example)
25GDMS Content Model
(Mycenae example)
(lawn example)
26Numerical Data Content Model
(ICPSR survey example)
27The New FEDORA
- Technical Specifications Part I
28Background Material
- Overview of Web Service Technologies
29What is a Web Service?
- A distributed application that runs over the
internet. - An addressable network endpoint which receives
structured messages returns structured responses. - A web application that publishes an open
interface through which clients can send requests
and received responses.
30How is this different from plain old web
applications?
- Formally defined API (application programming
interface) defines a set of abstract operations
for a web service - Published bindings for client to run operations
- Standard protocol for invoking operations on the
service. - XML as standard means of encoding service
requests and responses.
31Why are Web Services important?
- Interoperability
- Web applications can interact and build upon each
other - Data is transferred in an interoperable manner
(e.g., over HTTP) - Data is encoded in an interoperable format (XML)
- Works in decentralized, distributed,
operating-system independent environment. - Standards-oriented
- Means to expose complex operations with rich data
typing (via XML Schema language typing) - Ease of integrating distributed systems via the
Web - W3C effort to develop this service architecture
32How are Web Services Implemented?
- The Simple Object Access Protocol (SOAP) Approach
- SOAP is a messaging protocol that can run over
different transport protocols (e.g., HTTP, SMTP) - Operation oriented (send a request to a end
point) - Like CORBA, RMI, DCOMbut for Web and simpler
- Application APIs can be defined and published
using the Web Service Description Language (WSDL) - Requests and responses sent as XML messages
- Supports simple and complex data typing in
requests and responses - Supports transmission of binary data within
requests or response packages
33How are Web Services Implemented?
- The REST (Representational State Transfer)
Approach - URI HTTP XML
- URI/resource driven message built into a URI
(URL) - HTTP GET or POST
- Response is XML data
- Issues
- Not a standard, but a style of doing web apps
arguably it just gives a fancy name to how lots
of people do applications on the web by default
nothing really new here just argues to do things
the way we have been, maybe a little more
standard by using XML. - Fragile service definition URLs change
- No data typing on requests
- Limited ability to transmit complex requests on
URL - W3C behind SOAP, but only one strong voice out
there for REST (Prescod).
34Example of Web Service using SOAP
My Application
SOAP Request (XML)
Google Web Service
SOAP/HTTP
SOAP/HTTP
doSpellingSuggestion(payet)
payette
SOAP Response (XML)
35XML SOAP Request
lt?xml version"1.0" encoding"UTF-8"?gt SOAP-ENVEn
velope xmlnsSOAP-ENVhttp//schemas.xmlsoap.org/s
oap/envelope/
xmlnsxsi"http//www.w3.org/1999/XMLSchema-inst
ance
xmlnsxsd"http//www.w3.org/1999/XMLSchema"gt
ltSOAP-ENVBodygt ltmdoSpellingSuggestion
xmlnsm"urnGoogleSearch"gt ltkeygt/e325JlNPASJult/k
eygt ltphrasegtpayetlt/phrasegt lt/mdoSpellingSuggest
iongt lt/SOAP-ENVBodygt lt/SOAP-ENVEnvelopegt
36XML SOAP Response
lt?xml version"1.0" encoding"UTF-8"?gt ltSOAP-ENVE
nvelope xmlnsSOAP-ENV"http//schemas.xmlsoap.org
/soap/envelope/"
xmlnsxsi"http//www.w3.org/1999/XMLSche
ma-instance"
xmlnsxsd"http//www.w3.org/1999/XMLSchema"gt
ltSOAP-ENVBodygt ltns1doSpellingSugges
tionResponse xmlnsns1"urnGoogleSearch"
SOAP-ENVencodingStyle"http//sc
hemas.xmlsoap.org/soap/encoding/"gt
ltreturn xsitype"xsdstring"gtpa
yettelt/returngt lt/ns1doSpellingSuggestionRespons
egt lt/SOAP-ENVBodygt lt/SOAP-ENVEnvelope
gt
37New Fedora Key Features
- Repository system exposed as two related Web
services - described using WSDL
- both SOAP and HTTP bindings
- Digital objects encoded and stored as XML using
Metadata Encoding and Transmission Standard
(METS) - Digital object behaviors implemented as linkages
to distributed web services (also described using
WSDL) - Digital objects support versioning of both
content and services.
38New Fedora System
39Web Service Communication View
40The New FEDORA
- Encoding Digital Objects in XML
41Metadata Encoding and Transmission Standard (METS)
- XML standard for encoding descriptive,
administrative, and structural metadata of
digital library objects - Developed under auspices of the Digital Library
Federation - METS standard maintained by the Network
Development and MARC Standards Office of the
Library of Congress
http//www.loc.gov/standards/mets/
42METS Schema
- METS is written in the XML Schema Language
- METS defines four sections for an object
- Descriptive metadata
- Administrative metadata
- File group
- Structure map
- METS goals include
- Facilitate management of objects within a
repository - Provide a standard format for exchange of objects
between repositories - Provide standard format for transmission of
objects to users for rendering (via tools or
applications)
43Mapping Fedora to METS
44Mapping Fedora to METS
45Digital Object Versioning
- Versioning within Data Objects
- Datastream versioning
- Date/time stamped
- New version every time datastream is modified
- Disseminator versioning
- Date/time stamped
- New version if disseminator is modified to
reference a different Behavior Mechanism (better
mousetrap) - Versioning within Behavior Definition and
Mechanism Objects - New versions of WSDL metadata recorded in these
objects (with date/time stamps) - This deserves much more explanation that this
slide can offer! ?
46METS Sample Fedora Object
Click here for image digital object
47Fedora Dissemination Database
- Alternate form of object storage that will act as
a cache of most recent versions of digital
objects - Ensure high-performance access (disseminations)
- Repository system replicates from authoritative
XML version of objects to relational database - Plan to phase-out the database in Phase 2-3
- Access sub-system to work completely off the XML
storage, as XML tools improve performance-wise. - Pursue different caching strategies as necessary
48The New FEDORA
49Fedora Repository System
50FEDORA Web Service API Definitions
- API-M interface for management sub-system
- Operations necessary to create and maintain
objects and their components - Interface directly with authoritative XML version
of object - API-A interface for access sub-system
- Operations necessary for clients to perform
disseminations on objects in the repository - No direct access to object internal structure or
components - Will work against cached representation of object
to optimize performance.
51Fedora Management Sub-System Implements API-M
- Object Management
- Object Component Management
- Object Validation
- PID Generation
- Interacts with Storage Subsystem
52Other Sub-systems
- Storage Sub-system
- Responsible for all matters pertaining to reading
and writing objects from persistent storage - Modular design can configure different object
readers and writers to suit the context. - Modular design can configure different data
store strategies (in phase 1 will have file
system and relational database) - Security Sub-system
- Store access control policies for repository and
objects - Store user and group information
- Enforcement of policies
53Security Sub-systemAccess Control Policies
- General Purpose
- Only repository managers can add new
disseminators to digital objects in the
repository. - Object-Specific (e.g., Lecture object)
- Guests may view course syllabus and slides 1-10
of Lecture 1, but may not view the lecture video
or any other slides. - Students may not view Lecture 2 video unless
they submit assignment for Lecture 1.
See research at http//www.cs.cornell.edu/payett
e/prism/security/policy.htm
54Fedora Repository System
55Fedora Access Sub-System Implements API-A
- Object Reflection
- Identify the types of Behavior Definitions to
which an object subscribes (via the objects
Disseminators) - Reflect on a Behavior Definition to identify the
kinds of disseminations that can be run on the
object (i.e,. as method requests) - Dissemination
- Fulfills requests for particular methods (i.e.,
of a Behavior Definition) to be run on an object - Mediates access to supporting services (i.e.,
Behavior Mechanisms) used to present or transform
datastreams of the object - Returns a view of the objects content to client
56API-A Object Reflection RequestsIdentify Types
of Behavior Definitions
- Each Disseminator is said to subscribe to a
Behavior Definition - It does this by referencing the PID of a
particular Behavior Definition Object. - Each Behavior Definition Object contains metadata
that describes a set of related behaviors (or
operations) - Via API-A, clients can send a service request to
determine what Behavior Definitions an object
subscribes to.
57API-A Object Reflection RequestGet Behavior
Methods
- Each Disseminator has a Behavior Definition
Object associated with it. - Each Disseminator has a Behavior Mechanism Object
associated with it that describes how to bind to
a particular service that complies with the
Disseminators Behavior Definition. - Via API-A, clients can send a service request to
obtain the list of method definitions associated
with a particular Disseminator of the digital
object.
58API-A Object Reflection Requests
PID 101
59API-A Dissemination Request
- Clients can obtain content from a digital object
with minimal knowledge about the object. - Behavior Definition identifiers and method
definitions are the basis for making
dissemination requests on digital objects - Clients do not need to know particulars of how
to attach to the service (Behavior Mechanism)
that is operating on its behalf. - A dissemination request requires just three
things - Digital Object Identifier (PID)
- Behavior Definition Identifier (BID)
- Method name (and optional parameters) for a
behavior
60API-A Dissemination Request
Bird Digital Library1
White Birds Image 1 Image 2 Image 3
61DisseminationsBenefits
- Simple access dissemination requests shield
clients from the internal structure of digital
objects - Stable interface dissemination requests are like
requests against an abstract interface in that
they are not tied to object implementation
details that may change over time (e.g., storage
locations of datastreams) - Foster Interoperability different digital
objects can vary in both the format of content
and how it is structured, yet we can access them
in a consistent manner via disseminations.
62The New FEDORA
63Fedora Software Deployment Goals
- An efficient, scalable, freely distributable
FEDORA repository system ASAP - Make all software open source
- A complete basic management and access interfaces
with the initial release - Add other important digital library functionality
in later releases - Create multiple testbed repositories to deploy
and evaluate the software - Interoperability testing, including sharing of
content and mechanisms among deployment partner
repositories.
64Deployment Group
- Indiana University Digital Library group
- NYU Humanities Computing group
- Tufts Digital Collections and Archives
Department - Kings College London Humanities Computing
- Oxford Oxford Digital Library and The Refugee
Studies Center - Library of Congress Motion Picture and Recorded
Sound Division - Northwestern University library/academic
computing - Los Alamos National Laboratory Research Library
65Fedora Project Plan
- Phase 1 (pre-release Oct 31, 2002 final Jan
2003) - Repository system with management and access
subsystems exposed as web services - Storage subsystem with XML object store and
replication to relational database cache - Object builder tools (GUI and batch)
- Basic set of behavior services
- Phase 2 Add more production support
- Security and policy enforcement
- Additional management tools
- Optimize performance for accessing XML objects
- Object versioning
- Collection objects
- Advanced disk management
- Phase 3 Enhance end-user support
- New kinds of disseminators, with supporting
behavior services - Efficiency and scale optimization
66 FEDORA Web Site www.fedora.info
67Questions and Discussion