Title: The Fedora Project March 19, 2003 ISTEC Symposium, Brazil
1The Fedora Project March 19, 2003ISTEC
Symposium, Brazil
- Sandy Payette
- Cornell Information Science
2Motivation
- The Problem of Complex Content
3Digital Library Contentnot just documents ...
- Complex, compound, dynamic objects
4Research Questions
- How can clients interact with heterogeneous
collections of complex objects in a simple and
interoperable manner? - How can complex objects be designed to be both
generic and genre-specific at the same time? - How can we associate services and tools with
objects to provide different presentations or
transformations of the object content? - How can we associate fine-grained access control
policies with specific objects, or with groups of
objects? - How can we facilitate the long-term management
and preservation of complex objects that have
dependencies on distributed content and services?
5The Flexible Extensible Digital Object Repository
Architecture (FEDORA)
- DARPA and NSF-funded research at Cornell
(1997-present) - CORBA-based reference implementation
(Payette/Lagoze) - Extensive interoperability testing (with
Arms/Blanchi/Overly) - Policy Enforcement (Payette/Schneider)
- Interpreted and re-implemented at U of Virginia
(1999-) - Simple web-oriented implementation, focused on
access to collections - Java servlet and relational db
- Testbed of 10,000,000 objects with performance
metrics (1999-2001) - Mellon-Funded FEDORA Software(2002-)
- University of Virginia and Cornell - joint
development - Open source
- Web services and XML
- Mediation of distributed services
- Preservation focus
6Fedora Key Features
- Open System public APIs, exposed as web
services - Flexible Digital Object Model
- XML submission and storage (METS Schema)
- Local and distributed content
- Data (any type) and metadata (any schema DC,
other) - Supports inter-relationships among objects
- Behavior contracts for objects
- Associate services with objects
- Objects can provide launch-pad or tool to use
object content - Repository System
- Management Service - manage digital resources,
metadata, as well as computer programs, services
and tools that support them - Access Service repository search and object
disseminations - Mediation - interacts with other distributed web
services for content transformation and
presentation - OAI Provider
- Access Control
- Preservation service (future release)
7RequirementsHeterogeneous Digital Collections
8Shortcomings of commercial digital library
products
- Narrow focus on specific media formats (e.g.
image databases, document management) - Fail to effectively address interrelationships
among digital entities - Fail to address interoperability no open
interfaces to facilitate sharing of services no
standard protocols for cross-system
interoperability - Fail to provide facilities for managing programs
and tools that are integral to delivering digital
content. - Not extensible does not enable easy integration
of new tools and services - Do not address fine-grained access control and
preservation issues.
9The Fedora Architecture
- Digital Object Model
- The Repository
- Web Services
10FEDORA Basic Object Architecture
- Digital Object Model
- Container to aggregate digital content of any
type - Data or metadata
- Local or distributed
- Behavior contracts
- Definitions of abstract operations
- Fulfillment via bindings to external services
- Enables multiple disseminations of content
11Digital Object Model Functional View
Application
services
12Digital Object Model Architectural View
Globally unique persistent id
Persistent ID (
PID
)
Public view access methods for obtaining
disseminations of digital object content
Disseminators
Internal view metadata necessary to manage the
object
System
Metadata
Datastreams
Protected view content that makes up the
basis of the object
13Digital Object Model Example Disseminators
- Get Profile
- List Items
- Get Item
- List Methods
- Get DC Record
Persistent ID (
PID
)
Disseminators
Default
Simple Image
- Get Thumbnail
- Get Medium
- Get High
- Get VeryHigh
System
Metadata
Datastreams
14Object Behavior Contracts
Behavior Definition Object
behavior subscription
Data Object
behavior contract
data contract
Web Service
Behavior Mechanism Object
15FEDORA Basic Repository Architecture
- Repository System
- Object Management
- Lifecycle (Ingest/create ? Store ? Delete ?
Approve ? Purge) - Validation
- PID Generation
- Version management
- Access Control
- Preservation support
- Object Access
- Object Dissemination
- Object Reflection
- Service Mediation
16Fedora Implementation
- Understanding the system implementation
- Web Services
- Server Design
17What is a Web Service?
- A distributed application that runs over the
internet. - A web application that publishes an open
interface through which clients can send requests
and received responses - Standards
- Transport protocol HTTP, others
- Messaging protocol SOAP, HTTP GET/POST
- Message encoding XML
- Service description WSDL
18Fedora and Web Services
- Fedora Repository system is a web service
- Access/Search (API-A) and Management (API-M)
- Service descriptions published using WSDL
- Both SOAP and HTTP bindings
- Back-end services
- Digital object behaviors implemented as linkages
to other distributed web services - Service binding metadata (WSDL) stored in special
Fedora Behavior Mechanism objects. - Fedora acts as mediator to these services.
19Fedora Repository SystemClient and Web Service
Interactions
Backend
Frontend
Fedora Repository System
Content Transform Service
client application
client application
user
Service
Web Service
Web Service Dispatch
Content Transform Service
web browser
user
Service
20Fedora Server Design
- 3-Tiered Architecture
- Modular Extensible
- System Diagram
21Server Design 3 Layers
22Fedora System Diagram
23Open Source Fedora Implementation Technologies
- Fedora Web Services Layer
- Apache Axis for SOAP over HTTP
- Apache Tomcat 4.1
- Core Repository System
- Sun Java J2SDK1.4
- Xerces 2-2.0.2 for XML parsing and validation
- Saxon 6.5 for XSLT transformation
- Schematron 1.5 for validation
- MySQL-2.23.52 and Mckoi relational database
- Deployment Platforms
- Windows 2000, NT, XP
- Solaris
- Linux
24DEMO Use Cases
25Release Plan
- Phase 1 Fedora 1.0 (May 1, 2003 public)
- Phase 2/3 (2003-2005)
- Advanced Access Control
- Preservation Service
- R2R Repository Federation
- Reliability
- Fault tolerance
- Mirroring and replication
- Performance tuning
- Caching
- Load balancing
- Storage scalability
26Deployment Partners
- Los Alamos National Laboratory Research Library
- Library of Congress Motion Picture and Recorded
Sound Division - Indiana University Digital Library group
- Kings College London Humanities Computing
- NYU Humanities Computing
- Northwestern University Academic Computing
- Oxford Oxford Digital Library and The Refugee
Studies Center - Tufts Digital Collections and Archives Department
27More Information