Title: The Mellon-Funded Fedora Project A Briefing for the Cornell University Library January 24, 2002
1The Mellon-Funded Fedora ProjectA Briefing for
the Cornell University LibraryJanuary 24, 2002
- Sandy Payette
- Thorny Staples
- Ross Wayland
2The Mellon Fedora Project
3The FEDORA Open-source Development
Project January 24, 2002
4Digital Library Projects
- Web sites with links to on-line resources
- Specific, boutique collections
- Large collections in one or two area
- A broad research collection in all media types
and content areas - Ideally, the digital library includes all
information
5Library Digital Centers
6Library Digital Collections
Books Rare Books Multimedia Music
E-texts Maps Photographs Statistics
Video Art Manuscripts Data
Images 3-D Objects Journals Sound Effects
7Other Library Services
- Electronic Cataloger in the Cataloging Department
- Digital Library Research and Development
Department - Digital Services Integration (DSI) Coordinator
- Digital Library Production Services
8Other Services Housed in the Library
- The Institute for Advanced Technology in the
Humanities - The Virginia Center for Digital History
- The Teaching Technologies Initiative
- The Media Studies Program Offices
9Information Communities
Richer collections
Community-oriented resources
Discipline-specific services
Specialized access and delivery
10Managing the Collection
- Provide a way to universally name all resources
without respect to machine address - Track all files for resources, metadata and
computer programs consistently - Enforce appropriate policies for use of Library
resources - Provide a high level of security
- Support preservation activities appropriately
11Delivering the Collection
- Deliver tools with content
- Allow every resource to be used in any number of
contexts - Discovery searching across the full collection
- Deep searching in particular collections
- Move towards a library which aware users can
configure for themselves
12Supporting Digital Scholarship
- Supporting the creation of digital scholarly
projects - Collecting born-digital scholarly projects
- For preservation
- Taking over responsibility for primary delivery
- Supporting information communities
13Metadata
- Descriptive metadata that users use to find
things, like traditional library catalog records - Administrative metadata that the library uses
to manage library resources - Structural metadata about the relationships
among resources - Behavioral computer programs that deliver
digital resources to users
14Digital Library Management and Delivery System
15The Flexible Extensible Digital Object Repository
Architecture (FEDORA)
- Developed as an NSF-funded research project at
Cornell - Interpreted and re-implemented at UVA
- Testbed of 10,000,000 digital objects with very
good results - Mellon gave us 1,000,000 to develop a usable
system around FEDORA
16Repository DevelopmentProject Goals
- An efficient, scalable, freely distributable
FEDORA repository system ASAP - A complete basic management interface with the
initial release - Add important digital library functionality in
later releases - Create multiple testbed repositories to deploy
and evaluate the software - Make all software open source
17Deployment Group
- The Digital Library group, Indiana U.
- The Humanities Computing group, New York U.
- The Digital Collections and Archives Department,
Tufts U. - The Humanities Computing group, Kings College
London - The Oxford Digital Library and The Refugee
Studies Center, Oxford U. - Audio/Video Project, Library of Congress
- A library/academic computing group, Northwestern
University
18Project Plan
- Phase 1 Deliver the repository system and the
full management interface - Phase 2 Add more production support
- Security and policy enforcement
- Collection objects
- Disk management
- Phase 3 Enhance end-user support
- Versioning and Editions
- Dynamic, Context Sensitive Behaviors
- Efficiency and scale optimization
19 FEDORA Development Project Description http//f
edora.comm.nsdlib.org/
20Fedora Architecture
- Research History and Overview
21FEDORAOriginal Research Goals
- Management - of distributed digital content and
services - Access via stable interfaces to digital objects
- Interoperability - for digital objects and
repositories - Extensibility easy evolution of object
behaviors - Flexibility - community-defined content models
- Security - rights management and access control
- Preservation of content and look and feel
22FEDORA Basic Architectural Abstractions
- Digital Object
- Container for aggregating any digital content
- Content disseminations based on behavior
definitions - Extensibility of behavior mechanisms
- Repository
- Service layer for contained Digital Objects
- Object lifecycle management
- Access management
23FEDORA Digital Object
Globally unique persistent id
Persistent ID (
PID
)
Public view access methods for obtaining
disseminations of digital object content
Disseminators
Internal view metadata necessary to manage the
object
System
Metadata
Protected view content that makes up the
basis of the object
Datastreams
24Digital Object Interoperability Common Behaviors
for variable content
Digital Object 2
Digital Object 1
Functional equivalency
25Digital Object Extensibility Adding New Behaviors
Digital Object 3
The same underlying content...
to create new disseminations not originally
conceived of
can be operated on in novel ways
26FEDORA Digital Object Architecture
Behavior Definition
Object
Data Object
Persistent ID (
PID
)
Method Definition
Persistent ID (
PID
)
Metadata
System
Metadata
Disseminators
Datastreams
(specs)
Behavior Mechanism
Object
System
Metadata
Persistent ID (
PID
)
Method Implementation
Metadata
System
Metadata
Datastreams
Datastreams
(executables)
27UVA Example Shared Image Behavior Definitions
28UVA Example Default Behavior Definitions
29FedoraRepository System
Management
Access
Digital Objects with fine-grained access control
Storage
general-purpose access control
30Access ManagementPolicy Enforcement
- Semantics of policy language must parallel the
behavioral semantics of digital objects - Fine-grained, context-sensitive policies
- Extensibility for policies and enforcement
mechanisms - Support for portability of digital objects
- Decentralized policy management
31Access Control Policies
- General Purpose
- only repository managers can add new
disseminators to digital objects in the
repository. - Object-Specific (e.g., Lecture object)
- guests may view course syllabus and slides 1-10
of Lecture 1, but may not view the lecture video
or any other slides. - students may not view Lecture 2 video unless
they submit assignment for Lecture 1.
See research at http//www.cs.cornell.edu/payett
e/prism/security/policy.htm
32UVA Prototypes
- UVA Content Models and Demos
33Finding Aid Content Model
(Finding Aid example)
34TEI Letter Content Model
(TEI letter example)
35TEI Book Content Model
(TEI book example)
36General Image Content Model
(Mycenae image example)
37MrSID Image Content Model
(Pavilion III image example)
381-bit B/W TIFF Content Model
(1-bit B/W TIFF example)
39GDMS Content Model
(Mycenae example)
(lawn example)
40Numerical Data Content Model
(ICPSR survey example)
41FEDORA Specifications Part I
42New Repository System
43FEDORA XML using METS
44Metadata Encoding and Transmission Standard (METS)
- XML standard for encoding descriptive,
administrative, and structural metadata of
digital library objects - Developed under auspices of the Digital Library
Federation - METS standard maintained by the Network
Development and MARC Standards Office of the
Library of Congress
http//www.loc.gov/standards/mets/
45METS Schema
- METS is written in the XML Schema Language
- METS defines four sections for an object
- Descriptive metadata
- Administrative metadata
- File group
- Structure map
- METS goals include
- Facilitate management of objects within a
repository - Provide a standard format for exchange of objects
between repositories - Provide standard format for transmission of
objects to users for rendering (via tools or
applications)
46Mapping Fedora to METS
Fedora METS
Persistent Identifier (PID) ltMETSmets OBJIDPID1/gt
Disseminator ltMETSbehaviorSec STRUCTIDS1gt ltMETSmechanism/gt ltMETSinterfaceDef/gt lt/METSbehaviorSecgt ltMETSstructMap IDS1gt ltMETSdivgt ltMETSfptr FILEIDds1" /gt ltMETSdiv/gt lt/METSstructMapgt
System Metadata ltMETSdmdSec/gt ltMETSamdSec/gt
Datastreams ltMETSfileGrpgt ltMETSfilegtltMETSFlocat IDds1" LOCATION"" xlinksimpleLink""/gt ltMETSfile/gt lt/METSfileGrpgt
New in METS
47METS Sample Fedora Object
Click here for image digital object
48METS Sample Fedora Behavior Definition Object
Click here for Behavior Definition object for DC
Click here for Behavior Definition object for
UVA_Images
49METS Sample Fedora Behavior Mechanism Object
Click here for Behavior Mechanism object for
UVA_MARC_DC
Click here for Behavior Mechanism object for
UVA_Image_STD
Click here for Behavior Mechanism object for
UVA_Image_MRSID
50Fedora Relational Database
- Phase 1 Alternate form of object storage to
support high-performance access (disseminations) - Repository system replicates from authoritative
XML version of objects to relational database - Phase 2-3 Access sub-system works completely
off the XML storage, as XML tools improve
performance-wise.
51FEDORA Database Schema
52FEDORA Specifications Part II
53New Repository System
54FEDORA API Definitions
- API-1 interface for management sub-system
- Operations necessary to create and maintain
objects and their components - Interface directly with XML version of the object
- API-2 interface for access sub-system
- Operations necessary for clients to perform
disseminations on objects in the repository - Interface directly with SQL representation of
objects - No direct access to object internal structure or
components
55Fedora Management Sub-System API-1
- Create object
- Modify object
- Delete object
- Examine object
- Search objects
- Create/maintain Behavior Definition object
- Create/maintain Behavior Mechanism object
- Repair repository/objects
- Batch functions
56Create and Maintain Behavior Definition Objects
- Create PID
- Create System Metadata
- Method definition metadata (e.g., in WSDL)
- Create/maintain Datastreams
- Alternate expressions of interface definitions
- User guides or documentation
- Register the Behavior Definition
57Create and Maintain Behavior Mechanism Objects
- Create PID
- Create System Metadata
- Method implementation metadata (e.g., in WSDL)
- Create/maintain Datastreams
- Executables
- Programmer documentation
- Register the Behavior Mechanism
58Other Functionality
- Repair
- Ability to fix inconsistencies in
repository/object structure when they arise - Batch operations
- Ability to perform all common functions in batch
mode
59Fedora Access Sub-System API-2
- Identify Behavior Types to which an object
subscribes (via the objects Disseminators) - Get the Behavior Definition (method definitions)
for a given Behavior Type - Get Disseminations of digital object content
60Fedora Access Sub-System
Web browsers
Web Server
Fedora Access Sub-System (API-2)
Fedora Management Sub-System (API-1)
Digital Objects (SQL)
Digital Objects (XML)
Web Service
Web Service
Web Service
MRSID Image Mechanism
TEI Book Mechanism
Other Mechanism
61Access RequestIdentify Behavior Types
- Each Disseminator has a Behavior Definition
Object associated with it. - Each Behavior Definition Object has a PID that
also serves as the Behavior Type Identifier for a
set of related behaviors - Clients can query a digital object for what
Behavior Types it subscribes to.
62Access RequestGet Behavior Definition
- Each Disseminator has a Behavior Definition
Object associated with it. - Each Behavior Definition Object is stored as a
Fedora digital object. - A Behavior Definition Object contains a set of
method definitions that represent a set of
related behaviors for a Behavior Type. - Clients can query a digital object to get a set
of method definitions for a particular Behavior
Type.
63Behavior Identification Requests
64Access RequestGet Dissemination
- Clients can obtain content from a digital object
with minimal knowledge about the object. - Behavior Type identifiers and method definitions
are the basis for making dissemination requests
on digital objects - A dissemination request requires just three
things - Digital Object Identifier (PID)
- Behavior Type Identifier (BID)
- Method name (and optional parameters) for a
behavior
65Access RequestGet Dissemination
Bird Digital Library1
White Birds Image 1 Image 2 Image 3
66DisseminationsBenefits
- Simple access dissemination requests shield
clients from the internal structure of digital
objects - Stable interface dissemination requests are like
requests against an abstract interface in that
they are not tied to object implementation
details that may change over time (e.g., storage
locations of datastreams) - Foster Interoperability different digital
objects can vary in both the format of content
and how it is structured, yet we can access them
in a consistent manner via disseminations.
67Questions and Discussion