Title: Best Practices in Developing a Digital Library Repository Using Fedora at the UVa Library
1Best Practices in Developing a Digital Library
Repository Using Fedora at the UVa Library
- Leslie Johnston
- Head of Digital Access Services, University of
Virginia Library - Fedora Users Meeting, Open Repositories 2007
2Underlying Object Architecture Must be First
- The object architecture guides all repository
development steps. - Based on core architectural and service
assumptions, such as these at UVa - We have complex objects with many relationships.
- We have simple objects that users just want to
find and use. - Any given resource can be associated with and
presented in any number of contexts. - The Repository will have a public interface to
support discovery and use of the collections by
the UVa community, accompanied by tools for their
use in instruction and research.
3Underlying UVa Object Architecture
- At UVA, an atomic or granular approach
- Media Objects (art and architecture images, page
images, video, audio) are the most granular, and
are always children of work objects. The
datastreams are metadata and media files - Work Objects represent a logical object or
concept (a visual resources concept of site or
work, a text volume, an episode of a television
news show, an oral history interview). Work
Objects provide the principal mode of discovery
for Media Objects. Work Objects are not required
to have Media Object children. The datastreams
are metadata and inline content, such as TEI
transcriptions and EAD finding aids - A Work Object can be a child of an Aggregation
Object, representing a real or virtual
collection, a serial publication, a television
news series, etc. The datastreams are metadata
for the aggregation set and rules specific to the
set used to present all or subsets of its members.
4Underlying UVa Object Architecture
- Many relationships are possible
- Media Objects can be the children of multiple
Work Objects. - A drawing of a building in a letter can be part
of a visual resources object representing the
structure, as well as the child of a text object
for the letter). - Work Objects may be part of many Aggregation
Objects. - A painting may be part of a collection of work by
that one artist, and part of a virtual collection
representing a particular subject theme. - Aggregation Objects may be the children of other
Aggregation Objects, where appropriate.
5Disseminators in a Granular Architecture
- Media Objects have disseminators that primarily
provide access to the datastreams. - Examples
- The uvaDefault disseminator includes a
getDefaultContent behavior which gets whichever
datastream is considered the default view for a
user. - The uvaImage disseminator includes the behaviors
getPreview and getScreen, which simply get
specific datastreams. It also includes a
getImageViewer behavior which gets an image
datastream and wraps it in an applet that allows
manipulation of the image. - The uvaDefault and uvaMeta disseminators include
behaviors that return xml (get) or styled html
versions (view) of descriptive or administrative
metadata, or Dublin Core.
6Disseminators in a Granular Architecture
- Work Objects have disseminators that provide
access to the work and its media children. - Examples
- The default disseminator includes the getFullView
behavior which returns a styled presentation of a
text and its child page images, if there are any. - The uvaPageBook disseminator includes the
getPageTurner behavior which gets the child
images and presents them for navigation in a page
turner. - The uvaDefault and uvaMeta disseminators include
behaviors that return xml (get) or styled html
versions (view) of descriptive or administrative
metadata, or Dublin Core.
7Disseminators in a Granular Architecture
- Aggregation Objects primarily have disseminators
that provide access to a set of Work Objects. - Examples
- getCollection presents the default view of the
collection. - getMembers returns a list of members all or
only those which meet criteria expressed through
an XPath statement. - getPids returns a list of all PIDS for member
Work Objects. - viewDublin Core presents styled Dublin Core data
for the aggregation.
8Content Model Development
- Not surprisingly, the first step is the
identification of functional requirements - Coordinated in a format-by-format basis involving
research and development staff, repository
operations staff, and public services staff who
are specific format experts. - Identify all functional requirements for end-user
use, including default delivery options,
interaction with the objects through the web
interface, and downloading possibilities.
9Content Model Development
- Inventorying of digital media
- Detailed review of the legacy collections as to
media formats, configurations of media files,
available metadata, and metadata format.
10Content Model Development
- Development of production standards
- Production standards are developed out of the
inventory of legacy materials and metadata and
the wish list for functional requirements, where
we identify the appropriate formats and
configuration of files to enable the desired
functionality. - The standards must jibe with what we have, what
we can migrate legacy collections to, and
generate in a scaleable way in all future
production. - http//www.lib.virginia.edu/digital/reports/uvalib
_production_standards.htm - http//www.lib.virginia.edu/digital/metadata/
11Content Model Development
- Translation of the output of all those processes
into content models and disseminators - The key to developing content models is limiting
the number of content models you will have for
each format class, such as images or electronic
texts or video. Multiple content models for each
format class are often needed. - Each content model for each format class
corresponds to a necessary variation of the
configuration of media files, which require
different mechanisms for the behaviors and
therefore constitute a different content model. - All content models have a default disseminator,
and metadata disseminator, and an asset
definition disseminator, plus disseminators that
are specific to the media type.
12Content Model Development
- Translation of functional specifications into
content models and disseminators - Each functional requirement for the end-user
interface translates to one or more disseminators
with behaviors that objects should be able to
present, such as delivering subsets of their
content or metadata, delivery of static files or
on-the-fly transformations (such as raw XML
delivery versus delivery of XHTML generated from
the same XML file), or supporting the download of
image files. - Great care was needed to confirm that the desired
functionality for the search and delivery in the
public interface matched behaviors in the content
models, and that the correct number and types of
media file and metadata datastreams were present
to support those behaviors.
13UVa Content Models
- Media Objects
- Bitonal
- 1 Bitonal TIFF datastream
- 1 static JPEG thumbnail datastream
- HighRes
- 1 MRSID datastream
- 1 static JPEG screen size datastream
- 1 static JPEG thumbnail datastream
- LowRes
- 1 JPEG screen size datastream
- 1 JPEG thumbnail datastream
- Work Objects
- GDMS (image metadata for works)
- EAD (metadata describing archival collection)
- GenText (TEI transcription only)
- Book (TEI transcription and page images)
- PageBook (page images only)
- Aggregation Objects Still under review for
final content models
14Supporting it all Production Workflows
- Translation of the production standards and
configuration of content models into scaleable,
sustainable workflows. - Coordinated by research and development staff,
repository operations staff, and production staff.
15Production Workflows
- Subject librarians select all content, with the
additional step of a technical assessment.
Assessment includes review of media formats and
metadata against production standards. - After the assessment, new production and legacy
collection migration are prioritized. - Ease of production, e.g., existing workflows in
place - Need for metadata enrichment or normalization
- Time constraints, such as instructional deadlines
- Funding availability for staff and infrastructure
- Workflows are in place for Images, Electronic
Texts, and EAD Finding Aids. This includes all
steps for handoffs of materials for digitization,
digitization, cataloging and export of cataloging
records from native systems to XML, programmatic
creation of administrative and technical
metadata, QA, handoff to repository operations,
object building and ingest, and indexing for
delivery.
16(No Transcript)
17(No Transcript)
18Ongoing Process
- Review and revision of what weve done
architecture, standards, and disseminators. - Dont be afraid to assess and update.
- Still adding new format types video, datasets,
and printed music are under way, with audio and
GIS next.