Implementing PREMIS in Container Formats - PowerPoint PPT Presentation

About This Presentation
Title:

Implementing PREMIS in Container Formats

Description:

... revisions of Data Dictionary and XML schema ... PREMIS Data Dictionary provided detailed descriptions and guidelines to ... XML is faithful to data dictionary ... – PowerPoint PPT presentation

Number of Views:203
Avg rating:3.0/5.0
Slides: 36
Provided by: brian657
Learn more at: https://www.loc.gov
Category:

less

Transcript and Presenter's Notes

Title: Implementing PREMIS in Container Formats


1
Implementing PREMIS in Container Formats
  • Rebecca Guenther, Library of Congress
  • rgue_at_loc.goc
  • Zhiwu Xie, Los Alamos National Laboratory
  • zxie_at_lanl.gov
  • ISTs Archiving 2007
  • Arlington, VA, May 23, 2007

2
OUTLINE
  • Introduction
  • OAIS Reference Model and Containers
  • METS
  • MPEG-21 DID
  • Implementing PREMIS in METS
  • Implementing PREMIS in MPEG-21 DID
  • Summary

3
Digital preservation imperative and challenge
  • More and more of scholarly and cultural record
    exists in digital form steps must be taken to
    secure its long-term future
  • Significant progress has been made in raising
    awareness about digital preservation imperative
  • Shift in focus from articulating problem to
    solving it
  • Not so much Why is digital preservation
    important, but What must be done to achieve
    preservation objectives?
  • Many practical challenges in implementing
    reliable, sustainable digital preservation
    programs
  • One key challenge preservation metadata

4
PREMIS Working Group
  • June 2003 OCLC, RLG sponsored new international
    working group
  • PREMIS Preservation Metadata Implementation
    Strategies
  • Membership
  • gt 30 experts from 5 countries, representing
    libraries, museums, archives, government
    agencies, and the private sector
  • Co-Chairs Priscilla Caplan (FCLA), Rebecca
    Guenther (LC)
  • Objective 1 Identify and evaluate alternative
    strategies for encoding, storing, managing, and
    exchanging preservation metadata
  • PREMIS Survey Report (September 2004)
  • Snapshot of current practices/emerging trends
    related to managing and using preservation
    metadata in digital archiving systems
  • http//www.oclc.org/research/projects/pmwg/surveyr
    eport.pdf
  • Objective 2 Define implementable, core
    preservation metadata, with guidelines/recommendat
    ions for management and use

5
PREMIS Data Model
Intellectual Entities
Rights
Agents
Objects
Events
6
PREMIS Data Dictionary
  • May 2005 Data Dictionary for Preservation
  • Metadata Final Report of the PREMIS Working
    Group
  • 237-page report includes
  • PREMIS Data Dictionary 1.0
  • Context/assumptions, data model, usage examples
  • Set of XML schema to support implementation
  • Data Dictionary
  • Comprehensive view of information needed to
    support digital preservation
  • Guidelines/recommendations to support creation,
    use, management
  • Used Framework as starting point
  • Based on deep pool of institutional experiences
    in setting up and managing operational capacity
    for digital preservation
  • Received the 2005 Digital Preservation Award (UK)
    and 2006 Society of American Archivists
    Publication Award

http//www.oclc.org/research/projects/pmwg/premis-
final.pdf
7
Some guiding principles
  • Implementable, core, preservation metadata
  • Preservation metadata maintain viability,
    renderability, understandability, authenticity,
    identity in a preservation context
  • Core What most preservation repositories need
    to know to preserve digital materials over the
    long-term
  • Implementable rigorously defined supported by
    usage guidelines/recommendations emphasis on
    automated workflows
  • Technical neutrality
  • Digital archiving system no assumptions about
    specific archiving technology, system/DB
    architectures, preservation strategy
  • Metadata management no assumptions about whether
    metadata is stored locally or in external
    registry recorded explicitly or known
    implicitly instantiated in one metadata element
    or multiple elements
  • Promotes flexibility, applicability in wide range
    of contexts

8
Scope
  • What PREMIS DD is
  • Common data model for organizing/thinking about
    preservation metadata
  • Guidance for local implementations
  • Standard for exchanging information packages
    between repositories
  • What PREMIS DD is not
  • Out-of-the-box solution need to instantiate as
    metadata elements in repository system
  • All needed metadata excludes business rules,
    format-specific technical metadata, descriptive
    metadata for access, non-core preservation
    metadata
  • Lifecycle management of objects outside
    repository
  • Rights management limited to permissions
    regarding actions taken within repository

9
PREMIS Maintenance Activity
  • Web site
  • Permanent Web presence, hosted by
  • Library of Congress
  • Central destination for PREMIS-related
  • info, announcements, resources
  • Home of the PREMIS Implementers Group (PIG)
    discussion list
  • PREMIS Editorial Committee
  • Set directions/priorities for PREMIS development
  • Coordinate future revisions of Data Dictionary
    and XML schema
  • Membership Library of Congress, OCLC, FCLA,
    National Archives of Scotland, British Library,
    National Library of Australia, U. of Goettingen,
    LANL, Ex Libris, Library Archives Canada

http//www.loc.gov/standards/premis/
10
OAIS Reference Model and Containers
11
OAIS Reference Model
  • Developed by the Consultative Committee for Space
    Data Systems (CCSDS)
  • ISO 147212003
  • A functional model for preservation activities
  • An information model specifying types of
    information required for long-term preservation

12
OAIS Reference Model and PREMIS
  • OAIS reference model specifies the Preservation
    Description Information (PDI)
  • PREMIS used the OAIS information model as a
    starting point
  • PREMIS Data Dictionary consolidated and further
    developed the conceptual types of information
    objects into more than 100 structured and
    logically integrated semantic units.
  • PREMIS Data Dictionary provided detailed
    descriptions and guidelines to implement these
    semantic units.
  • PREMIS Data Dictionary does not provide semantic
    units for Intellectual Entities, but provides
    semantic units to link to other metadata sources
    for Intellectual Entities
  • All entities have reference (identification)
    information.
  • No packaging information that links content
    with metadata, but PREMIS can be used with
    container schemas
  • PREMIS deals mostly with representation, context,
    provenance, and fixity information, in keeping
    with PREMIS definition of preservation metadata.

13
PREMIS XML schemas
  • One schema for each PREMIS entity in data model
  • Allows user to choose which parts of PREMIS to
    use
  • PREMIS container schema
  • References schema for each entity type
  • Provides a container if it is desirable to keep
    some or all PREMIS metadata together
  • If using container requires at least an object
    which in turn requires objectIdentifier and
    objectCategory
  • Individual schemas may used alone or with
    container
  • Semantic units in PREMIS schemas
  • XML is faithful to data dictionary
  • Only those units mandatory for all categories of
    objects are mandatory in object schema

14
Need a Container in the XML Implementation
  • Archival Information Package (AIP) may include
    much more metadata besides the preservation
    metadata
  • A well defined container is usually necessary to
    group and appropriately associate these metadata
    with the data object
  • For example METS or MPEG-21 DID

15
  • METS records the (possibly hierarchical)
    structure of digital objects, the names and
    locations of the files that comprise those
    objects, and the associated metadata
  • A METS document may be a unit of storage (e.g.
    OAIS AIP) or a transmission format (e.g. OAIS SIP
    or DIP)
  • METS is extensible and modular
  • METS uses extension wrappers or sockets where
    elements from other schemas can be plugged in
  • METS uses the XML Schema facility for combining
    vocabularies from different Namespaces
  • The METS Editorial Board has endorsed PREMIS as
    an extension schema
  • Many institutions trying to use PREMIS within the
    METS context

16
The structure of a METS file
17
OAIS and METS
ltMETSgt
described by
delimited by
Archival Information Package
Descriptive Information
Packaging Information
identifies
derived from
ltdmdSecgt
MODS MARCXML DC
Preservation Description Information
Content Information
further described by
ltfileGrpgt
ltamdSecgt
Reference Information
ltmdRefgt
Representation Information
Data Object
ltrightsMDgt
Context Information
metsRights premisrights
lttechMDgt
ltfilegt
ltstructMapgt
ltdigiProvMDgt ltsourceMDgt premisevent
Provenance Information
Semantics
Structure
described by
Fixity Information
lttechMDgt
premisobject
File formats
premisobject textMD MIX
Legend Black Arial OAIS Red Times New Roman
METS Primary Schema Blue Times New Roman Italics
Extension Schema
18
METS extension schemas
  • wrappers or sockets where elements from other
    schemas can be plugged in
  • Provides extensibility
  • Uses the XML Schema facility for combining
    vocabularies from different Namespaces
  • Endorsed extension schemas
  • Descriptive MODS, DC, MARCXML
  • Technical metadata MIX (image) textMD (text)
  • Preservation related PREMIS

19
Issues in using PREMIS with METS
  • Which METS sections to use and how many
  • Whether to record elements redundantly in PREMIS
    that are defined explicitly in the METS schema
  • How to record elements that are also part of a
    format specific technical metadata schema (e.g.
    MIX)
  • Recording structural relationships
  • How to deal with locally controlled vocabularies
  • Whether to use the PREMIS container

20
PREMIS and METS sections
  • Flexibility of METS requires implementation
    decisions
  • You cant put all PREMIS metadata directly under
    amdSec
  • What sections to use for PREMIS metadata?
  • Alternative 1
  • Object in techMD
  • Event in digiProvMD
  • Rights in rightsMD
  • Agent with event or rights
  • Alternative 2
  • Everything in digiProvMD
  • Alternative 3
  • Everything in techMD
  • How many administrative MD sections to use?
  • Experimentation will result in best practices

21
Inserting technical metadata in a METS Document
ltmetsgt ltamdSecgt lttechMDgt ltmdWrapgt
ltxmlDatagt lt!-- insert data from
different namespace here --gt
lt/xmlDatagt lt/mdWrapgt lt/techMDgt
lt/amdSecgt ltfileSec /gt ltstructMap /gt lt/metsgt
22
  • ltfileSecgtltfileGrpgt
  • ltfile ID"FID1" SIZE"184302" ADMID"TMD1PREMIS
    TMD1MIX DP1EVENT DP1AGENT CHECKSUM"4638bc65c5b97
    15557d09ad373eefd147382ecbf" CHECKSUMTYPE"SHA-1"gt
  • ltFLocat LOCTYPE"OTHER" xlinkhref"BXF22.JPG" /gt
  • lt/filegtlt/fileGrpgtlt/fileSecgt
  • lttechMD ID"TMD1PREMIS"gt
  • ltmdWrap MDTYPE"PREMIS"gt
  • ltxmlDatagt ltpremisobject gt
    ltobjectCharacteristicsgt ltfixitygt
    ltmessageDigestAlgorithmgtSHA-1 lt/messageDigestAlgor
    ithmgt ltmessageDigestgt4638bc65c5b9715557d09
    ad373eefd147382ecbf 
  • lt/messageDigestgt
    ltmessageDigestOriginatorgtEchoDep/me
    ssageDigestOriginatorgt lt/fixitygt
    ltsizegt184302lt/sizegt lt/objectCharacteristicsgt
  • Elements defined in both METS and PREMIS
  • METS Checksum, Checksumtype
  • attribute of ltfilegt
  • not repeatable
  • PREMIS fixity
  • also includes messageDigestOriginator
  • allows multiples

23
  • ltfileSecgtltfileGrpgt
  • ltfile ID"FID1" ADMID"TMD1PREMIS DP1EVENT
    DP1AGENT MIMETYPE"image/jpeg"
  • ltFLocat LOCTYPE"OTHER" xlinkhref"BXF22.JPG"/gt
  • lt/filegtlt/fileGrpgtlt/fileSecgt
  • lttechMD ID"TMD1PREMIS
  • ltmdWrap MDTYPE"PREMIS"gt
  • ltxmlDatagt
  • ltpremisobjectgt
  • ltobjectCharacteristicsgt
  • ltformatgt
  • ltformatDesignationgt
  • ltformatNamegtimage/jpeglt/formatNam
    egt
  •   ltformatVersiongt1.02 lt/formatVersi
    ongt
  • lt/formatDesignationgtlt/formatgt
  • lt/objectCharacteristicsgt
  • Elements defined both in METS and PREMIS
  • METS MIMETYPE
  • attribute of ltfilegt

24
  • ltfileSecgt ltfileGrpgt
  • ltfile ID"FID1" ADMID"TMD1PREMIS TMD1MIX
    DP1EVENT DP1AGENT"gt
  • lttechMD ID"TMD1PREMIS"gt
  • ltlinkingEventIdentifiergt
  • ltlinkingEventIdentifierTypegtECHODEP Hub
    Event
  • lt/linkingEventIdentifierTypegt
  • ltlinkingEventIdentifierValuegtecho12345lt/linki
    ngEventIdentifierValuegt
  • lt/linkingEventIdentifiergt
  • ltdigiprovMD ID"DP1EVENT"gt
  •   ltpremiseventgt
  • lteventIdentifiergt
  • lteventIdentifierTypegtECHODEP Hub Eventlt/e
    ventIdentifierTypegt
  • lteventIdentifierValuegtecho12345 lt/eventId
    entifierValuegt
  • lt/eventIdentifiergt
  • lteventTypegtingestionlt/eventTypegt
  • lteventDateTimegt2006-05-02T151253 lt/eventD
    ateTimegtlt/eventgt
  • Elements defined both in METS and PREMIS
  • METS ID/Idref used to associate metadata in
    different sections and for different files

25
  • ltstructMap TYPEphysicalgt
  • ltdiv ORDER"1" TYPE"text"gt
  • ltfptr FILEID"FID9"/gt
  • ltdiv ORDER"1" TYPE"page" LABEL" Page
    1"gt
  • ltfptr FILEID"FID1"/gtlt/metsdivgt
  • ltdiv ORDER"2" TYPE"page" LABEL" Page
    2"gt
  • ltfptr FILEID"FID2"/gtlt/metsdivgt
  • lt/divgt
  • ltrelationshipgt
  • ltrelationshipTypegtstructurallt/relationshipTypegt
  • ltrelationshipSubTypegtis sibling of
    lt/relationshipSubTypegt
  • ltrelatedObjectIdentificationgt
  • ltrelatedObjectIdentifierTypegtUCBlt/relatedObje
    ctIdentifierTypegt
  • ltrelatedObjectIdentifierValuegtFID2lt/relatedOb
    jectIdentifierValuegt
  • ltrelatedObjectSequencegt1lt/relatedObjectSequen
    cegt
  • Elements defined both in METS and PREMIS
  • METS structMap

26
How to record elements from 2 different technical
metadata schemas
  • Format specific metadata may be included in
    addition to PREMIS general technical metadata
  • Use multiple techMD sections and specify source
    in MDType attribute and/or namespace declaration
  • e.g. MDTYPENISOIMG or PREMIS
  • Give MIX schema declaration in METS document
  • MIX was recently revised to correspond with the
    revision of the Z39.87 technical metadata for
    digital still images standard names harmonized
    with corresponding PREMIS semantic units
  • For digital still images, best practice may be to
    use PREMIS for general semantic units defined in
    PREMIS and MIX for format specific units without
    redundancy

27
MPEG-21 Digital Item Declaration (DID)
  • ISO/IEC 21000-2 Digital Item Declaration
  • A promising alternative to represent Digital
    Objects
  • Starting to get supported by some repositories,
    e.g., aDORe, DSpace, Fedora
  • A flexible and expressive model that easily
    represents compound objects (recursive item)
  • Attach well-formed XML from persistent namespaces
    as metadata
  • Strong industry support

28
Abstract Model for MPEG-21 DID
container grouping of items and
descriptor/statement constructs pertaining to the
container
container
item represents a Digital Item aka Digital
Object aka asset. Descriptor/statement constructs
convey information about the Digital Item
descriptor/statement
item
component binding of descriptor/statements to
datastreams
descriptor/statement
item
resource datastream
component
component
descriptor/statement
resource
resource
resource
descriptor/statement
29
Implementing PREMIS in DID
  • DID abstract model is an object-centric
    containment model
  • Semantically, Descriptor/statement constructs
    under a certain level are the metadata about
    that level of DID container or item or component.
  • Descriptor/statement about the DID container
    should be mapped to OAIS packaging information,
    therefore out of the PREMIS scope
  • Rights, Agents, and Events in the PREMIS model
    are linked to the objects, but not about the
    objects.
  • However, the PREMIS metadata as a whole
    (premispremis), is about an object (the target
    of the preservation)

30
Mapping
All rights, events, and agents go here. The top
level object goes here. Other objects may be
duplicated here or linked here.
DID
DIDInfo
object1
premispremis
object2
object3
object4
premisobject
premis object
resource
resource
resource
premis object
31
Partial Implementation in DID
When metadata are not sufficient to form the top
level PREMIS elements, partial implementation may
be done if PREMIS elements are globally defined.
DID
DIDInfo
object1
premispremis
object2
object3
object4
premissignificantProperties
premis creatingApplication
resource
resource
resource
premis format
32
Examples of PREMIS in XML containers
  • PREMIS in METS
  • Portrait of Louis Armstrong (Library of Congress)
  • PREMIS in MPEG DID
  • aDORe example (LANL)

33
Proposed schema changes for new version
  • Define an abstract object type to allow for
    better validation of object category
    (representation, file, bitstream)
  • Define elements and types globally to allow for
    reuse
  • Implement an extensibility mechanism to provide
    for further structure when needed
  • Implement a mechanism to use controlled
    vocabularies
  • Adjust schemas to support changes in version 2 of
    data dictionary

34
Summary container formats
  • A container format is needed to package together
    all forms of metadata (of which PREMIS is one)
    and digital content
  • Use of a container is compatible with and an
    implementation of the OAIS information package
    concept
  • Co-existence with other types of metadata
    requires best practices for both approaches
    redundancy seems to be preferred
  • Changes to the next version of the PREMIS XML
    schemas will facilitate a phased approach to full
    PREMIS implementation
  • Development of registries (informal or formal)
    for controlled vocabularies will benefit
    implementation

35
Summary METS vs. MPEG 21 DID
  • METS and MPEG DID are similar types of container
    formats in that both are expressed in XML, both
    represent the structure of digital objects, and
    both include metadata
  • MPEG DID doesnt have the segmentation in
    metadata sections that METS does, so this
    implementation decision need not be made in DID
  • METS is open source and developed by open
    discussion, mainly cultural heritage community
  • MPEG DID is an ISO standard and has industry
    support, but is often implemented in a
    proprietary way and standards development is
    closed
  • It would be possible to transform a METS
    container to a MPEG DID and vice versa
    development of stylesheets will enable
    transformations
Write a Comment
User Comments (0)
About PowerShow.com