What will you need to know - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

What will you need to know

Description:

(1st workshop 1995; Blue Book 2002) http://www.ccsds.org/docu/dscgi/ds.py/Get ... NLA PANDORA (1996-) http://pandora.nla.gov.au/index.html. CEDARS (1998-2002) ... – PowerPoint PPT presentation

Number of Views:97
Avg rating:3.0/5.0
Slides: 38
Provided by: rob871
Category:
Tags: know | need | pandora

less

Transcript and Presenter's Notes

Title: What will you need to know


1
What will you need to know?
  • The role of metadata in keeping digital content
    alive

Robin Wendler, Harvard University
Library November 2, 2005 r_wendler_at_harvard.edu
The Crystal Ball. John William Waterhouse.
Private collection
2
Let me count the ways digital stuff goes bad
  • Media become obsolete
  • Media decay
  • Formats are superseded
  • Proprietary formats may be orphaned
  • Hardware breaks
  • Software is orphaned
  • Encryption may hinder preservation
  • User requirements change

3
How will you know?
  • Preservation Planning
  • Monitor your data through metadata for
  • Integrity
  • Renderability
  • Understandability
  • Authenticity
  • Identity
  • Responsibility
  • Monitor the community
  • Format support
  • Requirements

4
What will you do?
  • Identify materials at risk
  • Analyze options
  • Categorize objects
  • Formal characteristics
  • Purpose
  • Antecedents
  • Communicate with owners
  • Perform preservation actions
  • Create audit trail

All utilize and/or generate metadata
5
Gradual understanding
  • OAIS
  • (1st workshop 1995 Blue Book 2002)
  • http//www.ccsds.org/docu/dscgi/ds.py/Get/File-143
    /650x0b1.pdf
  • NLA PANDORA (1996-)
  • http//pandora.nla.gov.au/index.html
  • CEDARS (1998-2002)
  • http//www.leeds.ac.uk/cedars/index.htm
  • NEDLIB (1998-2000)
  • http//www.kb.nl/coop/nedlib/
  • OCLC/RLG Preservation Framework Working Group
    (2000-2001)
  • http//www.oclc.org/research/projects/pmwg/wg1.htm
  • PREMIS (2003-2005)
  • http//www.oclc.org/research/projects/pmwg/

6
Preservation Metadata
  • the information necessary to carry out,
    document and evaluate the processes that support
    the long-term retention and accessibility of
    digital content.
  • Moving digital objects and their metadata
    across space and time requires standard
    mechanisms for encoding and exchange
  • Brian Lavoie
  • Viewed from a preservation lens, all metadata is
    preservation metadata
  • Categories of metadata overlap a single piece of
    metadata can serve many purposes

7
OAIS Functional Model
Archival Information Systems are permeated by
metadata. Metadata is the difference between a
repository and just files on a disk.
8
OAIS Information Model
9
OAIS Content Information Framework, Expanded by
OCLC/RLG WG
OAIS Model
OCLC/RLG Extensions
Still a framework, not usable, defined elements
10
OAIS Preservation Description Information
Framework
Reference provides identifiers and describes
mechanisms by which ids are assigned Context
documents relationships of content to its
environment (why created, other formats,
editions) Provenance documents the history,
changes, custody of content Fixity documents
data integrity checks or validation and
verification keys to ensure no unauthorized
changes
11
Metadata relevant to preservation
  • Storage management and fixity
  • Technical characteristics
  • Structure
  • Provenance
  • Rights
  • Digital signature trail, where applicable
  • Intellectual access / description

12
PREMISPreservation Metadata Implementation
Strategies
  • Surveyed implementation of digital repositories,
    assessed adoption of metadata standards
    (2003/2004)
  • Defined a core set of implementable preservation
    metadata elements (2005)
  • Implementation-independent
  • Explicit or implicit
  • Not reinventing the wheel
  • Descriptive, rights, agents
  • Privilege automatically-suppliable values
  • Defined associated XML schemas
  • Set up ongoing maintenance activity
  • http//www.loc.gov/standards/premis/

13
PREMIS Data Model
Intellectual Entities
Rights
Objects
Agents
Events
14
Importance of object modeling
  • Metadata must adhere to the right thing
  • Representation
  • The set of files, including structural metadata,
    needed for a complete and reasonable rendition of
    an Intellectual Entity.
  • File
  • A named and ordered sequence of bytes that is
    known by an operating system.
  • Bitstream
  • A contiguous or non-contiguous data within a file
    that has meaningful common properties for
    preservation purposes.

Any can express an Intellectual Entity
All are kinds of Objects in PREMIS
All can be affected by Events
Rights adhere to all
15
Sample PREMIS semantic unit
16
Core Object Metadata(Yes, this is to make you
sweat)
  • objectIdentifier
  • preservationLevel
  • objectCategory
  • objectCharacteristics
  • compositionLevel
  • fixity
  • messageDigestAlgorithm
  • messageDigest
  • messageDigestOriginator
  • size
  • format
  • formatDesignation
  • formatName
  • formatVersion
  • formatRegistry
  • formatRegistryName
  • formatRegistryKey
  • formatRegistryRole
  • significantProperties
  • storage
  • contentLocation
  • storageMedium
  • environment
  • environmentCharacteristic
  • environmentPurpose
  • environmentNote
  • dependency
  • dependencyName
  • dependencyIdentifier
  • software
  • swName
  • swVersion
  • swType
  • swOtherInformation
  • swDependency
  • hardware
  • hwName
  • hwType

17
Significant Properties
  • objective technical characteristics
    subjectively considered important, or
    subjectively determined characteristics.
  • Requires identification in advance of whats
    crucial, what might be at risk, and how to codify
    it.

Mondrian. Composition with large red plane,
yellow, black, gray and blue. 1921. Haags
Gemeentemuseum, Hague
Monet. Waterloo Bridge, London, at Sunset,
1904Collection of Mr. and Mrs. Paul Mellon.
National Gallery of Art.
18
Rights
  • Different flavors
  • Rights
  • Permissions
  • Licenses
  • Submission Agreements
  • Multiple rights languages
  • XrML (eXtensible rights Markup Language)
  • http//www.xrml.org/
  • ODRL (Open Digital Rights Language)
  • http//odrl.net/
  • Designed to support DRM
  • Complex
  • Patent/licensing issues
  • PREMIS Rights
  • Lightweight
  • Focused on right to preserve
  • Statements, rather than DRM

19
PREMIS Permission Statement
  • permissionStatementIdentifier
  • linkingObject
  • grantingAgent
  • grantingAgreement
  • permissionGranted
  • act
  • restriction
  • termOfGrant
  • startDate
  • endDate
  • permissionNote

20
Event Metadata
  • Events in the life of a digital object
  • What was done
  • Who did it
  • When
  • Who authorized it
  • What was the outcome
  • General
  • PREMIS Events
  • Specific, e.g.
  • AES Process History

21
PREMIS Events
  • Must be related to one or more objects
  • Can be related to one or more agents
  • Consist of
  • eventIdentifier
  • eventType
  • eventDateTime
  • eventDetail
  • eventOutcomeInformation
  • linkingAgentIdentifier
  • linkingObjectIdentifier

22
Beyond PREMIS
  • Format-specific technical metadata
  • Detailed event metadata
  • Structural metadata / content packaging
  • Specific descriptive metadata

23
Technical Metadata
  • Formally characterizes
  • a class of objects
  • an individual object
  • Some technical metadata applies to all formats,
    most is specific to a category of formats, e.g.
  • NISO Z39.87 Technical Metadata for Still Images
  • http//www.niso.org/standards/resources/Z39_87_tri
    al_use.pdf
  • MIX (XML schema for Z39.87) http//www.loc.gov/st
    andards/mix//
  • Audio Engineering Society Core Technical Metadata
    for Audio in draft
  • TextMD
  • http//dlib.nyu.edu/METS/textmd.xsd

24
Structural Metadata
  • Not only content, but also metadata and binding
    must be preserved
  • Enables a complex object to be assembled from its
    constituent parts
  • Content, Metadata, Relationships, Behaviors

25
Structural and Packaging Metadata
  • Many formats developed in different communities,
    e.g.,
  • Digital library METS
  • http//www.loc.gov/standards/mets/
  • Commercial media MPEG 21 DIDL
  • Available from ISO www.iso.org
  • Learning objects IMS Content Packaging
  • http//www.imsglobal.org/content/packaging/
  • Space data XFDU still in draft
  • http//www.ccsds.org/docu/dscgi/ds.py/GetRepr/File
    -1912/html
  • Audio-visual Advanced Authoring Format (AAF)
  • http//www.aafassociation.org/html/techinfo/index.
    html
  • Television Television Material Exchange Format
    (MXF)
  • Available from SMPTE www.smpte.org
  • No consolidation of formats, but dialog and
    mapping

26
METS Basics
  • METS provides a framework for
  • Content files
  • Metadata
  • Descriptive
  • Structural
  • Technical
  • Provenance
  • Source
  • Relationships
  • Behaviors
  • Suitable for
  • Open Archival Information Systems
  • Archival information package (AIP)
  • Submission information package (SIP)
  • Dissemination information package (DIP)
  • Display and navigation of digital objects
  • Sharing of digital objects among libraries and
    archives

27
RLGs METS Viewer
Structural Metadata
Descriptive Metadata
Behaviors
Content
28
Structure of a METS File
METS
metsHdr
Header describing METS file itself
fileSec
Inventory or manifest of component files
dmdSec
Descriptive metadata
Administrative metadata -- technical, source,
rights, provenance
admSec
structMap
Structure map the heart of METS
structLink
Structural map linking, i.e., hyperlinks
behaviorSec
Executable behaviors
Less commonly used
29
Structure Map
ORDER1 TYPE
ORDER2 ORDERLABELi FILEIDB ORDER3 FILEIDC v LABELChapter 1 ORDER4 v LABELpage 2 ORDER5 FILEIDE
Title page Preface page i page
ii Chapter 1 page 1 page 2
30
Referring to Metadata
METS
METS does not define descriptive or
administrative metadata elements. dmdSec and
admSec are buckets or sockets where
externally-defined metadata can be supplied or
referenced
metsHdr
fileSec
dmdSec
  • Three techniques
  • In-line XML
  • Wrapped base-64 encoded data
  • Pointers to external information
  • (e.g., URNs, handles)

admSec
structMap
structLink
METS Board endorses range of recommended
extension schemas
behaviorSec
31
Use of MODS Extension Schema for Descriptive
Metadata
treasurer DMDIDD1
LABELChapter 1 DMDIDCH1
1 ORDER3
LABELpage 2 ORDER4
Book Chapter 1 page 1 page 2
  
   
c.gov/mods/v3" xsischemaLocationhttp
//www.loc.gov/mods/v3           
Radcliffe
College     
          
Reports of the president and
treasurer for...     
       
 
MDTYPEMARC xlinkhrefhttp//... BNI3165/
Catalog record
32
Where does all this metadata come from?
  • Look, Ma, no hands! (as much as possible, that
    is)
  • Dont make people create it
  • Machines are faster, cheaper, more accurate
  • Dont make people read it
  • Use controlled values
  • Expect bulk preservation of like objects
  • Artisanal preservation is not affordable
  • Develop and share tools to automate creation,
    ingest, extraction, exchange

33
JHOVEJSTOR/Harvard Object Validation Environment
  • Format Identification
  • Format Validation
  • Well-formedness (Syntactical)
  • Validity (Semantic)
  • Format Characterization
  • http//hul.harvard.edu/jhove/
  • Modules for
  • AIFF
  • ASCII
  • BYTESTREAM
  • GIF
  • HTML
  • JPEG
  • JPEG2000
  • PDF
  • TIFF
  • UTF8
  • WAVE
  • XML

34
Automatic Exposure
  • RLG initiative advocates for capturing standard
    technical metadata about digital images
    automatically as part of image creation
  • engage manufacturers in dialog about what
    technical metadata their products currently
    capture vs what is required for digital archiving
  • leverage existing industry efforts
  • identify and evaluate tools for harvesting
    technical metadata and explore how those tools
    can scale to serve the entire community.

35
Format Registries
  • Detailed documentation of how typed content is
    represented
  • Persistent, unambiguous association between
    public identifiers for digital formats and their
    documentation
  • Lists of systems and services which use or
    produce the format
  • Must be inclusive, detailed, rigorous, public,
    and sustainable
  • Format Registry projects
  • PRONOM
  • http//www.nationalarchives.gov.uk/pronom/
  • Global Digital Format Registry
  • http//hul.harvard.edu/gdfr/
  • TOM
  • http//tom.library.upenn.edu/
  • FRED demonstration system
  • http//tom.library.upenn.edu/fred/

36
Other Registries(Extant and Posited)
  • Registry of Digital Masters
  • I will preserve this digital thing
  • http//www.oclc.org/digitalpreservation/why/digita
    lregistry/default.htm
  • Profile registries
  • I restrict this broader standard in the
    following ways
  • Metadata Element/Schema registries
  • I use these elements to mean these things
  • http//www.xml.org/xml/registry.jsp
  • http//www.ukoln.ac.uk/projects/iemsr/
  • Etc.
  • Environment registries
  • Hardware/software configurations in which given
    software is known to work

37
Digital Information Community benefits from
metadata cooperation
  • Develop common understanding
  • Crucial metadata
  • Standards!
  • Trusted repository certification
  • Acceptable preservation strategies
  • Needs and costs
  • Automate capture/creation of metadata
  • Work with equipment manufacturers
  • Develop open source tools
  • Share burden
  • Monitor/document digital formats
  • Avoid duplicate digitization
Write a Comment
User Comments (0)
About PowerShow.com