DCC Representation Information Registry - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

DCC Representation Information Registry

Description:

... software such as FITSIO, familiar with astronomical spectrographic instruments ... FIELD arraysize='11' datatype='char' name='RA' unit='HMS' ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 29
Provided by: davidgi9
Category:

less

Transcript and Presenter's Notes

Title: DCC Representation Information Registry


1
DCC Representation Information Registry
Digital Curation Centre
a centre of expertise in data curation and
preservation
  • D Giaretta
  • http//www.dcc.ac.uk
  • http//dev.dcc.ac.uk

Funders
2
Outline
  • OAIS key points
  • Representation Information
  • Registry
  • Conclusions

3
Fundamentals
  • OAIS Reference Model is a key (the only?)
    standard for the long-term preservation of
    information
  • Digital Preservation/Curation covers many issues
    including financial, scientific, technical, legal
    and sociological ones.
  • OAIS does not cover all these issues and so
    addresses only part of the solution but a vital
    part

4
OAIS Reminder
  • OAIS is a standard about the long-term
    preservation of information
  • Information
  • not just bits
  • Must be usable at least by the Designated
    Community
  • How can this be done in a manageable way?

5
OAIS Responsibilities
  • The OAIS must
  • Negotiate for and accept appropriate information
    from information Producers.
  • Obtain sufficient control of the information
    provided to the level needed to ensure Long-Term
    Preservation.
  • Determine, either by itself or in conjunction
    with other parties, which communities should
    become the Designated Community and, therefore,
    should be able to understand the information
    provided.
  • Ensure that the information to be preserved is
    Independently Understandable to the Designated
    Community. In other words, the community should
    be able to understand the information without
    needing the assistance of the experts who
    produced the information. ?
  • Follow documented policies and procedures which
    ensure that the information is preserved against
    all reasonable contingencies, and which enable
    the information to be disseminated as
    authenticated copies of the original, or as
    traceable to the original.
  • Make the preserved information available to the
    Designated Community.

6
OAIS Reference Model Functional Model
7
OAIS Information Definition
  • Information is always expressed (i.e.
    represented) by some type of data
  • Data interpreted using its Representation
    Information yields Information
  • Information Object preservation requires clear
    identification and understanding of the Data
    Object and its associated Representation
    Information

Interpreted Using its
Yields
Data Object
Representation Information
Information Object
8
Information Objects
9
Representation Information
  • The Data Object is interpreted using the
    Representation Information (RepInfo)
  • The Reference Model is designed to ensure that an
    OAIS is not set the impossible task of having to
    provide all possible RepInfo immediately
  • Hence
  • Take account of the Designated Community and its
    associated Knowledge Base
  • Note that RepInfo may itself need further RepInfo

10
Designated Community
  • general English reading public educated to High
    School and above, with access to a Web Browser
    (HTML 4.0 capable)
  • GIS data GIS researchers - undergraduates and
    above, having an understanding of the concepts of
    Geographic data having access to current (2005,
    USA) GIS tools/computer software e.g. ArcInfo
    (2005)
  • Astronomer (undergraduate and above) with access
    to FITS software such as FITSIO, familiar with
    astronomical spectrographic instruments
  • Student of Middle English with an understanding
    of TEI encoding and access to an XML rendering
    environment.
  • Variant 1 Cannot understand TEI
  • Variant 2 Cannot understand TEI and no access to
    XML rendering environment
  • Variant 3 No understanding of Middle English but
    does understand TEI and XML

11
Representation Information
  • The information that maps a Data Object into more
    meaningful concepts. An example is the ASCII
    definition that describes how a sequence of bits
    (i.e., a Data Object) is mapped into a symbol.

12
Representation Information
  • The Representation Information accompanying a
    physical object, like a moon rock, may give
    additional meaning
  • It typically is a result of some analysis of the
    physically observable attributes of the rock
  • The Representation Information accompanying a
    digital object, or sequence of bits, is used to
    provide additional meaning.
  • It typically maps the bits into commonly
    recognized data types such as character, integer,
    and real and into groups of these data types.
  • It associates these with higher level meanings
    which can have complex inter-relationships that
    are also described

13
RepInfo Classification
14
Structure including Formats
  • Distinguish
  • formats which are used mainly for rendering to
    be followed by human inspection, and
  • formats used for automated processing
    particularly important for science data
  • Distinguish
  • Things with unknown structure needs software
  • proprietary software e.g. MS Word
  • Open Source software e.g. CDF
  • Things with known/well described structure
  • ASCII file, FITS file, telemetry etc
  • Document the format
  • Use description language if possible e.g. EAST,
    DFDL,
  • The EAST tools are themselves Representation
    Information which in due course will have to be
    fully defined the closure of their
    Representation Nets will be the EAST standard
  • Higher level definitions should include useful
    scientific objects and humanities objects

15
Layered Model from OAIS
16
Semantics
  • Meaning/ Relationships
  • Data Dictionaries
  • Thesauri
  • Ontologies
  • Semantic interoperability

17
Time Dependent Information
  • Many, perhaps most, datasets change over time and
    the state at each particular moment in time may
    be important. It may be useful to break the issue
    into separate parts.
  • at each moment in time we could, in principle,
    take a snapshot and store it. That snapshot has
    its associated Representation Net.
  • efficient storage of a series of snapshots may
    lead one to store differences or include time
    tags in the data
  • Additional Representation Information would be
    needed which describes how to get to a particular
    time's snapshot from the efficiently encoded
    version.
  • Also applies to ANNOTATION who said what about
    which and when did they say it

18
Actions and Processes (Behaviour)
  • Some information has, as an integral part of its
    content, an implicit or explicit process
    associated with it
  • An examples of this is a database or other time
    dependent or reactive system such as a Neural
    Net.
  • Emulations
  • Limited but may be adequate for rendered
    document-type data

19
Is saying its XML enough?
  • lt?xml version'1.0'?gt
  • ltVOTABLE version"1.1"
  • xmlnsxsi"http//www.w3.org/2001/XMLSchema-insta
    nce"
  • xsischemaLocation"http//www.ivoa.net/xml/VOTab
    le/v1.1 http//www.ivoa.net/xml/VOTable/v1.1"
  • xmlns"http//www.ivoa.net/xml/VOTable/v1.1"gt
  • lt!--
  • ! VOTable written by uk.ac.starlink.votable.VOTa
    bleWriter
  • !--gt
  • ltRESOURCEgt
  • ltTABLE name"6dfgs_E7_subset" nrows"875"gt
  • ltPARAM arraysize"" datatype"char"
    name"Original Source" value"http//www-wfau.roe.
    ac.uk/6dFGS/6dfgs_E7.fld.gz"gt
  • ltDESCRIPTIONgtURL of data file used to create this
    table.lt/DESCRIPTIONgt
  • lt/PARAMgt
  • ltPARAM arraysize"" datatype"char"
    name"Credits" value"Column explanations
    provided by Mike Read (ROE) from 6dfGS
    project."/gt
  • ltPARAM arraysize"" datatype"char"
    name"Conversion" value"Converted from
    6dfgs_E7.fld.gz by Mark Taylor (Starlink) using
    STIL."/gt
  • ltPARAM arraysize"" datatype"char"
    name"Comment" value"Cut down 6dfGS dataset for
    TOPCAT demo usage."/gt
  • ltFIELD arraysize"15" datatype"char"
    name"TARGET"gt
  • ltDESCRIPTIONgtTarget namelt/DESCRIPTIONgt

Or here
NO!
20
Preservation Issues
  • Given a file or a stream of bits how does one
    know what Representation Information is needed
    (this question applies to Representation
    Information itself as well as to the digital
    objects we are primarily interested in preserving
    and using) how does one know, for example, if
    this thing is in FITS format?
  • Someone may simply know what it is and how to
    deal with it i.e. the bits are within the
    Knowledge Base
  • One may be able to recognise the format by
    looking for various types of patterns.
  • One may feed the bits into all available
    interpreters to see which accept the data as
    valid
  • Other means.
  • The only safe way have an associated label which
    points to the appropriate Representation
    Information
  • Note this does not exclude the other methods e.g.
    for data rescue

21
Registry for Representation Info
The Digital Object could have RepInfo packed with
it
Support automated access processing
Example of use of Representation Information
Labelling
22
CPID
Registry
External
23
Preservation Perspectives
  • Migration
  • Refresh
  • Replicate
  • Repackage
  • Transform
  • Access Service
  • Dissemination API
  • Data Virtualisation
  • Source code
  • Emulation
  • Archive Interoperability
  • Federated
  • Resource sharing

24
Types of Information Used in OAIS
25
Preservation Description Info
Issues of Trust
26
Archival Information Package
27
Preservation Description Information
  • Provenance Information
  • Describes the source of Content Information, who
    has had custody of it, what is its history
  • Context Information
  • Describes how the Content Information relates to
    other information outside the Information Package
  • Reference Information
  • Provides one or more identifiers, or systems of
    identifiers, by which the Content Information may
    be uniquely identified
  • Fixity Information
  • Protects the Content Information from
    undocumented alteration

28
Conclusions
  • OAIS provides the framework for information
    preservation
  • Representation Information is key
  • Representation Information is more than just
    format
  • Desirable to share the effort
  • Also need PDI, Packaging, etc
Write a Comment
User Comments (0)
About PowerShow.com