JHOVE2 A NextGeneration Architecture for FormatAware Preservation Processing - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

JHOVE2 A NextGeneration Architecture for FormatAware Preservation Processing

Description:

Tom Cramer, Richard Anderson, Hannah Frost, Rachel Gollub, Nancy Hoebelheinrich, Keith Johnson ... What is your underlying assessment model? ... – PowerPoint PPT presentation

Number of Views:113
Avg rating:3.0/5.0
Slides: 14
Provided by: Step642
Category:

less

Transcript and Presenter's Notes

Title: JHOVE2 A NextGeneration Architecture for FormatAware Preservation Processing


1
JHOVE2A Next-Generation Architecture
forFormat-Aware Preservation Processing
Digital Library Federation Fall
Forum Philadelphia, November 5-7, 2007
  • Stephen Abrams
  • Harvard University
  • Evan Owens
  • Portico
  • Tom Cramer
  • Stanford University

2
JHOVE2 project
  • Two year NDIIPP-funded collaborative project to
    develop next generation architecture for
    format-aware preservation processing
  • Harvard University
  • Stephen Abrams, Gary McGath, Robin Wendler
  • Portico
  • Evan Owens, John Meyer, Sheila Morrissey
  • Stanford University
  • Tom Cramer, Richard Anderson, Hannah Frost,
    Rachel Gollub, Nancy Hoebelheinrich, Keith
    Johnson
  • Open source
  • Educational Community License (ECL)
  • SourceForge

3
JHOVE2 project goals
  • Refactor the existing architecture
  • Rectify known inefficiencies and idiosyncrasies
  • Simplify the process of integration
  • Encourage third-party extensions
  • Provide enhancements
  • Separate identification from validation
  • Standardized error handling
  • Standardized handling of validation profiles
  • Standardized reporting using METS, with XSL
    transform
  • More sophisticated data model
  • Arbitrary processing modules

4
JHOVE2 project goals
  • Develop modules
  • Signature-based identification using DROID
  • Validation and characterization
  • Symbolic display of selected binary formats
  • API-level editing capability
  • Policy-based assessment

5
Data model
  • Implicit assumption in JHOVE
  • 1 object 1 file 1 format
  • But what about
  • TIFF with embedded ICC profile and XMP metadata
  • 1 object 1 file 3 formats
  • JPEG 2000 JPX fragmentation
  • 1 object n files 1 format
  • ESRI Shapefile
  • 1 object 3 files 3 formats
  • JHOVE2 will support processing of complex
    aggregate objects and nested formatted bit
    streams
  • 1 object n files m formats

6
Common backplane
  • Outer loop is an iteration over digital objects
  • Inner loop of processes applied against each
    object, passing a common memory structure
  • while (has-another-object)
  • while (has-another-process)
  • process (object, state)

7
Validation
  • There is a useful distinction between
    well-formedness, validity, renderability, and
    usability
  • Well-formedness and validity are bright line
    determinations relative to a specification
  • Renderability is a bright line determination
    relative to a specific rendering tool
  • Usability is a fuzzy determination relative to
    local policies and heuristics

8
Policy-based assessment
  • Evaluate objects based on prior characterization
    and locally-defined policy rules and heuristics,
    for example
  • Risk of technological obsolescence
  • Risk of transformative loss
  • Codify assessment methodologies and best practice
    recommendations
  • Develop a formal language in which to express
    policy rules
  • Implement a rules engine

9
Format support
  • Audio AIFF, WAVE
  • Color ICC
  • Document PDF
  • GIS Shapefile
  • Image GIF, JPEG, JPEG 2000, TIFF
  • Text ASCII, HTML, SGML, UTF-8, XML

10
Schedule
  • 6 months of community outreach, requirements
    gathering, and design
  • 6 months implementation of core APIs and the
    engine
  • 1 year implementation of modules
  • Continual prototyping and re-factoring

11
Questions (for you)?
  • Do you care about the open source license (ECL)?
  • Do you care about the distribution platform
    (SourceForge)?
  • Do you have functional requirements or use cases?
  • How do you use JHOVE today?
  • What needs doesnt it meet?
  • What types of policy assessments do you perform?
  • How do you quantify risk?
  • What is your underlying assessment model?
  • Are you aware of existing expression languages
    and engines for rules-based assessment?

12
Questions (for you)?
  • What can we do to facilitate integration into
    existing (or planned) systems and workflows?
  • What can we do to facilitate third-party
    development and extension?
  • What help would you need to implement your own
    modules?
  • Would you be interested in a co-development
    arrangement with the JHOVE2 project?
  • Do you have interesting test files that you are
    willing to share?

13
Questions (from you)?
Write a Comment
User Comments (0)
About PowerShow.com