Digital Preservation at HUL - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Digital Preservation at HUL

Description:

Humans are dependent on technology to interpret the content... Content can be minimally processed, or can be fully processed by depositors but ... – PowerPoint PPT presentation

Number of Views:714
Avg rating:3.0/5.0
Slides: 39
Provided by: ago50
Category:

less

Transcript and Presenter's Notes

Title: Digital Preservation at HUL


1
Digital Preservation at HUL DRS 2
  • HMS Countway Library
  • Andrea Goethals
  • July 20, 2009

2
Agenda
  • The problem
  • What are we doing about it?
  • DRS 2
  • Open for questions

3
1. The problem
4
The problem is twofold
  • 1. Keeping the bits safe

2. Keeping the bits useful to people
5
Keeping the bits safe
  • Digital things are amazingly easy to destroy
  • Bad people
  • Software or hardware failure
  • Human mistakes
  • Destruction is not always apparent
  • Data not used frequently is at risk of unnoticed
    damage
  • Some damage is not noticeable to human eyes and
    ears

6
Keeping the bits useful to people
  • Digital material is fragile
  • Humans are dependent on technology to interpret
    the content...
  • Technologies must understand the format of the
    content
  • Technologies age and disappear!

7
Using information content
Analog book Unmediated use
Digital book Technology-mediated use
8
Formats are key to determining usability
Formats are the bridge between the content we
want to preserve and supporting technologies
digital content
supporting technologies
9
2. What are we doing about it?
10
Keeping the bits safe
  • Store the bits in multiple copies, in multiple
    places
  • Make sure the bits are not corrupt
  • Replace media periodically
  • Restrict who can access the bits
  • Be able to recover the bits!

11
Keeping the bits safe at HUL
  • 3-4 copies of each file, 2 different media
  • 1-2 (tape and sometimes disk) 60 Oxford Street,
    Cambridge
  • 1 (disk) Summer Street, Boston
  • 1 (tape) Southborough

12
Keeping the bits safe at HUL
  • Automated integrity monitoring
  • Drscheck script
  • Compares the MD5 of each file at the Summer
    Street location to the MD5 stored in a database
  • Also checks the 60 Oxford Street disk copy
  • A copy of each file checked every 2 weeks
  • Recent enhancement Trigger on database update of
    MD5
  • Storage media replaced every 4-5 years

13
Keeping the bits safe at HUL
  • Overseen by OIS and UIS IT staff
  • Just-in-case plans
  • Disaster recovery
  • Server fail-overs
  • Software failure
  • Tape libraries
  • Fabric switches
  • Lost or damaged tapes
  • Data recovery (corruption)

14
Its safe - but is it usable???
  • Its not enough to preserve the bits if the
    format of the bits is obsolete!
  • WordStar? AppleWorks? Excel 1.0?
  • For digital content we are dependent on software
    that can understand the format

15
The importance of format
  • Understanding formats is fundamental to
    preservation

ffd8ffe000104a46494600010201 008300830000ffed0fb05
0686f74 6f73686f7020332e30003842494d 03e90a5072696
e7420496e666f00 0000007800000000004800480000 00000
2f40240ffeeffee03060252 0347052803fc00020000004800
48 0000000002d80228000100000064 000000010003030300
000001270f 0001000100000000000000000000 0000600800
190190000000000000 0000000000000000000000000000 00
00000000000000000000003842 494d03ed0a5265736f6c757
4696f 6e0000000010008313a3000200 ...
16
The importance of format
  • Understanding formats is fundamental to
    preservation

ffd8ffe000104a46494600010201 008300830000ffed0fb05
0686f74 6f73686f7020332e30003842494d 03e90a5072696
e7420496e666f00 0000007800000000004800480000 00000
2f40240ffeeffee03060252 0347052803fc00020000004800
48 0000000002d80228000100000064 000000010003030300
000001270f 0001000100000000000000000000 0000600800
190190000000000000 0000000000000000000000000000 00
00000000000000000000003842 494d03ed0a5265736f6c757
4696f 6e0000000010008313a3000200 ...
SOI APP0 JFIF 1.2 APP13 IPTC APP2 ICC DQT SOF0
183x512 DRI DHT SOS ECS0 RST0 ECS1 RST1 ECS2 ...
17
The importance of format
  • Understanding formats is fundamental to
    preservation

ffd8ffe000104a46494600010201 008300830000ffed0fb05
0686f74 6f73686f7020332e30003842494d 03e90a5072696
e7420496e666f00 0000007800000000004800480000 00000
2f40240ffeeffee03060252 0347052803fc00020000004800
48 0000000002d80228000100000064 000000010003030300
000001270f 0001000100000000000000000000 0000600800
190190000000000000 0000000000000000000000000000 00
00000000000000000000003842 494d03ed0a5265736f6c757
4696f 6e0000000010008313a3000200 ...
SOI APP0 JFIF 1.2 APP13 IPTC APP2 ICC DQT SOF0
183x512 DRI DHT SOS ECS0 RST0 ECS1 RST1 ECS2 ...
18
Keeping the bits useful to people
  • Know what formats you have
  • Make sure theres technology to support the
    formats!
  • Provide ways for people to find it
  • Provide ways for curators to manage it
  • Keep records of significant events
  • Repair, replace

19
Can we approach the problem differently?
  • In way thats more proactive?
  • And more efficient?
  • And less expensive?

Yes
20
The content production matters!
  • The least expensive, and most effective
    preservation measure is to think about the future
    when digital content is created!
  • It makes good sense to try to influence the
    content creation process

21
Preservation lifecycle
  • Create digital content
  • Ingest into a preservation repository
  • Continuous cycle of
  • Monitoring
  • Planning
  • Intervention
  • Subject to collection management decisions
  • Transfer to next generation of the repository or
    to a different repository

22
Keeping the bits useful to people at HUL
  • Guidelines
  • More preservable files
  • formats standard, well-understood,
    well-supported, open
  • Recommended supplementary documentation
    (metadata)
  • Tools
  • FITS, JHOVE check quality of files, automated
    metadata extraction
  • Staff available to consult

23
Keeping the bits useful to people at HUL
  • Collection management applications
  • Discoverable content
  • Catalogs
  • Persistent names
  • Search engines
  • Extensive metadata
  • Administrative, Technical, Structural, Provenance
  • Suite of delivery applications

24
Keeping the bits useful to people at HUL
  • Suite of delivery services
  • Delivery applications created and maintained at
    OIS
  • IDS, PDS, SDS, ADS, FTS
  • Third party middle-ware maintained at OIS
  • RealServer, Luratech JPEG 2000 Server
  • Third party rendering applications on users
    desktops
  • Web browsers, RealAudio Players, TIFF viewers,
    ZIP utilities

25
Involvement in broader preservation community
efforts
  • E-journal archiving
  • Technical metadata
  • Still images, audio, documents
  • METS (package for metadata and digital objects)
  • PDF-A
  • PREMIS (preservation metadata)
  • AIHT (repository interaction demonstration)
  • Registry of digital masters
  • Repository certification
  • Formats registry (UDFR)

26
4. DRS 2
27
DRS 2 changes
  • Why?
  • To better support digital preservation
  • To better support needs of DRS depositors,
    curators and collection managers

28
DRS 2 changes
  • New conceptual foundation
  • Objects, content models
  • User improvements
  • Opaque objects, new file formats, tools, guidance
  • A new approach to metadata
  • Increased preservation planning and activities

29
Objects
  • Currently only a file level in the DRS
  • All management has to be done at the individual
    file level
  • Objects are aggregations of files
  • Page-turned object
  • Still image object
  • More intuitive unit for management, reporting and
    searching
  • Example How many Page-turned objects do I have
    in the DRS?

30
Content models
  • Types of objects
  • Example audio content model

31
Support for opaque objects
  • A special content model
  • Allows files in any format
  • Digital equivalent of buying time at HD
  • Content can be minimally processed, or can be
    fully processed by depositors but not yet
    supported by the DRS
  • Must be intended for long-term preservation
  • Will receive some preservation services
  • Will be on a path to fuller DRS preservation

32
Support for new file formats
  • PDF
  • Audio
  • MP3, MP4/AAC
  • Drawings
  • AutoCAD
  • Adobe Illustrator
  • Video
  • Whats next?

33
Deposit, management delivery tools
  • Enhanced Batch Builder
  • Integrated with File Information Tool Set (FITS)
  • Enhanced DRS Web Admin
  • Better searching
  • Richer management and reporting
  • Ability to perform batch updates
  • File Delivery Service (FDS)
  • Created for PDF delivery
  • Delivers a file to users web browser

34
Future of http//hul.harvard.edu/ois/
35
Guidance user community
  • New website for digital preservation
  • Formats central
  • Content models
  • DRS practices
  • HUL digital preservation projects
  • Emerging standards and best practices
  • Tools, services, registries
  • Resources Experts

36
A new approach to metadata
  • Moving towards community-standard schemas
  • PREMIS, MODS, MIX, textMD, etc.
  • Metadata files on the file system alongside
    content files
  • object descriptor files
  • Preservation, rights, descriptive metadata
  • More reliance on embedded metadata
  • Automatic extraction at deposit time by FITS
  • Third party delivery applications are becoming
    aware of file-embedded metadata

37
Increased preservation planning and activities
  • More granular format identification
  • Sub-file characterization
  • Preservation plans per content model
  • Digital first aid (content metadata)
  • Localization, migrations, normalizations
  • Technology watch
  • Virus checking

38
5. Open questions
Write a Comment
User Comments (0)
About PowerShow.com