Title: Risk Assessment and Cost Appraisal Delos - November 15-16 Academia Nazionale Dei Lincei - Rome
1Risk Assessment and Cost AppraisalDelos -
November 15-16 Academia Nazionale Dei Lincei -
Rome
- Rory McLeod Digital Preservation Manager
- The British Library, London
2Overview
- Risk assessment
- Background
- Components
- Key findings and activities 2007/08
- Cost appraisal
- Lifecycle management
- LIFE1
- LIFE2
3Content analysis from 2006- DPC Mind the Gap
- Over 200 terabytes of data growing by over 50
terabytes a year - Majority is from the sound archive, however
- Manuscripts 1.5 terabytes
- Asia, Pacific and Africa 9.5 terabytes
- European and American 0.5 terabytes
- Commercial services 5 terabytes
- Voluntary deposit of electronic publications 1.5
terabytes - Newspapers potentially 70 terabytes at project
conclusion - Microsoft digitisation 30 terabytes
- Sound recordings 12 terabytes
- A wide variety of formats are represented in the
data most common formats are found, but there
are also smaller amounts of rare and proprietary
data - Update- Based upon the risk assessment we
estimate the total size to be closer to 300 TB.
4Background Risk assessment 2007
- The objective of Digital Preservation Team is to
address the risk of deterioration of digital
material through short-term access to long-term
preservation. - Building on a 2003 study
- When we examined the 2003 risk assessment we
concluded - Having the object isnt enough (CDR)
- Knowing the format of the object isnt enough
(EPS) - You need software to use it, a computer to run it
on (Postscript Parser) - The functionality and access of the object can
intimately depend on the details of the
environment, most of which we dont have.
(operating system, hardware requirements) - Taking those basic concepts a little further..
5Background - Risk assessment 2007
- Physical media deterioration
- The lifetimes of physical media can be measured
in years (or even months, e.g. recordable CD/DVD) - Unlike books which can be kept for centuries in
the right conditions - Technical obsolescence
- 3.5 floppy disk drives used to be ubiquitous,
now only a few have them - File format obsolescence
- Keeping the bitstream isnt enough we need to
understand it - Many file formats are undocumented we cant
understand the files, and need the software - Software gets abandoned (who uses WordStar any
more?) and new versions can be incompatible with
old files - Environmental obsolescence
- Keeping the software isnt enough we need to run
it - But old hardware doesnt work, or isnt available
6Background - Risk assessment 2007
- So our starting point
- Identify the digital assets currently held
- Identify the environmental requirements of those
assets - Assess the risks jeopardizing the use of those
assets - React to those risks (save those assets to
which access no longer exists or will most soon
be lost) - Proactively respond to those risks (prevent
assets from becoming inaccessible in the future)
7Background Risk assessment 2007
- This produced an accurate and detailed digital
holdings list - Be as near to exhaustive as possible
- Be detailed
- Physical formats
- Disk/file-system formats
- File formats
- Operating system requirements
- Application requirements
Not covered in 2003
8Background Risk assessment
- From this holdings list, we have performed a risk
assessment to - Enumerate the risks faced
- Evaluate the likelihood and impact of each risk
- Rank holdings according to risk
- Perform risk-based triage on holdings
- And longer term, from the assessment and ranking
we will - Prioritise ingest into DOM
- Write preservation plans and take preservation
actions to target the highest-risk material - Possibly Migrate it to less risky file formats
- Preserve software environments (emulators etc.)
- Guide future ingest
- Determine a set of preferred long-term
preservation formats
9Components - Risk assessment 2007
- AS/NZ 43602004 risk standard
- Follows on from 2003 study
- Methodology is split into Context, identify,
analyse, evaluate and treat - A new value scale for BL holdings plus DRAMBORA
an established risk toolkit was used for impact
(cataclysmic to none) - Representative content was analysed from all
areas of the collections, the first time this has
been done. - 23 different risks were identified, these were
gathered into 6 direct and 2 indirect risks - Media Degradation, Media Obsolescence, File
format obsolescence, Hardware obsolescence,
Operating system file system, Software
obsolescence - Poor policy (Cataloguing, Metadata), Poor policy
other (Handling, Training)
10Components - Risk assessment 2007
- The AS/NZS 43602004 Risk Management standard
defines a seven-step approach to risk management - Communicate and consult
- Communicate and consult with internal and
external stakeholders as appropriate at each
stage of the risk management process and
concerning the process as a whole. - Establish the context
- Stakeholders are identified, and the objectives
of the stakeholders and the organization as a
whole are established. - Identify the risks.
- In this stage, the risksthat is, what can go
wrongare enumerated and described. We used a
combination of industry analysis and real life
scenarios.
11Components - Risk assessment 2007
- Analyse the risks
- This step covers the evaluation of the impact of
the risks, and the likelihood of those risks. The
evaluation may be qualitative or quantitative - Evaluate the analysis
- At this stage, negligible risks might be
discarded (to simplify analysis), and evaluations
(especially qualitative evaluations) adjusted. - Treat the risks
- The options to address the risks are identified,
the best option chosen, and implemented. This may
include taking no action if no risk is
sufficient. - This step was felt beyond the remit of this
assessment project.
12Components - Risk assessment 2007
- Monitor and review
- It is necessary to monitor the effectiveness of
all steps of the risk management process. This is
important for continuous improvement. Risks and
the effectiveness of treatment measures need to
be monitored to ensure changing circumstances do
not alter priorities. - The assessment also uses the impact scale
devised in the DRAMBORAi methodology. - i Digital Repository Audit Method Based on Risk
Assessment http//www.repositoryaudit.eu/ - Summary
- The first part of the analysis was to create an
inventory of the digital assets. Each collection
area was visited and interviewed, and a partial
audit of their digital material conducted. This
provided an indicative sample of the current
state of play within the Library. It is likely
that continued annual updating of this list will
form part of the long-term maintenance of the
analysis.
13Components - Risk assessment 2007
- Broad results returned, can be split into
technical and policy, the headlines are - 12 of the 13 case studies returned results
consistent with the highest category of risk
identified (Media Degradation) - Secondary risks associated with software and
hardware are less risky but without addressing
Media Degradation all data is at the same high
level of risk. - Failure rates for disks within the BL collections
have reached a high level (up to 3) - No central store or service for this digital
content - The proposed timeframes for ingest of this
material mean that an interim solution must now
be considered to safeguard this material (and
prepare it for ingest) - There is a lack of awareness of the fragility of
these collection items across the BL - There is a need for training in both handling and
data stewardship skills across the collection
areas
14Components - Risk assessment 2007
Risk ranking Risk Access type jeopardized
8 Media degradation Bit-stream
7 Media obsolescence Bit-stream
6 File format obsolescence File/Semantic
5 Hardware obsolescence File/Semantic
4 Operating system file system obsolescence File/Semantic
3 Software obsolescence File/Semantic
2 Poor policy (improper cataloguing, metadata) Semantic
1 Poor policy (other) Semantic/File/Bit-stream
15Risk assessment- Final Prioritisation
16Components - Risk assessment 2007
- BL preservation value system
- Our obligation to preserve the material
- Estimates of the cost/effort to mitigate the
risks - Estimates of the resource available to the
Digital Preservation Team - Estimates of the cultural significance and value
of the collection - The commercial significance and value of the
collection - The need for further analysis of the collection
to inform future preservation activities - Reader and researcher needs
- Interest and demand
17(No Transcript)
18Key findings- Risk Assessment 2007
- DPT needs to create and implement a policy that
deals with all digital content consistently - This reduces the variations seen in how digital
material is cared for - BL needs to move from at-risk physical media to
online hard disk-based managed storage. - This addresses media deterioration, physical
damage, environmental damage, and media
obsolescence, and is believed to be the best
long-term storage mechanism option available - This also enhances manageability of the digital
collection - Where migration to hard disk is not immediately
possible, move to climate controlled (etc.)
storage to ensure that the physical media last as
long as possible (and back-up) - This reduces the problems due to media
deterioration, physical damage, and environmental
damage - Failure rates for disks within the BL collections
have reached unacceptably high levels (up to 3)
for hand held media.
19Activities to mitigate 07/08- Risk Assessment 2007
- Monitor and review
- DPT will use a continuous improvement approach
constantly reducing the level of risk - Annual update to the risk assessment to
continuously improve the condition of the
collection based digital objects - Annual identification of resulting actions to
mitigate risks - Management of the digital preservation
prioritisation table - Key performance indicators to be drawn from the
risk factors within the prioritisation table, to
be monitored by the digital preservation steering
group. (Ideally all risk factors should be in a
continuous process of reduction) - Resource Plan
- DPT will take responsibility for this effort by
writing a resource plan to establish next stage
activity. This will involve people, equipment,
storage and policy issues. - Establishment of the British Library centre for
digital preservation based upon this risk work. - This work is already underway.
20The cost of digitisation and preservation The
LIFE Project
01101101010101011001110100110110101010101100111010
0110110101010101
21Cost appraisal overview
- What is the LIFE Project?
- LIFE1 and LIFE2
- LIFE Models
- Burney Case Study
- Benefits
- Further Information
22Lifecycle Information for E-literature
- Project phases
- LIFE1 (12 months)
- LIFE2 (18 months)
23LIFE starts to answer the question
- What is the long term costof preserving digital
material?
24Why use lifecycle costing?
- Enables evaluation of all the financial
commitments for an item in a collection - Important for digital collections, where many
costs are largely unknown
25Lifecycle management- aims
- Better understanding of the digital lifecycle
- Plan and prepare for digital preservation
activities - Evaluate and improve efforts
- Compare analogue and digital
26LIFE1 project
- Literature Review
- Economic Lifecycle Model
- Generic Preservation Model
- Case Studies
- International Conference
27LIFE1 Case Studies
e-Journals Web Archiving Voluntary Deposit
28LIFE2
29Aim of LIFE2
- To evaluate, refine and
- further develop the techniques
- developed in phase one of LIFE
30LIFE2 deliverables
- Economic Evaluation of LIFE1
- Revision of the LIFE Model
- Version 1.1 (October 2007)
- Version 2 (Summer 2008)
- Updated Preservation Model (Summer 2008)
- Final report
- End of project conference
31The LIFE Model v1.1
Access
Content Preservation
Bit-stream Preservation
Metadata Creation
Ingest
Lifecycle Stage
Access Provision
Preservation Watch
Repository Admin
Re-use Existing Metadata
Quality Assurance
Lifecycle Elements
Access Control
Preservation Planning
Storage Provision
Metadata Creation
Deposit
User Support
Preservation Action
Refreshment
Metadata Extraction
Holdings Update
Re-ingest
Backup
Reference Linking
Inspection
32LIFE Model v1.1 Non-lifecycle Elements
Non-Lifecycle Stage Management and Administration Systems / Infrastructure Economic Adjustments
Non-Lifecycle Elements Management Repository Software Inflation
Non-Lifecycle Elements Administration Discounting
33Generic LIFE Preservation Model
- The GPM predicted large cost and much activity -
the challenge is reducing both.
34Generic LIFE Preservation Model
Preservation cost of n objects of a particular
format for the period 0 to t.
e.g. 200000 objects of the GIF format for a
period of 10 years.
Frequency of action
Tech Watch
Preservation action
Preservation
- Monitoring formats and software for obsolescence
- Preservation planning
- Updating metadata
Q/A
Update object and event metadata
Perform preservation action
Cost of Preservation tool
- The number of preservation actions within the
time period calculated
35Complexity of file formats
Frequency of action
Tech Watch
Preservation action
Preservation
Category Complexity Examples
Simple 0.1 ASCII, Unicode
Bitmap 0.2 JPEG, GIF
Mark-up 0.3 XML, HTML
Vector 0.4 EMF, Draw
Multimedia 0.6 MPEG3, WAV
Document 0.8 Word, PDF
Complex 1 Oracle database dump
- Size
- Complexity
- Proprietary
- Open
- Standardised
Q/A
Update metadata
Perform preservation action
Cost of Preservation tool
Format Complexity
36LIFE2 Case Studies
01101101010101011001110100110110101010101100111010
01101101010101011001110100110110101010101100111010
0110110101010101100111010110
Institutional Repositories Primary
Data Digitised Newspapers
37The Burney Collection
- Purchased by the British Library in 1818 for
13,500 - 1,100 volumes of the earliest known newspapers
- 1,000,000 pages from 17th, 18th and 19th
Centuries. - Re-scanning or re-microfilming is not possible.
- Microfilmed in the 1970s
- Digitisation started in 1995-96 and ran until
2004.
38Questions that arise from Burney
- Comparing digital and analogue lifecycles
- What is the lifecycle cost to an institution of
producing digitised surrogates? - What are the key preservation issues common
across digitisation projects of differing scales?
39Benefits of LIFE
- Assess the financial commitment for acquiring or
creating new digital materials - More effective planning for preservation
activities - Comparison of digital lifecycles across
collections - Evaluation and optimisation of existing digital
lifecycles - Predictive future cost of digital preservation
40LIFE Website Blog
- Websitewww.life.ac.uk
- LIFE Blogwww.life.ac.uk/blog
41Thanks and Acknowledgements
- Thanks for your attention.
- Risk Assessment 2007 (Peter Bright and Paul
Wheatley) - LIFE Team (Paul Ayris, Helen Shenton, Paul
Wheatley and Richard Davies) -