Digital Curation and Preservation: Defining the research agenda for the - PowerPoint PPT Presentation

1 / 8
About This Presentation
Title:

Digital Curation and Preservation: Defining the research agenda for the

Description:

Digital Curation and Preservation: Defining the research agenda for the. next decade Warwick Workshop 7/8 November ... Teetering on the brink of federation. ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 9
Provided by: dcc8
Category:

less

Transcript and Presenter's Notes

Title: Digital Curation and Preservation: Defining the research agenda for the


1
Digital Curation and Preservation Defining the
research agenda for the next decade Warwick
Workshop 7/8 November 2005
Breakout Group Distributed
Architectures
Scope and definition Technologies and services
for digital curation
2
Where are we now?
  • Domain based archive systems lifetime not
    guaranteed past life of organisation.
  • Centralised, distributed and scalable storage
    systems (TB-PB) Scalability beyond single file
    systems.. to 10s millions
  • Fair interoperability (1 to few), but clunky user
    interfaces and tools despite proliferation of
    user access methods. Search and discovery good
    tools exist but not widespread or well
    integrated.
  • Teetering on the brink of federation. Systems
    looking towards virtualisation, but difficult to
    build future proof solutions (E.g. XML)
  • Very painful, costly and regular migrations (of
    hardware), and software, Can manage format
    migrations, but not as standard. The next
    complete service?
  • Ingestion standards, and certification and trust
    are also weak areas

3
5 years time?.
  • Have solid Certification standard with
    accreditation system
  • Heading towards Massively scalable architectures
    (10s PBs) (centralised services vs distrib) -
    which include Improved migration services,
    workflow processes (tools), error levels.
  • Keep domain based approach but wider dispersed
    search and discovery, and ease clunky user
    interface (standardisation).
  • Progress towards automated validation and
    ingestion issues.
  • Automated format migration as standard. Improved
    trust (via virtualisation?)
  • Start to feel the affect of high level policies
    beyond organisation.
  • Partially crack the representation Information
    for my designated community problem.

4
10 years time?
  • Massively scalable systems (100s PB to exabytes)
  • Seamless migration tools for hardware and
    software systems
  • Seamless international interoperability
  • Natural language and fully automated
    queries/discovery.
  • Federations of federations
  • including cross-domain
  • Knowledge and management virtualisation
  • Fully crack the representation Information for
    my designated community problem

5
How to get there? 5 years
  • Certification standards, accreditation system
    and certification process etc
  • Incremental prototypes for virtualisation of
    layers
  • Check domain specificity
  • Addition of services and methods

6
How to get there? 10 years
  • Abstraction techniques
  • Semantics research
  • Modelling
  • Case studies
  • Incremental prototypes with significant lifetimes
    and operational loads - rather than big-bang
    modelling

7
What are the right research questions?
  • Process
  • Management What are the preservation policies?
    How are policies tuned to be cost effectiveWhat
    is acceptable loss? How do you define the chain
    of responsibilities?
  • Trust How can I trust an external organisation?
    Or alternativley, what ways do we have to
    engineer away (around) the trust issues? (Isnt
    this just risk management?)
  • Workflow What are the tools required to
    evaluate the cost effectiveness of proposed
    workflows? System integration..?
  • Information
  • Knowledge Focus on automated ingest - Maximise
    value of ingest minimise subsequent mining. E.g
    Evolution of Ontologies.
  • Data Need to model metatdata over time. Time
    dependent data (genomics is constantly
    evolving..)
  • Hardware Holographic storage from 2008? How
    real is this?

8
Dependencies
  • Growth rate of storage, CPU, bandwidth assumed to
    hold (but worry that it might not)
  • Need to maintain balance between CPU and I/O
  • Storage
  • Depends on commercial opportunities
  • Bit-error rates demanded commercially may not be
    adequate for Exabyte data
  • Related to commercial plans
  • More effective access to archives must go hand in
    hand with preservation requirements and available
    resource constraints
  • Preserving data beyond the organisation lifetime
Write a Comment
User Comments (0)
About PowerShow.com