Title: Data Curation and Digital Preservation: a view from the UK Part I
1Data Curation and Digital Preservation a view
from the UK (Part I)
IASSIST, May 2004 Madison, Wisconsin
- Peter Burnhill Robin Rice
- www.dcc.ac.uk
Funded by
2An Overview
- Part 1 Digital Curation Centre
- what it is (and is not)
- where it came from
- whos involved
- challenge of set-up go
- Part 2 Digital Curation Across Disciplines
- Digital Preservation Approaches
- Data Curation Activities
- Lessons from for the Social Sciences
- We must all be Data Curators now
3UK Digital Curation Centre
- identified in Report commissioned by JISC Cttee
for Support of Research (Lord Macdonald, May
2003) - Twin drivers
- Digital Preservation ePublishing (DPC)
eLearning - Continuing Access e-Science, data deluge Res
Council policies - Call to set up DCC in JISC Circular 6/03, June
2003 - Ambitious demanding remit,
- Joint funding by JISC and e-Science Core
Programme - Funding for outreach, services development
- Funding for research programme
- Task entrusted to Consortium of four partners
- award made Feb/March 2004
4Terminology Digital Curation
- actions needed to maintain and utilise digital
data research results over entire life-cycle - for current and future generations of users.
- Digital Curation
- Digital Preservation Data Curation
- Digital Preservation
- long-run technological/legal accessibility
usability - Data curation in science
- maintenance of body of trusted data to represent
current state of knowledge in area of research.
5Terminology e-Science
- e-Science is about global collaboration in key
areas of science, and the next generation of
infrastructure that will enable it - John Taylor
- (former) Director-General of Researchh Councils,
- (UK) Office of Science Technology
- e-Research now becoming adopted as a term that is
inclusive of scholarship in Arts and Humanities
and the Social Sciences
6Terminology e-Science
- Invention exploitation of advanced
computational methods - to generate, curate and analyse research data
- From experiments, observations and simulations
- Quality management, preservation and reliable
evidence - to develop and explore models and simulations
- Computation and data at extreme scales
- Trustworthy, economic, timely and relevant
results - to enable dynamic distributed virtual
organisations - Facilitating collaboration with information and
resource sharing - Security, reliability, accountability,
manageability and agility
Malcolm Atkinson, Director, National e-Science
Centre.
7Overall Aim of DCC
- continuing quality improvement in data curation
digital preservation - Initial focus
- data as evidential base for scholarly
conclusions - role of data archiving preservation as keys to
reproducibility and reuse - Wider context remit
- worlds of scholarly communication eLearning
8Objectives
- vibrant research programme
- addressing the wider issues of digital curation
- Collaborative Associates Network of Data
Organisations - outreach for strong links across existing
community of practice - engagement with curators (individuals
organisations) - service definition and delivery
- to evaluate tools, methods, standards and
policies - a repository of tools and technical information
- virtuous circle
- expertise, experience requirement feed into the
DCC research programme
9What the DCC is not ...
- a national digital repository
- an attempt to impose one size fits all
- and to rubbish everybodys past efforts
-
10Whos Involved DCC Consortium Partners
- Four Consortium partner institutions
- University of Edinburgh - lead partner
- University of Glasgow (HATII)
- University of Bath (UKOLN)
- CCLRC (Rutherford and Daresbury Laboratories)
- Prior links via National eScience Centre (NeSC)
- jointly managed by Universities of Edinburgh
Glasgow
11NDCCCANDO
CCLRC
UKOLN
DELOS
DPC
DLI (US)
NeSC
UofG
UofE
12developing the collaborative model
curation organisations eg DPC
communities of practice users
community support outreach
service definition delivery
Collaborative Associates Network of
Data Organisations
management admin support
research collaborators
research
development co-ordination
testbeds tools
Industry
standards bodies
13Responsibilities across the DCC
- Them with titles
- Peter Burnhill, Director (Phase One)
- with Robin Rice, Phase One Project Co-ordinator
- EDINA Data Library, University of Edinburgh
- Peter Buneman Research Director ( PI on EPSRC
grant) - Informatics, University of Edinburgh
- Liz Lyon, Associate Director (Community Support
Outreach) - UKOLN, University of Bath
- Seamus Ross, Associate Director (Service
Definition Delivery) - HATII, University of Glasgow
- David Giaretta, Associate Director (Development)
- CCLRC
- Two significant well known Ex Portfolio names
- Malcolm Atkinson, Director, NeSC
- Chris Rusbridge, Director, Information Services,
UofGlasgow
14Setting up the DCC
- Funding from the JISC began on 1 March 2004
- EPSRC Research funding begins on 1 September 2004
- expect to harvest early crop from extant
research - Phase One Set-up
- from now until Launch of Centre in October 2004
- face2face meetings 20/21 March 24/25 June
- drawing up programme of deliverables
- re-deploying recruiting staff
- aim to have appointed full time director in time
for Launch
15What needs to be done
- Respond to policy imperatives
- twin aimsexcellence in research excellence in
service - international respect national leadership
- meeting the needs of e-Science
- impact now and into the future
- Bridge across communities
- universities research institutes
- scientific data tradition document tradition
- different disciplinary perspectives
- engaging the information computing sciences
- Develop a collaborative model
- Associates Network of Data Organisations
16Key Themes for the DCC
- data as evidence
- for one or more designated communities
- archival responsibility
- at one or more institutional levels
- keen to invoke practices from archiving
- appraisal retention/disposal
- logical physical integrity authenticity/securit
y - keen to locate research problems with Informatics
and with Law
17Research Agenda
- Aims
- evidence curation as integrative activities
- usability automation
- novel visible research
- deliverables/testbeds
- Hot Topics
- annotation provenance
- universal interest, wide subject, eg referencing
- data publishing
- metadata, Grid services, integration, security,
optimisation - archiving and appraisal
- process automation at ingest, curating change,
scalability - socio-economic and legal
- organisational dynamics, rights/responsibilities
- Reach out listen - virtuous circle
18Research Development
- Research
- Annotation, Data integration and publication
- Appraisal and long-term preservation
- Socio-economic legal context
- rights, responsibilities and viability
- Performance and Optimisation
- Development into Services
- Standards Testbeds
- File Formats
- Registry of Metadata Standards
- Further topics
- Evolution of structure, Ontologies, Emulation
19Services Development
- Turns Research into Products for Research that
our communities can use with confidence - tracking and testing tools and standards
- that are correct, usable, reliable, well
documented - e.g. for ingest, repository management, data
exchange, ontologies - working with tool developers wherever possible
- developing testbeds interworking with other
testbeds - aim to gain leverage formats
- working with other projects worldwide
- using generic tools and techniques
- to develop strategies for emerging digital
formats - Metadata standards
- long-term viability of metadata
- Registries underpin this work to provide basis of
Advisory Service
20Early deliverables
- Website at www.dcc.ac.uk
- visit to learn of updates progress
- and Presentations like this
- digitalcuration_at_ed.ac.uk
- contact us with offers of collaboration