A vision involving raw data archiving via local archives as a supplement to the existing processed data archives (PDB, CSD, ICDD etc) - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

A vision involving raw data archiving via local archives as a supplement to the existing processed data archives (PDB, CSD, ICDD etc)

Description:

A vision involving raw data archiving via local archives as a supplement to the existing processed data archives (PDB, CSD, ICDD etc) John R. Helliwell, Brian McMahon ... – PowerPoint PPT presentation

Number of Views:113
Avg rating:3.0/5.0
Slides: 13
Provided by: Resear116
Learn more at: http://www.iucr.org
Category:

less

Transcript and Presenter's Notes

Title: A vision involving raw data archiving via local archives as a supplement to the existing processed data archives (PDB, CSD, ICDD etc)


1
A vision involving raw data archiving via local
archives as a supplement to the existing
processed data archives (PDB, CSD, ICDD etc)
John R. Helliwell, Brian McMahon, Tom Terwilliger
john.helliwell_at_manchester.ac.uk bm_at_iucr.org terwil
liger_at_lanl.gov
2
Options
  • Do nothing for ensuring raw data archiving
  • Do what we can eg via centralised facilities raw
    data archiving along with Universities own data
    archives both as supplements to the processed
    data archiving at the CSD and PDB etc or at the
    very least by personal web page links
  • Seek a blue skies solution where all raw data are
    compulsorily archived at centralised repositories

3
During the last year detailed options were
sketched out Firstly
  • At the Launch Meeting of the DDD WG in Madrid it
    was suggested that a pilot project involving
    digital object identifier (DOI) registrations of
    a test group of data sets could be established
    this would be led by an SR Facility that is
    keeping a raw data archive in any case.
  • This was enthusiastically supported and DLS
    agreed to take this forward with 100 MX data
    sets.
  • JRH in parallel continued to investigate the
    local University reprint repository archive
    option, which accepts data (in U. Manchester) for
    small data sets this led to finding out that
    U. Manchester in any case was setting up a data
    archive for its researchers so as to satisfy
    funding bodies requirements of its grant holders
    (launch expected September 2012).
  • The local University Data Archive would be the
    vehicle for locally measured diffractometer data
    sets and also perhaps those from SR and neutron
    Facilities that made it into publications by
    academics at that University.

4
During the last year detailed options were
sketched out Secondly
  • A draft proposal was also written by JRH
    exploring the possibility of Acta
    Crystallographica Section E Structure Reports
    Online hosting raw data (the set of diffraction
    data images) for each structure
  • Preliminary analysis, in discussions with IUCr
    Journals Chester, identified the major bottleneck
    as network bandwidth (Chester has 2 x 2Mbps but
    there were also concerns about bandwidth limits
    on international pipes, especially to individual
    laboratories)
  • Also building costs would be involved to upgrade
    a server room for higher-capacity storage
    although preliminary estimates suggested
    per-article storage overhead could be sustainable
    within the journal's open-access charging model

5
JRH with L K-B write article with links to raw
data sets
  • Tanley, S. W. M., Schreurs, A. M. M., Helliwell,
    J. R. and Kroon-Batenburg, L. M. J.
    (2012).Experience with exchange and archiving of
    raw data comparison of data from two
    diffractometers and four software packages on a
    series of lysozyme crystals (2012). J. Appl.
    Cryst. Submitted.
  • Explores comparative metadata associated with
    different instruments, emphasising benefit of
    standard ontologies (e.g. imgCIF)
  • Demonstrates scientific usefulness of detailed
    data reanalysis

6
New reports appear from learned bodies
  • In addition to ICSUs Strategic Committee on Data
    Report
  • The Royal Society (June 2012) enthusiastically
    endorses the importance of access to data their
    Committee defines data in its view as
  • and states For example, the annual cost of
    managing the worlds data on protein structures
    in the world wide Protein Data Bank is less than
    1 of the cost of generating that data.
  • Their data definitions unfortunately seem to miss
    the distinction between processed data and raw
    data.

7
Is a Blue Skies option still out of the question?
  • One or more centralised global repositories might
    take on the raw data archiving?
  • The PDB has given a careful and detailed analysis
    at this Workshop.

8
Is the option of localised repositories (near to
where data are measured) secure yet?
  • CSynR has started a survey of SR Facilities (8
    reported so far) suggesting that this is a
    promising as an option but each SR facility
    emphasised that they are not to be regarded as an
    archive. Neither
  • instantaneous delivery of data
  • provision of data sets certified to be 100 free
    of data corruption
  • could be guaranteed.
  • The Universities Data Archive experience, even at
    the most advanced in their planning (e.g.
    University of Manchester), is yet to be seen in
    practice, e.g. with respect to the two issues
    mentioned in point 1 above.

9
Possibilities for SR facility temporary
repositories
  • Most synchrotron facilities already maintain
    simple archives of users data
  • Perhaps 99 access and availability is plenty
    (and better than nothing)
  • A simple approach
  • Save raw data at SR, tagged with identifier(s).
    Optimally tag meta-data also. (Perhaps one DOI
    per dataset generated at this time and provided
    to user and stored in image headers)
  • Processing programs keep track of identifiers so
    that processed data is linked to raw data
  • On PDB deposition, the DOI is deposited. On
    publication it is listed.
  • PDB notifies SR, the flagged data are copied to a
    long-term storage location
  • Perhaps some day the PDB pulls this data in

10
Might we still need additional fallback
positions?
  • Corresponding authors set up web links to their
    data sets that underpin their publications.
  • These may be or may be not DOI linked such a
    requirement would be difficult to enforce
    although journals could strongly recommend.
  • How would such a method for data archiving and
    access by readers be kept up to date,e. g. in the
    event of an author retiring (or what to do after
    their death?).

11
Conclusions
  • There is an enthusiasm and encouragement to
    archive more than derived or processed data in
    many areas of science besides our own.
  • The crystallographic community prides itself in
    making its processed data accompany its
    publications indeed it has been obligatory these
    last 10 years or so.
  • We have three practical options in the near
    future to extend these principles to our raw
    data
  • via the local Data Archive
  • via synchrotron data storage
  • Or via the corresponding author setting up a
    personal link to datasets underpinning
    publications on their personal websites.

12
So, we suggest a proposal
  • We suggest that we adopt the above three
    practical options to make feasible a
    recommendation to the IUCr Executive Committee
    that
  • Authors should provide a permanent and prominent
    link from an article to the raw data sets
    underpinning a journal publication
  • with a view to making this a formal requirement
    on authors at such time as the community has
    adopted raw data deposition as a routine
    procedure.
Write a Comment
User Comments (0)
About PowerShow.com