Title: Linking research papers and research data: possibilities for a generic solution
1WWW2006 repositories workshop
- Linking research papers and research data
possibilities for a generic solution
2Outputs benefits
- The identification of
- workflows, norms and perceived problems in the
use of source and output repositories - common attributes across disciplines
- A generic technical specification for functional
enhancements to source and output repositories,
identified from a survey of active researchers - A pilot system that demonstrates the linking of
holdings in a source repository (the UK Data
Archive) to research papers stored in output
repositories
- The ability more conclusively to track the use
and influence of one's published research - A structured means of surveying research
publications and their associated source data
across an entire discipline or within a specific
research theme - An environment with added value output
repositories that link to their sources and
source repositories that link to their outputs
will expand the opportunities for dissemination
of research and scholarship.
3Constituency
- Seven scientific disciplines surveyed
- Archaeology, Astronomy, Biochemistry,
Biosciences, Chemistry, Physics, Social Policy - Academic researchers (staff PGs), independent
researchers, government - 3,700 e-mail invitations despatched
- 377 online respondents (10)
4Response rates
Astronomy 64 16.9
Archaeology 63 16.7
Physics 63 16.7
Social Policy 62 16.5
Biochemistry 44 11.7
Biosciences 42 11.1
Chemistry 39 10.4
5Endorsement - 1
The value of direct links from source to output data University academic staff University research assistant PG student Contract researcher Independent researcher Other Totals
Significant advantage 85 18 33 11 2 26 175
Useful 78 9 41 5 4 9 146
Interesting 24 4 5 3 0 5 41
Of no interest 9 0 0 0 0 1 10
Not sure 7 0 7 0 1 2 17
Other 1 1 0 0 0 1 3
Totals 204 32 86 19 7 44 392
6Endorsement - 2
The value of direct links from output to source data University academic staff University research assistant PG student Contract researcher Independent researcher Other Totals
Significant advantage 83 19 39 12 3 19 175
Useful 84 9 35 5 3 15 151
Interesting 16 2 6 1 1 3 29
Of no interest 7 0 0 0 0 1 8
Not sure 2 0 2 0 0 1 5
Other 5 1 1 0 0 2 9
Totals 197 31 83 18 7 41 377
7Who uses source repositories?
REPOSITORY Submitters per discipline
Archaeology Data Service 29 45
Brookhaven National Laboratories 4 6
CERN 13 21
Genbank 23 51
National Crystallography Service 8 20
Protein Structures 19 47
SuperCOSMOS 3 5
UK Data Archive 11 18
Uniprot 2 4
Other 99
8Frequency of submission
9Who uses the Other source repositories and what
are they?
- Archaeology
- Astronomy
- (40 of total Other)
- Biosciences
- English Heritage, Portable Antiquities Database
- NED (NASA-IPAC Extragalactic database), CDS
(Centre de Donnees Stellaires), ADS (Harvard),
SIMBAD,NASA data archives, Advanced Camera for
Surveys Science Archive, Very Large Array
Archive, European Southern Observatory Archive,
etc. etc. - BioMagResBank
10Source data formats
11The 76 significant others?
- latex.cc source code, .cif (crystallographic
data), .pdb, .mtz, .pool, .root, .raw, .swf,
.fla, .raw, .mpg, binary files, chemdraw cdx,
xwin nmr files, .ps files, .fla, .swf, masslynx
files, mathematica, derived data in PAw-format
ntuples, raw mass spectrometry data, Kanga, X-ray
diffraction data, kaleidagraphs, Atlas/ti
hermeneutic unit files, C/shell scripts,
Fourier induction decay files, spectra, TeX
source (math), etc., etc., etc., etc..
12Who assigns metadata?
Who assigns metadata to your research data? Academic staff Research assistant Postgraduate student Contracting researcher Independent researcher Other Totals
I decide which terms to use and I assign them 118 15 47 11 5 16 212
Research colleague(s) assign metadata on the team's behalf 34 6 6 3 1 5 55
Research support staff assign metadata on the team's behalf 13 3 1 3 0 2 22
Metadata are assigned by library/information services staff 4 0 0 0 0 0 4
Metadata are assigned by the repository administrators 29 2 0 2 0 4 37
Metadata are generated automatically 31 9 9 4 0 10 63
It is not known who assigns metadata 30 8 23 0 1 6 68
Other 12 5 9 1 1 9 37
Totals 271 48 95 24 8 52 498
13Archaeology refer in some cases to use of
thesauri, Dublin Core, etc.
14Key metadata
15The Other metadata
- Some examples
- Archaeological period, artefact material,
artefact type, conservation method - Celestial object, position and observation date
- Chemical entity, chemical identifier (InChI)
- Description of the instrument operating mode
- Description of GIS processes applied, min/max
co-ordinates, cell resolution for raster data - Description of experimental conditions under
which the data was generated - Experimental method used
- Protein sequence
16Output repositories
What level of searching do you normally find sufficient when using an output repository? What level of searching do you normally find sufficient when using an output repository? What level of searching do you normally find sufficient when using an output repository? What level of searching do you normally find sufficient when using an output repository?
Simple - e.g. author, title, keyword, date 59.3 223
Advanced, using a range of fields and identifiers 21.5 81
Employing Boolean logic 7.2 27
Using a subject thesaurus or subject headings 1.3 5
No preference 8.2 31
Other (please specify) 2.4 9
17Evolving strategy
- These early indications from the StORe
questionnaire confirm a - strategy in which
- the pilot middleware could provide a broad core
generic solution - the middleware must be capable of accepting a
limited number of discipline-specific add-ons - a standard platform for metadata can be
established to reflect a large proportion of
practices and needs. - In addition, further analysis is determining that
- cross-discipline data requirements must be met
for output and source data - a range of different attitudes to data sharing
will have to be supported by effective validation
if repositories are to be accepted and effective - improved online support is expected to be the
most appropriate and economical means of meeting
expectations for help - there are indications of a considerable lack of
awareness of repositories amongst academic staff
and postgraduates.
18