Jeremy G. Frey - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Jeremy G. Frey

Description:

The curation of laboratory experimental data as part of the overall data lifecycle Jeremy G.Frey School of Chemistry, University of Southampton, UK – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 38
Provided by: eprintsSo
Category:

less

Transcript and Presenter's Notes

Title: Jeremy G. Frey


1
The curation of laboratory experimental data as
part of the overall data lifecycle
  • Jeremy G.Frey
  • School of Chemistry, University of Southampton,
    UK
  • 21 Nov 2006
  • DCC Conference, Glasgow

2
If you do things right at the start then all the
following processes are much easier!
Exponentially growing amount of data - the future
overwhelms the past
3
The CombeChem Project
  • End to End linking of data and information
  • Publication_at_Source
  • So collect data with regard to how it could
    eventually be used
  • Make sure the metadata is of high quality
  • Record properly at source in Digital Form
  • The Chemistry Lab
  • People Machines working together

4
Combechem
E-Malaria
Smart Lab
R4L
e-Bank
Instruments on the Grid
Statistics
BioSimGrid
5
The concept of Publication _at_ Source
Goal
Knowledge
not just one laboratory but many
co-laboratoriesworking together
Literature
Smart Dissemination
Smart Laboratory
Report
Plan COSHH
Information Integration
Smart HCI
Digital Model
Analysis
Smart Workflow
Smart Storage
Synthesis
6
Typical Laboratory
7
Need to make the data available Need to be able
to find it But how to expose it?
First, they do an online search
8
I am sure we collected that information a few
years ago
The details should be in her thesis..
Can you read what he says here.?
Can you find the file of data that were used to
make the plot?
Some of these problems are due to the lack of
information recorded at the time. Others are due
to loss of information over time.
9
What are the people up to?
  • Capture Data and Context
  • People
  • Process
  • Environment

10
(No Transcript)
11
If you are caught using the scrap of paper
technique, your improperly recorded data may be
confiscated by your TA
12
COSHHLeverage off things we already have to do
We have a cunning plan
13
(No Transcript)
14
(No Transcript)
15
Pub-Sub systems provide the flexible extensible
approach to distribution of real time laboratory
monitoring archiving
Smart Laboratory Spaces
16
But what about the laboratory environment?
I just realized, Howard, that everything in this
apartment is more sophisticated than we are
17
Semantic DataGrid
  • CombeChem used, tested strained the Semantic
    Web for
  • Enhanced (annotated) DataGrid over multiple
    diverse stores
  • Storage of Provenance Information
  • Some Data Storage
  • Annotated multimedia streams
  • Units Propoerties Ontology
  • Multiple Triple Stores

18
Laboratory Blogs
  • Laboratory notebook is a Blog
  • Encourage and facilitate collaboration
  • Need a data repository behind the Blog
  • R4L
  • E-Bank
  • Flexible
  • Service oriented approach being developed
  • A VRE

19
Instrument Blog
Blog-jects
20
The Scientific Blog is being tried in an
attempt to combine laboratory notebooks and
publication
21
Format Issues everyday and for the long term
22
Note the use of YouTube
An experiment that failed Publishable? Useful?
23
CoAKTing Memetic
Record the Scientific Conversation this part
of the record often exists only in the grey
literature
24
Laboratory IRs and Information Management
25
Repositories
26
Validation
  • Increasing the value of data
  • How to bring all the necessary information
    together to enable appropriate validation
  • Increasingly difficult expensive to achieve
  • Need provenance and context
  • Essential step otherwise just a collection of
    items

27
Why?Publishing Data and Information Loss
28
Paper organized using RDF
SVG active graphics
Link to data, follow links back to the raw data
archive
Link to simulation, full simulation data archived
in BioSimGrid
R4L
29
Access to information requires crossing
administrative domains
National Archive
Research Group
Researcher
Research Group
Institution
International Database
30
Subversive and furtive sharing exploitation of
data in virtual space
Digital Repository
Labs
RDF
E-
CAS
OAI Taxi
user
Data
31
He is charged with expressing contempt for
meta-data
32
Metadata Lifecycle
  • Creation and maintenance of metadata
  • Need a metadata infrastructure as well as a data
    infrastructure
  • Capture process as well as results
  • Automatic metadata generation when possible
  • Human annotation will always be needed

33
Plans
  • Plans are useful
  • This is the way things are supposed to be done
  • The Plan provides a digital context so increases
    the value of planning
  • Key to our Smart Lab approach.
  • Is it the best way?

34
Who is responsible
  • Context is crucial for curation
  • every person, on each step of the process of
    converting data to knowledge
  • Need to consider the future access to this
    information by themselves and others.

35
These are the same people if we can talk to
ourselves efficiently over time then that is a
good start to be able to talk to others
Information Providers
Information Consumers
36
We must speed up the knowledge discovery process
All I am saying is that now is the time to
develop the technology to deflect an asteroid
37
PEOPLE
  • Southampton ECS, MATHS CHEMISTRY
  • IT-INNOVATION
  • BRISTOL
  • UKOLN
  • CCLRC
  • INDIANA
  • SYDNEY
  • MANCHESTER
  • EPRSC e-Science Chemistry Programmes
  • JISC e-Infrastructre
  • DTI
  • See web site for full details and links
  • www.combechem.org
Write a Comment
User Comments (0)
About PowerShow.com