Exploiting scientific data in the domain of omics - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Exploiting scientific data in the domain of omics

Description:

... Oliver Fiehn, Jennifer Fostel, Peter Ghazal, Graeme Grimes, ... Norman Morrison (NEBC) Jeremy Selengut (TIGR) Peter Sterk (EBI) Tatiana Tatusova (NCBI) ... – PowerPoint PPT presentation

Number of Views:264
Avg rating:3.0/5.0
Slides: 45
Provided by: dfi73
Category:

less

Transcript and Presenter's Notes

Title: Exploiting scientific data in the domain of omics


1
Exploiting scientific data in the domain of omics
  • 'Genomics Standards Consortium
  • Ontology requirements and experiences'

Dawn Field Oxford Centre for Ecology and Hydrology
2
Overview
  • Goal of this Workshop to explore what's been
    achieved to date with RDF, meta-data and
    ontologies in exploiting scientific data -
    particularly data integration, discovery and
    sharing
  • what we have achieved
  • the challenges we face
  • what we hope to achieve in the near future
  • what are the major issues requiring further
    research

3
Challenges and Opportunities
  • Rapidly growing collection of genomes
  • Increasing need for researchers to access,
    combine and analyze data sets containing genomic,
    taxonomic, ecological and environmental data
  • Increasing number of initiatives capturing
    metadata
  • Additional information about complete genome
    sequences would be beneficial

4
De novo DNA sequencing Continues to grow
exponentially
SymBio Corporation
SymBio Corporation
5
Data scope of genome resources at NCBI
Environmental samples?
Organisms
Nematoda
C.elegans, C.briggsae
Microbes
Viruses
Fungi/small eukaryotes
Plants
A.thaliana Barley Corn Oat Rice Soybean Tomato Ric
e Wheat
Fishes
Insects
D.melanogaster, A.gambia, D.pseudoobscura, Honey
bee,
Chicken
Dog
Mouse/Rat
pig, cow
Human
chimpanzee
6
The Promise of Metagenomics
7
Features of GBMF marine microbial genome
sequencing project webpage ? Acts as portal
to primary investigator webpage ? Provides
basic information about the organism 1)
Phylogeny of organism 2) Physiology, if
known 3) Habitat 4) Geographic
location 5) Isolation technique 6) Primary
citation 7) Culture collection
www.moore.org/microgenome
8
Problems
  • DATA INTEGRATION!!!!!
  • NO SUFFICIENT DATA REGARDING PHYSIOLOGY OF
    ORGANISMS !!!!

9
(No Transcript)
10
Morphology and Growth
  • Haemophilus influenzae is a non-motile,
    gram-negative, rod shaped bacterium. Optimal
    growth temperature is 37 degrees and doubling
    time in culture is 26 minutes.

11
Interactions and Ecology
  • H. influenzae is a obligate commensal with the
    ability to cause disease including menigitis and
    otitis media. The primary habitat of this species
    is the human nasopharyx. This bacterium is
    faculatively anaerobic and uses organic matter as
    a source of carbon and organic matter as a source
    of energy.

12
What we have achieved
13
Cataloguing our Complete Genome Collection
  • Proposal Field D, Hughes J (2005). Cataloguing
    our current genome collection. Microbiology 151
    1016-1019
  • Analysis Hughes J Field D (2005) Ecological
    Perspectives on our complete genome collection
    Ecology Letters. 8, 1334-1345
  • Workshop Cataloguing our current genome
    collection Sept 7-9, 2005 Cambridge, UK NIEeS
    D. Field, G. Garrity, N. Morrison, J. Selengut,
    P. Sterk, N. Thomson, T. Tatusova. Meeting
    report. Comp. Func. Genomics.
  • Genomic Standards Consortium (GSC)
    http//gensc.sourceforge.net
  • Funding Cataloguing our current genome
    collection (NERC International Opportunities
    Fund Award NE/3521773/1)

14
Cataloguing our Complete Genome Collection
  • Workshop Cataloguing our current genome
    collection II Nov 10-11, 2005, EBI, Cambridge,
    UK D. Field, N. Morrison, J. Selengut, P. Sterk,
    Meeting report OMICS (in press)
  • Special issue of OMICS on data standards guest
    editors Dawn Field and Susanna Sansone organized
    around first two GSC workshops
  • Funding Cataloguing our current genome
    collection funding from NIEeS for two more
    workshops in June 2006 and 2007
  • Workshop 3rd GSC workshop Sept 11-13, 2006
    NIEeS, Cambridge UK. Co-organizers Dawn Field and
    Tatiana Tatusova
  • Genome Catalogue Launch of implementation of
    MIGS checklist as a database ready to accept case
    study genomes

15
Overview of GSC activities
  • The aim of the Genomic Standards Consortium (GSC)
    is to support the
  • community-based development of a genomic standard
    that captures a
  • richer set of information about complete genomes
    and metagenomic
  • datasets.
  • Checklist
  • Implementation
  • Ontology development
  • Metadata exchange

16
Overview of GSC activities
  • Checklist The GSC is currently working together
    towards the "Minimal Information about a Genome
    Sequence" (MIGS) specification.
  • Implementation To promote discussion and support
    the capture of preliminary data an XML schema has
    been built from the checklist and implemented as
    the Genome Catalogue database.

17
Overview of GSC activities
  • Ontology development The GSC is also working
    towards the development of controlled
    vocabularies for describing genomes and this work
    feeds into the FuGO project (A Functional
    Genomics Investigation Ontology).
  • Metadata exchange GFF3 and GnoME

18
The challenges we face
19
Challenges
  • Defining the standard
  • Collecting the data
  • Fields can be calculated in a variety of ways
    separate curated and calculated fields
  • We dont know enough about many of these genomes
    with respect to lifestyle
  • Relationships between genomes
  • Completeness of data

20
Defining the Checklist
Concepts Organism Phenotype Environment Sample
Processing Data Processing
Taxonomic Groups Eukaryotes Bacteria/Archaea Plas
mids Viruses Organelles Metagenomes
Implementation Working Group Metadata Exchange
Working Group
21
Proliferation of MI Checklists
  • Upcoming special issue of OMICS a journal of
    integrative biology on data standards includes
    descriptions of 7 checklists
  • Upcoming issue of Nature Biotechnology expected
    to include more

22
Protein Standards Initiative (June 2006)Special
sessionThe proliferation of MI checklists
opportunities and challenges
  • Chris Taylor (EBI) Minimal Information about a
    Protein Experiment (MIAPE) and MIxxx and the
    need for a central registry
  • Dawn Field (CEH Oxford) Minimal Information about
    a Genome Sequence (MIGS)
  • Don Robertson (Pfizer Global RD, Ann Arbor MI)
    MSI -- Metabolomics Standards Initiative.
  • Graeme Grimes (Scottish Centre for Genomic
    Technology and Information, Edinburgh, UK)
    Minimum Information About a RNAi Experiment
    (MIARE)
  • Stefan Wiemann (DKFZ, Heidelberg, Germany)
    Minimum Information About a Cellular Assay
    (MIACA)
  • Ryan Brinkman (UBC, Canada) presented by Chris
    Taylor (EBI) Minimum Information for a
    Fluorescence Activated Cell Experiment (MIFACE)

23
  • MICheck A Minimum Information Checklist Portal
  • Chris Taylor, Dawn Field, Susanna-Assunta
    Sansone, Rolf Apweiler, Michael Ashburner, Cathy
    Ball, Pierre-Alain Binz, Alvis Brazma, Ryan
    Brinkman, Eric Deutsch, Oliver Fiehn, Jennifer
    Fostel, Peter Ghazal, Graeme Grimes, Nigel Hardy,
    Henning Hermjakob, Randall Julian, Martin Kuiper,
    Nicholas Le Novère, Jim Leebens-Mack, Suzi Lewis,
    Ruth McNally, Norman Morrison, Norman Paton, John
    Quackenbush, Donald Robertson, Philippe
    Rocca-Serra, Barry Smith, Jason Snape, Stefan
    Wiemann

24
micheck.sourceforge.net
  • The MICheck website will provide
  • a comprehensive list of MI checklists
  • convenience links to relevant resources
    appropriate tools, data formats, ontologies
  • links to relevant policy statements from various
    external bodies (such as funders data sharing
    policies, journals publication guidelines and so
    forth).
  • contact(s) for submitting feedback
  • where possible, most recent versions of
    checklists (either as a local copy or a link)
  • charter for the group
  • guidelines for registering a checklist
  • sign-up details for the mailing list.

25
micheck.sourceforge.net
  • The MICheck website will provide
  • Minimal Information about a Minimal Information
    Checklist (MIMI)
  • Searchable database of terms from all checklists

26
We propose that the MICheck play two primary
roles
  • The first is to provide a one-stop shop for
    researchers, journal editors and reviewers, and
    funders providing a quick and simple way to
    discover (whether there are) guidelines for a
    particular domain.
  • This second is to facilitate investigation of the
    boundaries, overlaps and gaps between projects,
    minimally by raising awareness of the scope and
    progress of extant efforts.

27
These two roles translate into two distinct parts
of MICheck
  • Portal exists simply to raise awareness of, and
    afford simple access to a wide range of
    checklists registering for the portal implies no
    commitment to integrate by the registrant.
  • Foundry communities can, if motivated, sign up
    to the foundry to jointly examine ways to
    refactor the checklists over which they have
    control and begin to produce the first components
    of a suite of self-consistent, clearly bounded,
    orthogonal, integrable checklist modules.

28
Registering a project
  • Domain Genomics and metagenomicsChecklist
    type Primary guidelinesCommunity Name The
    Genomic Standards ConsortiumMain website
    http//gensc.sourceforge.org/MI Checklist Name
    Minimal Information about a Genomic SequenceMI
    Checklist Acronym MIGSCurrent Version Number
    0.1Release Date for current version
    2006-01-01Primary Contact Person Dr Jane
    DoeComments Early draft based on first two
    exploratory workshops public distribution for
    commentKey concepts eukaryotes,
    bacteria/archaea, plasmids, organelles, viruses,
    metagenomes, organism, phenotype,
    environment, sample processing, data
    processingBibliography Publications to be
    reposited where possibleLocation of document(s)
    http//sourceforge.net/project/showfiles.php?group
    _id153365

29
Proteomics three main efforts
  • The Minimum Information About a Proteomics
    Experiment (MIAPE)
  • HUPO Proteomics Standards Initiative
  • The Paris Guidelines
  • sponsored by MCP
  • Guidelines for the Next Ten Years of Proteomics
  • published by Proteomics

30
Integrative Activities
31
Defining the Checklist
Investigation
Concepts Organism Phenotype Environment Sample
Processing Data Processing
Taxonomic Groups Eukaryotes Bacteria/Archaea Plas
mids Viruses Organelles Metagenomes
Study
Assay
Implementation Working Group Metadata Exchange
Working Group
32
what we hope to achieve in the near future
33
FuGO An Ontology for Functional Genomics
Investigation
Susanna-Assunta Sansone (EBI) Overview Trish
Whetzel (Un of Penn) Microarray Daniel Schober
(EBI) Metabolomics Chris Taylor (EBI)
Proteomics On behalf of the FuGO working
group http//fugo.sourceforge.net
34
FuGO - Rationale
  • Standardization activities in (single) domains
  • Reporting structures, CVs/ontology and exchange
    formats
  • Pieces of a puzzle
  • Standards should stand alone BUT also function
    together
  • - Build it in a modular way, maximizing
    interactions
  • Capitalize on synergies, where commonality
    exists
  • Develop a common terminology for those parts of
    an investigation that are common across
    technological and biological domains

                                       
 
35
FuGO - Overview
  • Purpose
  • NOT model biology, NOR the laboratory workflow
  • BUT provide core of universal descriptors for
    its components
  • To be extended by biological and technological
    domain-specific WGs
  • No dependency on any Object Model
  • - Can be mapped to any object model, e.g. FuGE OM
  • Open source approach
  • Protégé tool and Ontology Web Language (OWL)

36
FuGO Communities and Funds
  • List of current communities
  • Omics technologies
  • HUPO - Proteomics Standards Initiative (PSI)
  • Microarray Gene Expression Data (MGED) Society
  • Metabolomics Society Metabolomics Standards
    Initiative (MSI)
  • Other technologies
  • Flow cytometry
  • Polymorphism
  • Specific domains of application
  • Environmental groups (crop science and
    environmental genomics)
  • Nutrition group
  • Toxicology group
  • Immunology groups
  • List of current funds
  • NIH-NHGRI grant (C. Stoeckert, Un of Pen) for
    workshops and ontologist
  • BBSRC grant (S.A. Sansone, EBI) for ontologist

37
FuGO Processes
  • Coordination Committee
  • Representatives of technological and biological
    communities
  • - Monthly conferences calls
  • Developers WG
  • Representatives and members of these communities
  • - Weekly conferences calls
  • Documentations
  • http//fugo.sourceforge.net
  • Advisory Board
  • Advise on high level design and best practices
  • Provide links to other key efforts
  • Barry Smith, Buffalo Un and IFOMIS
  • Frank Hartel, NIH-NCI
  • Mark Musen, Stanford Un and Protégé Team
  • Robert Stevens, Manchester Un
  • Steve Oliver, Manchester Un
  • Suzi Lewis, Berkeley Un and GO

38
FuGO Strategy
  • Use cases -gt within community activity
  • Collect real examples
  • Bottom up approach -gt within community activity
  • Gather terms and definitions
  • - Each communities in its own domain
  • Top down approach -gt collaborative activity
  • Develop a naming convention
  • Build a top level ontology structure, is_a
    relationships
  • Other foreseen relationships
  • - part_of (currently expressed in the taxonomy as
    cardinal_part_of)
  • - participate_in (input) and derive_from
    (output),
  • - describe or qualify
  • located_in and contained_in
  • Binning terms in the top level ontology
    structure
  • The higher semantics helps for faster binning

39
FuGO Status and Plans
  • Binning process - ongoing
  • Reconciliations into one canonical version
  • Iterative process
  • Common working practices - established
  • Each class consists of term ID, preferred
    term, synonyms, definition and comments
  • Sourceforge tracker to send comments on terms,
    definitions, relationships
  • Timeline for completion of core omics
    technologies
  • Two years and several intermediate milestones
  • Interim solution
  • - Community-specific CVs posted under the OBO
  • Ultimately FuGO will be part of the OBO Foundry
    (Core) Ontology
  • Overview paper Special Issue on Data
    Standards OMICS journal

40
Areas requiring significant research
41
Summary gensc.sf.net
  • The GSC is tackling the issue of describing our
  • complete genome collections in greater detail
  • through
  • MIGS
  • Genome Catalogue
  • Ontology Development
  • Metadata Exchange
  • In co-ordination with
  • MICheck micheck.sf.net
  • FuGO fugo.sf.net

42
Acknowledgements
  • GSC Coordinators
  • Dawn Field (CEH Oxford)
  • George Garrity (Bergeys Trust)
  • Norman Morrison (NEBC)
  • Jeremy Selengut (TIGR)
  • Peter Sterk (EBI)
  • Tatiana Tatusova (NCBI)
  • Nick Thomson (Sanger)

Working Groups General Members of the
GSC Participants of all meetings
gensc.sf.net
43
Acknowledgements
  • MICheck A Minimum Information Checklist Portal
  • Chris Taylor, Dawn Field, Susanna-Assunta
    Sansone, Rolf Apweiler, Michael Ashburner, Cathy
    Ball, Pierre-Alain Binz, Alvis Brazma, Ryan
    Brinkman, Eric Deutsch, Oliver Fiehn, Jennifer
    Fostel, Peter Ghazal, Graeme Grimes, Nigel Hardy,
    Henning Hermjakob, Randall Julian, Martin Kuiper,
    Nicholas Le Novère, Jim Leebens-Mack, Suzi Lewis,
    Ruth McNally, Norman Morrison, Norman Paton, John
    Quackenbush, Donald Robertson, Philippe
    Rocca-Serra, Barry Smith, Jason Snape, Stefan
    Wiemann

44
FuGO An Ontology for Functional Genomics
Investigation
Susanna-Assunta Sansone (EBI) Overview Trish
Whetzel (Un of Pen) Microarray Daniel Schober
(EBI) Metabolomics Chris Taylor (EBI)
Proteomics On behalf of the FuGO working
group http//fugo.sourceforge.net
Write a Comment
User Comments (0)
About PowerShow.com