The GenomeMine: The need for improved genomic metadata capture and exchange - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

The GenomeMine: The need for improved genomic metadata capture and exchange

Description:

NCBI Taxonomy ... Taxonomy. Annotation Process. Genomic Features. Publication and Electronic Resources ... Taxonomy ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 30
Provided by: dfi73
Category:

less

Transcript and Presenter's Notes

Title: The GenomeMine: The need for improved genomic metadata capture and exchange


1
The GenomeMine The need for improved genomic
metadata capture and exchange
  • Dawn Field
  • Oxford Centre for Ecology and Hydrology

2
Deluge of Data
3
The GenomeMine
4
The Genome
  • The H. influenzae genome was published in 1995
    and was the first bacterial genome to be
    completed. This genome is notable because it was
    the first bacterial genome sequenced.

5
NCBI Taxonomy
  • The NBCI RefSeq number for this genome is
    NC_000907 and the taxid for this species is
    71421. This species belongs to the division of
    proteobacteria (gamma subdivision).

6
General Genomic Features
  • The genome of index strain RD contains a single
    circular chromosome of 1,830,137 bp in size with
    an average GC content of 38. Strain RD has 1743
    ORFs, 6 rRNA operons, 54 tRNAs. Strain RD has 0
    plasmids, 0 mega-plasmids, and 0 bacteriophage.

7
Morphology and Growth
  • Haemophilus influenzae is a non-motile,
    gram-negative, rod shaped bacterium. Optimal
    growth temperature is 37 degrees and doubling
    time in culture is 26 minutes.

8
Interactions and Ecology
  • H. influenzae is a obligate commensal with the
    ability to cause disease including menigitis and
    otitis media. The primary habitat of this species
    is the human nasopharyx. This bacterium is
    faculatively anaerobic and uses organic matter as
    a source of carbon and organic matter as a source
    of energy.

9
Cataloguing our Complete Genome Collection
  • Proposal Field D, Hughes J (2005). Cataloguing
    our current genome collection. Microbiology. 151,
    1016-9
  • Hughes J Field D (2005) Ecological Perspectives
    on our complete genome collection Ecology
    Letters (in press).
  • Workshop Cataloguing our current genome
    collection Sept 7-9, 2005 Cambridge, UK NIEeS
  • Website Cataloguing our current genome
    collection (NERC International Opportunities
    Fund Award NE/3521773/1)

10
Cataloguing our Complete Genome Collection
  • Proposal Field D, Hughes J (2005). Cataloguing
    our current genome collection. Microbiology. 151,
    1016-9
  • More information describing our complete genome
    collection valuable (prokaryotes, eukaryotes,
    viruses, plasmids, and organelles)
  • Complete set of information not readily
    available distributed widely among different
    sources
  • A new standard could capture minimal information
    about a genome sequence
  • developed within the auspices of an international
    working group
  • E-Science implementation

11
MIGS Genome Reports
  • Origin of Strain and Availability
  • Geographical Origin and Environmental Context
  • Ecology
  • Taxonomy
  • Annotation Process
  • Genomic Features
  • Publication and Electronic Resources
  • Contact Information

12
MIGS Genome Reports
  • Origin of Strain and Availability
  • At a minimum this section should provide enough
    information for a user to easily request a stock
    sample of this genome (culture collection ID).
  • Ideally, this section should contain a
    description of the salient features of the index
    strain (time in culture, risk of extinction, why
    it was selected, notable phenotypes)

13
MIGS Genome Reports
  • Geographical Origin and Environmental Context
  • At a minimum this section should provide enough
    information for a user to be able to re-sample
    the location (under similar environmental
    conditions) from which the genome was taken using
    the same isolation methodologies.
  • Ideally, it should provide a range of information
    about the environment that an organism was taken
    from at the time of sampling.

14
MIGS Genome Reports
  • Ecology
  • At a minimum this section should provide
    information on the relationship between this
    genome and other genomes.
  • Ideally, this section will allow the collection
    of a rich set of ecological information for each
    genome.

15
MIGS Genome Reports
  • Taxonomy
  • At a minimum this section should provide enough
    information for a genome to be uniquely and
    unambiguous identified by name in the literature.
  • Ideally, it should provide the most update to
    date understanding of an organisms taxonomy
    possible.

16
MIGS Genome Reports
  • The Annotation Process
  • At a minimum this section could be optional.
  • Ideally, it will give an overview of the
    annotation process, so that the same level of
    annotation could be regenerated from scratch.

17
MIGS Genome Reports
  • Genomic Features
  • At a minimum this section could be optional.
  • Ideally, it should confirm a variety of genomic
    features that can not be calculated from the
    genome sequence.

18
MIGS Genome Reports
  • Related Publications and Electronic Databases
  • At a minimum this section should provide enough
    information to find the genome in at least one
    electronic resource (primary genome publication,
    international sequence database (e.g. Genbank)).
  • Ideally it would provide sufficient information
    about all relevant resources containing
    information on this genome.

19
MIGS Genome Reports
  • Contact
  • At a minimum this section should contain the
    name and email of the person proposing to
    sequence the genome or the corresponding author
    on the primary genome paper.
  • Ideally, it will contain contact details for
    other people with relevant areas of expertise
    including the person who submitted the genome
    (e.g. the bioinformaticians who did the genome
    annotation, the ecologist who isolated the
    organism, the taxonomist who is an expert on this
    group, etc).

20
Challenges
  • Defining the standard
  • Collecting the data
  • Fields can be calculated in a variety of ways
    separate curated and calculated fields
  • We dont know enough about many of these genomes
    with respect to lifestyle
  • Relationships between genomes
  • Completeness of data

21
Future uses
  • Research tool
  • Explore relationships between features
  • Compare outputs of different methodologies
  • Look-up Reference
  • Network of researchers
  • Teaching tool

22
Minimal Requirements
  • must describe all genomes (eukaryotes,
    prokaryotes, plasmids, viruses, and organelles)
  • must be easy to enter descriptions into the
    genome catalog
  • possible to update descriptions / specification
  • descriptions must be comparable across genomes
  • descriptions easy to integrate with other
    resources
  • each description must be associated with unique
    and permanent identifiers
  • all information collected must be freely
    available

23
Discussion Document
  • Table of Contents
  • Executive summary
  • The need for a new genomic standard
  • The process of consensus building
  • The specification checklist
  • The specification draft version available
  • Issues and challenges
  • Considerations
  • Case Studies
  • Implementation options
  • Genomic metadata exchange

24
Building Consensus
  • Identification of the need for a standard
  • The formation of a working group (community)
  • Selection of case studies
  • Development of the specification checklist,
    specification attributes, definitions, and
    allowed terms
  • Development of a suitable implementation
  • Development of a repository to store, view, and
    distribute annotations
  • Final annotation of genomes to compliant format
    and submission to the repository.

25
Open call for Case Studies
  • Contributing a first-pass description of your
    genome(s) will help provide input into the design
    of the specification
  • Both descriptions and comments are collected from
    each contributor and these lead to the refinement
    of the specification

26
Call for contributed data sets
  • GenomeMine re-designed to be a community archive
  • Supplementary information for publications
  • Archive of published data sets
  • Selected data from other databases
  • Each dataset captured in GnoME format
  • Credit can be ascribed to author(s) (provenance)
  • Data can be regenerated (variable definitions)

27
Summary
  • The GenomeMine contains information about
    genomes data comes from a variety of sources
    aim is to serve as a community archive
    http//www.genomics.ceh.ac.uk/GMINE/
  • From the difficulties associated with collecting
    curated data, the Cataloguing our complete
    genome collection project launched
  • Open call for case studies (MIGS project) to
    describe genomes with curated information
  • Open call for contributed data sets containing
    describing genomes to the GenomeMine using GnoME
    (curated, calculated and extracted variables
    which are defined)

28
Acknowledgements
  • Genomemine
  • Jennifer Hughes
  • Tanya Gray
  • Milo Thurston
  • Gareth Wilson
  • Adrian Tett
  • Paul Swift
  • Chimdi Ekeke
  • Towards and new genomic standard
  • Tatiana Tatusova, Jen Hughes, James
  • Cole, Tanya Gray, Robert Feldman,
  • Naomi Forrester, Dan Haft, Andy
  • Lilley, Nick Mann, Victor Markowitz,
  • Norman Morrison, Julian Parkhill, Lita
  • Proctor, Jeremy Selengut, Susanna-Assunta
  • Sansone, Paul Swift, Adrian Tett, Nick
  • Thomson, Sarah Turner, Gareth Wilson,
  • Anil Wipat
  • and Everyone at this workshop

29
Genomic Metadata Exchange Format (GnoME)
Write a Comment
User Comments (0)
About PowerShow.com