Title: Overview of the Functional Genomics Investigation Ontology FuGO and its Relationship to the Genomic
1Overview of the Functional Genomics Investigation
Ontology (FuGO) and its Relationship to the
Genomic Standards Consortium (GSC)
- Trish Whetzel on behalf of the
- FuGO Working Group
- September 12, 2006
2FuGO
- Purpose
- Provide a resource for the unambiguous
description of the components of biomedical
investigations such as the design, protocols and
instrumentation, material, data and types of
analysis on the data - NOT designed to model biology
- NOT dependent on any given Object Model
- NOT limited to only functional genomics
- considering re-naming the project to better
reflect the scope, e.g. Ontology of Biomedical
Investigations - Enables
- Allow consistent annotation of data across
different technological and biological domains - Enable powerful concept-driven queries
- Facilitate semantically-driven data integration
3Motivation for FuGO
- Standardization efforts in biological and
technological domains - Standard syntax - Data exchange formats
- To provide a mechanism for software
interoperability, e.g. FuGE Object Model - Standard semantics - Controlled vocabularies or
ontology - Centralize commonalities for annotation term
needs across domains to describe an
investigation/study/experiment, e.g. FuGO
4Biomedical Investigation Components
Describe the material and characteristics.
Describe the manipulations or perturbations or
observations performed on the material to meet
the general aim of the investigation.
Describe how the material was prepared for
analysis - e.g. labeling, protein digest, etc.
Describe the instrument and settings that were
used.
Describe the results from the instrument, e.g.
what units are represented.
Describe the type analysis performed to
confirm/deny the hypothesis, e.g. clustering.
5FuGO Development Strategy Decisions
- Unified Development
- Pros
- Overlap of terms is identified early in
development - Universal/Common terms are defined by all those
collaborating - Additional technological or biological terms can
be added as needed by collaborators - Cons
- Time needed to develop the ontology
- Independent Development
- Pros
- Develop Ontology in a time frame limited only
by the community - Cons
- Development of different working policies?
- Use of different top level classes?
- Overlap of terms at lower levels of the ontology
tree
6FuGO Development Process
- Collect Use Cases - within community activity
- Collect examples of investigations as performed
within a community and present Use Cases to
developers group - Bottom up approach - within community activity
- Identify concepts to describe using controlled
terms - Collect terms and their definitions
- Bin terms in the top level ontology structure
- Top down approach - collaborative activity
- Build a top level ontology structure, is_a
(vertical) relationships - Make a list of other foreseen (horizontal)
relationships - Review how Top Level Nodes fit in with the Upper
Level Ontologies
7FuGO - Top Level Classes
- Continuant an entity that endure/remains the
same through time - Dependent Continuant depend on another entity
- E.g. Environment (depend on the set of ranges of
conditions, e.g. geographic location) - E.g. Characteristics (entity that can be
measured, e.g. temperature, unit) - - Realizable an entity that is realizable
through a process (executed/run) - E.g. Software (a set of machine instructions)
- E.g. Design (the plan that can be realized in a
process) - E.g. Role (the part played by an entity within
the context of a process) - Independent Continuant stands on its own
- E.g. All physical entity (instrument, technology
platform, document etc.) - E.g. Biological material (organism, population
etc.) - Occurrent an entity that occurs/unfold in time
- E.g. Temporal Regions, Spatio-Temporal Regions
(single actions or Event) - Process
- E.g. Investigation (the entire experimental
process) - E.g. Study (process of acquiring and treating the
biological material) - E.g. Assay (process of performing some tests and
recording the results)
8Emerging FuGO Design Principles
- OBO Foundry ontology, utilize ontology best
practices - Inherit top level classes from an Upper Level
ontology - Use of the Relation Ontology
- Follow additional OBO Foundry principles
- Facilitates interoperability with other OBO
Foundry ontologies - Develop recommendations for naming conventions
and metadata - Format for term names, e.g. underscore vs. camel
case, no purals - Use of Alphanumeric identifier for terms, I.e.
something that does not have semantic meaning - Mechanisms for adding synonyms, etc.
- Open source approach
- Protégé/OWL
- Weekly conference calls
- Shared environment using Sourceforge (SF) and SF
mailing lists
9Future Plans
- Binning process - ongoing
- Reconciliations into one canonical version
- Iterative process
- Common working practices - established
- Each class consists of unique alphanumeric
identifier, human readable string name,
definition and comments - Sourceforge tracker in place to collect comments
on terms, definitions, relationships - Review ontology so that top level classes meet
the needs of all involved communities
10FuGO Collaborating Communities
- Crop sciences Generation Challenge Programme
(GCP), www.generationcp.org - Environmental genomics MGED RSBI Group,
www.mged.org/Workgroups/rsbi - Genomic Standards Consortium (GSC),
www.genomics.ceh.ac.uk/genomecatalogue - HUPO Proteomics Standards Initiative (PSI),
psidev.sourceforge.net - Immunology Database and Analysis Portal,
www.immport.org - Immune Epitope Database and Analysis Resource
(IEDB), http//www.immuneepitope.org/home.do - International Society for Analytical Cytology,
http//www.isac-net.org/ - Metabolomics Standards Initiative (MSI),
msi.workgroups.sourceforge.net - Neurogenetics, Biomedical Informatics Research
Network (BIRN), www.nbirn.net - Nutrigenomics MGED RSBI Group, www.mged.org/Workgr
oups/rsbi - Polymorphism
- Toxicogenomics MGED RSBI Group,
www.mged.org/Workgroups/rsbi - Transcriptomics MGED Ontology Group,
mged.sourceforge.net/ontologies
11FuGO and its Relationship to GSC
- Organism
- Ploidy level - Can Locate in FuGO
- Is this a model organism - Locate in XML Schema
or FuGO - Reference for the description of the biological
material sequenced (isolate, soil sample etc) -
Can Locate in FuGO - Host - point to the NCBI Taxonomy
- Health/disease status of source host at time of
collection - point to Disease Ontology
12FuGO and its Relationship to GSC
- Phenotype
- Generally, look to PATO for these terms
- FuGO will point out to PATO
13FuGO and its Relationship to GSC
- Environment
- Date and time of sample collection - Can Locate
in FuGO (notion of a timeline and time points and
the process in which these occur) - Geographic location (latitude, longitude, depth /
altitude of sample - Can Locate in FuGO - Habitat type - Can Locate in FuGO
14FuGO and its Relationship to GSC
- Sample Processing
- Volume of sample - Point to Unit Ontology
- Sampling strategy (was it enriched, screened,
normalized) - Can Locate in FuGO - DNA preparation (DNA extraction method and
amplification e.g. MDA, emPCR, plones) - Can
Locate in FuGO - Sequencing Method Used (e.g. dideoxysequencing,
pyrosequencing, polony) - Can Locate in FuGO
15FuGO and its Relationship to GSC
- Data Processing
- Assembly (assembly method, estimated error rate
and method of calculation) - Can Locate in FuGO - Classification (binning) method for fragments -
Can Locate in FuGO
16Mechanisms for GSC to Contribute to FuGO
- Cost involved in standardization process
- Benefit - choice in where to encounter that cost
- Choice of Methods
- Independent
- Unified
- Hybrid
- Pros
- Extends the work done by all those collaborating
for use by a given community - Cons
- Overlap of terms at lower levels of the ontology
tree? - Need to re-build when top-level changes?
17Considerations for Method to Contribute to FuGO
- Timeframe for annotated data in production system
- Need for terms in an ontology versus
taxonomy/controlled vocabulary - Expertise in ontology development
- Ease of using CV in XML Schema vs use of
ontology, e.g. OWL file - Others?
18Acknowledgments
- Jennifer Fostel, NIEHS-NCT
- Tanya Gray, University of Manchester
- Mervi Heiskanen, NCI
- Norman Morrison, University of Manchester
- Helen Parkinson, EBI
- Philippe Rocca-Serra, EBI
- Susanna-Assunta Sansone, EBI
- Daniel Schober, EBI
- Chris Stoeckert, University of Pennsylvania
- Chris Taylor, EBI
- Joe White, Dana Farber Cancer Center
- FuGO Working Group
- FuGO Advisory Board
- FuGO Coordinators
- Ryan Brinkman, Terry Fox Laboratory
- Richard Bruskiewich, International Rice Research
Institute - William Bug, Drexel University College of
Medicine - Tina Boussard, Stanford University
- Helen Causton, MRC Clinical Sciences Centre
- Liju Fan, Ontology Workshop LLC
- Dawn Field, Centre for Ecology Hydrology,
Oxford - Gilberto Fragoso, NCI
19http//fugo.sourceforge.net