Title: Moving Beyond Ontology Libraries: Integrating and Accessing Biomedical Ontologies to Annotate Experi
1Moving Beyond OntologyLibraries Integrating and
AccessingBiomedical Ontologies to
AnnotateExperimental Data
- Daniel L. RubinChris J. MungallSuzanna E. Lewis
- Monte Westerfield
- Michael Ashburner
- Mark A. Musen
2The biomedical data explosion
- Huge growth in online biomedical data sets
- Genomics (genetic sequences, SNPs)
- Gene expression microarrays
- Proteomics (mass spectrometry, protein arrays)
- Tissue arrays, ICH
- Need for people machines to make sense of
massive data sets
3Ontologies have an important role in e-science
- Ontologies formal and explicit declarations of
the entities and relationships applications - Relevance built by humans, processed by
machines - Capabilities
- Relate disparate data
- Enable data summarization, data mining
4Ontology development is fragmented
- Separate communities of biomedical researchers
creating and maintaining ontologies - Different model organism databases using
ontologies to annotate experimental data - Bioinformaticians creating algorithms to analyze
these annotations - These activities are not unifiedunification
could allow - Integration with other data
- Cross-species analysis
5Problems facing ontology content curation
- Many different groups/consortia create
ontologiestheir efforts are uncoordinated - Many different ontologies, overlapping content
and variable quality - Ontologies are not interoperable
- Data integration efforts are laborious barriers
to accessing and effectively using expanding data
repositories
6Problems facing experimental data annotation
- Growing number of biomedical resources annotate
data with ontologies (GO, MGED, BioPAX) - Current resources confined to using single
ontology for annotations - Difficult to relate different annotation
repositories to each other
?
?
?
7NIH has funded a National Center of Biomedical
Ontology
- Mission Advance biomedicine with tools and
methodologies for the structured organization of
knowledge - Strategy Develop, disseminate, and support
- Open-source ontology development and data
annotation tools - Resources (OBO, OBD) enabling scientists to
access, review, and integrate disparate knowledge
resources
8http//bioontology.org/
9(No Transcript)
10cBiO Software and Resources
- Open Biomedical Ontologies (OBO)
- An integrated virtual library of biomedical
ontologies - Open Biomedical Data (OBD)
- An online repository of OBO annotations on
experimental data accessible via BioPortal - BioPortal A Web-based portal
- Allow investigators and intelligent computer
programs to access and use OBO - Use OBO to annotate experimental data in OBD
- Visualize and analyze OBD annotations
11Methods (1)Integrating diverse ontologies
- obo.sourceforge.net an initial effort at
integration hosts bio-ontologies - Variety of formats OBO-EDIT, DAG-EDIT, Protégé,
XML - Must be viewed by tool that created them
- No mappings between them no way to
relate/compare them
12obo.sourceforge.net
13Strategy access ontologies via common
representation
- Protégé (http//protege.stanford.edu)
- Platform to access/manage ontologies
- Provides plug-in architecture
- Existing plug-ins handle most formats hosted by
obo.sourceforge.net - DAG-EDIT (Gennari, 2005)
- GO (Yeh, Karp, et al. 2003)
- OWL (Knublauch, Fergerson et al. 2004)
- No plug-in for OBO-EDIT thus, we created Protégé
extension to read OBO-EDIT
Recently, an OBO-EDIT plug-in was created by
Gennari (2005)
14Creating Protégé plug-in to read OBO-EDIT
- Approach translate OBO-EDIT representation into
Protégé frames representation - Concepts in OBO-EDIT ?? Protégé classes
- Relationship types in OBO-EDIT ?? Protégé slots
- Operations performed using Python script
- Validation of import manual comparison of
original and imported ontologies
15Accessing OBO-EDIT in Protégé
Ontology in OBO-EDIT
We access OBO-EDIT ontologies by importing them
via a python script that maps the OBO-EDIT
ontologies into Protégé ontologies.
PYTHONSCRIPT
Ontology in Protégé
16Benefits of accessing OBO ontologies in Protégé
- Unified access to all OBO ontology content via
single API - Access to Protégé tools for alignment and diff
- Access to ontology visualization tools
- Avoid necessity to use multiple tools to access
OBO ontologies
17DAG-EDIT ? Protégé
The PaTO ontology (originally in DAG-EDIT format
and imported using DAG-EDIT plug-in to Protégé).
18OBO-EDIT ? Protégé
The Drosophila anatomy ontology (originally in
OBO-EDIT format and imported with OBO-EDIT
extension to Protégé). The contents of both PaTO
and Drosophila ontologies are now accessible in
the same common format.
19Methods (2) Managing/accessing annotations
- Numerous model organism databases being developed
(e.g., FlyBase, ZFIN, SGD) - Collect experimental and computed data
- Annotate data using OBO ontologies, providing
computable representation of anatomy, biology,
phenotype, etc. - Annotations are evolving
- Past single terms from one ontology
- Current/future multiple composed terms taken
from several ontologies
20New types of annotations
- New expressive annotations are composed
- Ontology entities (nouns), e.g., phenotype
ontology - Ontology attributes (verbs), e.g., PaTO
- Values to which annotation is applicable
- For example
- Datum annotated FBal0145168 allele
- Entity atresia
- Attribute shape
- Value abnormal
- We created prototype resource allowing users to
browse these annotationsOBD
i.e., the FBal0145168 allele is associatedwith
atresia in the shape phenotype,which is abnormal
21Open Biomedical Data (OBD) (taken from FlyBase
and ZFIN data)
All Alleles
OBD collects annotations on experimental data
using OBO ontologies. LEFT Ontology annotations
on alleles. The annotations consist of entities,
attributes, and/or values (EAV). RIGHT Detailed
view showing all annotations on a particular
allele in the EAV format.
22Advantages of OBD
- Unification of annotations in disparate model
organism databases - Browse search for genes/alleles having
particular types of attributes or values - More expressive queries
- e.g., find alleles associated with lethal embryo
(Eembryo, Aviability, Vlethal) and
abnormal embryonic head (Eembryonic head,
Aqualitative, Vabnormal) - Potential to link similar phenotypes to similar
genes
23Example holoprosencephaly (h.p.)
- Locus of lesion causing human h.p. was
incompletely understood SHH mutations can cause
midline defects (cleft palate or h.p.) - Query find genes with similar mutant phenotypes
- Human ? SHH gene
- Zebrafish ? shh oep genes
- Query Zebrafish oep gene ? annotations show
nearly all defects seen in human h.p. (suggests
oep ortholog in human may be responsible for
human h.p.) - Knowledge of ZFIN oep gene was available in 1998,
and provided candidate for cause of human h.p.
mutation of human oep ortholog (TDGF1) not found
until 2002!
24Discussion
ONTOLOGY DEVELOPMENT
- Bio-ontologies being developed in vertical
communities with little or no coordination - Redundancy, variable quality, confusing array of
ontologies - At present, obo.sourceforge.net is a catch-all
collection of ontologies, without integration of
actual content - We demonstrated utility of Protégé to integrate
diverse ontology content - Can inter-relate diverse ontologies
- Protégé provides tools for ontology alignment,
GUI for viewing ontologies, and API for
applications
25Discussion
ONTOLOGY USE IN ANNOTION
- Increasing number of biological databases using
ontologies to annotate data content, but they are
not integrated - Difficult to perform cross-species analysis
- OBD will unify annotations among model organism
databases - OBD will support search/query with richer
annotations (EAV), making it possible to
describe richer phenotypes - Linking OBD to OBO will permit more
biologically-relevant queries, because of access
to all parents of terms used for annotation
26Future work unifying ontologies and annotations
- BioPortal a Web portal accessing and linking OBO
and OBD - Benefit access semantics in OBO ontologies to
refine search/visualization of OBD annotations - e.g., search based on parents of annotation terms
27BioPortal
28OBD
29Acknowledgements
- National Center for Biomedical Ontology
- Executive Team Mark Musen, Suzanna Lewis, Daniel
Rubin, Sima Misra - cBiO staff Natasha Noy, Ray Fergerson, Lynn
Murphy, Archana Verbakam, Chris Mungall, Harold
Solbrig - Collaborators Michael Ashburner, Monte
Westerfield, Ida Sim, Chris Chute, Barry Smith,
Peggy Storey, Richard Olshen, Werner Ceusters,
Deborah McGuinness - Students postdocs Kaustubh Supekar, Nigam
Shah, Fabian Neuhaus - Funded through NIH Roadmap for Medical Research
grant U54 HG004028 - Program officer Peter Good (NIGMS)
30Thank you.
Contact information Center feedback_at_cbio.us
31(No Transcript)
32Planning the Center Structure of the grant
33cBiO Resources
Software resources Scientific investigation
Community outreach
34cBiO Scientific Team
- Computer Science (Musen, Stanford)
- Ontology management/alignment/diff
- Ontology integration
- Terminology access/query (Chute, Mayo)
- Ontology visualization/browsing/search (Storey,
UVIC) - Bioinformatics (Lewis, Berkeley)
- Data/image Annotation tools
- Annotation databases
- Driving Biological Projects
- Flybase (Ashburner, Cambridge)
- ZFIN (Westerfield, Oregon)
- HIV (Sim, UCSF)
- Education/Dissemination
- Educational workshops (Smith, University at
Buffalo) - Ontology development workshops