Linking Multiple Ontologies: The OBO Foundry Approach - PowerPoint PPT Presentation

Loading...

PPT – Linking Multiple Ontologies: The OBO Foundry Approach PowerPoint presentation | free to download - id: 20f620-ZDc1Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Linking Multiple Ontologies: The OBO Foundry Approach

Description:

A computable representation of some domain. What kinds of ... Mitral valve. Aortic valve. Heart. Cavitated organ. Cardiovascular. System. part_of. part_of ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 88
Provided by: chris1008
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Linking Multiple Ontologies: The OBO Foundry Approach


1
Linking Multiple OntologiesThe OBO Foundry
Approach
  • Chris Mungall
  • NIAID Cell Ontology Workshop
  • May 2008

2
Outline
  • Introduction to ontologies
  • The OBO perspective
  • Case study in the Gene Ontology
  • The OBO Foundry goals and principles
  • The OBO relation ontology
  • Organization of ontologies in OBO
  • Modularity
  • An example from CL
  • Linking CL to the OBO Foundry

3
What is an ontology?
  • A computable representation of some domain
  • What kinds of things exists
  • What are the relations that hold between them?

Cavitated organ
Cardiovascular System
is_a
part_of
Heart
part_of
part_of
Mitral valve
Aortic valve
4
Aspects of an ontology
  • Identifiers
  • Uniquely identify a class / term
  • E.g. CL0000037 is ID for the term hematopoietic
    stem cell
  • Identifier metadata
  • Terminological aspects
  • Names and synonyms/alternate labels
  • CL0000037 has hemopoietic progenitor cell as a
    related synonym and hemopoietic stem cell as
    exact synonym
  • Logical aspects
  • Relations
  • Definitions

Provenance
5
Some ontologies and their uses
  • The Gene Ontology
  • Annotation of gene products
  • Analyzing high-throughput datasets
  • Anatomical ontologies (including CL)
  • Experimental metadata
  • Image annotation
  • Indicating location of gene expression
  • Creating Phenotypic descriptions
  • Others
  • NLP
  • Annotating information models
  • Database integration

6
Origins of OBO The Gene Ontology (GO)
  • 3 ontologies for annotating genes and gene
    products

Ontology terms links
Molecular function 7889 9225
Biological process 13978 25065
Cellular component 2034 3894
  • These ontologies are organised as a collection of
    related terms, constituting nodes in a graph
  • Gradually incorporating other logical axioms

7
Annotation and GO
  • GO Annotations
  • Associations between genes and GO terms, with
    evidence
  • Met17 methionine metabolism GO0006555
  • 222,000 genes and gene products have high quality
    annotations to GO terms
  • 3.4m including automated predictions
  • 66,000 publications curated
  • Variety of analysis tools
  • http//www.geneontology.org/GO.tools.shtmlmicro

8
GO and high-throughput biology Over-representatio
n of GO terms for gene sets
GOTermFinder Sherlock et al
9
GO and the need for OBO
  • GO terms implicitly reference kinds of entities
    outwith the scope of GO
  • Methionine biosynthesis
  • Neural crest cell migration
  • Cardiac muscle morphogenesis
  • Regulation of vascular permeability
  • OBO was born from the need to create source
    ontologies for GO term cross-products
  • Define composite classes in terms of simpler ones

chemical
cell
anatomy
quality
10
The Open Biomedical Ontologies (OBO) Foundry
  • A collection of orthogonal reference ontologies
    in the biological/biomedical domain
  • The OBO Foundry Each is committed to an agreed
    upon set of principles governing best practices
    in ontology development

11
Some OBO ontologies
  • Gene Ontology
  • ChEBI - chemical entities
  • OBI - investigations
  • PATO, MP - phenotypes
  • CL - cells
  • ENVO - environment and habitat
  • DO - Human diseases
  • CARO - common anatomy
  • FMA - human anatomy
  • SO - sequence features
  • Model organism anatomy
  • ZFA
  • Fly_anat
  • Dicty_anat
  • Mouse_anat
  • OBO Relation Ontology

12
OBO Foundry criteria, v1
  • Open
  • Well-defined exchange format
  • E.g. OBO or OWL
  • Uses identifiers according to OBO ID policy
  • Ontology Life-cycle / versioning
  • Has clearly specified and delineated content
  • Has unambiguous definitions
  • Uses or extends relations in the OBO Relation
    Ontology
  • Well documented
  • Has a plurality of users (and a mail list issue
    tracker)
  • Developed collaboratively
  • Orthogonal, modular

http//obofoundry.org/
13
OBO Relation Ontology
  • Edges can link nodes
  • Within ontologies
  • Across ontologies
  • The precise meaning of the relation is important
  • Relations have formal definitions
  • Rules for composing relations together
  • http//obofoundry.org/ro/

14
Is_a
  • X is_a Y
  • If something is an instance of X (at time t),
    then it is also an instance of Y (at t)
  • Transitive
  • B1 B cell is_a B cell
  • B cell is_a lymphocyte
  • Therefore B1 B cell is_a lymphocyte

15
Part_of
  • Instance level part_of relation is primitive
  • Between classes
  • X part_of Y
  • Every instance of X is part_of some instance of Y
  • Paneth cell part_of intestine YES
  • Nucleus part_of Cell YES
  • Neuron part_of brain NO
  • (there are some neurons that are part of others
    parts of the nervous system)
  • Transitive
  • X part_of Y, Y part_of Z
  • Therefore, X part_of Z

16
Has_part
  • Instance level inverse of part_of
  • X has_part Y
  • Every X has some Y as part
  • Cell has_part nucleus NO
  • Nucleate erythrocyte has_part nucleus YES

17
Develops_from
  • X develops_from Y
  • Every instance of X was once a Y, or inherited a
    significant portion of its matter from a Y
  • Example erythrocyte develops_from reticulocyte
  • Transitive
  • erythrocyte develops_from reticulocyte
  • reticulocyte develops_from orthochromatic
    erythroblast
  • gt
  • erythrocyte develops_from orthochromatic
    erythroblast

18
Transformation and derivation
  • Develops_from relation can be refined into two
    cases
  • Transformation_of
  • X transformation_of Y
  • Any instance of X was previously an instance of Y
  • Example erythrocyte transformation_of
    reticulocyte
  • Derives_from
  • X derives_from Y
  • Holds between distinct instances where Y inherits
    matter from X
  • Most OBO ontologies just use the develops_from
    relation

19
Other relations
  • Inherence
  • Between a quality and an object
  • E.g. between a specific shape and a cell
  • Participation
  • Between a process and an object
  • E.g. between a B cell and an immune process

20
Definitions state necessary and sufficient
conditions
  • Links in the ontology graph state necessary
    conditions for a class
  • E.g. erythroid progenitor cell develops_from
    megakaryocyte erythroid progenitor
  • These characteristics may not be unique
  • A definition should state necessary and
    sufficient conditions for a class
  • The characteristics must be unique to the defined
    class
  • E.g. progenitor cell that is committed to the
    erythroid lineage
  • Definition should be precise and (as far as
    possible) translated / translatable to logical
    computable form

21
Genus differentia definitions
  • Of the form
  • An X is a G that D
  • G should be in the same ontology
  • D is discriminating characteristics that
    differentiate (in the classification sense) Xs
    from other Gs.
  • Relations to terms in an ontology (the same
    ontology or a different one)
  • Example
  • A B cell is a lymphocyte that expresses an
    immunoglubulin complex

22
Orthogonality of ontologies
  • No two ontologies should represent the same kind
    of entity
  • E.g. B-cell should only be represented in one
    ontology
  • Related entities should be coordinated across
    ontologies
  • GO B-cell differentiation
  • Exceptions
  • The term cell connects GO Cellular Component
    (cell parts) and CL (cells)
  • Advantages
  • Reduces redundancy and work
  • Easier to make the union consistent

23
Some OBO terms..
bile
liver
liver development
obesity
fat body
hepatic artery
oenocyte
oenocyte differentiation
hepatoma
hepatocyte
insulin
increased circulating glucose level
carbohydrate metabolism
glucose
glycogen
24
FMA
bile
MP
GO
(adult human)
(mammal phenotype)
(biological process)
FBbt
MA
liver
liver development
obesity
fat body
(mouse)
(fly)
hepatic artery
oenocyte
oenocyte differentiation
hepatoma
hepatocyte
DO
CL
PRO
insulin
increased circulating glucose level
carbohydrate metabolism
glucose
glycogen
CHEBI
25
FMA
bile
MP
GO
(adult human)
(mammal phenotype)
(biological process)
FBbt
MA
liver
liver development
obesity
fat body
(mouse)
(fly)
hepatic artery
oenocyte
oenocyte differentiation
hepatoma
hepatocyte
DO
CL
PRO
insulin
increased circulating glucose level
carbohydrate metabolism
glucose
glycogen
CHEBI
26
FMA
bile
MP
GO
(adult human)
(mammal phenotype)
(biological process)
FBbt
MA
liver
liver development
obesity
fat body
(mouse)
(fly)
How should we organize this?
hepatic artery
oenocyte
oenocyte differentiation
hepatoma
hepatocyte
DO
CL
PRO
insulin
increased circulating glucose level
carbohydrate metabolism
glucose
glycogen
CHEBI
27
Top-level organisation (BFO Basic Formal
Ontology)
  • General categories
  • 3D things (continuants)
  • Independent
  • Cells, organs, molecules
  • Dependent
  • Shapes, sizes, concentrations,
  • 4D things (processes)
  • Processes
  • Useful organisational principle for OBO
  • is_a and part_of should not cross top level
    categories
  • Levels of granularity (scale)
  • Population
  • Organism
  • Organ
  • Cell
  • Molecule
  • part_of relations can cross levels

28
Objects
Qualities etc
Processes
FMA
bile
MP
GO
(adult human)
(mammal phenotype)
(biological process)
FBbt
MA
liver
liver development
obesity
fat body
(mouse)
(fly)
hepatic artery
oenocyte
oenocyte differentiation
hepatoma
hepatocyte
DO
CL
PRO
insulin
increased circulating glucose level
carbohydrate metabolism
glucose
glycogen
CHEBI
29
(No Transcript)
30
The OBO Foundry can help with modular ontology
design
  • Biology is complex
  • So our ontologies will be complex
  • Multiple purposes
  • Multiple means of classifying
  • Separate out different aspects
  • Modular approach
  • Avoid multiple inheritance (gt1 is_a parent)
  • Dont over-use is_a
  • Dont cross aspects with is_a
  • Make complex descriptions from simpler parts
  • Polyhierarchies arise from composition

31
Cysteine biosynthesis (trimmed)
GO
Tangled polyhierarchy
32
Cysteine biosynthesis (trimmed)
Process axis
33
Cysteine biosynthesis (trimmed)
Chemical structure axis
34
Cysteine biosynthesis (trimmed)
ChEBI (trimmed)
35
Cysteine biosynthesis (trimmed)
ChEBI (trimmed)
36
Cysteine biosynthesis (trimmed)
ChEBI (trimmed)
37
Cysteine biosynthesis (trimmed)
ChEBI (trimmed)
We can do more than simply link
terms Cross-products (aka logical
definitions, Computable genus-differentia
definitions)
38
Cysteine biosynthesis (trimmed)
ChEBI (trimmed)
Cysteine biosynthesisGO0019344 a biosynthetic
process GO0009058 that results_in_creation_of
cysteine CHEBI13536
genus

differentia
39
(No Transcript)
40
results_in_change_to
Cysteine biosynthesitic process
biosynthetic process that results_in_change_to
cysteine
41
Let the computer do the work..
Given cross-products, A reasoner can add all
links
Underlying representation is normalized
42
Example of is_a-overloading OBO Cell
Ontology(current)
CL
43
CL
X
  • Try not to assert too many is_a parents

44
CL
GO
X
?
Has function
  • Reuse existing ontologies
  • Non-is_a relation

45
How CL can use other OBO ontologies
  • GO Cellular component
  • Mononuclear phagocyte
  • B cell (expresses immunoglubulin complex)
  • GO Biological process
  • Photosynthetic cell
  • PATO Qualities
  • Spiny neuron
  • CHEBI Chemical entities
  • X secreting cell
  • Anatomy Ontologies
  • CNS neuron

Molecular function, PRO - CD4 positive cell
46
How CL is used by other ontologies
Ontology Example Genus Differentia
GO-BP T cell differentiation Cell differentiation Results_in_acquisition_of_features_of T cell
GO-CC Germ cell nucleus Nucleus Part_of germ cell
MP Abnormal macrophage morphology Abnormal morphology Inheres_in macrophage
ZFA (zebrafish) erythrocyte erythrocyte In_organism Danio Has_part nucleus
OBI
DO (disease)
Ontology Example Relationship Relationship
Fly anatomy R8 photoreceptor cell Part_of ommatidium Part_of ommatidium
47
Results
  • Biological process x CL
  • http//wiki.geneontology.org/index.php?XPbiologic
    al_process_xp_cell
  • Uncovered inconsistencies between GO and CL
  • Oenocyte differentiation is_a columnar/cuboidal
    epithelial cell differentiation
  • MP x CL
  • http//wiki.geneontology.org/index.php/XPmammalia
    n_phenotype_xp
  • Resulted in various fixes to MP

48
OBD Ontology Annotation Database
49
Summary
  • The cell ontology is a representation of the
    types of cell that exist
  • The OBO Foundry provides
  • Principles
  • A framework for connecting ontologies
  • There are many points of coordination between CL
    and other OBO ontologies
  • CL could benefit from the gradual introduction of
    a modular approach

50
(No Transcript)
51
The Gene Ontology and beyond
  • Curation of genes and gene products
  • Molecular function
  • Biological process
  • Cellular component

GO
Multiple databases using the same ontology
52
The Gene Ontology and beyond
  • Curation of genes and gene products
  • Molecular function
  • Biological process
  • Cellular component
  • What about curation of other data types?
  • Expression, transcriptomics
  • Genetics, phenotypes and disease
  • Many others..
  • OBO
  • Open Bio-Ontologies
  • Arose partly in response to requirements outside
    scope of GO

GO
53
Islands of biological data
Anatomy ontologies
GO
Phenotype ontologies
54
Connecting the islands
55
Connecting the islands
56
Amino acid cross-products in GO
Bada et al GO to ChEBI
http//www.berkeleybop.org/obol
57
http//www.berkeleybop.org/obol
58
  • GO approach is retrospective
  • Text based approaches to decompose terms
  • Obol
  • Bada/Hunter
  • Born of necessity
  • OBO did not exist when GO started
  • Hard work
  • New ontologies should take the prospective
    approach
  • Separate out aspects from the outset
  • No heuristic parsing necessary

59
Prospective approach Sequence Ontology
Separate hierarchies created from the outset -
cross-products made from the beginning
60
(No Transcript)
61
OBI Ontology for Biomedical Investigations
  • Successor to MGED/FuGO
  • Represents the realm of investigations
  • Biomaterials
  • Equipment
  • Protocols
  • Data transformations
  • Makes maximal use of OBO
  • PATO
  • ChEBI
  • Primary representation language is OWL
  • Uses OWL translations at http//purl.org/obo/

62
Social Insect Behavior Ontology
  • 4 distinct hierarchies
  • Anatomical entity
  • Behavior
  • Chemical entity
  • Species
  • Links
  • derives_from, between chemical and anatomical
    entity
  • Future plans
  • Submit chemical terms to ChEBI
  • Upper level behavior ontology?

63
Anatomy
  • GO is relevant for all kingdoms of life
  • Development of anatomical ontologies has been
    less coordinated
  • Cell subcellular one ontology applicable to
    all
  • Gross Anatomy multiple ontologies
  • Vertebrate
  • MA EMAP Mouse
  • FMA Human (adult)
  • EHDA Human
  • ZFA Zebrafish
  • TAO teleost anatomy
  • XAO Xenopus
  • Invertebrate
  • FBbt Drosophila anatomy
  • Tick anatomy
  • Mosquito anatomy

64
Anatomy Ongoing work
  • CARO
  • Upper level shared anatomical ontology
  • Very general terms
  • Teleost anatomy ontology
  • Broader than zebrafish anatomy ontology
  • Will include homology links
  • Linking cells to gross anatomical entity
  • Purkinje cell part_of cerebellum
  • Spans ontologies (CL ssAO)
  • BIRNLex
  • Stages and development

poster
poster
poster
talk
65
Using multiple ontologies Pre vs post composition
  • Complex descriptions (aka cross-products) can be
    composed from 2 or more terms
  • By ontology editors (pre)
  • By curators (post)
  • Example
  • Liver hyperplasia
  • Precomposed phenotype ontology
  • MP0005141 liver hyperplasia increased size of
    liver due to increased hepatocyte cell number
  • Post-composition at time of genotype curation
  • PATO0000644 hyperplastic
  • MA0000358 liver
  • Which strategy to choose?

66
  • Either strategy can be used
  • Or mixed and matched
  • Caveat
  • Pre-composed terms must have computable
    definitions (cross-products)
  • Currently created retrospectively
  • Current progress
  • MP (Mammalian Phenotype)
  • 4136/5760 xp defs, partially vetted
  • Caveat species-specificity
  • WormPhenotype
  • 350/1569 xp defs
  • PlantTrait
  • 340/765 xp defs, partially vetted

67
Other ontologies
  • Envo GAZ
  • Environmental ontology and gazetteer
  • Habitats
  • Host (anatomy)
  • Geographical features (eg hydrothermal vents)
  • Qualities, chemical entities
  • BIRNLex
  • Protein Ontology
  • Links to/from GO
  • Complexes
  • Functions of ancestral proteins

68
Envo-based annotation in Phenote
69
Technical consequences of modular approach
  • Dependencies
  • Technical issues
  • Dependence on network?
  • Formats - converters
  • Social management issues
  • Change and versioning
  • http//www.bioontologies.org/
  • Managing dependencies
  • http//obofoundry.org/wiki/index.php/Mappings
  • Stable URLs for downloading ontologies in obo or
    owl
  • http//purl.org/obo/
  • OBO Identifier policy
  • http//obofoundry.org/wiki/index.php/Identifiers

70
Conclusions
  • Be modular
  • Distinct hierarchies
  • Avoid is_a overloading
  • Link to existing ontologies
  • Rewards
  • Standards
  • Increases value of curated data
  • Reduces duplication of effort and maximises
    curation effort
  • Ontologies are long term infrastructure
  • Its worth getting them right

71
Learning more
  • http//www.bioontology.org
  • National Center for Biomedical Ontology
  • Browse and search OBO
  • Coming soon inter-ontology links
  • http//obofoundry.org
  • Principles and recommendations
  • Participation
  • Mailing lists
  • Trackers

72
Restructuring Cell.obo
73
OBO Cell Ontology
  • Current version
  • Overloading of is_a hierarchy
  • Difficult to maintain
  • Leads to true path violations
  • Refactoring
  • Replace is links with has_function
  • Keep main axis structure-based (but not
    religiously so)

74
  • For every term immediately under
    cell-by-function, we made a new function term
  • propagation of genome
  • to circulate
  • to secrete
  • to metabolise
  • to contract
  • Electrical absorption
  • Barrier
  • Motility
  • Structural
  • to accumulate stuff
  • signaling (mitogenic)
  • to die
  • Defense
  • Transport
  • to photosynthesize
  • to support
  • Valve
  • to fix nitrogen
  • Also create grouping terms

75
(No Transcript)
76
(No Transcript)
77
  • Replaced is_a links to cell-by-function terms
    with has_function links to corresponding function
    terms

78
(No Transcript)
79
  • What do we do about the old cell-by-function
    terms?
  • We can eliminate them..
  • OR we can support them, but infer the tangled
    DAG
  • Requires xp defs
  • Nitrogen fixing cell cell THAT has_function
    nitrogen-fixing

80
(No Transcript)
81
  • Future work / ongoing issues
  • Redundancy between cell functions GO biological
    process?
  • Cell-by-lineage

82
Synchronizing ssAOs and CL
  • Fly_anat, zfa, plant_anat all represent cell
    types
  • Part_of links from cells to gross anatomy
  • E.g. purkinje_cell part_of cerebellum
  • Methodology
  • Xrefs from ssAOs to CL IDs
  • Treat as ss subtypes
  • Use reasoner to stay in sync
  • http//www.bioontology.org/wiki/index.php/CLAlign
    ing_species-specific_anatomy_ontologies_with_CL
  • Examples
  • http//www.berkeleybop.org/obol/fly_anatomy_xp_ce
    ll-obol

83
Transformation_of
  • Class-level relation between continuant types
  • Transitive
  • Relation between two classes, in which instances
    retain their identity yet change their
    classification by virtue of some kind of
    transformation. Formally C transformation_of C'
    if and only if given any c and any t, if c
    instantiates C at time t, then for some t', c
    instantiates C' at t' and t' earlier t, and there
    is no t2 such that c instantiates C at t2 and c
    instantiates C' at t2

84
(No Transcript)
85
Derives_from
  • Holds between continuants
  • transitive
  • Derivation on the instance level (derives_from)
    holds between distinct material continuants when
    one succeeds the other across a temporal divide
    in such a way that at least a biologically
    significant portion of the matter of the earlier
    continuant is inherited by the later
  • We say that one class C derives_from class C' if
    instances of C are connected to instances of C'
    via some chain of instance-level derivation
    relations.
  • Examples
  • osteocyte derives_from osteoblast

86
(No Transcript)
87
(No Transcript)
About PowerShow.com