How to Build an Ontology - PowerPoint PPT Presentation

View by Category
About This Presentation
Title:

How to Build an Ontology

Description:

How to Build an Ontology Barry Smith http://ontology.buffalo.edu/smith * – PowerPoint PPT presentation

Number of Views:373
Avg rating:3.0/5.0
Slides: 87
Provided by: phismith
Learn more at: http://ncor.buffalo.edu
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: How to Build an Ontology


1
How to Build an Ontology
  • Barry Smith
  • http//ontology.buffalo.edu/smith

2
Schedule and Rules for Practicum
  • http//ncorwiki.buffalo.edu/index.php/Immunology_O
    ntology

3
Why to Build an Ontology?
  • Scientific data are stored in databases
  • There are few constraints on the creation of new
    databases
  • Scientific data is siloed
  • How to counteract this silo-formation?
  • Create a common non-redundant suite of ontologies
    covering all scientific domains to annotate
    (tag, curate) scientific data

4
More precisely
  • How to build this suite of ontologies?
  • How to build ontologies that will integrate well
    together?
  • One answer The Semantic Web

5
integration via Linked Open Data
  • html demonstrated the power of the Web to allow
    sharing of information
  • use power of hyperlinks to break down silos, and
    create useful integration of on-line data
  • via Web Ontology Language (OWL)

6
Ontology success stories, and some reasons for
failure
Not all of the links here are what they seem

A fragment of the Linked Open Data in the
biomedical domain
7
The more Semantic Technology is successful, they
more it fails to solve the problem of silos
  • Indeed it leads to the creation of multiple,
    new, semantic silos

8
(No Transcript)
9
(No Transcript)
10
(No Transcript)
11
Ontology success stories, and some reasons for
failure

Linked Open Data integration via mappings
12
What you get with mappings
  • all phenotypes (excess hair loss, duck feet)

13
What you get with mappings
  • HPO all phenotypes (excess hair loss, duck feet
    ...)
  • NCIT all organisms

14
What you get with mappings
  • all phenotypes (excess hair loss, duck feet)
  • all organisms
  • allose (a form of sugar)

15
What you get with mappings
  • all phenotypes (excess hair loss, duck feet)
  • all organisms
  • allose (a form of sugar)
  • Acute Lymphoblastic Leukemia (A.L.L.)

16
Mappings are hard
  • They are fragile, and expensive to maintain
  • This yields a new risk of forking
  • The goal should be to reduce need for mappings to
    minimum possible
  • By creating orthogonal ontologies one ontology
    for each domain
  • Where to begin?

17
Uses of ontology in PubMed abstracts
18
By far the most successful GO (Gene Ontology)
19
GO provides a controlled system of terms for use
in annotating (describing, tagging) data
  • multi-species, multi-disciplinary, open source
  • contributing to the cumulativity of scientific
    results obtained by distinct research communities
  • compare use of kilograms, meters, seconds in
    formulating experimental results

20
Hierarchical view representing relations between
represented types
21
Anatomical Space
Anatomical Structure
Organ Cavity Subdivision
Organ Cavity
Organ
Serous Sac
Organ Component
Serous Sac Cavity
Tissue
Serous Sac Cavity Subdivision
is_a
Pleural Sac
Pleura(Wall of Sac)
Pleural Cavity
part_of
Parietal Pleura
Visceral Pleura
Interlobar recess
Mediastinal Pleura
Mesothelium of Pleura
22
US 200 mill. invested in literature and data
curation using GO
  • over gt 12 million annotations relating gene
    products described in the UniProt, Ensembl and
    other databases to terms in the GO
  • experimental results reported in gt 60,000
    scientific journal articles manually annoted by
    expert biologists using GO

23
GO has learned the lessons of successful
cooperation
  • Based on community consensus
  • Updated every night
  • Clear documentation
  • The terms chosen are already familiar
  • Fully open source
  • Subjected to considerable third-party critique
  • Tracker with rapid turnaround to identify errors
    and gaps

24
compare legends for maps
compare legends for maps
25
compare legends for diagrams
26
ontologies are legends for data
27
ontologies are legends for databases
GlyProt
MouseEcotope
sphingolipid transporter activity
DiabetInGene
GluChem
28
annotation using common ontologies yields
integration of databases
GlyProt
MouseEcotope
Holliday junction helicase complex
DiabetInGene
GluChem
29
GO has limited coverage
  • represents only three groups of biological
    entities
  • cellular components
  • molecular functions
  • biological processes
  • and it does not provide representations of
    proteins, diseases, symptoms,
  • ?OPEN BIOMEDICAL ONTOLOGIES FOUNDRY

30
RELATION TO TIME GRANULARITY CONTINUANT CONTINUANT CONTINUANT CONTINUANT OCCURRENT
RELATION TO TIME GRANULARITY INDEPENDENT INDEPENDENT DEPENDENT DEPENDENT
ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality(PaTO) Biological Process (GO)
CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) Phenotypic Quality(PaTO) Biological Process (GO)
MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Function (GO) Molecular Process (GO)
Original OBO Foundry ontologies (Gene Ontology
in yellow)
31
RELATION TO TIME GRANULARITY CONTINUANT CONTINUANT CONTINUANT CONTINUANT CONTINUANT OCCURRENT
RELATION TO TIME GRANULARITY INDEPENDENT INDEPENDENT INDEPENDENT DEPENDENT DEPENDENT
ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality(PaTO) Biological Process (GO)
CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) Phenotypic Quality(PaTO) Biological Process (GO)
MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Function (GO) Molecular Process (GO)
environments are here
Environment Ontology
32
RELATION TO TIME GRANULARITY CONTINUANT CONTINUANT CONTINUANT CONTINUANT OCCURRENT
RELATION TO TIME GRANULARITY INDEPENDENT INDEPENDENT DEPENDENT DEPENDENT
COMPLEX OF ORGANISMS Family, Community, Deme, Population Family, Community, Deme, Population Organ Function (FMP, CPRO) Population Phenotype Population Process
ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality(PaTO) Biological Process (GO)
CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) Phenotypic Quality(PaTO) Biological Process (GO)
MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Function (GO) Molecular Process (GO)
http//obofoundry.org
33
The OBO Foundry a step-by-step, evidence-based
approach to expand the GO
  • Developers commit to working to ensure that, for
    each domain, there is community convergence on a
    single ontology
  • and agree in advance to collaborate with
    developers of ontologies in adjacent domains.
  • http//obofoundry.org

34
OBO Foundry Principles
  • Common governance (coordinating editors)
  • Common training
  • Common architecture
  • simple shared top level ontology BFO
  • shared Relation Ontology www.obofoundry.org/ro
  • One ontology for each domain, so no need for
    mappings

35
Anatomical Space
Anatomical Structure
Organ Cavity Subdivision
Organ Cavity
Organ
Serous Sac
Organ Component
Serous Sac Cavity
Tissue
Serous Sac Cavity Subdivision
is_a
Pleural Sac
Pleura(Wall of Sac)
Pleural Cavity
part_of
Parietal Pleura
Visceral Pleura
Interlobar recess
Mediastinal Pleura
Mesothelium of Pleura
36
For ontologies
  • it is generalizations that are important types,
    types, kinds, species

37
Catalog vs. inventory
A 515287 DC3300 Dust Collector Fan
B 521683 Gilmer Belt
C 521682 Motor Drive Belt

38
types vs. instances
39
names of instances
40
names of types
41
An ontology is a representation of types
  • We learn about types in reality from looking at
    the results of scientific experiments in the form
    of scientific theories
  • experiments relate to what is particular
    science describes what is general

42
types
mammal
frog
instances
43
3 kinds of (binary) relations
  • Between types
  • human is_a mammal
  • human heart part_of human
  • Between an instance and a type
  • this human instance_of the type human
  • this human allergic_to the type tamiflu
  • Between instances
  • Marys heart part_of Mary
  • Marys aorta connected_to Marys heart

44
Type-level relations presuppose the underlying
instance-level relations
  • A is_a B def. A and B are types and all
    instances of A are instances of B
  • A part_of B def. All instances of A are
    instance-level-parts-of some instance of B

45
Anatomical Space
Anatomical Structure
Organ Cavity Subdivision
Organ Cavity
Organ
Serous Sac
Organ Component
Serous Sac Cavity
Tissue
Serous Sac Cavity Subdivision
is_a
Pleural Sac
Pleura(Wall of Sac)
Pleural Cavity
part_of
Parietal Pleura
Visceral Pleura
Interlobar recess
Mediastinal Pleura
Mesothelium of Pleura
46
The assertions linking terms in ontologies must
hold universally
  • Hence all type-level relations are provided with
  • All-Some definitions
  • A has-part B def. All As have some B as
    instance-level part
  • A part-of B def. All As are instance-level
    parts of some B

47
Ontology for Biomedical Investigations
48
OBI representation of a trial in a neuroscience
study
49
OBI representation of a vaccine protection
investigation
50
(No Transcript)
51
Examples of ontology classes (types) used in
these examples
52
Ontology relations
53
BFO Top-Level Ontology
Continuant
Occurrent (always dependent on one or more
independent continuants)
Independent Continuant
Dependent Continuant
54
CONTINUANT CONTINUANT CONTINUANT CONTINUANT OCCURRENT
INDEPENDENT INDEPENDENT DEPENDENT DEPENDENT
ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality(PaTO) Organism-Level Process (GO)
CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) Phenotypic Quality(PaTO) Cellular Process (GO)
MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Function (GO) Molecular Process (GO)
OBO Foundry coverage
55
Two kinds of entities
  • occurrents (processes, events, happenings)
  • continuants (objects, qualities, states...)

56
You are a continuant
  • Your life is an occurrent
  • You are 3-dimensional
  • Your life is 4-dimensional

57
Dependent entities
  • require independent continuants as their bearers
  • There is no run without a runner
  • There is no grin without a cat

58
Phenotype Ontology (PATO)
59
color
anatomical structure
is_a
is_a
red
eye
instantiates
instantiates
the particular case of redness (of a particular
fly eye)
an instance of an eye (in a particular fly)
depends on
60
Dependent vs. independent continuants
  • Independent continuants (organisms, buildings,
    environments)
  • Dependent continuants (quality, shape, role,
    propensity, function, status, power, right)

61
All occurrents are dependent entities
  • They are dependent on those independent
    continuants which are their participants (agents,
    patients, media ...)

62
BFO Top-Level Ontology
Continuant
Occurrent (always dependent on one or more
independent continuants)
Independent Continuant
Dependent Continuant
63
Blinding Flash of the Obvious (BFO)
Continuant
Occurrent process
Independent Continuant thing
Dependent Continuant quality
.... ..... .......
64
OBO Foundry organized in terms of Basic Formal
Ontology
  • Each Foundry ontology can be seen as an
    extension of a single upper level ontology (BFO)
  • either post hoc, as in the case of the GO
  • or in virtue of creation ab initio via downward
    population from BFO

65
top level mid-level domain level
Basic Formal Ontology (BFO)
Information Artifact Ontology (IAO) Ontology for Biomedical Investigations (OBI) Spatial Ontology (BSPO)
Anatomy Ontology (FMA, CARO) Anatomy Ontology (FMA, CARO) Environment Ontology (EnvO) Infectious Disease Ontology (IDO) Biological Process Ontology (GO)
Cell Ontology (CL) Cellular Component Ontology (FMA, GO) Environment Ontology (EnvO) Infectious Disease Ontology (IDO) Biological Process Ontology (GO)
Cell Ontology (CL) Cellular Component Ontology (FMA, GO) Environment Ontology (EnvO) Phenotypic Quality Ontology(PaTO) Biological Process Ontology (GO)
Subcellular Anatomy Ontology (SAO) Subcellular Anatomy Ontology (SAO) Subcellular Anatomy Ontology (SAO) Phenotypic Quality Ontology(PaTO) Biological Process Ontology (GO)
Sequence Ontology (SO) Sequence Ontology (SO) Sequence Ontology (SO) Molecular Function (GO) Biological Process Ontology (GO)
Protein Ontology (PRO) Protein Ontology (PRO) Protein Ontology (PRO) Molecular Function (GO) Biological Process Ontology (GO)
Extension Strategy Modular
Organization
66
Principle of Low Hanging Fruit
  • Include even absolutely trivial assertions
    (assertions you know to be universally true)
  • pneumococcal bacterium is_a bacterium
  • Computers need to be led by the hand

67
Principle of singular nouns
  • Terms in ontologies represent types
  • Goal Each term in an ontology should represent
    exactly one type
  • Thus every term should be a singular noun

68
MeSH
  • MeSH Descriptors Index Medicus Descriptor
    Anthropology, Education, Sociology and Social
    Phenomena (MeSH Category) Social
    Sciences
  • Political Systems National
    Socialism
  • National Socialism is_a Political Systems
  • National Socialism is_a Anthropology ...

69
Principle do not confuse words with things
  • mouse def. common name for the species mus
    musculus
  • swimming is healthy and has eight letters

70
Principle of Aristotelian definitions
  • All definitions should be of the form
  • an S Def. a G which Ds
  • where G (for genus) is the parent term of S
    (for species) in the corresponding reference
    ontology
  • For example
  • A human being is an animal which is rational

71
Single Inheritance
  • No kind in a classificatory hierarchy should be
    asserted to have more than one is_a parent on the
    immediate higher level

72
Multiple Inheritance
  • thing

car
blue thing
is_a
is_a
blue car
73
Multiple Inheritance
  • is a source of errors
  • encourages laziness
  • serves as obstacle to integration with
    neighboring ontologies
  • hampers use of Aristotelian methodology for
    defining terms
  • hampers use of statistical search tools

74
Multiple Inheritance
  • thing

blue thing
car
is_a1
is_a2
blue car
75
Principle of asserted single inheritance
  • Each reference ontology module should be built
    as an asserted monohierarchy (a hierarchy in
    which each term has at most one parent)
  • Asserted hierarchy vs. inferred hierarchy

76
Ontology Development Principles
  • Reference ontologies capture generic content
    and are designed for aggressive reuse in multiple
    different types of context
  • Single inheritance
  • Single reference ontology for each domain of
    interest
  • Application ontologies created by combining
    local content with generic content taken from
    relevant reference ontologies

77
top level mid-level domain level
Basic Formal Ontology (BFO)
Information Artifact Ontology (IAO) Ontology for Biomedical Investigations (OBI) Spatial Ontology (BSPO)
Anatomy Ontology (FMA, CARO) Anatomy Ontology (FMA, CARO) Environment Ontology (EnvO) Infectious Disease Ontology (IDO) Biological Process Ontology (GO)
Cell Ontology (CL) Cellular Component Ontology (FMA, GO) Environment Ontology (EnvO) Infectious Disease Ontology (IDO) Biological Process Ontology (GO)
Cell Ontology (CL) Cellular Component Ontology (FMA, GO) Environment Ontology (EnvO) Phenotypic Quality Ontology(PaTO) Biological Process Ontology (GO)
Subcellular Anatomy Ontology (SAO) Subcellular Anatomy Ontology (SAO) Subcellular Anatomy Ontology (SAO) Phenotypic Quality Ontology(PaTO) Biological Process Ontology (GO)
Sequence Ontology (SO) Sequence Ontology (SO) Sequence Ontology (SO) Molecular Function (GO) Biological Process Ontology (GO)
Protein Ontology (PRO) Protein Ontology (PRO) Protein Ontology (PRO) Molecular Function (GO) Biological Process Ontology (GO)
OBO Foundry Downward Population from
BFO
78
Example The Cell Ontology
79
How to build an ontology
  • import BFO into ontology editor (Protégé)
  • work with domain experts to create an initial
    mid-level classification
  • find 50 most commonly used terms corresponding
    to types in reality
  • arrange these terms into an informal is_a
    hierarchy according to this universality
    principle
  • A is_a B ? every instance of A is an instance of
    B
  • fill in missing terms to give a complete
    hierarchy
  • (leave it to domain experts to populate the lower
    levels of the hierarchy)

80
BFO Top-Level Ontology
Continuant
Occurrent (always dependent on one or more
independent continuants) Process
Independent Continuant Material Entity
Dependent Continuant Attribute
81
BFO Top-Level Ontology
Continuant
Process
Material Entity
Attribute
82
http//ncorwiki.buffalo.edu/index.php/ Immunology_
Ontology
83
Terms for an Allergy Ontology
  peanut allergy disease IgE-mediated hypersensitivity to peanut allergen(s) peanut allergy disorder Mast cells and basophils with peanut allergen-specific IgE bound to their membranes mast cell/basophil degranulation   allergic reaction   acute urticaria (hives)   allergic angioedema    allergic rhinitis    allergic asthma anaphylaxis (as reaction) anaphylaxis (as syndrome)   peanut allergy   milk allergy   ragweed allergy   dust mite allergy
84
(No Transcript)
85
From the allergy example
86
(No Transcript)
About PowerShow.com