Title: Ontologies, Clinical and Genomic Information How to say what we mean and mean what we say Opportunities
1Ontologies, Clinical and Genomic InformationHow
to say what we mean and mean what we
sayOpportunities Pitfalls
- Alan Rector, Jeremy Rogers, Chris Wroe
- Information Management Group / Bio Health
Informatics GroupDepartment of Computer Science,
University of Manchesterrector_at_cs.man.ac.uk
www.clinical-escience.orgwww.co-ode.orgwww.open
galen.orgprotege.stanford.org
2What Is An Ontology?
- Ontology (Socrates Aristotle 400-360 BC)
- The study of being
- Word borrowed by computing for the explicit
description of the conceptualisation of a domain - concepts (entities)
- properties and attributes of concepts
- constraints on properties and attributes
- Individuals (often, but not always)
- An ontology defines
- a common vocabulary
- a shared understanding
- a classification
3Sharing info ? Sharing meaning
- Metadata
- Data describing the content and meaning of
resources and services. - But everyone must speak the same language
- Terminologies
- Shared and common vocabularies
- For search engines, agents, curators, authors and
users - But everyone must mean the same thing
- Ontologies
- Shared and common understanding of a domain
- Essential for search, exchange and discovery
4Measure the worldquantitative models(not
ontologies)
- Quantitative
- Numerical data
- 2mm, 2.4V, between 4 and 5 feet
- Unambiguous tokens
- Main problem is accuracy at initial capture
- Numerical analysis (e.g. statistics) well
understood - Examples
- How big is this breast lump?
- What is the average age of patients with cancer ?
- How much time elapsed between original referral
and first appointment at the hospital ?
5describe the the world ontologies
- Qualitative
- Descriptive data
- Cold, colder, blueish, not pink, drunk
- Ambiguous tokens
- Whats wrong with being drunk ?
- Ask a glass of water.
- Accuracy poorly defined
- More examples
- How pleomorphic are the cells in the biopsy?
- What is a proteins function?
- What is the derivation of a tissue?
6Why Develop an Ontology?Naming, Classifying,
Indexing
- To share common understanding of the structure of
descriptive information - among people
- among software agents
- between people and software
- To enable reuse of domain knowledge
- to introduce standards to allow interoperability
- To index and annotate other resources
Semantic InteroperabilityFoundation of the
Semantic Web/Grid
7More Reasons
- To make domain assumptions explicit
- easier to change domain assumptions (consider a
genetics knowledge base) - easier to understand and update legacy data
- To separate domain knowledge from the operational
knowledge - re-use domain and operational knowledge
separately (e.g., configuration based on
constraints) - To manage the combinatorial explosion
8A semantic continuum
- Mike Uschold, Boeing Corp
Shared human consensus
Implicit
? Further to the right ?
- Less ambiguity
- Better inter-operation
- More robust less hardwiring
- More difficult
9An Ontology should be just the Beginning
Databases
Declare structure
Ontologies
Knowledge bases
The SemanticWeb
Provide domain description
Software agents
Problem-solving methods
10What an Ontology Isnt(It wont make the
coffee)
- A database
- Ontologies are about categories/classes/types/conc
epts/entities not instances - ABOUT diseases, genes, proteins, ...
NOT ABOUT specific patients, samples, studies,
- A database/EHR schema
- An ontology is about meaning rather than storage
- Although ontology technologies are a means for
merging schemas - A decision support/protocol management system
- The entities used in the rules, not the rules
- A metadata schema
- The entities used in the metadata, not the schema
itself - A lexicon
- Meaning rather than language
- But every ontology needs language tools
11Ontology Technologies
- Description logics (DLs), OWL
- Designed to provide logical support for automatic
classification and consistency checking - Designed for sharing and software engineering
- Leverage off Semantic Web / Grid commnity
- But not everything in OWL is an ontology
- RDF(S)
- Specialised for groups
- DAGEdit and other OBO tools FMA explorer,
- UML
- Carefully developed UML models convey much
information for an ontology - But support only very simple inference and
checking
12Why its hard (1)
- Language is slippery local Rigour logic are
hard - Classification is too easy for people (to do
badly) - But logical/computational properties unintuitive
- Combinatorial explosions
- Philosophical religious differences
- Information capture
- Data quality
- Tools environments
- Different points of view
- Oncology, Cardiology,
- Adult, developmental, aetiological,
- Clinical, genetic, genomic,
13Why its hard (2)
- Need a combined model of meaning
- The EHR/Database holding the ontology PLUSThe
ontology held - Hard to scope easy to do too much
- Just in time ontology
- Better in the bio than the medical community
- Software engineering methods poorly understood
14Classification is easy for people (to do badly)
- On those remote pages it is written that animals
are divided into - a. those that belong to the Emperor
- b. embalmed ones
- c. those that are trained
- d. suckling pigs
- e. mermaids
- f. fabulous ones
- g. stray dogs
- h. those that are included in this classification
- i. those that tremble as if they were mad
- j. innumerable ones
- k. those drawn with a very fine camel's hair
brush - l. others
- m. those that have just broken a flower vase
- n. those that resemble flies from a distance"
From The Celestial Emporium of Benevolent
Knowledge, Borges
15Avoiding combinatorial explosions
- The Exploding Bicycle From phrase book to
dictionary grammar - 1980 - ICD-9 (E826) 8
- 1990 - READ-2 (T30..) 81
- 1995 - READ-3 87
- 1996 - ICD-10 (V10-19 Australian) 587
- V31.22 Injury or accident to the occupant of
three-wheeled motor vehicle in collision with
pedal cycle, person on outside of vehicle,
nontraffic accident, while working for income - and meanwhile elsewhere in ICD-10
- W65.40 Drowning and submersion while in bath-tub,
street and highway, while engaged in sports
activity - X35.44 Victim of volcanic eruption, street and
highway, while resting, sleeping, eating or
engaging in other vital activities
16The ontology nested in the EHR
the ehr (hl7 rim) moodCodeEvent
subjectRelative code
diabetes (subject person_in_family)
the ontology (snomed-ct)
? ltfamily_hx (assoc_find Diabetes)gt
the combined meaning
What is legal? Required? Mandatory?
17Developing Software Engineering Methodologies for
Ontologies
- Building a life cycle
- Use/test cases exemplars
- Identifying problems alternative solutions -
exploring consequences deciding amongst
alternatives - Specifying solutions
- Human and machine readable form
- Setting conformance tests for specifications
- Building reference implementations
- Monitoring for problems
- Recording of problems and changes
18Logic-based Ontologies Conceptual Lego
gene
protein
cell
expression
chronic
acute
bacterial
deletion
polymorphism
ischaemic
19Logic-based Ontologies Conceptual Lego
SNPolymorphism of CFTRGene causing Defect in
MembraneTransport of Chloride Ion causing
Increase in Viscosity of Mucus in CysticFibrosis
Hand which isanatomically normal
20Logical Constructs build complex concepts from
modularisedprimitives
Species
Genes
Function
Disease
21Normalising (untangling) Ontologies
22A simplified example Build a simple treee
easy to maintain
23Let the classifier organise it
24If you want more abstractions,just add new
definitions(re-use existing data)
Diseases linked to abnormal proteins
25And let the classifier work again
26And again even for a quite different category
Diseases linked genes described in the mouse
27Untangling and EnrichmentUsing a classifier to
make life easier
Substance- Protein- - ProteinHormone- - -
Insulin- Steroid- - SteroidHormone- - -
Cortisol- Hormone- -ProteinHormone- - -
Insulin- - SteroidHormone- - - Cortisol-
Catalyst- - Enzyme- - - ATPase
Substance- Protein- - ProteinHormone- - -
Insulin- - Enzyme- - - ATPase- Steroid- -
SteroidHomone- - - Cortisol-Hormone- -
ProteinHormone- - - Insulin- -
SteroidHormone- - - Cortisol- Catalyst- -
Enzyme- - - ATPase
Hormone ? Substance playsRole-someValuesFrom HormoneRole
ProteinHormone ? Protein playsRole someValuesFrom HormoneRole
SteroidHomone ? Steroid playsRole someValuesFrom HormoneRole
Catalyst ? Substance playsRole someValuesFrom CatalystRole
Enzyme ? Protein playsRole someValuesFrom CatalystRole
Insulin ? playsRole someValuesFrom HormoneRole
Cortisol ? playsRole someValuesFrom HormoneRole
ATPase ? playsRole someValuesFrom CatalystRole
28Ontologies and Reference Information Resources
- An ontology is just one part
- Naming - Definitions necessary conditions
- Classification
- Indexing
- Knowledge bases
- What we know about those entities what is true
in general - Databases
- What we know about individuals
- Instance stores specialised databases that link
to ontologies - Plus
- Lexicons
- Metadata
- Mappings
29Definitionalknowledge
Ontology
Linguistic
Knowledge
30Example 1 Indexing Drug Contraindications(or
guidelines or information or)
31Example 2 Indexing data entry formsFractal
tailoring forms for clinical trials
Hypertension
Hypertension
Idiopathic Hypertension
Idiopathic Hypertension
In our companys studies
In our companys studies
In Phase 2 studies
In Phase 2 studies
32Example 3 PENPADFractal Tailoring of fail
soft forms
What is it sensible to say about ?
33(No Transcript)
34Technical Barriers to linking ontologies
- Overlap
- Linking independent ontologies easyOverlap
ALWAYS brings differences in meaning - To integrate, separate
- Appropriate levels of abstraction
- Genetics/Genomics is changing disease
clqssification - Anti-angina drugs
- Ingredients conjugated in the liver
- Feedback
- New biology ? new clinical classifications ?
Disciplin required to keep separations - Views
- Anatomy Tissues (developmental) vs Structures
vs Functions
35Nontechnical barriers to linking ontologies
- Organisational barriers
- How to keep separation and scope of individual
ontologies - All enterprises tend to expand and encroach
- Discipline barriers
- Task barriers
- Fit for one purpose is not fit for all purposes
- Language barriers
- Between communities as well as languages
- IP barriers
- Process
- Collaborative distributed vs Centralised
- Authority
- Life cycle and rate of change
- GO runs at web speed seconds - days
- SNOMED runs at e-publishing speed 6mo-3 years
- ICD runs at print/committee speed 10-20 years
36Good ontologies
- Fitness for purpose
- Whats it for?
- Defined scope
- Ownership by users
- A language belongs to its community
- Human factors
- Understandability, Reliability!
- Evaluation criteria
- How do we know if it meets its purpose?Evolution
Process not Product!
37Good ontologies
- Internal Structure
- Consistency
- Modularity Normalisation
- Software engineering issues Architecture Tools
- Its software! It evolves! Its a
standard!Conformance and regression testing
matter - Philosophical clarity
- Class-instance divide correct
- Instances are different in ontologies and
databases - Ontologies are about a view of the worldNot
about how to store information in a database - Clear distinction between part-whole and kind-of
38Grounding cost vs Cleanup cost
- What do we need to share?
- What is broken?
- How much do we need to know to communicate?
- Easy to build too much
- And very costly!
- Just in time ontology
- Use logic
- Use the web
- Bio / OBO does wellMedicine so far doing badly
39Important Ontologies related standards
- OBO (Open Biomedical Ontologies)
- Gene Ontology
- MGED family
-
- UMLS
- Massive resource for cross referencing
- Use CUIs LUIs Concept Unique IDs Lexical
Unique IDs - SNOMED-CT
- SNOMED-International
- Anatomy
- Digital Anatomist FMA, Mouse Developmental, Mouse
Adult - SAEL Standard Anatomy Entry List
- NCICB
- CaCORE ontology
- National minimum data sets controlled
vocabularies - HL7, LOINC, DICOM, CDISC,
- OpenGALEN source for experimentation and
development - Bio databases at least implicit controlled
vocabularies - Swissprot, OMIM, , ENSEMBLE, PRINTs,
40Summary Planning forNaming, Classifying,
Indexing
- What is it for? Is there a gap? What is needed?
- What are the use cases? Criteria for success?
- Does it exist already?
- Is an ontology the answer? Is an ontology needed
for the answer? - What else is needed?
- A reference knowledge source?
- What is the MINIMUM that one can do?
- Who will own it?
- Can we build it collaboratively?
- What is the authority?
- How will it evolve?
- What is the pace of change?
- Can we do it just in time?
- Can we evaluate and test it again and again?