Title: Building Ontologies from the Ground Up When users set out to model their professional activity
1Building Ontologies from the Ground Up When
users set out to model their professional activity
- Mark A. Musen
- Professor of Medicine and Computer Science
- Stanford University
v 1.00
2An ontology is a specification of a
conceptualization (T. Gruber)
- A conceptualization is the way we think about a
domain - A specification provides a formal way of writing
it down
3Porphyrys depiction of Aristotles Categories
Supreme genus SUBSTANCE
Differentiae material immaterial
Subordinate genera BODY SPIRIT
Differentiae animate inanimate
Subordinate genera LIVING
MINERAL
Differentiae sensitive insensitive
Proximate genera ANIMAL PLANT
Differentiae rational irrational
Species HUMAN BEAST
Individuals Socrates Plato Aristotle
4(No Transcript)
5Creating Ontologies in Machine-Processable Form
- Provides a mechanism for developers to codify
salient distinctions about the world or some
application area - Provides a structure for knowledge bases that can
enable - Information retrieval
- Information integration
- Automated translation
- Decision support
6The New Philosophers
- Categorizing what exists in machine-understandab
le form - Providing a structure that enables
- Developers to locate and update relevant
descriptions - Computers to infer relationships and properties
- Creating new abstractions to facilitate the
creation of this structure
7(No Transcript)
8Part of the CYC Upper Ontology
9There is a misconception
- That people building ontologies are all well
versed in metaphysics, computer science,
knowledge representation, and the content domain - That ontologies in the real world are as clean
as SUMO, DOLCE, and other upper-level ontologies - That most people who are creating ontologies
understand all the ramifications of what they are
doing!
10Lots of ontology builders are not very good
philosophers
- Nearly always, ontologies are created to address
pressing professional needs - The people who have the most insight into
professional knowledge may have little
appreciation for metaphysics, principles of
knowledge representation, or computational logic - There simply arent enough good philosophers to
go around
11Practical Problems
12The pressing need to standardize the names of
human genes
13But the human genome is only part of the problem
- Scientist maintain huge databases of gene
sequences and gene expression for a wide range of
model organisms (e.g., mouse, rat, yeast, fruit
fly, round worm, slime mold) - Database entries are annotated with the entries
such as the name of a gene, the function of the
gene, and so on - How do you ensure uniformity in the nature of
these annotations?
14Gene Ontology Consortium
- Founded in 1998 as a collaboration among
scientists responsible for developing different
databases of genomic data for model organisms
(fruit fly, yeast, mouse) - Now, essentially all developers of all
model-organism databases participate - Goal To produce a dynamic, controlled
vocabulary that can be applied to all organism
databases even as knowledge of gene and protein
roles in cells is accumulating and changing
15Gene Ontology (GO)
- Comprises three independent ontologies
- molecular function of gene products
- cellular component of gene products
- biological process representing the gene
products higher order role. - Uses these terms as attributes of gene products
in the collaborating databases (gene product
associations) - Allows queries across databases using GO terms,
providing linkage of biological information
across species
16GO Three Ontologies
- Molecular Function
- elemental activity or task
- example DNA binding
- Cellular Component
- location or complex
- example cell nucleus
- Biological Process
- goal or objective within cell
- example secretion
17(No Transcript)
18GO has been wildly successful!!
- Dozens of biologists around the world contribute
to GO on a regular basis - The ontology is updated every 30 minutes!
- Its now impossible to work in most areas of
computational biology without making use of GO
terms
19But GO has real problems
- Ontologies are represented in an idiosyncratic
format that is not compatible with standard
knowledge-representation systems - The format is based on directed acyclic graphs of
concepts, without the general ability to specify
machine interpretable properties of concepts or
definitions of concepts - Because of the informal knowledge-representation
system, lots of errors have crept into GO - Terms that are duplicated in different places
- Terms with no superclasses
- Uncertain relationships between terms
20(No Transcript)
21Tension in the GO Community
- Biologists around the world with pressing needs
to integrate research databases work together to
add terms to GO nearly continuously - Using an impoverished, nonstandard
knowledge-representation system - Using no standards to assure uniform modeling
conventions from one part of GO to another - Computer scientists bemoan all this ad-hoc-ery
and condemn GO as a hack that will become
increasingly unusable and unmaintainable
22The Capulets and MontaguesA plague on both your
houses?
A wonderful keynote talk from the recent meeting
on Standards and Ontologies for Functional
Genomics
- Professor Carole Goble
- University of Manchester, UK
- Warning
- This talk contains sweeping generalisations
23Prologue
? Carole Goble
- Two households, both alike in dignity,
- In fair genomics, where we lay our scene,
- (One, comforted by its logics rigour,
- Claims ontology for the realm of pure,
- The other, with blessed scientists vigour,
- Acts hastily on models that endure),
- From ancient grudge break to new mutiny,
- When being drives a fly-man to blaspheme.
- From forth the fatal loins of these two foes
- Researchers to unlock the book of life
- Whole misadventured piteous overthrows
- Can with their work bury their clans strife.
- The fruitful passage of their GO-mark'd love,
- And the continuance of their studies sage,
- Which, united, yield ontologies undreamed-of,
- Is now the hours' traffic of our stage
- The which if you with patient ears attend,
- What here shall miss, our toil shall strive to
mend.
Based on an idea by Shakespeare
24The Montagues
? Carole Goble
One, comforted by its logics rigour, Claims
ontology for the realm of pure
- Computer Science, Knowledge engineering, AI
- Logic and Languages
- Theory
- Top down, well-behaved neatness
- Generic and lots of toys
- Methodologies patterns
- Tools and standards
- Technology push
- Academic pursuit
25The Capulets
? Carole Goble
The other, with blessed scientists vigour, Acts
hastily on models that endure
- Life Scientists
- Practice
- Bottom up, real-world
- Specific and many of them
- Methodologies, community practice
- Tools and standards
- Application pull
- Practical pursuit build n use it
26The Philosophers
? Carole Goble
One, comforted by its logics rigour, Claims
ontology for the realm of pure
- Philosophers
- Theory
- Truth
- Generic the one true ontology?
- Methodologies, patterns foundational ontologies
- Not really into tools
- No push or pull
- Academic pursuit
27? Carole Goble
Philosophers
Spiritual guides
Aesthetics
Life Scientists Capulets
KR Montagues
Theoreticians
Pragmatists
A means to an end Content providers
The end Mechanism providers
28The Princes of Genomics
? Carole Goble
- Rebellious subjects, enemies to peace,
- Profaners of this neighbour-stained steel,--
- Will they not hear? What, ho! you men, you
beasts, - That quench the fire of your pernicious rage
- With purple fountains issuing from your veins,
- On pain of torture, from those bloody hands
- Throw your mistemper'd weapons to the ground,
- And hear the sentence of your moved prince.
- Three civil brawls, bred of an airy word,
- By thee, old Capulet, and Montague,
- Have thrice disturb'd the quiet of our streets,
- And made genomics's ancient citizens
- Cast by their grave beseeming ornaments,
- To wield old partisans, in hands as old,
- Canker'd with peace, to part your canker'd hate
29A tragedy?
As in Romeo and Juliette, the threats are
political and sociological
30Creating ontologies has become a widespread
cottage industry
- Professional Societies
- MGED Microarray Gene Expression Data Society
- HUPO Human Protein Organization
- Government
- NCI Thesaurus
- NIST Process Specification Language
- Open Biological Ontologies
- GO
- Three dozen (and growing) other ontologies
- Mostly in DAG-Edit, some in Protégé format
31(No Transcript)
32Government Continues to be a Major Driving Force
- Highly visible intramural initiatives to create
public ontologies at many agencies, including
NIST, NIH, VA, CDC - Notable variation in these ontologies
- Scope
- Representational sophistication
- Openness of content
- Opportunities for peer review
33NCI Enterprise Vocabulary Services
- 1997 R. Klausner, Director NCI, wanted a
science management system - Know about everything funded by NCI
- Goals and results bench to bedside
- Thereby improve and speed translation of research
- Approach
- Create integrative terminology
- Evolve terminology scope from supporting grants
management to supporting science - Build Web-accessible infrastructure caCORE
-
34(No Transcript)
35More than 37,000 concepts are represented with
extremely detailed granularity in many areas
36Definitions may include considerable detail with
respect to properties that establish
relationships with other concepts
37- NCI Thesaurus is in Active Use
- nciterms.nci.nih.gov
- ncicb.nci.nih.gov/core/EVS (more info)
- Website 1500-4000 page hits daily, 14K unique
visitors (2004) - API NCICB external applications
- Fulfills NCI and collaborators needs for
controlled vocabulary - Public domain, open content license
38NCI Thesaurus Guidelines
- Develop content model (based on Ontylog?
description logic from Apelon, Inc.) - Leverage existing sources as appropriate
- MeSH, VA NDF-RT, MedDRA
- Develop unique content where needed
- Cancer genes, gene products, cancer diagnoses,
drugs, chemotherapies, molecular abnormalities
etc., and relationships among them - Link to other standards using URLs where possible
- OMIM, Swissprot, GO
39NCI uses an Elaborate Process for Editing and
Maintenance
40The NCI Thesaurus is not without its problems
- Upper level concepts are sometimes used
inconsistently or not at all - Textual definitions of concepts may not always
reflect the meaning implied by the concepts
position in the ontology - Reliance on a proprietary knowledge-representation
system - Prevents the ability to disseminate the ontology
freely - Adds an unfortunate degree of uncertainty to the
semantics
41Throughout this cottage industry
- Lots of ontology development, principally by
content experts with little training in
conceptual modeling - Use of development tools and ontology-definition
languages that may be - Extremely limited in their expressiveness
- Useless for detecting potential errors and
guiding correction - Nonadherent to recognized standards
- Proprietary and expensive
42But the world is beginning to change!
- The Montagues do want to get the modeling right!
- The Capulets do want to see their work used by
others! - Useful, open tools and standards are now
available that make it hard to justify closed,
proprietary approaches
43Some signs the world is changing
- Developers of several overlapping and
incompatible ontologies of anatomy suddenly are
trying to understand why their models do not
agree - Philosopher Barry Smith suddenly is camping out
at biomedical informatics meetings to get the
attention of ontology developers - NCI is piloting the use of OWL and Protégé to
encode and manage the NCI thesaurus - MGED and several other biomedical ontologies are
being authored in OWL and Protégé from the
beginning - Downloads of the Protégé system continue to
escalate
44(No Transcript)
45(No Transcript)
46Protégés main features
- Simplified editing of ontologies and knowledge
bases - Open-source distribution to encourage development
by a world-wide community of users - A plug-in architecture that enables developers to
add new features easily - Support for a wide range of representation
formats - CLIPS/COOL
- XML Schema
- UML
- RDF
- OWL
47Protégé is ecumenical in its support for formal
languages
- Open Knowledge Base Connectivity Protocol
- CLIPS/COOL
- UML
- XML Schema
- RDF and RDFS
- Topic Maps
- Ontology Web Language (OWL)
48Protégé remains successful because of its user
community
- There are now 89 plug-ins available for use with
Protégé - Collaboration with our users enables rapid
debugging and code fixes - Some development, such as the creation of
extensions to our basic OWL capabilities, has
been a major collaborative experience - Annual users groups meetings provide great
opportunities for developers to share strategies,
principles, and war stories - Members of the international Protégé community
are a huge support base for new users and for
fledgling projects
49The NCI Thesaurus
50Moving from cottage industry to the industrial
age
- There must be widely available tools that are
open-source, that are easy to use, and that
adhere to knowledge representation standards
Protégé certainly is a candidate - There must be a large user user community of
developers who use the tools and who can provide
feedback to one another and to the core team of
tool builders
51Moving from cottage industry to the industrial
age II
- Government and professional societies must set
expectations regarding the need for appropriate
standards - Government and professional societies must invest
in educational programs to teach Montagues to
identify with Capulets, and vice versa - Demonstration projects must communicate to the
potential developers of future ontologies the
strengths and weaknesses of the guidelines,
tools, and languages that facilitated the
development work
52A thousand flowers are blooming from every corner
of the landscape
- Ontologies are being developed by interested
groups from every sector of academia, industry,
and government - Many of these ontologies have been proven to be
extraordinarily useful to wide communities - Many of these same ontologies have been shown to
be structurally flawed and of uncertain semantics - We finally are at the stage where we have tools
and representation languages that can lift us out
of the grass roots to create durable and
maintainable ontologies with rich semantic content
53An infrastructure is now in place
- The need to build new ontologies in environmental
health, phenotypic expression in model organisms,
developmental biology, and many, many other
domains is getting wide attention - We finally have the tools and the languages to do
things right - Now all we need now is the will, the educational
opportunities, and the community feedback to help
developers at the grass roots to reemerge as
philosophers and princes.
54(No Transcript)
55Editing OWL Ontologieswith Protégé
- Holger Knublauch
- Stanford University
- July 06, 2004
56This Tutorial
- Introduction to OWL, the Semantic Web, and the
Protégé OWL Plugin - Theory Walkthrough
- Also available Tutorial by Matthew Horridge
(http//www.co-ode.org) - Similar content but more details on logic
- Other example scenario (Pizzas)
- ... Workshop (this afternoon)
- ... Talks (tomorrow morning)
57Overview
The Semantic Web and OWL
Basic OWL
Interactive Classes, Properties
Advanced OWL
Interactive Class Descriptions
Creating Semantic Web Contents
58The Semantic Web
- Shared ontologies help to exchange data and
meaning between web-based services
(Image by Jim Hendler)
59Wine Example Scenario
Tell me what wines I should buy to serve with
each course of the following menu.
Books Agent
Wine Agent
I recommend Chardonney or DryRiesling
Grocery Agent
60Ontologies in the Semantic Web
- Provide shared data structures to exchange
information between agents - Can be explicitly used as annotations in web
sites - Can be used for knowledge-based services using
other web resources - Can help to structure knowledge to build domain
models (for other purposes)
61OWL
- Web Ontology Language
- Official W3C Standard since Feb 2004
- Based on predecessors (DAMLOIL)
- A Web Language Based on RDF(S)
- An Ontology Language Based on logic
62OWL Ontologies
- Whats inside an OWL ontology
- Classes class-hierarchy
- Properties (Slots) / values
- Relations between classes(inheritance,
disjoints, equivalents) - Restrictions on properties (type, cardinality)
- Characteristics of properties (transitive, )
- Annotations
- Individuals
- Reasoning tasks classification,consistency
checking
63OWL Use Cases
- At least two different user groups
- OWL used as data exchange language(define
interfaces of services and agents) - OWL used for terminologies or knowledge models
- OWL DL is the subset of OWL (Full) that is
optimized for reasoning and knowledge modeling
64Protégé OWL Plugin
- Extension of Protégé for handling OWL ontologies
- Project started in April 2003
- Features
- Loading and saving OWL files databases
- Graphical editors for class expressions
- Access to description logics reasoners
- Powerful platform for hooking in custom-tailored
components
65Tutorial Scenario
- Semantic Web for Tourism/Traveling
- Goal Find matching holiday destinations for a
customer
I am looking for a comfortable destination with
beach access
Tourism Web
66Scenario Architecture
- A search problem Match customers expectations
with potential destinations - Required Web Service that exploits formal
information about the available destinations - Accomodation (Hotels, BB, Camping, ...)
- Activities (Sightseeing, Sports, ...)
67Tourism Semantic Web
- Open World
- New hotels are being added
- New activities are offered
- Providers publish their services dynamically
- Standard format / grounding is needed ?
Tourism Ontology
68Tourism Semantic Web
OWL Metadata (Individuals)
OWL Metadata (Individuals)
Tourism Ontology
Destination
Accomodation
Activity
OWL Metadata (Individuals)
OWL Metadata (Individuals)
Web Services
69OWL (in Protégé)
- Individuals (e.g., FourSeasons)
- Properties
- ObjectProperties (references)
- DatatypeProperties (simple values)
- Classes (e.g., Hotel)
70Individuals
- Represent objects in the domain
- Specific things
- Two names could represent the same real-world
individual
71ObjectProperties
- Link two individuals together
- Relationships (0..n, n..m)
72Inverse Properties
- Represent bidirectional relationships
- Adding a value to one property also adds a value
to the inverse property
73Transitive Properties
- If A is related to B and B is related to C then A
is also related to C - Often used for part-of relationships
74DatatypeProperties
- Link individuals to primitive values(integers,
floats, strings, booleans etc) - Often AnnotationProperties without formal
meaning
hasSize 4,500,000 isCapital true rdfscomment
Dont miss the opera house
75Classes
- Sets of individuals with common characteristics
- Individuals are instances of at least one class
76Range and Domain
- Property characteristics
- Domain left side of relation (Destination)
- Range right side (Accomodation)
77Domains
- Individuals can only take values of properties
that have matching domain - Only Destinations can have Accomodations
- Domain can contain multiple classes
- Domain can be undefinedProperty can be used
everywhere
78Superclass Relationships
- Classes can be organized in a hierarchy
- Direct instances of subclass are also (indirect)
instances of superclasses
79Class Relationships
- Classes can overlap arbitrarily
80Class Disjointness
- All classes could potentially overlap
- In many cases we want to make sure they dont
share instances
disjointWith
81(Create a new OWL project)
82(Create simple classes)
83(Create class hierarchy and set disjoints)
84(Create Contact class with datatype properties)
85(Edit details of datatype properties)
86(Create an object property hasContact)
87(Create an object property with inverse)
88(Create the remaining classes and properties)
89Class Descriptions
- Classes can be described by their logical
characteristics - Descriptions are anonymous classes
90Class Descriptions
- Define the meaning of classes
- Anonymous class expressions are used
- All national parks have campgrounds.
- A backpackers destination is a destination that
has budget accomodation and offers sports or
adventure activities. - Expressions mostly restrict property values (OWL
Restrictions)
91Class Descriptions Why?
- Based on OWLs Description Logic support
- Formalize intentions and modeling decisions
(comparable to test cases) - Make sure that individuals fulfill conditions
- Tool-supported reasoning
92Reasoning with Classes
- Tool support for three types of reasoning exists
- Consistency checkingCan a class have any
instances? - ClassificationIs A a subclass of B?
- Instance classificationWhich classes does an
individual belong to? - For Protégé we recommend RACER(but other tools
with DIG support work too)
93Restrictions (Overview)
- Define a condition for property values
- allValuesFrom
- someValuesFrom
- hasValue
- minCardinality
- maxCardinality
- cardinality
- An anonymous class consisting of all individuals
that fulfill the condition
94Cardinality Restrictions
- Meaning The property must have at least/at
most/exactly x values - is the shortcut for and
- Example A FamilyDestination is a Destination
that has at least one Accomodation and at least 2
Activities
95allValuesFrom Restrictions
- Meaning All values of the property must be of a
certain type - Warning Also individuals with no values fulfill
this condition (trivial satisfaction) - Example Hiking is a Sport that is only possible
in NationalParks
96someValuesFrom Restrictions
- Meaning At least one value of the property must
be of a certain type - Others may exist as well
- Example A NationalPark is a RuralArea that has
at least one Campground and offers at least one
Hiking opportunity
97hasValue Restrictions
- Meaning At least one of the values of the
property is a certain value - Similar to someValuesFrom but with
Individuals and primitive values - Example A PartOfSydney is a Destination where
one of the values of the isPartOf property is
Sydney
98Enumerated Classes
- Consist of exactly the listed individuals
99Logical Class Definitions
- Define classes out of other classes
- unionOf (or)
- intersectionOf (and)
- complementOf (not)
- Allow arbitrary nesting of class descriptions (A
and (B or C) and not D)
100unionOf
- The class of individuals that belong to class A
or class B (or both) - Example Adventure or Sports activities
101intersectionOf
- The class of individuals that belong to both
class A and class B - Example A BudgetHotelDestination is a
destination with accomodation that is a budget
accomodation and a hotel
102Implicit intersectionOf
- When a class is defined by more than one class
description, then it consists of the intersection
of the descriptions - Example A luxury hotel is a hotel that is also
an accomodation with 3 stars
103complementOf
- The class of all individuals that do not belong
to a certain class - Example A quiet destination is a destination
that is not a family destination
104Class Conditions
- Necessary Conditions(Primitive / partial
classes)If we know that something is a X,then
it must fulfill the conditions... - Necessary Sufficient Conditions(Defined /
complete classes)If something fulfills the
conditions...,then it is an X.
105Class Conditions (2)
(not everything that fulfills theseconditions is
a NationalPark)
(everything that fulfills theseconditions is a
QuietDestination)
106Classification
- A RuralArea is a Destination
- A Campground is BudgetAccomodation
- Hiking is a Sport
- ThereforeEvery NationalPark is a
Backpackers-Destiantion
(Other BackpackerDestinations)
107Classification (2)
- Input Asserted class definitions
- Output Inferred subclass relationships
108(Create an enumerated class out of individuals)
109(Create a hasValue restriction)
110(Create a hasValue restriction)
111(Create a defined class)
112(Classify Campground)
113(Add restrictions to City and Capital)
114(Create defined class BackpackersDestination)
115(Create defined class FamilyDestination)
116(Create defined class QuietDestination)
117(Create defined class RetireeDestination)
118(Classification)
119(Consistency Checking)
120Visualization with OWLViz
121OWL Wizards
122Putting it All Together
- Ontology has been developed
- Published on a dedicated web address
- Ontology provides standard terminology
- Other ontologies can extend it
- Users can instantiate the ontology to provide
instances - specific hotels
- specific activities
123Ontology Import
- Adds all classes, properties and individuals from
an external OWL ontology into your project - Allows to create individuals, subclasses, or to
further restrict imported classes - Can be used to instantiate an ontology for the
Semantic Web
124Tourism Semantic Web (2)
OWL Metadata (Individuals)
Tourism Ontology
Destination
Accomodation
Activity
Web Services
125Ontology Import with Protégé
- On the Metadata tab
- Add namespace, define prefix
- Check Imported and reload your project
126Individuals
127Individuals
128OWL File
lt?xml version"1.0"?gt\ ltrdfRDF
xmlns"http//protege.stanford.edu/plugins/owl/owl
-library/heli-bunjee.owl" xmlnsrdf"http//w
ww.w3.org/1999/02/22-rdf-syntax-ns"
xmlnsrdfs"http//www.w3.org/2000/01/rdf-schema"
xmlnsowl"http//www.w3.org/2002/07/owl"
xmlnsdc"http//purl.org/dc/elements/1.1/"
xmlnstravel"http//protege.stanford.edu/plugins/
owl/owl-library/travel.owl" xmlbase"http//pr
otege.stanford.edu/plugins/owl/owl-library/heli-bu
njee.owl"gt ltowlOntology rdfabout""gt
ltowlimports rdfresource"http//protege.stanford
.edu/plugins/owl/owl-library/travel.owl"/gt
lt/owlOntologygt ltowlClass rdfID"HeliBunjeeJu
mping"gt ltrdfssubClassOf rdfresource"http//
protege.stanford.edu/plugins/owl/owl-library/trave
l.owlBunjeeJumping"/gt lt/owlClassgt
ltHeliBunjeeJumping rdfID"ManicSuperBunjee"gt
lttravelisPossibleIngt ltrdfDescription
rdfabout"http//protege.stanford.edu/plugins/owl
/owl-library/travel.owlSydney"gt
lttravelhasActivity rdfresource"ManicSuperBunje
e"/gt lt/rdfDescriptiongt
lt/travelisPossibleIngt lttravelhasContactgt
lttravelContact rdfID"MSBInc"gt
lttravelhasEmail rdfdatatype"http//www.w3.org/2
001/XMLSchemastring"gtmsb_at_manicsuperbunjee.com
lt/travelhasEmailgt lttravelhasCity
rdfdatatype"http//www.w3.org/2001/XMLSchemastr
ing"gtSydneylt/travelhasCitygt
lttravelhasStreet rdfdatatype"http//www.w3.org/
2001/XMLSchemastring"gtQueen Victoria
Stlt/travelhasStreetgt lttravelhasZipCode
rdfdatatype"http//www.w3.org/2001/XMLSchemaint
"gt1240lt/travelhasZipCodegt
lt/travelContactgt lt/travelhasContactgt
ltrdfscomment rdfdatatype"http//www.w3.org/2001
/XMLSchemastring"gtManic super bunjee now offers
nerve wrecking jumps from 300 feet right
out of a helicopter. Satisfaction
guaranteed.lt/rdfscommentgt lt/HeliBunjeeJumpinggt
lt/rdfRDFgt