Knowledge Representation Chapter 10 - PowerPoint PPT Presentation

1 / 97
About This Presentation
Title:

Knowledge Representation Chapter 10

Description:

Knowledge Representation Chapter 10 Outline KR Introduction Ontological Engineering Categories and Objects Actions, Situations, and Events Mental Events and Mental ... – PowerPoint PPT presentation

Number of Views:293
Avg rating:3.0/5.0
Slides: 98
Provided by: RobinMc7
Category:

less

Transcript and Presenter's Notes

Title: Knowledge Representation Chapter 10


1
Knowledge RepresentationChapter 10
2
Outline
  • KR Introduction
  • Ontological Engineering
  • Categories and Objects
  • Actions, Situations, and Events
  • Mental Events and Mental Objects
  • Reasoning Systems for Categories
  • Reasoning with Default Information
  • Truth Maintenance Systems
  • Bio-Ontologies

3
KR Introduction
  • General problem in Computer Science
  • Solutions Data Structures
  • words
  • arrays
  • records
  • list
  • More specific problem in AI
  • Solutions knowledge structures
  • lists
  • trees
  • procedural representations
  • logic and predicate calculus
  • rules
  • semantic nets and frames
  • scripts

4
Kinds of Knowledge
Things we need to talk about and reason about
what do we know?
  • Objects
  • Descriptions
  • Classifications
  • Events
  • Time sequence
  • Cause and effect
  • Relationships
  • Among objects
  • Between objects and events
  • Meta-knowledge

Distinguish between knowledge and its
representation
5
Representation Mappings
Reasoning Programs
Internal Representation
Facts
English Representation
  • Knowledge Level
  • Symbol Level
  • Mappings are not one-to-one
  • Never get it complete or exactly right

6
Ontological Engineering
  • Like knowledge engineering but applies to
    general-purpose knowledge bases
  • Ultimate goal is to represent everything in the
    world!!
  • Result is an upper ontology

Anything/Root
AbstractObjects
GeneralizedEvents
RepresentationalObjects
Numbers
Sets
Places
Interval
Processes
PhyscialObjects
Categories
Sentences
Measurements
Things
Moments
Stuff
Solid
Liquid
Gas
Agents
Animals
Weights
Times
Humans
7
Special- and General-purpose Ontologies
  • Special-purpose ontology
  • Designed to represent a specific domain of
    knowledge
  • genetics (GO)
  • immune system (IMGT)
  • mathematics (Tom Gruber)
  • General-purpose ontology
  • Should be applicable in any special-purpose
    domain
  • Unifies different domains of knowledge
  • Upper ontology provides highest level framework -
    all other concepts follow

8
Cyc Upper Ontology
  • Cycorp released 3,000 upper-level concepts into
    public domain
  • Cyc Upper Ontology satisfies two important
    criteria
  • It is universal Every concept can be linked to
    it
  • It is articulate Distinctions are necessary and
    sufficient for most purposes

9
Categories - Representation
  • Two choices for representation
  • Predicate
  • Basketball(b)
  • Object
  • Basketballs
  • Member(b, Basketballs)
  • Subset(Basketballs, Balls)

10
Categories - Organizing
  • Inheritance
  • All instances of the category Food are edible
  • Fruit is a subclass of Food
  • Apples is a subclass of Fruit
  • Therefore, Apples are edible
  • The Class/Subclass relationships among Food,
    Fruit and Apples is a taxonomy

11
Categories - Partitioning
  • Disjoint The categories have no members in
    common
  • Exhaustive Decomposition Every member of the
    category is included in at least one of the
    subcategories
  • Partition Disjoint exhaustive decomposition

12
Categories - Partitioning
  • Disjoint(Animals,Vegetables)

13
Categories - Partitioning
  • Disjoint(Animals,Vegetables)
  • Disjoint(s) ltgt (?c1,c2 c1?s ? c2?s ? c1?c2 ?
    Intersection(c1,c2) )

14
Categories - Partitioning
  • Disjoint(Animals,Vegetables)
  • Disjoint(s) ltgt (?c1,c2 c1?s ? c2?s ? c1?c2 ?
    Intersection(c1,c2) )
  • ExhaustiveDecomposition(Americans,Canadians,Mexic
    ans,NorthAmericans)

15
Categories - Partitioning
  • Disjoint(Animals,Vegetables)
  • Disjoint(s) ltgt (?c1,c2 c1?s ? c2?s ? c1?c2 ?
    Intersection(c1,c2) )
  • ExhaustiveDecomposition(Americans,Canadians,Mexic
    ans,NorthAmericans)
  • ExhaustiveDecomposition(s,c) ? (?i i?c ? ?c2
    c2?s ? i?c2)

16
Categories - Partitioning
  • Disjoint(Animals,Vegetables)
  • Disjoint(s) ltgt (?c1,c2 c1?s ? c2?s ? c1?c2 ?
    Intersection(c1,c2) )
  • ExhaustiveDecomposition(Americans,Canadians,Mexic
    ans,NorthAmericans)
  • ExhaustiveDecomposition(s,c) ? (?i i?c ? ?c2
    c2?s ? i?c2)
  • Partition(Males,Females,Animals)

17
Categories - Partitioning
  • Disjoint(Animals,Vegetables)
  • Disjoint(s) ltgt (?c1,c2 c1?s ? c2?s ? c1?c2 ?
    Intersection(c1,c2) )
  • ExhaustiveDecomposition(Americans,Canadians,Mexic
    ans,NorthAmericans)
  • ExhaustiveDecomposition(s,c) ? (?i i?c ? ?c2
    c2?s ? i?c2)
  • Partition(Males,Females,Animals)
  • Parition(s,c) ? Disjoint(s) ? ExhaustiveDecomposi
    tion(s,c)

18
Categories - More
  • PartOf
  • PartOf(Bucharest,Romania)
  • PartOf(Romania,EasternEurope)
  • PartOf(EasternEurope,Europe)
  • PartOf(Europe,Earth)
  • Composite Objects
  • Biped(a) ? ?c1,c2,b Leg(c1) ? Leg(c2) ? Body(b)
    ? PartOf(c1,a) ? PartOf(c2,a) ? PartOf(b,a) ?
    Attached(c1,b) ? Attached(c2,b) ? c1?c2 ? ?c3
    Leg(c3) ? PartOf(c3,a) ? (c3c1 ? c3c2)

19
Categories And More
  • Count Nouns and Mass Nouns
  • How many aardvarks? How many butters!?!
  • x ? Butter ? PartOf(y,x) ? y ? Butter
  • Intrinsic and Extrinsic Properties
  • Intrinsic properties belong to the very substance
    of the object e.g. flavor, color, density,
    boiling point, etc.
  • Extrinsic properties change if the object is
    changed (cut in half) e.g. weight, length,
    shape, etc.

20
Actions, Situations and Events
21
Situation Calculus
  • The states resulting from executing actions
  • Ontology
  • Situations logical terms describing initial
    situation and all situations that result from
    executing actions on a given situation
  • Result(a,s)
  • Fluents functions and predicates that may be
    different in different situations
  • Age(Wumpus,S0) is Wumpus age in situation S0
  • Atemporal or eternal functions and predicates
    that are constant across all situations
  • Gold(G1)

22
Situation Calculus Actions
  • Each action described by two axioms
  • Possibility Axiom
  • Preconditions ? Poss(a,s)
  • Effect Axiom
  • Poss(a,s) ? changes that result from taking
    action

23
Situation Calculus - Example
  • Possibility Axioms
  • At(Agent,x,s) ? Adjacent(x,y) ? Poss(Go(x,y),s).
  • Gold(g) ? At(Agent,x,s) ? At(g,x,s) ?
    Poss(Grab(g),s).
  • Holding(g,s) ? Poss(Release(g),s).
  • Effect Axioms
  • Poss(Go(x,y),s) ? At(Agent,y,Result(Go(x,y),s).
  • Poss(Grab(g),s) ? Holding(g,Result(Grab(g),s)).
  • Poss(Release(g),s) ? ?Holding(g,Result(Grab(g),s))
    .

24
Go for the Gold!
  • GOAL Bring the gold from 1,2 to 1,1
  • At(Agent,1,1,S0) ? At(G1,1,2,S0).
  • ?Holding(G1,S0).
  • Gold(G1).
  • Adjacent(1,1,1,2) ? Adjacent(1,2,1,1).
  • Do It
  • Go(1,1,1,2).
  • Result
  • At(Agent,1,2,Result(Go(1,1,1,2),S0)).
  • Now, can I grab the gold?
  • Grab(G1).

25
Go for the Gold!
  • GOAL Bring the gold from 1,2 to 1,1
  • At(Agent,1,1,S0) ? At(G1,1,2,S0).
  • ?Holding(G1,S0).
  • Gold(G1).
  • Adjacent(1,1,1,2) ? Adjacent(1,2,1,1).
  • Do It
  • Go(1,1,1,2).
  • Result
  • At(Agent,1,2,Result(Go(1,1,1,2),S0)).
  • Now, can I grab the gold?
  • Grab(G1).

26
Go for the Gold!
  • GOAL Bring the gold from 1,2 to 1,1
  • At(Agent,1,1,S0) ? At(G1,1,2,S0).
  • ?Holding(G1,S0).
  • Gold(G1).
  • Adjacent(1,1,1,2) ? Adjacent(1,2,1,1).
  • Do It
  • Go(1,1,1,2).
  • Result
  • At(Agent,1,2,Result(Go(1,1,1,2),S0)).
  • Now, can I grab the gold?
  • Grab(G1).

27
Go for the Gold!
  • GOAL Bring the gold from 1,2 to 1,1
  • At(Agent,1,1,S0) ? At(G1,1,2,S0).
  • ?Holding(G1,S0).
  • Gold(G1).
  • Adjacent(1,1,1,2) ? Adjacent(1,2,1,1).
  • Do It
  • Go(1,1,1,2).
  • Result
  • At(Agent,1,2,Result(Go(1,1,1,2),S0)).
  • Now, can I grab the gold?
  • Grab(G1).

28
The Frame Problem
  • Result
  • At(Agent,1,2,Result(Go(1,1,1,2),S0)).
  • Now, can I grab the Gold?
  • Grab(G1).

29
The Frame Problem
  • Result
  • At(Agent,1,2,Result(Go(1,1,1,2),S0).
  • Now, can I grab the Gold?
  • Grab(G1).
  • What in the knowledge base allows me to go from
    my Result (above) to Grab(G1)?

30
The Frame Problem
  • Result
  • At(Agent,1,2,Result(Go(1,1,1,2),S0).
  • Now, can I grab the Gold?
  • Grab(G1).
  • What in the knowledge base allows me to go from
    my Result (above) to Grab(G1)?
  • nothing

31
The Frame Problem
  • How do we represent all the things in the world
    that stay the same?
  • Represent all things at all situations the
    representational frame problem
  • Project the results of a sequence of actions the
    inferential frame problem

32
Representational Frame Problem
  • Successor-State Axiom
  • Action is possible ? (Fluent is true in result
    state ? Actions effect made it true ? It was
    true before and action left it alone).
  • Truth value of each fluent in the next state
    depends on action and truth value in the current
    state
  • Poss(a,s) ? (At(Agent,y,Result(a,s)) ? a
    Go(x,y) ? (At(Agent,y,s) ? a ? Go(y,z))).

33
Time and Event Calculus
  • Event Calculus based on points in time
  • Fluents hold at points in time as opposed to
    holding in situations
  • A fluent is true at a point in time if the
    fluent was initiated by an event at some time in
    the past and was not terminated by an intervening
    event.

34
Event Calculus
  • Initiates(e,f,t) and Terminates(w,f,t)
  • Event Calculus Axiom
  • T(f,t2) ? ?e,t Happens(e,t) ? Initiates(e,f,t)
    ? (tltt2) ? ?Clipped(f,t,t2)
  • Clipped(f,t,t2) ? ?e,t Happens(e,t1) ?
    Terminates(e,f,t1) ? (t lt t1) ? (t1 lt t2)

35
Event Calculus - more
  • Can be extended to handle
  • indirect effects
  • continuous change
  • nondeterministic effects
  • causal constraints
  • . . .

36
Generalized Events
  • Combines aspects of space and time calculus
  • Allows representation of events occurring in a
    space-time continuum
  • World War II is an event that happened in
    various geographic locations during a specific
    period of time within the 20th century.

37
Processes
  • Discrete Events the event is a whole and a part
    of the event is no longer the same event
  • Processes can include subintervals a part of a
    plane flight is still a member of the Flying
    class (aka liquid events)
  • Stated more precisely Any subinterval of a
    process is also a member of the same process
    category.

38
Intervals
  • Moment has temporal duration of zero
  • Extended Interval has temporal duration of
    greater than zero
  • Partition(Moments,ExtendedIntervals,Intervals)
  • Member(i,Moments) ? Duration(i) Seconds(0).

39
Intervals Ontology
  • Meet(i,j) ? Time(End(i)) Time(Start(j)).
  • Before(i,j) ? Time(End(i)) lt Time(Start(j)).
  • After(j,i) ? Before(i,j).
  • During(i,j) ? Time(Start(j)) ? Time(Start(i)) ?
    Time(End(i)) ? Time(End(j)).
  • Overlap(i,j) ? ?k During(k,i) ? During(k,j).

40
Mental Events and Mental Objects
  • Knowledge about beliefs, specifically about those
    beliefs held by an agent
  • Which agent knows about the geography of Maine?
  • Provides an agent the ability to reason about
    beliefs of agents
  • However, need to define propositional attitudes,
    such as Believes, Knows and Wants as relations
    where the second argument is referentially opaque
    (no substitution of equal terms)

41
Reasoning Systems for Categories
  • Categories are KR building blocks
  • Two primary systems for reasoning
  • Semantic Networks
  • Graphical aids for visualizing knowledge
  • Mechanisms for inferring properties of objects
    based on category membership
  • Description Logics
  • Formal language for constructing and combining
    category definitions
  • Algorithms for classifying objects and
    determining subsumption relationships

42
Semantic Networks
  • Graphical notation with underlying logical
    representation
  • A form of logic, but not FOL
  • Capable of representing objects, relations,
    quantification,
  • Convenient representation of inheritance
  • Multiple Inheritance (sometimes)
  • Inverse links
  • Extendable using procedural attachments

43
Semantic Networks - More
  • Can only express binary relationships making it
    more difficult to express n-ary predicates e.g.
    Fly(Shankar,NewYork,NewDelhi,Monday)
  • Negation, disjunction, nested function symbols,
    and existential quantification are missing
  • Some SNs include procedural attachments
  • Represents default values assertions may be
    overridden by more specific values

44
Semantic Networks
Mammals
SubsetOf
Persons
Legs
2
HasMother
SubsetOf
SubsetOf
Females
Males
SisterOf
Legs
Mary
1
John
45
Description Logics
  • Notations to make it easier to describe
    definitions and properties of categories
  • Taxonomic structure is organizing principle
  • Subsumption Determine if one category is a
    subset of another
  • Classification Determine the category in which
    an object belongs
  • Consistency Determine if membership criteria are
    logically satisfiable

46
Description Logics
  • CLASSIC was one of first languages (Borgida, et
    al, 1989)
  • All bachelors are unmarried adult males.
  • DL
  • Bachelor And(Unmarried,Adult,Male).
  • FOL
  • Bachelor(x) ? Unmarried(x) ? Adult(x) ? Male(x)

47
Description Logics
  • What does this DL statement say?
  • And(Man,AtLeast(3,Son), AtMost(2,Daughter),
    All(Son,And(Unemployed,Married,
    All(Spouse,Doctor))), All(Daughter,And(Professor,
    Fills(Department,Physics,Math)))).

48
Description Logics - More
  • Emphasis on tractability of inference
  • Inference happens by
  • Describe the problem instance
  • Asserting the instance into the KB to be handled
    by the subsumption apparatus
  • FOL cannot predict solution time
  • DL solve in time polynomial in size of KB
  • DLs usually lack disjuntion and negation (for
    time/speed considerations)

49
Current Description Logic
  • DAMLOIL
  • DARPA Agent Mark-up Language Ontology Inference
    Language (OIL)
  • Comes out of DARPA initiative
  • OIL from University of Manchester
  • http//www.w3.org/TR/damloil-reference
  • OWL
  • Ontology Web Language
  • A language for the semantic web
  • Next generation DAMLOIL
  • Flavors OWL-Lite, OWL-DL and OWL (full)
  • W3C recommendation as of Feb 10, 2004
  • http//www.w3.org/TR/2004/REC-owl-features-2004021
    0/

50
Reasoning with Default Information
  • Open and Closed worlds
  • Open World Information provided is not assumed
    to be complete, therefore inferences may result
    in sentences whose truth value is unknown
  • Closed World Information provided is assumed
    complete, therefore ground sentences not asserted
    to be true are assumed false
  • Negation as Failure A negative literal, not P,
    can be proved true if the proof of P fails

51
Nonmonotonic Logics Circumscription
  • Version of closed-world assumption
  • Specify predicates that are almost always false
  • Default rule stating that birds fly
  • Bird(x) ? ?Abnormal(x) ? Flies(x)
  • Abnormal() is circumscribed reasoner assumes
    ?Abnormal() unless Abnormal() is known to be true
  • Circumspection is model preference logic notion
    of preferred models in KB

52
Nonmonotonic LogicsDefault Logic
  • Default rules express contingencies
  • Bird(x) Flies(x)/Flies(x)
  • If Bird(x) is true, and Flies(x) consistent with
    KB, then conclude Flies(x) (by default)
  • Default rule form is
  • P J1, , Jn/C
  • P Prerequisite J Justifications C
    Conclusions
  • If any J is false, then C is not true

53
Truth Maintenance Systems
  • Designed to handle Belief Revision

54
Truth Maintenance Systems
  • Designed to handle Belief Revision
  • Lets say our KB contains sentence P

55
Truth Maintenance Systems
  • Designed to handle Belief Revision
  • Lets say our KB contains sentence P
  • But P is found to be incorrect/untrue

56
Truth Maintenance Systems
  • Designed to handle Belief Revision
  • Lets say our KB contains sentence P
  • But P is found to be incorrect/untrue
  • So, we want to say Tell(KB,?P)

57
Truth Maintenance Systems
  • Designed to handle Belief Revision
  • Lets say our KB contains sentence P
  • But P is found to be incorrect/untrue
  • So, we want to say Tell(KB,?P)
  • First, though, Retract(KB,P) to avoid P ? ?P

58
Truth Maintenance Systems
  • Designed to handle Belief Revision
  • Lets say our KB contains sentence P
  • But P is found to be incorrect/untrue
  • So, we want to say Tell(KB,?P)
  • First, though, Retract(KB,P) to avoid P ? ?P
  • What if P ? Q? What happens to Q?

59
Truth Maintenance Systems
  • Designed to handle Belief Revision
  • Lets say our KB contains sentence P
  • But P is found to be incorrect/untrue
  • So, we want to say Tell(KB,?P)
  • First, though, Retract(KB,P) to avoid P ? ?P
  • What if P ? Q? What happens to Q?
  • Retract Q?

60
Truth Maintenance Systems
  • Designed to handle Belief Revision
  • Lets say our KB contains sentence P
  • But P is found to be incorrect/untrue
  • So, we want to say Tell(KB,?P)
  • First, though, Retract(KB,P) to avoid P ? ?P
  • What if P ? Q? What happens to Q?
  • Retract Q?
  • But what if we also have R ? Q?

61
Truth Maintenance Systems
  • Designed to handle Belief Revision
  • Lets say our KB contains sentence P
  • But P is found to be incorrect/untrue
  • So, we want to say Tell(KB,?P)
  • First, though, Retract(KB,P) to avoid P ? ?P
  • What if P ? Q? What happens to Q?
  • Retract Q?
  • But what if we also have R ? Q?
  • Therefore

62
Truth Maintenance Systems
  • Rollback mechanism doesnt scale up
  • Justification-based Truth Maintenance System
    (JTMS)
  • Includes in the KB the set of sentences from
    which the sentence was inferred
  • Sentences are in or out, based on truth value of
    supporting sentences
  • Assumption-based Truth Maintenance System (ATMS)
  • Maintains a set of supporting sentences,
    representing all states
  • Sentence holds in just those cases where all
    assumptions in one of the assumptions sets hold

63
Justification-based TMS
  • Each sentence in KB includes all sentences that
    made it true
  • P ? Q has justification P, P ? Q
  • What if Q has the following justifications, and
    we Retract(P)?

64
Justification-based TMS
  • Each sentence in KB includes all sentences that
    made it true
  • P ? Q has justification P, P ? Q
  • What if Q has the following justifications, and
    we Retract(P)?
  • P, P ? Q

65
Justification-based TMS
  • Each sentence in KB includes all sentences that
    made it true
  • P ? Q has justification P, P ? Q
  • What if Q has the following justifications, and
    we Retract(P)?
  • P, P ? Q
  • P, P ? R ? Q

66
Justification-based TMS
  • Each sentence in KB includes all sentences that
    made it true
  • P ? Q has justification P, P ? Q
  • What if Q has the following justifications, and
    we Retract(P)?
  • P, P ? Q
  • P, P ? R ? Q
  • R, R ? P ? Q

67
Justification-based TMS
  • Each sentence in KB includes all sentences that
    made it true
  • P ? Q has justification P, P ? Q
  • What if Q has the following justifications, and
    we Retract(P)?
  • P, P ? Q
  • P, P ? R ? Q
  • R, R ? P ? Q
  • Sentences that comprise Justifications are in or
    out (not removed from KB) efficiency

68
Assumption-based TMS
  • Designed to make Belief Revision efficient
  • Represents all states at the same time
  • Each sentence in the KB has a set of assumption
    sets
  • For each sentence in the KB, the sentence holds
    when all assumptions in one of its assumption
    sets hold

69
Ontologies in PracticeThe BioOntologies
Consortium
70
Outline
  • Motivation
  • The problem
  • The solution
  • Exchange Languages Evaluation
  • Initial Evaluation
  • Second-level Evaluation
  • Conclusions/Recommendations
  • Future Work

71
Motivation
  • Explosive and uncontrolled growth of
    Bioinformation
  • It is increasingly important in the life sciences
    to integrate information across scientific
    disciplines and business areas
  • Terminology in the domain of molecular biology is
    inconsistent - information searches can be
    incomplete and inaccurate
  • Definitions and descriptions of life sciences
    objects differ among data sources - significant
    time and effort is required to integrate those
    data sources

72
What is DNA Topoisomerase?
UMLS says its gt
EC 5.99.1.2 DNA Nicking-Closing Protein DNA
Relaxing Enzyme DNA Relaxing Protein DNA
Topoisomerase DNA Topoisomerase I DNA Type 1
Topoisomerase DNA Untwisting Enzyme DNA
Untwisting Protein Omega Protein Topoisomerase
I Type I DNA Topoisomerase Nicking-closing
enzyme Relaxing enzyme Untwisting
enzyme w-Protein Swivelase
73
Motivation - Shared Ontologies
  • Ontologies in the life sciences currently exist,
    but not in a coordinated/shared manner
  • Shared ontologies provide benefits
  • sharing the work
  • database integration
  • exchange of biological data
  • developing shared understandings
  • differences can provide focus on interesting
    problems

74
The Solution Ontologies
  • An ontology is a specification of a
    conceptualization.
  • An ontology is a description of the concepts and
    relationships that can exist for an agent or a
    community of agents. ... A common ontology
    defines the vocabulary with which queries and
    assertions are exchanged among agents.
  • T.R. Gruber (1993)

75
Goals of Ontologies
  • Provide standardized vocabularies for text mining
    and information retrieval
  • Formalized ontologies are expressed in a common
    language (or a small number of languages),
    facilitating representation and exchange of
    ontological knowledge
  • Building common ontologies will establish shared
    understandings within the community ? so, create
    a consortium as a forum to develop these
    ontologies

76
Bio-Ontologies Consortium Goals
  • Enable interoperability/exchange of life sciences
    information
  • Establish a consortium for promoting and sharing
    open-source ontologies in the Life Sciences
  • Establish user community for sharing experiences
    with designing and building ontologies for the
    Life Sciences
  • Develop synergies with the Knowledge Management
    community to target tools/languages to life
    sciences ontologies
  • Create a permanent portal for the exchange of
    ontologies and ontology building tools

77
Bio-Ontologies Consortium Activity
  • Enable interoperability/exchange of life sciences
    information
  • Successful exchange depends on
  • Common, shared definitions
  • Common language to describe definitions
  • Therefore, select a language, or a small set of
    languages, for the exchange of life sciences
    ontologies

78
Select Candidate Languages (1)
  • Ontolingua
  • Long-standing effort in KR community
  • Based on work for common interchange language
  • CycL
  • Significant effort in KR community
  • Largest commercial vendor of ontological tools
  • OML/CKML
  • XML based language
  • new language, so possible to influence
    development
  • OPM
  • OO model to describe single- and multi-DB schemas
  • tool used in bioinformatic community

79
Select Candidate Languages (2)
  • XML and XML/RDF
  • Web-based language
  • Significant work going on to extend expressivity
  • UML
  • Widely used modeling tool in commercial
    marketplace
  • Based on OO concepts (supported by industry)
  • OKBC
  • API for accessing distributed Knowledge Bases
  • Current work by KR community
  • ASN.1
  • Early representation language for Bioinformatics
  • ODL
  • De facto standard for OO databases

80
Evaluation Criteria (1)
  • Language Support and Standardization
  • Does the language have a formal specification?
  • What support (documentation, tutorials, tech
    support, ) is available?
  • Does the language implement a standard? If so,
    who controls this standard?
  • Data model/capabilities
  • How rich is the expressiveness of the language,
    I.e., does the language support negation,
    conjunction, disjunction, relations, ...

81
Evaluation Criteria (2)
  • Performance
  • Scalability to real-world problems
  • Stability (languages with tools/environments)
  • Other Issues/Pragmatics
  • Current users of the language
  • Domains in which the language has been applied
  • Connection to data sources (knowledge sources
    storage formats (relational, OO, ))

82
Initial Evaluation - Results
  • Keys to acceptance
  • Rich expressive power
  • Stability and history of use
  • Approachable/understandable syntax
  • Open to collaboration
  • Keys to non-acceptance
  • Proprietary language
  • Wedded to a commercial system

83
Initial Evaluation - Results
84
Next Level Evaluation
  • Two languages stood out as strong candidates
  • Ontolingua
  • OML/CKML
  • Conduct experiments to represent biological
    entities
  • select two life sciences ontologies
  • Ecocyc Gene Ontology
  • GeneClinics data model/ontology
  • represent each ontology in both Ontolingua and OML

85
Gene Ontology - Ontolingua (1)
(DEFINE-CLASS Genes (?X) "The class of all
genes is divided into several subclasses. Genes
whose function is unknown or known only
approximately are grouped into the classes ORFs
and Unclassified-Genes, respectively. Genes of
known function have been classified using two
orthogonal classification schemes developed by
Monica Riley. One scheme classifies genes
according to the physiological role of their
product class (Physiological-Roles) the other
scheme classifies genes according to the function
of their product, such as enzymes and transport
proteins (Product-Types). DEF (AND
(DNA-Segments ?X))) ?VALUE))) (DEFINE-FU
NCTION CENTISOME-POSITION (?FRAME) -gt
?VALUE "This slot lists the map position of this
gene on the chromosome in centisome units. DEF
(AND (Genes ?FRAME) (NUMBER ?VALUE))) (DEFINE-R
ELATION CITATIONS (?FRAME ?VALUE) "This slot
lists general citations pertaining to the object
containing the slot. Each value of the slot is a
citation of the form reference-id. DEF (AND
(Organisms ?FRAME) (STRING ?VALUE))) (DEFINE-RE
LATION COMMENT (?FRAME ?VALUE) "The Comment slot
stores a general comment about the object that
contains the slot. DEF (AND (THING ?FRAME)
(STRING ?VALUE))) (DEFINE-FUNCTION COMMON-NAME
(?FRAME) -gt ?VALUE "The primary name by which
an object is known to scientists -- a widely used
and familiar name (in some cases arbitrary
choices must be made). DEF (AND (Organisms
?FRAME) (STRING ?VALUE))) (DEFINE-RELATION
EVIDENCE (?FRAME ?VALUE) "Describes evidence for
the defined function of this object. Currently we
distinguish between function that is determined
experimentally, and function that is determined
through computational sequence analysis. DEF
(AND (Genes ?FRAME) ((ONE-OF EXPERIMENT
SEQUENCE-ANALYSIS) ?VALUE)))
86
Gene Ontology - Ontolingua (2)
(DEFINE-RELATION HISTORY (?FRAME
?VALUE) "Contains a textual history of changes
made to this frame. Each item is either a string
or a note frame." DEF (AND (THING ?FRAME) ((OR
STRING Notes) ?VALUE))) (DEFINE-FUNCTION
INTERRUPTED? (?FRAME) -gt ?VALUE "The value of
this slot is T for genes that are interrupted,
i.e., those that have an early stop codon
inserted. DEF (AND (Genes ?FRAME) (BOOLEAN
?VALUE))) (DEFINE-FUNCTION LEFT-END-POSITION
(?FRAME) -gt ?VALUE DEF (AND (DNA-Segments
?FRAME) (NUMBER ?VALUE))) (DEFINE-RELATION
PRODUCT (?FRAME ?VALUE) "This slot lists the
product of a gene, which could be a polypeptide
or a tRNA. Multiple products will be recorded in
the case that several chemically modified forms
of the protein product exist. " DEF (AND
(Genes ?FRAME) ((OR Polypeptides RNA)
?VALUE))) (DEFINE-RELATION PRODUCT-STRING
(?FRAME ?VALUE) "This slot holds a text string
that describes the product of this gene this
slot is only used when EcoCyc does not describe
the gene product as a frame (such as a
polypeptide frame). DEF (AND (Genes ?FRAME)
(STRING ?VALUE))) (DEFINE-RELATION PRODUCT-TYPES
(?FRAME ?VALUE) "Describes the type of the gene
product, e.g., is it an enzyme, an RNA,
etc. DEF (AND (Genes ?FRAME) ((ONE-OF
ENZYME REGULATOR LEADER MEMBRANE TRANSPORT
STRUCTURAL RNA PHENOTYPE FACTOR
CARRIER) ?VALUE)))
87
Gene Ontology - Ontolingua (3)
(DEFINE-FUNCTION RIGHT-END-POSITION (?FRAME) -gt
?VALUE DEF (AND (DNA-Segments ?FRAME)
(NUMBER ?VALUE))) (DEFINE-RELATION SYNONYMS
(?FRAME ?VALUE) "One or more secondary names for
an object -- names that a scientist might attempt
to use to retrieve the object. The Synonyms
should include any name a user might use to try
to retrieve an object. DEF (AND
(Generalized-Reactions ?FRAME) (STRING
?VALUE))) (DEFINE-FUNCTION TRANSCRIPTION-DIRECTIO
N (?FRAME) -gt ?VALUE "This slot specifies the
direction along the chromosome in which this gene
is transcribed allowable values are or -."
DEF (AND (DNA ?FRAME)
((ONE-OF "" "-") ?VALUE)))
88
Gene Ontology - OML/CKML (1)
ltCKMLgt ltOntology id"Riley's Gene Classes"
version"1.0"gt ltcommentgt This OML ontology
defines an encoding of the gene
classification system developed by Monica Riley.
lt/commentgt ltextends ontology"http//www.ck
ml.org/ontology/" prefix"CKML"/gt ltObject
type"Genes"gt ltcommentgt The class of all
genes is divided into several subclasses. Genes
whose function is unknown or known only
approximately are grouped into the classes ORFs
and Unclassified-Genes, respectively. Genes of
known function have been classified using two
orthogonal classification schemes developed by
Monica Riley. One scheme classifies genes
according to the physiological role of their
product class (Physiological-Roles) the other
scheme classifies genes according to the function
of their product, such as enzymes and transport
proteins (Product-Types). lt/commentgt
lt/Objectgt ltFunction type"LEFT-END-POSITION"
srcType"Genes" tgtType"data.Real"/gt
ltFunction type"INTERRUPTED?" srcType"Genes"
tgtType"data.Boolean"gt ltcommentgt The value
of this slot is T for genes that are interrupted,
i.e., those that have an early stop codon
inserted. lt/commentgt lt/Functiongt
ltBinaryRelation type"HISTORY" srcType"CKMLObjec
t" tgtType"data.String"gt ltcommentgt
Contains a textual history of changes made to
this frame. Each item is either a string or a
note frame. lt/commentgt lt/BinaryRelationgt
ltTheory genus"Evidence"gt ltObject
type"EXPERIMENT"/gt ltObject
type"SEQUENCE-ANALYSIS"/gt lt/Theorygt
89
Gene Ontology - OML/CKML (2)
ltBinaryRelation type"EVIDENCE" srcType"Genes"
tgtType"Evidence"gt ltcommentgt Describes
evidence for the defined function of this object.
Currently we distinguish between function that is
determined experimentally, and function that is
determined through computational sequence
analysis. lt/commentgt lt/BinaryRelationgt
ltFunction type"CENTISOME-POSITION"
srcType"Genes" tgtType"data.Real"gt
ltcommentgt This slot lists the map position of
this gene on the chromosome in centisome units.
lt/commentgt lt/Functiongt ltBinaryRelation
type"CITATIONS" srcType"CKMLObject"
tgtType"data.String"gt ltcommentgt This slot
lists general citations pertaining to the object
containing the slot. Each value of the slot is a
citation of the form reference-id. lt/commentgt
lt/BinaryRelationgt ltBinaryRelation
type"COMMENT" srcType"CKMLObject"
tgtType"data.String"gt ltcommentgt The
Comment slot stores a general comment about the
object that contains the slot. lt/commentgt
lt/BinaryRelationgt ltFunction
type"COMMON-NAME" srcType"CKMLObject"
tgtType"data.String"gt ltcommentgt The
primary name by which an object is known to
scientists -- a widely used and familiar name (in
some cases arbitrary choices must be made).
lt/commentgt lt/Functiongt ltTheory
genus"Transcription-Direction"gt ltObject
type""/gt ltObject type"-"/gt
lt/Theorygt ltFunction type"TRANSCRIPTION-DIRECT
ION" srcType"Genes" tgtType"Transcription-Direct
ion"gt ltcommentgt This slot specifies the
direction along the chromosome in which this gene
is transcribed allowable values are or -.
lt/commentgt lt/Functiongt ltBinaryRelation
type"PRODUCT" srcType"Genes" tgtType"Polypeptid
es"/gt lt/BinaryRelationgt
90
Gene Ontology - OML/CKML (3)
ltBinaryRelation type"SYNONYMS"
srcType"CKMLObject" tgtType"data.String"gt
ltcommentgt One or more secondary names for an
object -- names that a scientist might attempt to
use to retrieve the object. The Synonyms should
include any name a user might use to try to
retrieve an object. lt/commentgt
lt/BinaryRelationgt ltBinaryRelation
type"PRODUCT-STRING" srcType"Genes"
tgtType"data.String"gt ltcommentgt This slot
holds a text string that describes the product of
this gene this slot is only used when EcoCyc
does not describe the gene product as a frame
(such as a polypeptide frame). lt/commentgt
lt/BinaryRelationgt ltTheory genus"Product-Types
"gt ltObject type"ENZYME"/gt ltObject
type"REGULATOR"/gt ltObject type"LEADER"/gt
ltObject type"MEMBRANE"/gt ltObject
type"TRANSPORT"/gt ltObject
type"STRUCTURAL"/gt ltObject type"RNA"/gt
ltObject type"PHENOTYPE"/gt ltObject
type"FACTOR"/gt ltObject type"CARRIER"/gt
lt/Theorygt ltBinaryRelation type"PRODUCT-TYPES
" srcType"Genes" tgtType"Product-Types"gt
ltcommentgt Describes the type of the gene product,
e.g., is it an enzyme, an RNA, etc. lt/commentgt
lt/BinaryRelationgt ltFunction
type"RIGHT-END-POSITION" srcType"Genes"
tgtType"data.Real"/gt
91
Gene Ontology - OML/CKML (4)
ltCollection.Objectgt ltGenes id"EG10707"
text"pheA"gt ltLEFT-END-POSITION
tgt"2735765"/gt ltCENTISOME-POSITION
tgt"58.97035d0"/gt ltTRANSCRIPTION-DIRECTIO
N tgt""/gt ltRIGHT-END-POSITION
tgt"2736925"/gt lt/Genesgt
lt/Collection.Objectgt ltCollection.BinaryRelatio
ngt ltEVIDENCE src"EG10707"
tgt"EXPERIMENT"/gt ltNAMES src"EG10707"
tgt"pheA"/gt ltNAMES src"EG10707"
tgt"b2599"/gt ltPRODUCT src"EG10707"
tgt"CHORISMUTPREPHENDEHYDRAT-MONOMER"/gt
ltPRODUCT-STRING src"EG10707" tgt"chorismate
mutase-P and prephenate dehydratase"/gt
lt/Collection.BinaryRelationgt
92
Experiments - Results (Ecocyc)
  • OML representation
  • OMLs expressive capabilities captured most
    aspects of gene ontology
  • some limitations in expressive capability no
    facets, cardinality or multiple collection types
  • terminology differences and definitions not
    modular
  • Ontolingua representation
  • Ontolingua expressed all of gene ontology
  • Lisp syntax of Ontolingua not readily approachable

93
Experiments - Results (GeneClinics)
  • OML representation
  • Expressive capabilities adequate to the job
  • OML/CKML is based on conceptual graphs and may
    have more expressive capabilities in the long
    term
  • Ontolingua representation
  • Ontolingua based on frames semantics which more
    closely aligns with relational and OO data models
  • Lisp syntax not acceptable to larger community
  • Both languages would benefit from life sciences
    examples

94
Conclusions and Recommendations
  • The language most suitable for the exchange of
    life sciences ontologies should have the
    following key characteristics
  • Frame-based representation
  • Long history of work with frame-based
    representation model
  • Mappings between this model and relational and/or
    OO data sources are easily expressed
  • XML-based syntax
  • Critical for exchange among physically dispersed
    community
  • New tools being developed in XML community
  • Lots of momentum in the web-based community

95
Current Efforts
  • Developed specification for an XML-based exchange
    language XOL (XML Ontology Language) based on
    Ontolingua (Karp/Chaudhri)
  • Frame-based semantics for OML/CKML
  • Developing process for submission of life
    sciences ontologies to the Bio-Ontologies
    Consortium

96
Other Ontology Efforts
  • Gene Ontology Consortium (http//genome-www.stanfo
    rd.edu/GO/)
  • BioPathways Consortium (http//www.3rdmill.com/Bio
    Pathways)
  • mmCIF (http//ndbserver.rutgers.edu/mmcif)

97
Bio-Ontologies Consortium - Future Work
  • Content development
  • Elicit and review ontology submissions
  • Synergies with OMG
  • Provide public-domain ontologies to the Life
    Sciences community and encourage use of those
    ontologies
  • Bio-Ontologies 2000
Write a Comment
User Comments (0)
About PowerShow.com