Dr. Douglas B. Lenat - PowerPoint PPT Presentation

About This Presentation
Title:

Dr. Douglas B. Lenat

Description:

CYC: Software that partially understands The impact of that on the AIM dream Computers versus Common Sense Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 – PowerPoint PPT presentation

Number of Views:292
Avg rating:3.0/5.0
Slides: 83
Provided by: Len4
Category:

less

Transcript and Presenter's Notes

Title: Dr. Douglas B. Lenat


1
  • CYC Software that partially understands
  • The impact of that on the AIM dream

Computers versus Common Sense
  • Dr. Douglas B. Lenat
  • , 7718 Wood Hollow Drive, Austin,
    TX 78731
  • Email Lenat_at_cyc.com
    Phone 512-342-4001

2
ELIZA (DOCTOR)
  • 1965 Joe Weizenbaum, MIT
  • Carl Rogers advocated reflection
  • Patient I swear a lot.
  • How do you feel about the fact that you swear a
    lot?
  • Counted a lot on anthropomorphism
  • Count on patient-to-doctor respect/awe/

Patient My dogs mother died recently. Tell
me more about your mother.
Veneer of intelligence, not intelligence
3
  • ALICE (40 years post-Eliza)
  • JUDGE119260I WILL ASK YOU SOME SIMPLE
    QUESTIONS AND GIVE SIMPLE COMMANDS
  • PROGRAM9260Do you think your plan will
    succeed?
  • JUDGE1192616WHAT IS THE COLOR OF A BLUE CAR
  • PROGRAM92616WHAT COLOR IS a blue car
  • JUDGE1192635WHERE IS SUES NOSE WHEN SUE IS
    IN HER HOUSE
  • PROGRAM92635Where it belongs.
  • Try searching the Web.

4
(No Transcript)
5
(No Transcript)
6
2009 Software is still Brittle
Is the Space Needle taller than the Eiffel
Tower?
How old was Martin van Buren when he was
elected President of the U.S.?
7
Natural Language Understanding requires having
lots of knowledge
  • The pen is in the box.
  • The box is in the pen.
  • 2. The police watched the demonstrators because
    they feared violence.
  • The police watched the demonstrators because
    because they advocated violence.
  • 3. Mary and Sue are sisters.
  • Mary and Sue are mothers.
  • 4. Every American has a mother.
  • Every American has a president.
  • 5. John saw his brother skiing on TV. The fool
    didnt have a coat on!
  • John saw his brother skiing on TV. The fool
    didnt recognize him!

8
  • 7. include all the re-do CABG procedures
    utilizing ITA and SVG in 1991.
  • And usually does mean and. But in this
    query, and really must mean or. Medical
    knowledge, not grammar, disambiguates this a
    single CABG will not have both an ITA and a SVG.
  • 8. that the tumor cells are stopping dividing
    or dying
  • Do they mean stopping dividing or stopping
    dying? Of course not, but in 16 of 30
    randomly selected syntactically similar
    constructions from www.clinicaltrials.gov, the
    coordination (i.e., the wider scope of the
    modifier, in this case the word stopping) was
    the intended meaning. In each case, only one
    choice makes sense (is consistent with medical
    knowledge and common sense).
  • 9. Adult patients who underwent MAZE III with or
    without Mitral Valve Repair or Replacements.
  • Is the second half of that query just a waste of
    space? Discourse pragmatics says no, the
    physician must have had some reason for saying
    that. Medical knowledge provides a plausible
    interpretation Adult patients who underwent
    MAZE III with no concomitant procedures other
    than Mitral Valve Repair or Replacements

9
Okay, so lets tell the computer the same sorts
of things that human beings know about cars, and
colors, heights, movies, time, driving to a
place, etc.
? all the other stuff that everybody knows.
  • The basic idea
  • Get the computer to understand, not just store,
    information. Then it can reason to answer your
    queries.

2 July 2005
10
MicrowaveOven is a type of Kitchen-Appliance Dishw
asher is a type of Kitchen-Appliance
  • The basic idea
  • Get the computer to understand, not just store,
    information. Then it can reason to answer your
    queries.

2 July 2005
11
You cant use X if it alorxes Y but lacks any Y
Rthagide-disjaks is a type of Kitchen-Appliance Gr
acinimumples is a type of Kitchen-Appliance Rthagi
de-disjaks alorxes Vorawnistz. Gracinimumples
alorxes Vorawnistz and Buzqa. Buzqa is a Thwarn
and supplied through Epluns.
2 July 2005
12
etc. ? all the other stuff that everybody knows.
Eventually, after writing millions of these
rules, the system knows as much about pipes,
liquids, water, electricity, microwave ovens,
dishwashers, cars, colors, movies, heights, etc.
as you and I do.
Ultimately, there is just 1 interpretation of
that model, and it corresponds to the real world.
Long before that, incrementally, the system
gains competence and trustworthiness
  • The basic idea
  • Get the computer to understand, not just store,
    information. Then it can reason to answer your
    queries.

2 July 2005
13
Cyc is
  • The typical bird has 1 beak, 1 heart, lots of
    feathers,
  • Hearts are internal organs feathers are external
    protrusions
  • Most vehicles are steered by an awake, sane,
    adult, human
  • Tangible objects cant be in 2 (disjoint) places
    at once
  • Badly injuring a child is much worse than killing
    a dog
  • Causes temporally precede (i.e., start before)
    their effects
  • A stabbing requires 2 cotemporal and proximate
    actors
  • etc.

14
Cyc is
  • Each of these represented in formal logic
  • Info. about a set of hundreds of thousands of
    terms
  • Language-independent

ChineseWordForWritingPen
15
Cyc is
16
What Needs to be Shared?
  • bits/bytes/streams/network
  • alphabet, special characters,
  • words, morphological variants,
  • syntactic meta-level markups (HTML)
  • semantic meta-level markups (SGML, XML)
  • content (logical representation of doc/page/...)
  • context (common sense, recent utterances, and n
    dimensions of metadata time, space, level of
    granularity, the sources purpose, etc.)

Sem. Web
17
How formalized knowledge helps search
(ForAll ?P (ForAll ?C        (implies (and
           (isa ?P Person)           
(children ?P ?C))        (loves ?P ?C))))
When you become happy, you smile. You become
happy when someone you love accomplishes a
milestone. Taking ones first step is a
milestone. Parents love their children.
  • Query Someone smiling

find information by inference (KB)
  • Caption A man helping his daughter take
    her first step

.
18
How formalized knowledge helps search
  • Query Show me pictures of strong and
    adventurous people
  • Caption A man climbing a rock face

find information by inference (KB)
19
How formalized knowledge helps search
  • Query Government buildings damaged in
    terrorist events in Beirut between 1990 and 2001
  • Document 1993 pipe bombing of Frances embassy
    in Lebanon.

Text Document
find information by inference (KB)
20
How can our programs be intelligent, not merely
have the veneer of it?
  • ANSWER By having a large corpus of knowledge,
    spanning the gamut from specific domain-dependent
    all the way up to general common sense.
  • The computer needs to be able to apply the
    knowledge, not just store some English gloss
  • Represent it formally (predicate calculus), and
    apply logic
  • Represent it numerically, and apply
    mathematics/statistics
  • And after all that Be compelling to the
    human deciding

21
One Good Explanation is worth 20 points of IQ
  • Magic tricks
  • How do they do that?! ? How was I ever
    fooled by that?!
  • Efficacy of punishment vs reward
  • Punishment is more effective, and the statistics
    back me up
  • Clinical decision-making (by doctors and by
    patients)
  • Because 0.814 versus Because lt plausible
    causal rationale gt
  • Organ donation in European countries
  • Why is it so often 15/85 or 85/15 ?
  • Answer Because when you apply for a drivers
    license in some countries, you have to check a
    box to opt in in others, you have to check a
    box to opt out and in the U.S. and most
    European countries at least, 85 of the people
    dont know what they should do, even though its
    an emotional, serious choice, and end up just
    leaving it unchecked.
  • And after all that Be compelling to the
    human deciding

22
Reflection Framing Effect
Philadelphia is preparing for a Legionaires
Disease outbreak expected to kill 600 people
today. Two alternative programs to combat the
disease have been proposed. The consequences
of each program are as follows

If Program A is adopted, 200 people will be
saved. (72) If Program B is adopted, there is
a 1/3 chance that all 600 will be saved, and a
2/3 chance that no lives will be saved.
(28)
If Program A is adopted, 400 people will die.
(22) If Program B is adopted, there
is a 2/3 chance that 600 will die, and a 1/3
chance that no one will die.
(78)
For more information, see Kahneman, D. and
Tversky, A. (1984). Choices, values, and
frames. American Psychologist, 39, 341-350.
23
Conjunction Fallacy
  • A health survey was conducted in a
    representative sample of adult males in Chicago
    of all ages and occupations. Mr. F was included
    in the sample. He was selected by random chance
    from the list of participants.
  • Please rank the following statements in terms of
    which is most likely to be true of Mr. F. (1more
    likely to be true, 6least likely)
  • ____ Mr. F smokes more than 1 cigarette per day
    on average.
  • ____ Mr. F has had one or more heart attacks.
    A
  • ____ Mr. F had a flu shot this year.
    A and B
  • ____ Mr. F eats red meat at least once per week.
  • ____ Mr. F has had one or more heart attacks and
    he is over 55 years old.
  • ____ Mr. F never flosses his teeth.

58 rated A and B more likely than A
For more information, see Tversky, A. and
Kahneman, D. (1983). Extensional vs. intui-tive
reasoning The conjunction fallacy in
probability judgment. Psych.Rev. 90, 293-315.
24
Why there is a need for meta-logical elements
(rationale and POV) to convince decision-makers
  • Early hominids pre-rational decision-makers
  • Later hominids usually rational
  • Even later hominids almost always rational

YOU ARE HERE
25
  • A 67 year old woman suffering from ICM with
    elevated bilirubin, history of diabetes, body
    mass index of 39.5, NYHA function class III,
    mitral valve regurgitation grade (MVRG) of 2,
    and no aortic valve regurgitation (AVR) is
    assigned to CABG surgery.  RFCyc is consulted
    and the RF (random forest statistical reasoning)
    component, having been trained on a large
    database, identifies CABG alone as the most
    likely treatment option, citing an odds ratio of
    2.6 over the next most favorable treatment,
    CABGMVA. As rationale, the Cyc (AI) component
    observes that the low MVRG is atypical of MVA
    which is a surgical procedure typically reserved
    for patients with severe mitral regurgitation and
    thus the simpler CABG procedure is preferred. 
    However, an intraoperative transesophageal
    echocardiogram (TEE) suggests MVRG is 3. Based
    on this, the surgical team overrides the initial
    diagnosis without consultation, opting instead
    for CABGMVA.  The patient dies 3 days later from
    complications due to surgery.  
  •   In this setting, RFCyc, if consulted, could
    have alerted the heart team to additional data
    that might have swayed their decision, thus
    potentially saving a life. RFCyc would have
    noted that while an MVRG of 3 is consistent with
    CABGMVA, the odds favoring CABG only marginally
    decrease from 2.61 to 1.71 when MVRG is
    upstaged for this patient from 2 to 3, and that
    surgery under CABG alone offers a 20 increase in
    median survival compared to CABGMVA.  RFCyc
    could further argue that intraoperative MVRG can
    falsely appear to be upstaged due to altered
    hemodynamics in anesthetized patients.  An
    Cyc-assisted semantic search of the recent
    literature reveals that transesophageal
    transthoracic echocardiograms (TTE) more reliably
    reflect the degree of mitral regurgitation than
    TEE. That (co-morbidities) argues for just
    CABG. 

26
4 Pitfalls of Semantic Technology
  • Ignorance-based A small theory size (terms,
    instances, rules)
  • Static KB (massively tuned, optimized, cached
    ahead of time)
  • Simple assertions (SAT constraints
    propositional calculus Horn clause logic
    Description Logic first order logic)
  • 1 global context (no contradic.s, tiny domain,
    simplified world)

27

Applying Cyc
  • Cyc is a power source, not a single application.
  • Like oil, electricity, telephony, computers,
    Cyc can spawn and sustain a knowledge utility
    industry.
  • It can cost-effectively underlie almost all apps.
  • (Provide a common-sense layer to reduce
    brittleness when faced with unexpected
    inputs/situations)
  • To apply Cyc, we extend its ontology, its KB, and
    possibly its suite of specialized reasoning
    modules

28
The Analysts Knowledge Base
CT Analyst
"What sequences of events could lead to the
destruction of Hoover Dam?"
Were there any attacks on targets of symbolic
value to Muslims since 1987 on a Christian holy
day?"


Domain Experts
Scenario
Explanation
Query
Scenario
Explanation
Query
Generation
Generation
Formulation
Generator
Generator
Formulator
Others/GOTS
Cycorp Tools For Ontology-Building, -Browsing,
-Editing, Fact/Rule Entry
Analysis and
General Knowledge
Collaboration
Components
Terrorism Knowledge
AKB
OWL
Relational DB projection of the AKB
29
A more recent example
  • What major US cities are particularly vulnerable
    to an anthrax attack?
  • The answer is logically implied by data
    dispersed through several sources

30
What major US cities are particularly
vulnerable to an anthrax attack?
  • major US city ? ?C is a U.S. City with gt1M
    population
  • particularly vulnerable to an anthrax attack ?
  • the current ambient temperature at ?C is above
    freezing, and
  • ?C has more than 100 people for each hospital
    bed, and
  • the number of anthrax host animals near ?C
    exceeds 100k

31
 state          name          type      
county     state_fips   ---------------------
-------------------------------------------- 
TX    Dallas                ppl  
Dallas                  48  MN    Hennepin
County       civil Hennepin               
27     CA    Sacramento County     civil
Sacramento               6     AZ   
Phoenix               ppl   Maricopa      
          4    primary_lat primary_long
elevation population     status     
-----------------------------------------------
-------------------  32.78333        -96.8
       463     1022830 BGN 1978 1959  
45.01667       -93.45          0     1032431
  38.46667   -121.31667          0    
1041219   33.44833   -112.07333       1072
    1048949 BGN 1931 1900 1897
The Geographic Names Information System (GNIS) DB
maintained by the US Geological Survey (USGS).
32
  • So how do we explain to our system that
  • row 1 of that table is about the city of
    Dallas, TX
  • the population field of that table contains the
    number of inhabitants of the city that that row
    is about
  • here is exactly how to access tuples of that
    database
  • that access will be fast, accurate, recent,
    complete

The Geographic Names Information System (GNIS) DB
maintained by the US Geological Survey (USGS).
33
  • the population field of that table contains the
    number of inhabitants of the city that that row
    is about
  • We provide the field encodings and decodings,
    some of which correspond to explicit fields like
    population, two-letter state codes, etc

(fieldDecoding Usgs-Gnis-LS ?x       
(TheFieldCalled population)
(numberOfInhabitants (TheReferentOfTheRow
Usgs-Gnis) ?x))
The Geographic Names Information System (GNIS) DB
maintained by the US Geological Survey (USGS).
34
  • how to access tuples of that database
  • We provide all the information needed for a JDBC
    connection script
  • We assert, in the context (MappingMtFn Usgs-KS),
    all of these

(passwordForSKS Usgs-KS "geografy") (portNumberFor
SKS Usgs-KS 4032) (serverOfSKS Usgs-KS
"sksi.cyc.com") (sqlProgramForSKS Usgs-KS
PostgreSQL) (structuredKnowledgeSourceName
Usgs-KS "usgs") (subProtocolForSKS Usgs-KS
"postgresql") (userNameForSKS "sksi")
The Geographic Names Information System (GNIS) DB
maintained by the US Geological Survey (USGS).
35
  • that access will be fast, accurate, recent,
    complete
  • We provide meta-level assertions about the
    database, about each table of the database, about
    the completeness etc. of various kinds of data in
    the DB, etc.
  • We assert, in the context (MappingMtFn Usgs-KS)

(schemaCompleteExtentKnownForValueTypeInArg
Usgs-Gnis-LS USCity numberOfInhabitants
1)
The Geographic Names Information System (GNIS) DB
maintained by the US Geological Survey (USGS).
36
  • that access will be fast, accurate, recent,
    complete
  • We provide meta-level assertions about the
    database, about each table of the database, about
    the completeness etc. of various kinds of data in
    the DB, etc.
  • We assert, in the context (MappingMtFn Usgs-KS)

(resultSetCardinality Usgs-Gnis-PS       
(TheSet (PhysicalFieldFn Usgs-Gnis-PS
"state")) TheEmptySet
60.0)(resultSetCardinality Usgs-Gnis-PS       
(TheSet            (PhysicalFieldFn Usgs-Gnis-PS
"primary_long")            (PhysicalFieldFn
Usgs-Gnis-PS "primary_lat")           
(PhysicalFieldFn Usgs-Gnis-PS "name"))       
(TheSet            (PhysicalFieldFn Usgs-Gnis-PS
"county")            (PhysicalFieldFn
Usgs-Gnis-PS "state")) 530.36)
The Geographic Names Information System (GNIS) DB
maintained by the US Geological Survey (USGS).
37
What major US cities are particularly
vulnerable to an anthrax attack?
  • major US city ? U.S. City with gt1M population
  • particularly vulnerable to an anthrax attack ?
  • the current ambient temperature at ?C is above
    freezing, and
  • ?C has more than 100 people for each hospital
    bed, and
  • the number of anthrax host animals near ?C
    exceeds 100k

Cyc knows that pullets are chickens, so dont add
those two numbers together!
38
(No Transcript)
39
(No Transcript)
40
(No Transcript)
41
(No Transcript)
42
(No Transcript)
43
Even simple queries often require 1-4 reasoning
steps
In what countries bordering Pakistan are there
members of the ANVC?
  • Each answer that CAE finds for this generally
    involves a 1-4-step (not 0-step) argument
    (reasoning chain)
  • E.g., for the answer India, the justification
    is
  • According to the web site Inside Terrorism,
    the ANVCs headquarters has been in Garo Hills,
    India from the beginning of January, 1996 through
    today.
  • If an organizations HQ is in place x, then
    there are members of that organization in place
    x.
  • If someone is in place x, they are in every
    super-region of x.
  • India borders Pakistan.

Dont include Prior Tacit Knowledge
44
The Cyc Knowledge Base
  • Represented in
  • First Order Logic
  • Higher Order Logic
  • Context Logic
  • Micro-theories

Cyc contains 15,000 Predicates 500,000 Concept
s 5,200,000 Assertions
These numbers are not a good way to really get a
handle on the Cyc KB
General Knowledge about Various Domains
Specific data, facts, and observations


45
The Cyc Knowledge Base
Cyc contains 15,000 Predicates 500,000 Concept
s 5,200,000 Assertions
Is any seagull also a moose? If Cyc knows
10,000 kinds of animals, it should be able to
answer 100,000,000 queries like that. Option 1
Add those 100M assertions to the KB Option 2 Add
50M disjointWith assertions instead Option 3 Add
about 10k Linnaean taxonomy assertions to the KB,
plus one extra assertion (isa BiologicalTaxon
SiblingDisjointCollectionType) If taxons A and B
are not explicitly known (via those 10k
assertions) to be in a subset/superset
relationship, then assume that they are disjoint.
These numbers are not a good way to really get a
handle on the Cyc KB
A few hundred such SiblingDisjoint assertions
take the place of over 6 billion disjointness
ones
which in turn take the place of 100 trillion ones
like this (not (isa Cher Moose))
46
There is no one correct monolithic ontology.
E.g., Cycs 5M axioms are divided into thousands
of contexts by granularity, topic, culture,
geospatial place, time,...
There is a correct monolithic reasoning
mechanism, but it is so deadly slow that we never
call on it unless we have to
E.g., the Cyc inference engine is a community of
1000 agents that attack every problem and,
recursively, every subproblem (subgoal). One
of these 1000 is a general theorem prover the
others have special-purpose data
structures/algorithms to handle the most
important, most common cases, very fast.
47
What factors argue ltfor/againstgt the conclusion
that ltETAgt ltperformedgt ltthe March 2004 Madrid
attacksgt?
48
Building Cyc qua Engineering Task
learning by discovery
learning via natural language
1984
2004
today
codify enter each piece of knowledge, by hand
CYC
900 person-years 23 realtime years 90 million
49
(No Transcript)
50
Temporal Relations
37 Relations Between Temporal Things
  • temporalBoundsContain
  • temporalBoundsIdentical
  • startsDuring
  • overlapsStart
  • startingPoint
  • simultaneousWith
  • after
  • temporalBoundsIntersect
  • temporallyIntersects
  • startsAfterStartingOf
  • endsAfterEndingOf
  • startingDate
  • temporallyContains
  • temporallyCooriginating

51
Temporal Relations
Ariel Sharon was in Jerusalem during 2005 with
granularity calendar-week
Condoleezza Rice made a ten-day trip to
Jerusalem in February of 2005
52
  • Rather than struggling to reason in natural
    language sentences, use logic as the
    representation language.
  • Most knowledge is default reason by
    argumentation
  • Rather than striving in vain for a single fast
    inference engine, use a suite of 1000 heuristic
    modules that each handles a class of
    commonly-occurring problems very fast. EL ??HL
    split
  • Some of these HL modules act as tacticians
    (meta-reasoners) to guide the reasoning a few
    are strategists (meta-meta-reasoners)
  • Bridging the knowledge gap do the intermediate
    theories.
  • Probabilities / certainty factors are useful
    (risk overdependence)
  • Rather than striving in vain for a monolithic
    consistent KB, divide the KB up into many
    locally-consistent contexts

53
Each assertion should be situated in a context
in a region of context-space
  • We identified 12 dimensions of mt-space
  • We developed a vocabulary of predicates and terms
    to describe points and regions along each of
    those 12 dimensions and
  • We have been situating assertions more and more
    precisely, and we have been working out calculi
    for inferring contexts
  • E.g., if P is true in C1, and PgtQ is true in C2,
    in what context C2 can Q be validly concluded?
  • Anthropacity
  • Time
  • GeoLocation
  • TypeOfPlace
  • TypeOfTime
  • Culture
  • Sophistication/Security
  • Topic
  • Granularity
  • Modality/Disposition /Epistemology
  • Argument-Preference
  • Justification

54
Mathematical Factoring of Context-space Dimensions
There are at least 900,000 doctors.
This inference depends on the time, space,
and respective granularities of the contexts.
LehighCountyInFebruary1985Context Dick
Thornburgh is governor and Ronald Reagan is
president.
Dick Thornburgh is governor and there
are at least 900,000 doctors.
55
Time Indices and Granularities
Doug is talking, at 1400-1500, on 4 May 2009.
56
Time Indices and Granularities
Doug is talking, at 1400 to 1500, on 4 May
2009 with temporal granularity 1 calendar minute
P Doug is talking.
Calendar Minutes
t that two-hour interval t a continuous
15-min. sub-interval
Future
t
Past
t





So Talking during each 15-minute interval?
Yes Talking during each 2-second interval
Unknown
57
Relations Between an Event and its Participants
Over 400 more.
58
In In Our Geospatial Ontology
  • We started in 1984 with just one binary
    predicate, in.
  • in(X,Y) means the inner object X is spatially
    located in the region defined by the outer object
    Y.
  • If I just tell you in(X,Y), and you arent told
    what X and Y are, then you (and Cyc) cant answer
    questions like these
  • From the outside of Y, can I see any part of X?
  • If I turn Y over and shake it, will X fall out?
  • Is there room to put more things in Y?
  • Is X actually a part of Y?
  • Such failures led to our introducing new, more
    precise, more specialized versions of in. By
    now there are over 75 such predicates, organized
    in a graphical taxonomy.

59
Propositional Attitudes Relations Between Agents
and Propositions
  • goals
  • intends
  • desires
  • hopes
  • expects
  • believes
  • opinesThat
  • knowsThat
  • remembersThat
  • perceivesThat
  • seesThat
  • fearsThat

Most of these are modal assertions using them go
beyond 1st-order logic
60
Handcrafted Cyc KB
  • Represented in
  • First Order Logic
  • Higher Order Logic
  • Context Logic
  • Microtheories

Cyc contains 15,000 Predicates 500,000 Concept
s 5,200,000 Assertions
The pump has been primed, Use it as an inductive
bias to power more automatic knowledge acquisition
Real World Domain Knowledge
Specific cases, facts, details,
61
AKA by Shallow Fishing
Automated Knowledge Acquisition
  • Abu Sayyaf was founded in ___
  • Al Harakat Islamiya, established in ___
  • ASG was established in ___

Search Strings
(foundingDate AbuSayyaf ?X)
Abu Sayyaf was founded in the early 1990s
?
Parse (foundingDate AbuSayyaf (EarlyPartFn
(DecadeFn 199)))
62
AKA by Shallow Fishing
Automated Knowledge Acquisition
  • The height of the Eiffel Tower is ___
  • The Eiffel Tower is ___ tall

Search Strings
(height EiffelTower ?x)
The height of the Eiffel Tower is 36 feet The
height of the Eiffel Tower is 984 feet
? Parse (height
EiffelTower (Foot 36)) (height EiffelTower (Foot
984))
63
WWW.CYC.COM
64
(No Transcript)
65
(No Transcript)
66
(No Transcript)
67
Recent/Future AKB Directions
CYC
  • Make it comprehensive (13 ? 100) apply it to
    other dom.
  • Make it easier for SMEs to enter/vet/modify
    info.
  • Improve the automatic acquis. (parsing / fishing
    from unstructured texts SKSI to structured
    sources, incl. SPARQL)
  • Make it easier for end users to pose questions
  • Automatically select (a small superset of) the
    relevant fragments
  • Use semantic constraints (argIsa, disjointness,
    domain knowledge) to combine the
    relevant fragments into a meaningful logical
    query
  • Make justifications more terse and more
    compelling
  • Speed up inference (in general and for AKB entry
    and AKB query-answering)
  • Graceful degradation ½-way betw. QA Google
    falling back on Semantic Search of auto. tagged
    documents (tagged with Cyc terms)

68
Developing a Cyc App.
  • Extend Cycs KB
  • Augment its ontology
  • New assertions involving those new terms
  • New Heuristic Level modules
  • Identify the need(s) for them
  • Design, build, and debug them
  • New interface modules
  • For manual entry for SKSI mapping for end users
  • Domain-specific interfaces (e.g., sketching
    military unit movements drawing chemical
    formulae etc.)

69
OpenCycOpen Source release of most of the
Cyc Ontology Simple Relns. Inference Engine
ResearchCycAlmost All of Cyc (for free for RD
purposes)
70
The Ontology
Pre-existing general medical knowledge
framework Prior to the CCF project, Cycs KB
had184 specializations of MedicalCareEvent

MedicalCareEvent Ablation Ligation
CoronaryArteryBypassGraft Biopsy-SurgicalProcedure
TrephiningSomeone Prostatectomy RoboticSurgery
OutpatientSurgery InpatientSurgery
LiposuctionSurgery RemovalOfUniqueBodyPart
Appendectomy
Tonsillectomy GumSurgery SurgicalTreatment
TransplantSurgery HeartTransplantSurgery
GeneralSurgery MajorSurgery OpenHeartSurgery
RootCanalSurgery VaccinationEvent
BoosterVaccinationEvent AnthraxMilitaryVaccination
Script MedicalTesting
71
The Ontology
Pre-existing general medical knowledge
framework Prior to the CCF project, Cycs KB had
350 specializations of AilmentCondition

AttentionDeficitDisorder Glaucoma SpinalStenosis
SleepDeprivation Ache-AilmentCondition Migraine
Hemorrhaging-TheCondition Jaundice
ParasiticAilment BacillaryAngiomatosis
Cryptosporidiosis Rickettsiosis
EpidemicTyphus-NAmerica ArthropodInfestation
ExternalArthropodInfestation InternalArthropodInfe
station Trichinosis Schistosomiasis Ascariasis
BladderFlukeInfestation
Atherosclerosis MultiplePersonalityDisorder
Adenomyosis Scabies AmyotrophicLateralSclerosis
Scoliosis Hypoglycemia TemproMandibularJointSyndro
me AcetylcholinePoisoning CadmiumPoisoning
CarbonMonoxidePoisoning FoodborneBotulism
InhalationalBotulism WoundBotulism InfantBotulism
Endometriosis Neuralgia Sciatica Diverticulitis
Gout MacularDegeneration
72
The Ontology
Pre-existing general medical knowledge
framework Prior to the CCF project, Cycs KB had
200 specializations of Bacterium

StreptococcusPneumoniae StreptococcusPyogenes
Bacillaceae-Family Bacillus-Genus
BacillusCereus-Species Monotrichous
Bacterium-Monotrichous Peritrichous
Bacterium-Peritrichous Amphitrichous
Bacterium-Amphitrichous Tenericutes-Division
Mollicutes-Class Anaeroplasmataceae-Family
Asteroplasma-Genus Acholeplasmatales-Order
Acholeplasmataceae-Family Acholeplasma-Genus
Phytoplasma-Genus Eperythrozoon-Genus
Mycoplasmatales-Order Mycoplasmataceae-Family
Mycoplasma-Genus MycoplasmaPneumoniae-Species
Spirillales-Order Vibrionaceae-Family
Vibrio-Genus VibrioCholerae-Species
73
The Ontology
Hundreds of pre-existing relevant
relationships
Medical domain specific relations infectionCause
dByOrganism infectingPathogen patientTreated devic
eTypeTreatsConditionType causeOfDeathTypeOfType fo
rmOfDisease ailmentTypeAffects ailmentEpidemicType
ailmentAcquiredBy ailmentTypicallyAcquiredBy
indicatedDrug mortalityRiskForCondition
survivalRate riskOfInfectionFromTypeToType

General Role Predicates objectActedOn eventOccur
sAt dateOfEvent objectPlaced objectRemoved deviceU
sed
74
The Ontology
  • Methodology
  • Establish bridging (translation) rules
  • Define rules that allow users to associate
    patients, dates, locations, etc. with the various
    events e.g. define patientTreated as a
    relationship between a medical event and a
    patient.
  • Define rules that allow users to easily express
    complicated logical conditions e.g. the
    defining rules for PrimarySurgery,
    isolatedProcedureOfType, concomitantProcedures,
    etc.
  • Define concise vocabulary for constructions that
    are complicated or difficult to express e.g.
    aortic valve replacement is represented as a
    single non-atomic term. This allows the user to
    specify this very common procedure with a single
    fragment instead of three distinct fragments in
    the CCF ontology (which in turn came about due to
    there not being an explicit functional term
    composition construct in the CCF representation).

75
Typical Query for outcomes study
  • The examples in this presentation were short,
    simple, Medical English queries the ones being
    focused on while building the application, and
    now that it is actually being used at CCF, are
    much larger ones, e.g.
  • IDENTIFY PATIENT POPULATION
  • FIND all native aortic valve replacements
    performed at CCF between January 1, 2000 and
    December 31, 2004 with a pre-operative diagnosis,
    as determined by echocardiogram, of moderately
    severe or severe aortic stenosis and moderate to
    severe left ventricular impairment.
  • INCLUDE operations in which concomitant primary
    CABG or concomitant mitral or tricuspid valve
    repair was performed.
  • EXCLUDE all patients with any prior valve repair
    or replacement or with concomitant pulmonary
    valve repair or with concomitant mitral,
    tricuspid, or pulmonary valve replacement or
    with aortic regurgitation greater than moderate
    degree.

76
Researchers and clinicians sometimes ask the same
queries
  • Are there cases in the last decade where
    patients had pericardial aortic valves inserted
    in the reverse position, to serve as mitral valve
    replacements, and how often in such cases did
    endocarditis or tricuspid valve infection
    develop, and how long after the procedure?

77

Applying i.e., Using Cyc
  • Get a large set of use-cases (CCF task the last
    900 queries)
  • Arrange them into maximally mutually-dissimilar
    classes
  • Manually represent a couple from each of those
    buckets
  • Reveals most of the necessary new predicates (
    interfaces)
  • Now go through each of the use-cases, trolling
    for new domain-specific terms to add to the
    ontology
  • Can be done manually, but we are beginning to
    rely more on semi-automatic methods where the
    system itself helps with that process
  • As appropriate, lexify the terms and/or align
    them to existing standards
  • Run exemplars from each bucket (i.e., to
    completion)
  • tracer bullets to reveal nec. new rules,
    reasoning modules (interfaces)
  • Replace the largest bucket by 2-4 spec.s, recur
    (i.e., repeat the preceding 3 steps, and this
    one, again) until there is no new gain

78

Applying i.e., Using Cyc
  • Test the system on previously-unseen use-cases
    (or at least ones which were not among those
    previously-selected from their bucket)
  • Have users try to use the system, and watch them
    (their results, of course, but also to the extent
    possible their time-feature trajectory)
  • Which features did they rarely or never use (to
    good effect)?
  • Which features did they make heavy use of?
  • Independent of this, ask them for their feedback
    and suggestions
  • Try to identify classes of users which will
    translate into classes of documentation and
    training materials/regimes/interface specifics
  • All along, identify what elements of the ontology
    (if any) are proprietary, and assimilate
    everything else into future versions of OpenCyc
    and ResearchCyc

79
(No Transcript)
80
  • (implies
  • (and      (cCFhasLeftAtriumDiameter ?EVT ?D) 
       (greaterThan ?D ((Centi Meter) 3.8))   
     (patientTreated ?EVT ?PAT)      (patientSex
    ?PAT FemaleHuman)      (rdf-type ?EVT ?TYPE)   
     (genls ?TYPE CCF-Evaluation))   (isa ?EVT
    EvaluationThatIndicates-
  • LeftAtrialEnlargement))

81
1784 pieces of pre-existing (prior to this
project) Cyc KB knowledge used while handling a
typical query. E.g. Inferred Disjointness
constraints (disjointWith PericardialWindow-S
urgicalProcedure MedicalPatient) Justificati
on we are counting each of these assertions,
in the total (genls PericardialWindow-SurgicalPr
ocedure PericardialProcedure-Surgical) in
UniversalVocabularyMt (genls PericardialProcedure-
Surgical CardiacProcedure-Surgical) in
UniversalVocabularyMt (genls CardiacProcedure-Surg
ical SurgicalProcedure) in UniversalVocabularyMt (
genls SurgicalProcedure MedicalCareEvent) in
BaseKB (genls MedicalCareEvent PhysicalSituation)
in BaseKB (genls PhysicalSituation
Situation-Localized) in UniversalVocabularyMt (gen
ls Situation-Localized Situation) in
UniversalVocabularyMt (disjointWith
SpatialThing-NonSituational Situation) in
BaseKB (genls EnduringThing-Localized
SpatialThing-NonSituational) in
UniversalVocabularyMt (genls Agent-NonGeographical
EnduringThing-Localized) in UniversalVocabularyMt
(genls EmbodiedAgent Agent-NonGeographical) in
UniversalVocabularyMt (genls PerceptualAgent-Embod
ied EmbodiedAgent) in UniversalVocabularyMt (genls
Animal PerceptualAgent-Embodied) in
UniversalVocabularyMt (genls MedicalPatient
Animal) in UniversalVocabularyMt
82
Ideas for NLM Grand Challenges
  • Comprehensive Ontology of Medicine
  • Ties to terminological standards (Snomed, ICD),
    lexical ones (WordNet), conceptual ones (Cyc)
  • Knowledge about/involving the concepts
  • Contextualized for time, source, level of
    detail,
  • Sample sub-project multicultural Engl.-Engl.
    translation
  • English-to-English translation
  • Using the above ontology of medicine, and models
    of discourse, models of classes of users (by age,
    occupation, etc.), models of individual users
    (built up over time and stored HIPAA-securely)
  • Translate articles, web pages, medicine bottle
    labels, etc. into comprehensible form for that
    user
  • In some cases this means literally writing more
    text expanding its length, or paring it down
    (eliminating prior knowledge)
  • In less clear cases (where the user might or
    might not already know some piece of
    information), the best way to expand the original
    text might be to add footnotes containing the
    borderline information, and to pare down the
    original text by relegating borderline material
    to footnote form
  • The translations neednt just be static they can
    sync with the users calendars, cell phones,
    computers, etc., to provide reminders,
    proactively send them relevant news articles or
    new warnings, and so on
  • Automated Clinical/Biomedical Discovery
  • Hypothesis formation, Experiment design, Data
    gathering, Analysis, New termshypotheses
Write a Comment
User Comments (0)
About PowerShow.com