Feasting on Brains

About This Presentation

Title:

Feasting on Brains

Description:

A personal journey through the Semantic Web and Web Services ... Description in: www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-abstract.html ... – PowerPoint PPT presentation

Number of Views:61

Avg rating:3.0/5.0

Slides: 150

Provided by: markwil5

Category:

more less

Transcript and Presenter's Notes

Title: Feasting on Brains

1
Feasting on Brains! From Web Services to Web 2.0
to the Semantic Web and back again A
personal journey through the Semantic Web and Web
Services for Health Care and Life Sciences Mark
Wilkinson (markw_at_illuminae.com) Assistant
Professor, Medical Genetics University of British
Columbia Heart and Lung Research Institute at St.
Pauls Hospital
2
Benjamin Good(Hes a Creep!)
3
approach
Bioinformatics is a broad fieldand suffers
SEVERE interoperability problems
Bioinformaticians tend to be specialists in a
particular domain of computational analysis
As a group, the brains of all bioinformaticians
Contain all (known) bioinformatics
Is it possible to extract the knowledge Required
for interoperability from the brains of
bioinformaticians en masse?
4
Human Computation (luis von Ahn)
5
Ontology Spectrum
Thesauri narrower term relation
Selected Logical Constraints (disjointness,
inverse, )
Frames (properties)
Formal is-a
Catalog/ ID
Informal is-a
Formal instance
General Logical constraints
Terms/ glossary
Value Restrs.
Originally from AAAI 1999- Ontologies Panel by
Gruninger, Lehmann, McGuinness, Uschold, Welty
updated by McGuinness. Description in
www.ksl.stanford.edu/people/dlm/papers/ontologies-
come-of-age-abstract.html
6
An ontology is a representation of knowledge
Animal
has
Mammal
Hair
Primate
is_a
Lemur
Human
Zombie
eats
Brains
Chips
Shoots
Classes, instances properties, relationships
7
Classes
Animal
Mammal
Hair
Primate
Lemur
Human
Zombie
Brains
Chips
Shoots
8
instances
9
Properties
has
is_a
eats
10
relations
has
is_a
eats
11
An ontology is a representation of knowledge
Animal
has
Mammal
Hair
Primate
is_a
Lemur
Human
Zombie
eats
Brains
Chips
Shoots
Classes, instances properties, relationships
12
Web Service?

A software tool that is accessible over the Web
Web Services are intended to be accessed by
machines, not people.

13
Interoperability?

The ability of two Web Services to exchange
information, and use that information correctly
This generally requires Semantics in the form of
Ontologies

14
Mmmm Brains!!

BioMoby
Eating brains to enable Web Service
Interoperability

15
What does BioMoby do?
16

Create an ontology of bioinformatics data-types
Define an ontology of bioinformatics operations
Open these ontologies for community input
Define Web Services v.v. these two ontologies
A Machine can find an appropriate service
A Machine can execute that service unattended
Ontology is community-extensible

The BioMoby Plan
17
Overview of BioMoby Semantic Interoperability
18
Why couldnt we do this before?
19
Interoperability is HARD!
20
Interoperability throughHuman Computation

BioMoby Data Type Ontology An explicit list of
all biological data-types, and the relationships
between them.
Ontology built, brain by brain, by
informaticians!
We achieve interoperability simply because
informaticians donate their brain-power
HUMAN COMPUTATION

21
A portion of the BioMoby Ontology built from
the brains of the community!
22
so what can I do with it?
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
Analytical workflow Discovery

No explicit coordination between providers
Run-time discovery of appropriate tools
Automated execution of those tools
The machine understands the data you have
in-hand, and assists you in choosing the next
step in your analysis.

40
Interoperability throughHuman Computation

Individuals contributed their knowledge about
bioinformatics data-types to a central ontology
Their combined knowledge enabled the construction
of an interoperable framework

Who uses BioMoby?

42
Usage Statistics

15 Nations
60 independent institutions
1600 interoperable Bioinformatics Resources
500,000 requests for brokering each month

43
What have we learned?

We can consume
the brains of a large community
to generate something complex, yet organized

44
Open Kimono

The BioMoby ontology is actually quite messy
communal brains can build useful ontologies, but
the problem is

45
Ontologies are HARD!
46
How are ontologies usually constructed?
47
By small, hard-working, dedicated groups with
lots of money!

Gene Ontology code
Curated 5 full-time staff
25 Million (Lewis,S personal communication)
NCI Metathesaurus code
Curated 12 full-time staff
15 Million (Peter K. , estimate)
Health Level 7 (HL7)
Curated
Lots Some claim as much as 15 Billion
(Smith, Barry, KBB Workshop, Montreal, 2005)

To build the global Semantic Web for Systems
Biology we need to encode knowledge from EVERY
domain of biology from barley root apex
structure and function, to HIV clinical-trials
outcomes and this knowledge is constantly
changing!
At 15M each, can we afford the Semantic Web???

49
Mmmm Need MORE Brains!!

iCAPTURer
experiment

50
Dr. Bruce McManus with a human heart in his
hands He knows his hearts but he doesntknow
how to buildan ontology
51
What we need
52
The Problem
53
The Solution?
54
The Solution?
55
So how do we do it?
56
Remember what we learnedfrom Moby communities
CAN build ontologies!
57
Building Systems BiologyOntologies through
Human Computation
58
iCAPTURer

Benjamin Good
Ph.D. Student, UBC Bioinformatics
Genome BC Better Biomarkers in Transplantation
project, St. Pauls Hospital iCAPTURE Centre

59
Old Way

KE drills the brain of one or a very few
experts.
Painful, expensive, and time-consuming

60
New Way? the iCAPTURer

KE creates a clever interface
No direct interaction with expert
Thousands of experts
Cheap Cheap Cheap!

61
iCAPTURer 1.0

Go to a scientific conference
Text-mine conference abstracts
Auto-Extract concepts
Put concepts into a series ofquestion
templates
a web interface presents questions about these
concepts to conference attendees
Give points for every question they answer
Give a prize to the highest point winner

62
Results

Is _____ a meaningful term?
Yes, No, I dont know buttons
What is a synonym for ______
Text entry box
Where does _____ fit in the following tree of
related terms?
Clickable tree

63
Observations

Yes/No questions work well
Text entry is less effective
Adding to a tree is a disaster!
Competition is a great motivatorfor human
computation!

64
COST?
65
COST?
66
COST?
67
COST?
68
COST?

69
iCAPTURer 1.5
70
Start with hypothetical concept tree Put
concepts-concept relations into a series of
true/false questions Make a web interface to
ask questions If a relationship is false, then
re-start at the root of the concept tree Give
points for every question they answer Give a
prize to the highest point winner
71
Chatterbot

Ive heard that a cardiac myocyte is a type of
cardiac cell. Is this true?
Ive heard that STEMI means the same thing as ST
Elevated Myocardial Infarction. Is that
nonsense, or is it correct?
How do you feel about your mother?

72
Results

Knowledge capture in 3 days
11,000 Concepts

73
COST
0
74
Full details of this experiment are available
in Proceedings of the Pacific Symposium on
Biocomputing, 2006
75
Ontology Quality?
76
Potential Ontology Evaluation Metrics

Manual, subjective
Auto, questionable value
Auto, useful, not enough
Auto, dependent on NLP
Auto/Manual gold standard must exist!
Optimal! Auto/Manual, but not generalizable

Domain independent
philosophical desiderata
graphical structure
satisfiability
Domain specific
Fit to text
Similarity to a gold standard
Task-based

77
Good???
78
What do we mean by Good?
Ontology construction is motivated by the goal
of alignment not on concepts but on the
universals in reality and thereby also on the
corresponding instances - Barry
Smith Reality should be the benchmark for the
goodness of an ontology
79
ontology evaluation based on referents in
reality
80
Chosen Philosophical PrincipleEpistemology
Precedes Ontology

A Class should refer to an invariant pattern of
properties common among all its instances
Mammals have mammary glands and hair
Humans are an instance of the class Mammal
Therefore
If class-instances are mapped into an ontology
Each instance has properties or qualities
These properties or qualities SHOULD segregate
into different classes if the ontology is any good

81
Philosophical Desiderata

Non-vagueness
at least one instance can exist with the Class
pattern
Vague class mammalian cell wall
Non-ambiguity
no more than one common pattern per Class
Ambiguous class cell (e.g. cell phone, jail
cell)
Non-redundancy
within the same level of granularity, no other
class refers to same common properties
Redundant classes human, homo sapiens

Cimino, J, 1998
82
Realist Evaluation Step 1Table of
Instance-Properties
A
C
B
I.1
I.3
I.2
I.4
(Test one class at a time)
83
Realist Evaluation Step 2Machine Learning
If char1 Y Then Class X 100
Pattern
Class B score for this pattern
84
WEKA

Produced by Waikato University in New Zealand
An open source library containing implementations
of hundreds of machine learning algorithms
(rule learners, LDA, SVM, neural networks... )

85
Realist Evaluation
0.35
0.1
0.92
Class Score for Each Class
86
Realist Evaluation - positive control

Identify an ontology that already has logical
constraints on properties of a classes.
Assemble instances that have those properties
Classify the instances with a reasoner
Remove class restrictions from the ontology, but
keep instances assigned to their classes
Look for patterns of instance properties
If successful, patterns should be detected
The higher the pattern score, the gooder the
ontology is

87
Positive Control Phosphabase

An ontology describing different classes of
phosphatase enzymes.
Given the domain composition of a protein,
phosphatase class can be inferred automatically.

Wolstencraft et al (2006) Protein classification
using ontology classification Bioinformatics.
Vol. 22 no. 14, pages 530538
88
Remove the Logical Rules

Remove the defining rules for each class
Maintain the classified instances
Execute the realist evaluation
Can we re-discover the patterns that the logical
class-rules used to dictate?

89
Realist Evaluation Positive Control

25 classes from phosphabase tested on 700
simulated protein instances
21 - pattern correctly identified for 100 of
instances
For 4 others, patterns identified covering 99,
92, 82, 82 of instances respectively.

90
Realist Evaluation Positive Control

So the Phosphabase ontology is good
We can detect strong patterns of properties in
its instances that follow the philosophical
desiderata
This is unsurprising, since we knew that it was
good in the first place

91
Evaluation of Gene Ontologyis ongoing
92
Interesting side effect

Class-defining rules are generated by the realist
evaluation
Most existing bio-ontologies lack formal
class-definitions
This evaluation could be used to create such
rules ? automatic classifiers
Can also detect what TYPE of property is best
classified by current bio-ontologies

93
Is Realist Evaluation a Valid metric?

the realist evaluation measures the success of an
ontology in classifying a specific set of
properties
We claim that this is a metric relating to the
quality of that ontology
Is this metric any better than other metric like
graph complexity, or fit-to-text?

94
Evaluatingmetrics
95
OntoLoki Making mischief with Ontologies

Take an ontology that we claim is good
Make it worse by mischievously adding changes
Measure the degree of mischief
Run the evaluation metric of interest
? Metric score should correlate with the amount
of mischief added

96
Comparison of ontology quality metrics
Measured Ontology Quality
Amount of noise added (ontology quality
decreasing)
97
Is Reality Evaluation a good metric?
98
Lets OntoLoki it to find out!
99
OntoLoki test of Realist Evaluation Metric
Average Class Score
Noise Added (a measure of nodes affected)
100
Conclusion

Human computation can collect significant amounts
of knowledge in an organized way

OntoLoki seems to be effective atevaluating
the evaluation metrics
Reality evaluation is an interesting new
metric for testing ontologies
101
Subjective iCAPTURer Observations

Humans had an EXTREMELY difficult time
classifying concepts into pre-existing categories
Humans had an EXTREMELY difficult time defining
new categories and placing them into the existing
classification system

102
Classification is HARD!
103
Abandoning Classification
104
(briefly)
105
An ontology is a representation of knowledge
Animal
has
Mammal
Hair
Primate
is_a
Lemur
Human
Gorilla
Classes, instances properties, relationships
has_size
Big
Medium
Small
106
AN ontology is ONE representation of knowledge
Animal
has
Mammal
Hair
Primate
is_a
Lemur
Human
Gorilla
Ontology of Anatomy
has_size
Big
Medium
Small
107
AN ontology is ONE representation of knowledge
Animal
lives
African_animal
Africa
Southern_African_animal
is_a
Ontology of Habitat Also might want Odour,
digits, bone density, friendliness, cuteness..
Aquatic
plains
mountain
108
Clay Shirky Ontology is Overrated

Attempts to predict the future
Soviet Union used to be a category in the
Library of Congress
Attempts mind-reading
Size, location, odour.. Authors must predict what
users are interested in
Great minds dont think alike..
No two people are likely to create the same
ontology

http//www.shirky.com/writings/ontology_overrated.
html
109
Categories
Properties
110
BRAINS!! MORE BRAINS!!

Mass Collaborative Tagging

111
Mass Open Social Tagging

A rapidly growing trend on the Web
Unstructured
Mass-collaboration
Anyone can say anything about anything using any
words they wish

112
Connotea Scientific Tagging(Connotea is a
product of Nature Publishing Group)
113
Connotea Growth
114
Tagging is EASY!
115
The Tagged World

Tagging is easy!
Tagging costs nothing
Tagging empowers all viewpoints
Tagging is happening!!!!!!

116
Lexical Comparison of Tagging with Formal
Indexing Systemsand Ontologies
117
Ontology (FMA)
118
Ontology (GO Molecular Function)
119
Ontology (GO Biological Process)
120
Tagging (Bibsonomy)
121
Tagging (CiteULike)
122
Tagging (Connotea)
123
Ontologies and Folksonomies are fundamentally
different!
124
Problem??

Folksonomies and ontologies are fundamentally
different!
It may not be possible to derive one from the
other accurately
Nevertheless, we would like to take advantage of
tagging behaviour while gaining the power of
controlled vocabularies/Ontologies

125
E.D.The Entity Desciber
126
Connotea tagging
User types in all tags
Type-ahead displays previously used tags
127
Connotea E.D. Tagging
128
Leveraging Tagging?

Tagging effectively assigns properties to
entities
ED Tagging constrains those properties to a
controlled vocabulary or ontology
Can we discover patterns in those properties that
indicate a natural classification system?
Can a realist-evaluation generate logical rules
that define classes based on patterns of tags?

129
Final Thoughts

Ontologies are important, but hard to build
iCAPTURer formal, template-based, cost-free
consumption of biologists brains seems to work!
Informal annotation (tagging) is cheap, easy,
and scalable,
and is HAPPENING
Can we leverage tagging to create ontology-like
structures? Maybe Maybe not!

130
My journey back to Web Services
131
Why do I care about WS so passionately?
132
(No Transcript)
133
The Deep Web

All the data and knowledge only accessible
through Web Forms
Estimated to be orders of magnitude greater than
the surface Web- 91,000 Terabytes in the deep
Web- 167 Terabytes in the Surface Web
Much of the Deep Web CANNOT be represented on the
Semantic Web since it DOES NOT EXIST until the
Web Form is accessed

134
Moby 2.0 and CardioSHARE Merging the Deep
Weband the Semantic Web
135
What Web Services do
BLAST SERVICE
Sequence Data
Blast Hit
136
What BioMoby does
??
Sequence Data
Want Blast
MOBY BLAST SERVICE
137
The implied relationship between input and output
Sequence Data
Blast Hit
givesBlastResult
Not Bologically Meaningful
138
The implied biological relationship between input
and output
hasHomologyTo
Sequence Data
Blast Hit
looks a lot like the RDF statement
139
To merge Web Servicesand the Semantic
WebSimply assertthe relationshipand let
Moby do the rest!
140
Start with a partial Triple
URI rdftype Sequence
hasHomologyTo
141
What Moby 2.0 Does
??
URI rdftypeSequence
hasHomologyTo
MOBY BLAST SERVICE
Moby 2.0 hasHomologyTo property provided
byBLAST services
142
Moby 2.0 Query

FIND SERVICES THAT

Consume Sequence Data Provide hasHomologyTo
Property Attached to other Sequence Data
143
Moby 2.0 extends SPARQL

SPARQL queries contain concepts and relationships
of interest
Map RDF predicates onto Moby services capable of
generating them
Registry query What Moby service consumes
subject and generates the predicate
relationship type?

144
But wait, theres more!
145
CardioSHARE Exploit knowledge in OWL-DL
ontologies to enhance query
146
CardioSHARE Exploit knowledge in OWL-DL
ontologies to enhance query
This SPARQL query could be posed on a database
of RAW, UNANNOTATED Protein sequences, and be
answered by Moby 2.0
147
What do Moby 2.0 and CardioSHARE achieve?

Makes the Deep Web transparently accessible as if
it were a Semantic Web Resource
Allows SPARQL to do truly semantic queries!
Reduces the requirement of Biologists to know
how/where to get their data of interest
Simplifies construction of complex analytical
pipelines by automating much of the
discovery/execution tasking

148
Ontology Spectrum
Thesauri narrower term relation
Selected Logical Constraints (disjointness,
inverse, )
Frames (properties)
Formal is-a
Catalog/ ID
Informal is-a
Formal instance
General Logical constraints
Terms/ glossary
Value Restrs.
Originally from AAAI 1999- Ontologies Panel by
Gruninger, Lehmann, McGuinness, Uschold, Welty
updated by McGuinness. Description in
www.ksl.stanford.edu/people/dlm/papers/ontologies-
come-of-age-abstract.html
149
Fin

Write a Comment

User Comments (0)