Feasting on Brains - PowerPoint PPT Presentation

1 / 149
About This Presentation
Title:

Feasting on Brains

Description:

A personal journey through the Semantic Web and Web Services ... Description in: www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-abstract.html ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 150
Provided by: markwil5
Category:
Tags: brains | feasting | ksl

less

Transcript and Presenter's Notes

Title: Feasting on Brains


1
Feasting on Brains! From Web Services to Web 2.0
to the Semantic Web and back again A
personal journey through the Semantic Web and Web
Services for Health Care and Life Sciences Mark
Wilkinson (markw_at_illuminae.com) Assistant
Professor, Medical Genetics University of British
Columbia Heart and Lung Research Institute at St.
Pauls Hospital
2
Benjamin Good(Hes a Creep!)
3
approach
Bioinformatics is a broad fieldand suffers
SEVERE interoperability problems
Bioinformaticians tend to be specialists in a
particular domain of computational analysis
As a group, the brains of all bioinformaticians
Contain all (known) bioinformatics
Is it possible to extract the knowledge Required
for interoperability from the brains of
bioinformaticians en masse?
4
Human Computation (luis von Ahn)
5
Ontology Spectrum
Thesauri narrower term relation
Selected Logical Constraints (disjointness,
inverse, )
Frames (properties)
Formal is-a
Catalog/ ID
Informal is-a
Formal instance
General Logical constraints
Terms/ glossary
Value Restrs.
Originally from AAAI 1999- Ontologies Panel by
Gruninger, Lehmann, McGuinness, Uschold, Welty
updated by McGuinness. Description in
www.ksl.stanford.edu/people/dlm/papers/ontologies-
come-of-age-abstract.html
6
An ontology is a representation of knowledge
Animal
has
Mammal
Hair
Primate
is_a
Lemur
Human
Zombie
eats
Brains
Chips
Shoots
Classes, instances properties, relationships
7
Classes
Animal
Mammal
Hair
Primate
Lemur
Human
Zombie
Brains
Chips
Shoots
8
instances
9
Properties
has
is_a
eats
10
relations
has
is_a
eats
11
An ontology is a representation of knowledge
Animal
has
Mammal
Hair
Primate
is_a
Lemur
Human
Zombie
eats
Brains
Chips
Shoots
Classes, instances properties, relationships
12
Web Service?
  • A software tool that is accessible over the Web
  • Web Services are intended to be accessed by
    machines, not people.

13
Interoperability?
  • The ability of two Web Services to exchange
    information, and use that information correctly
  • This generally requires Semantics in the form of
    Ontologies

14
Mmmm Brains!!
  • BioMoby
  • Eating brains to enable Web Service
    Interoperability

15
What does BioMoby do?
16
  • Create an ontology of bioinformatics data-types
  • Define an ontology of bioinformatics operations
  • Open these ontologies for community input
  • Define Web Services v.v. these two ontologies
  • A Machine can find an appropriate service
  • A Machine can execute that service unattended
  • Ontology is community-extensible

The BioMoby Plan
17
Overview of BioMoby Semantic Interoperability
18
Why couldnt we do this before?
19
Interoperability is HARD!
20
Interoperability throughHuman Computation
  • BioMoby Data Type Ontology An explicit list of
    all biological data-types, and the relationships
    between them.
  • Ontology built, brain by brain, by
    informaticians!
  • We achieve interoperability simply because
    informaticians donate their brain-power
  • HUMAN COMPUTATION

21
A portion of the BioMoby Ontology built from
the brains of the community!
22
so what can I do with it?
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
Analytical workflow Discovery
  • No explicit coordination between providers
  • Run-time discovery of appropriate tools
  • Automated execution of those tools
  • The machine understands the data you have
    in-hand, and assists you in choosing the next
    step in your analysis.

40
Interoperability throughHuman Computation
  • Individuals contributed their knowledge about
    bioinformatics data-types to a central ontology
  • Their combined knowledge enabled the construction
    of an interoperable framework

41
  • Who uses BioMoby?

42
Usage Statistics
  • 15 Nations
  • 60 independent institutions
  • 1600 interoperable Bioinformatics Resources
  • 500,000 requests for brokering each month

43
What have we learned?
  • We can consume
  • the brains of a large community
  • to generate something complex, yet organized

44
Open Kimono
  • The BioMoby ontology is actually quite messy
  • communal brains can build useful ontologies, but
    the problem is

45
Ontologies are HARD!
46
How are ontologies usually constructed?
47
By small, hard-working, dedicated groups with
lots of money!
  • Gene Ontology code
  • Curated 5 full-time staff
  • 25 Million (Lewis,S personal communication)
  • NCI Metathesaurus code
  • Curated 12 full-time staff
  • 15 Million (Peter K. , estimate)
  • Health Level 7 (HL7)
  • Curated
  • Lots Some claim as much as 15 Billion
    (Smith, Barry, KBB Workshop, Montreal, 2005)

48
  • To build the global Semantic Web for Systems
    Biology we need to encode knowledge from EVERY
    domain of biology from barley root apex
    structure and function, to HIV clinical-trials
    outcomes and this knowledge is constantly
    changing!
  • At 15M each, can we afford the Semantic Web???

49
Mmmm Need MORE Brains!!
  • iCAPTURer
  • experiment

50
Dr. Bruce McManus with a human heart in his
hands He knows his hearts but he doesntknow
how to buildan ontology
51
What we need
52
The Problem
53
The Solution?
54
The Solution?
55
So how do we do it?
56
Remember what we learnedfrom Moby communities
CAN build ontologies!
57
Building Systems BiologyOntologies through
Human Computation
58
iCAPTURer
  • Benjamin Good
  • Ph.D. Student, UBC Bioinformatics
  • Genome BC Better Biomarkers in Transplantation
    project, St. Pauls Hospital iCAPTURE Centre

59
Old Way
  • KE drills the brain of one or a very few
    experts.
  • Painful, expensive, and time-consuming

60
New Way? the iCAPTURer
  • KE creates a clever interface
  • No direct interaction with expert
  • Thousands of experts
  • Cheap Cheap Cheap!

61
iCAPTURer 1.0
  • Go to a scientific conference
  • Text-mine conference abstracts
  • Auto-Extract concepts
  • Put concepts into a series ofquestion
    templates
  • a web interface presents questions about these
    concepts to conference attendees
  • Give points for every question they answer
  • Give a prize to the highest point winner

62
Results
  • Is _____ a meaningful term?
  • Yes, No, I dont know buttons
  • What is a synonym for ______
  • Text entry box
  • Where does _____ fit in the following tree of
    related terms?
  • Clickable tree

63
Observations
  • Yes/No questions work well
  • Text entry is less effective
  • Adding to a tree is a disaster!
  • Competition is a great motivatorfor human
    computation!

64
COST?
65
COST?
66
COST?
67
COST?
68
COST?

69
iCAPTURer 1.5
70
Start with hypothetical concept tree Put
concepts-concept relations into a series of
true/false questions Make a web interface to
ask questions If a relationship is false, then
re-start at the root of the concept tree Give
points for every question they answer Give a
prize to the highest point winner
71
Chatterbot
  • Ive heard that a cardiac myocyte is a type of
    cardiac cell. Is this true?
  • Ive heard that STEMI means the same thing as ST
    Elevated Myocardial Infarction. Is that
    nonsense, or is it correct?
  • How do you feel about your mother?

72
Results
  • Knowledge capture in 3 days
  • 11,000 Concepts

73
COST
0
74
Full details of this experiment are available
in Proceedings of the Pacific Symposium on
Biocomputing, 2006
75
Ontology Quality?
76
Potential Ontology Evaluation Metrics
  • Manual, subjective
  • Auto, questionable value
  • Auto, useful, not enough
  • Auto, dependent on NLP
  • Auto/Manual gold standard must exist!
  • Optimal! Auto/Manual, but not generalizable
  • Domain independent
  • philosophical desiderata
  • graphical structure
  • satisfiability
  • Domain specific
  • Fit to text
  • Similarity to a gold standard
  • Task-based

77
Good???
78
What do we mean by Good?
Ontology construction is motivated by the goal
of alignment not on concepts but on the
universals in reality and thereby also on the
corresponding instances - Barry
Smith Reality should be the benchmark for the
goodness of an ontology
79
ontology evaluation based on referents in
reality
80
Chosen Philosophical PrincipleEpistemology
Precedes Ontology
  • A Class should refer to an invariant pattern of
    properties common among all its instances
  • Mammals have mammary glands and hair
  • Humans are an instance of the class Mammal
  • Therefore
  • If class-instances are mapped into an ontology
  • Each instance has properties or qualities
  • These properties or qualities SHOULD segregate
    into different classes if the ontology is any good

81
Philosophical Desiderata
  • Non-vagueness
  • at least one instance can exist with the Class
    pattern
  • Vague class mammalian cell wall
  • Non-ambiguity
  • no more than one common pattern per Class
  • Ambiguous class cell (e.g. cell phone, jail
    cell)
  • Non-redundancy
  • within the same level of granularity, no other
    class refers to same common properties
  • Redundant classes human, homo sapiens

Cimino, J, 1998
82
Realist Evaluation Step 1Table of
Instance-Properties
A
C
B
I.1
I.3
I.2
I.4
(Test one class at a time)
83
Realist Evaluation Step 2Machine Learning
If char1 Y Then Class X 100
Pattern
Class B score for this pattern
84
WEKA
  • Produced by Waikato University in New Zealand
  • An open source library containing implementations
    of hundreds of machine learning algorithms
  • (rule learners, LDA, SVM, neural networks... )

85
Realist Evaluation
0.35
0.1
0.92
Class Score for Each Class
86
Realist Evaluation - positive control
  • Identify an ontology that already has logical
    constraints on properties of a classes.
  • Assemble instances that have those properties
  • Classify the instances with a reasoner
  • Remove class restrictions from the ontology, but
    keep instances assigned to their classes
  • Look for patterns of instance properties
  • If successful, patterns should be detected
  • The higher the pattern score, the gooder the
    ontology is

87
Positive Control Phosphabase
  • An ontology describing different classes of
    phosphatase enzymes.
  • Given the domain composition of a protein,
    phosphatase class can be inferred automatically.

Wolstencraft et al (2006) Protein classification
using ontology classification Bioinformatics.
Vol. 22 no. 14, pages 530538
88
Remove the Logical Rules
  • Remove the defining rules for each class
  • Maintain the classified instances
  • Execute the realist evaluation
  • Can we re-discover the patterns that the logical
    class-rules used to dictate?

89
Realist Evaluation Positive Control
  • 25 classes from phosphabase tested on 700
    simulated protein instances
  • 21 - pattern correctly identified for 100 of
    instances
  • For 4 others, patterns identified covering 99,
    92, 82, 82 of instances respectively.

90
Realist Evaluation Positive Control
  • So the Phosphabase ontology is good
  • We can detect strong patterns of properties in
    its instances that follow the philosophical
    desiderata
  • This is unsurprising, since we knew that it was
    good in the first place

91
Evaluation of Gene Ontologyis ongoing
92
Interesting side effect
  • Class-defining rules are generated by the realist
    evaluation
  • Most existing bio-ontologies lack formal
    class-definitions
  • This evaluation could be used to create such
    rules ? automatic classifiers
  • Can also detect what TYPE of property is best
    classified by current bio-ontologies

93
Is Realist Evaluation a Valid metric?
  • the realist evaluation measures the success of an
    ontology in classifying a specific set of
    properties
  • We claim that this is a metric relating to the
    quality of that ontology
  • Is this metric any better than other metric like
    graph complexity, or fit-to-text?

94
Evaluatingmetrics
95
OntoLoki Making mischief with Ontologies
  • Take an ontology that we claim is good
  • Make it worse by mischievously adding changes
  • Measure the degree of mischief
  • Run the evaluation metric of interest
  • ? Metric score should correlate with the amount
    of mischief added

96
Comparison of ontology quality metrics
Measured Ontology Quality
Amount of noise added (ontology quality
decreasing)
97
Is Reality Evaluation a good metric?
98
Lets OntoLoki it to find out!
99
OntoLoki test of Realist Evaluation Metric
Average Class Score
Noise Added (a measure of nodes affected)
100
Conclusion
  • Human computation can collect significant amounts
    of knowledge in an organized way

OntoLoki seems to be effective atevaluating
the evaluation metrics
Reality evaluation is an interesting new
metric for testing ontologies
101
Subjective iCAPTURer Observations
  • Humans had an EXTREMELY difficult time
    classifying concepts into pre-existing categories
  • Humans had an EXTREMELY difficult time defining
    new categories and placing them into the existing
    classification system

102
Classification is HARD!
103
Abandoning Classification
104
(briefly)
105
An ontology is a representation of knowledge
Animal
has
Mammal
Hair
Primate
is_a
Lemur
Human
Gorilla
Classes, instances properties, relationships
has_size
Big
Medium
Small
106
AN ontology is ONE representation of knowledge
Animal
has
Mammal
Hair
Primate
is_a
Lemur
Human
Gorilla
Ontology of Anatomy
has_size
Big
Medium
Small
107
AN ontology is ONE representation of knowledge
Animal
lives
African_animal
Africa
Southern_African_animal
is_a
Ontology of Habitat Also might want Odour,
digits, bone density, friendliness, cuteness..
Aquatic
plains
mountain
108
Clay Shirky Ontology is Overrated
  • Attempts to predict the future
  • Soviet Union used to be a category in the
    Library of Congress
  • Attempts mind-reading
  • Size, location, odour.. Authors must predict what
    users are interested in
  • Great minds dont think alike..
  • No two people are likely to create the same
    ontology

http//www.shirky.com/writings/ontology_overrated.
html
109
Categories
Properties
110
BRAINS!! MORE BRAINS!!
  • Mass Collaborative Tagging

111
Mass Open Social Tagging
  • A rapidly growing trend on the Web
  • Unstructured
  • Mass-collaboration
  • Anyone can say anything about anything using any
    words they wish

112
Connotea Scientific Tagging(Connotea is a
product of Nature Publishing Group)
113
Connotea Growth
114
Tagging is EASY!
115
The Tagged World
  • Tagging is easy!
  • Tagging costs nothing
  • Tagging empowers all viewpoints
  • Tagging is happening!!!!!!

116
Lexical Comparison of Tagging with Formal
Indexing Systemsand Ontologies
117
Ontology (FMA)
118
Ontology (GO Molecular Function)
119
Ontology (GO Biological Process)
120
Tagging (Bibsonomy)
121
Tagging (CiteULike)
122
Tagging (Connotea)
123
Ontologies and Folksonomies are fundamentally
different!
124
Problem??
  • Folksonomies and ontologies are fundamentally
    different!
  • It may not be possible to derive one from the
    other accurately
  • Nevertheless, we would like to take advantage of
    tagging behaviour while gaining the power of
    controlled vocabularies/Ontologies

125
E.D.The Entity Desciber
126
Connotea tagging
User types in all tags
Type-ahead displays previously used tags
127
Connotea E.D. Tagging
128
Leveraging Tagging?
  • Tagging effectively assigns properties to
    entities
  • ED Tagging constrains those properties to a
    controlled vocabulary or ontology
  • Can we discover patterns in those properties that
    indicate a natural classification system?
  • Can a realist-evaluation generate logical rules
    that define classes based on patterns of tags?

129
Final Thoughts
  • Ontologies are important, but hard to build
  • iCAPTURer formal, template-based, cost-free
    consumption of biologists brains seems to work!
  • Informal annotation (tagging) is cheap, easy,
    and scalable,
  • and is HAPPENING
  • Can we leverage tagging to create ontology-like
    structures? Maybe Maybe not!

130
My journey back to Web Services
131
Why do I care about WS so passionately?
132
(No Transcript)
133
The Deep Web
  • All the data and knowledge only accessible
    through Web Forms
  • Estimated to be orders of magnitude greater than
    the surface Web- 91,000 Terabytes in the deep
    Web- 167 Terabytes in the Surface Web
  • Much of the Deep Web CANNOT be represented on the
    Semantic Web since it DOES NOT EXIST until the
    Web Form is accessed

134
Moby 2.0 and CardioSHARE Merging the Deep
Weband the Semantic Web
135
What Web Services do
BLAST SERVICE
Sequence Data
Blast Hit
136
What BioMoby does
??
Sequence Data
Want Blast
MOBY BLAST SERVICE
137
The implied relationship between input and output
Sequence Data
Blast Hit
givesBlastResult
Not Bologically Meaningful
138
The implied biological relationship between input
and output
hasHomologyTo
Sequence Data
Blast Hit
looks a lot like the RDF statement
139
To merge Web Servicesand the Semantic
WebSimply assertthe relationshipand let
Moby do the rest!
140
Start with a partial Triple
URI rdftype Sequence
hasHomologyTo
141
What Moby 2.0 Does
??
URI rdftypeSequence
hasHomologyTo
MOBY BLAST SERVICE
Moby 2.0 hasHomologyTo property provided
byBLAST services
142
Moby 2.0 Query
  • FIND SERVICES THAT

Consume Sequence Data Provide hasHomologyTo
Property Attached to other Sequence Data
143
Moby 2.0 extends SPARQL
  • SPARQL queries contain concepts and relationships
    of interest
  • Map RDF predicates onto Moby services capable of
    generating them
  • Registry query What Moby service consumes
    subject and generates the predicate
    relationship type?

144
But wait, theres more!
145
CardioSHARE Exploit knowledge in OWL-DL
ontologies to enhance query
146
CardioSHARE Exploit knowledge in OWL-DL
ontologies to enhance query
This SPARQL query could be posed on a database
of RAW, UNANNOTATED Protein sequences, and be
answered by Moby 2.0
147
What do Moby 2.0 and CardioSHARE achieve?
  • Makes the Deep Web transparently accessible as if
    it were a Semantic Web Resource
  • Allows SPARQL to do truly semantic queries!
  • Reduces the requirement of Biologists to know
    how/where to get their data of interest
  • Simplifies construction of complex analytical
    pipelines by automating much of the
    discovery/execution tasking

148
Ontology Spectrum
Thesauri narrower term relation
Selected Logical Constraints (disjointness,
inverse, )
Frames (properties)
Formal is-a
Catalog/ ID
Informal is-a
Formal instance
General Logical constraints
Terms/ glossary
Value Restrs.
Originally from AAAI 1999- Ontologies Panel by
Gruninger, Lehmann, McGuinness, Uschold, Welty
updated by McGuinness. Description in
www.ksl.stanford.edu/people/dlm/papers/ontologies-
come-of-age-abstract.html
149
Fin
Write a Comment
User Comments (0)
About PowerShow.com