Title: Reasoning on the Web: Theory, Challenges, and Applications in Bioinformatics
1Reasoning on the WebTheory, Challenges, and
Applications in Bioinformatics
2Contents
- Motivation
- Beyond the web Rules, Reasoning, Semantics,
Ontologies - Semantics of Deduction Rules
- Argumentation Semantics
- Fuzzy Reasoning
- Reaction rules
- Vivid Agents
- Prova
- Applications in Bioinformatics
3The Web
- A great success story, but
- its the web for humans, not machines
- Many areas, such as biology, have fully embraced
the web - Human genome project is only tip of the iceberg
- More than 500 tools and databases online
4Example Pubmed
- gt12.000.000 literature abstracts
- Great resource if one knows what one is looking
for - Kox1 has 17 hits
- But diabetes will produce gt200.000
- Often need to automatically process abstracts
5Results of PubMed
- Lorenz P, Transcriptional repression mediated by
the KRAB domain of the human C2H2 zinc finger
protein Kox1/ZNF10 does not require histone
deacetylation.Biol Chem. 2001 Apr382(4)637-44. - Fredericks WJ. An engineered PAX3-KRAB
transcriptional repressor inhibits the malignant
phenotype of alveolar rhabdomyosarcoma cells
harboring the endogenous PAX3-FKHR oncogene.Mol
Cell Biol. 2000 Jul20(14)5019-31....
However, to a machine things look different!
6Results of PubMed
- Lorenz P, Transcriptional repression mediated by
the KRAB domain of the human C2H2 zinc finger
protein Kox1/ZNF10 does not require histone
deacetylation.Biol Chem. 2001 Apr382(4)637-44. - Fredericks WJ. An engineered PAX3-KRAB
transcriptional repressor inhibits the malignant
phenotype of alveolar rhabdomyosarcoma cells
harboring the endogenous PAX3-FKHR oncogene.Mol
Cell Biol. 2000 Jul20(14)5019-31....
Solution tag data (XML)
7Results of PubMed
- ltauthorgtLorenz Plt/authorgtlttitlegtTranscriptional
repression mediated by the KRAB domain of the
human C2H2 zinc finger protein Kox1/ZNF10 does
not require histone deacetylation.
lt/titlegtltjournalgtBiol Chem lt/journalgtltyeargt2001lty
eargt - ltauthorgtLorenz Plt/authorgtlttitlegtTranscriptional
repression mediated by the KRAB domain of the
human C2H2 zinc finger protein Kox1/ZNF10 does
not require histone deacetylation.
lt/titlegtltjournalgtBiol Chem lt/journalgtltyeargt2001lty
eargt - ...
However, to a machine things look different!
8Results of PubMed
- ltauthorgtLorenz Plt/authorgtlttitlegtTranscriptional
repression mediated by the KRAB domain of the
human C2H2 zinc finger protein Kox1/ZNF10 does
not require histone deacetylation.
lt/titlegtltjournalgtBiol Chem lt/journalgtltyeargt2001lty
eargt - ltauthorgtLorenz Plt/authorgtlttitlegtTranscriptional
repression mediated by the KRAB domain of the
human C2H2 zinc finger protein Kox1/ZNF10 does
not require histone deacetylation.
lt/titlegtltjournalgtBiol Chem lt/journalgtltyeargt2001lty
eargt - ...
Solution use ontologies (Semantic Web)
9GeneOntology
- Biologists have recognised the problem of
semantic inter-operability between disparate
information sources - GeneOntology (GO) is effort to provide common
vocabulary for molecular biology - GO has gt10.000 terms in three branches
function, process, localisation
10GeneOntology
- Has 13 levels
- Width broadens to level 6 (3885 terms wide) then
shrinks - Number of leaves per levels broadens to level 6
(1223 leaves) then shrinks - Average term has 4 words
- Maximal term has 29 words
Oxidoreductase activity, acting on paired donors,
with incorporation or reduction of molecular
oxygen, 2-oxoglutarate as one donor, and
incorporation of one atom each of oxygen into
both donors
Breadth of GO
11Motivation Summary
- Web in the old days
- HTML (for humans)
- Web these days
- HTML
- XML, Ontologies (for machines)
- Web of the future
- HTML
- XML, Ontologies
- rules, reasoning, semantics
- access to computational resources (a la
grid-computing)
12Open Problems
- Part I Theory of rules and reasoning on the web
- Knowledge representation Which level of
expressiveness? - Semantics How to guarantee inter-operability
- Reasoning Fuzzy reasoning and unification
- Reactivity Vivid agents
- Part II Applications of rules and reasoning on
the web - Integration and querying of information sources
- Integration transmembrane prediction tools
- Integration protein structure DB and structure
classification - Consistency checking
- Ontology If A is B and B is C, then the ontology
should not explicitly mention A is C, as it is
already implicit - Annotation Do different tools agree or
disagree?
13The wider Picture www.RuleML.org
- Goal develop Web language for rules
- using XML markup,
- formal semantics, and
- efficient implementations.
- Rules derivation rules, transformation rules,
and reaction rules. - RuleML can thus specify queries and inferences in
Web ontologies, mappings between Web ontologies,
and dynamic Web behaviors of workflows, services,
and agents. - Currently, some 30 international members and
close collaboration with W3C
14The wider Picture REWERSE
- Reasoning on the Web with Rules and Semantics
- FP6 Network of Excellence with nearly 30 partners
- Working groups on Infrastructure and Applications
- Composition
- Typing
- Policies
- Querying
- Reactivity and evolution
- Personalised Web sites
- Calendar systems
- Bioinformatics
15Part I Theory
- Motivation Expressive Knowledge Representation
- Part I.a Argumentation as LP semantics
- Notions of attack and justified arguments
- Hierarchy of semantics
- Proof procedure
- Part I.b Fuzzy unification and argumentation
- Fuzzy negation
- Fuzzy argumentation
- Fuzzy unification
- Part I.c Vivid Agents
16Part I.a A Hierarchy of Semantics
- RuleML caters for different degrees of knowledge
representation - A hierarchy of semantics is required to guarantee
inter-operation. - Analogy In HTML, ltbgtMichaellt/bgt will be
interpreted differently in Netscape (Michael) and
the text-based browser Lynx (Michael). - Problem How can we guarantee inter-operability
between different interpretations of rules?
17Knowledge representation
- Pete earns 500.000 p.a.
- earns(pete,500000).
- Cross the street if there are no cars
- cross ? not car
- cross ? ? car
- The fridge is quite cheap
- cheap(fridge)70
- Does Mike live in Londn?
- address(mike,london) address(mike,londn) 95
18Knowledge System Cube
- r relational
- f fuzzy
- d deductive
- DB database
- FB factbase
19Part I.aArgumentation as semantics for
Extended Logic Programs
fdFB
fdDB
- f fuzzy
- d deductive
- DB database
- FB factbase
dDB
dFB
fDB
fFB
rDB
rFB
fuzzy
deductive
negation
20Extended Logic Programming
- Logic Programming with 2 negations
- Default negation
- not p true if all attempts to prove p fail.
- Explicit negation
- ?p falsehood of a literal may be stated
explicitly. - Coherence principle
- ?p ? not p
21Argumentation
- Interaction between agents in order to
- gain knowledge
- revise existing knowledge
- convince the opponent
- solve conflicts
- Elegant way to define semantics for (extended)
logic programming - Dung
- Kowalski, Toni, Sadri
- Prakken Sartor
- Etc.
22Arguments
- An argument is a partial proof, with implicitly
negated literals as assumptions. - Argument sequence of rules
23Attacking arguments
- Two fundamental kinds of attack
- A undercuts B A invalidates premise of B
- P Lets go to the lake as it is not snowing
anymore - O Hang, it is snowing
- A rebuts B A contradicts B
- P Lets go to the lake as it is not snowing
- O Lets not, as Ive got to prepare my talk
- Derived notions of attack used in Literature
- A attacks B A u B or A r B
- A defeats B A u B or (A r B and not B u A)
- A strongly attacks B A a B and not B u A
- A strongly undercuts B A u B and not B u A
24Proposition Hierarchy of attacks
Attacks a u ? r
Defeats d u ? ( r - u -1)
Undercuts u
Strongly attacks sa (u ? r ) - u -1
Strongly undercuts su u - u -1
25Fixpoint Semantics
- Argumentation
- game between proponent and opponent
- argument A is acceptable if opponents x-attack
is countered by proponents y-attack, which
proponent already accepted earlier. - Acceptable
- Let x,y be notions of attack.
- An argument A is x,y-acceptable w.r.t. a set of
arguments S iff - for every argument B, such that (B,A) ? x, there
is a C ? S such that (C,B) ? y - Fixpoint semantics
- Fx/y (S) A A is x,y-acceptable w.r.t. S
- x/y-justified arguments Least Fixpoint of Fx/y.
- x/y-overruled arguments x-attacked by a
justified argument. - x/y-defensible iff neither justified nor overruled
26Theorem Relationship of semantics
- Weakening opponent or strengthening proponent
increases justified arguments - Different notions of acceptability give rise to
different argumentation semantics
Prakken and Sartorssemantics w/o priorities
If opponent is allowed to attack,type of defense
does not matter
If opponent is allowed defeat,type of defense
does not matter
Dungs groundedargumentation semantics
WFSX
su/asu/d
If opponent is allowed undercut,defense with
(a,u,sa) or without(su,u) rebut makes a
difference
su/u
su/sa
sa/usa/dsa/a
su/su
u/au/du/sa
sa/susa/sa
u/suu/u
d/sud/ud/ad/dd/sa
a/sua/ua/aa/da/sa
27Proof procedure
- Dialogues
- x/y-dialogue is sequence of moves such that
- Proponent and Opponent alternate
- Players cannot repeat arguments
- Opponent x-attacks Proponents last argument
- Proponent y-attacks Opponents last argument
- Player wins dialogue if other player cannot move
- Argument A is provably justified if proponent
wins all branches of dialogue tree with root A - Concrete implementation SLXA
- Since u/au/du/saWFSX ? compute justified
arguments with top-down proof procedure SLXA for
WFSX Alferes, Damasio, Pereira - SLXA can be adapted for other notions
28Part I.bFuzzy unification and argumentation
fdFB
fdDB
- r relational
- f fuzzy
- d deductive
- DB database
- FB factbase
dDB
dFB
fDB
fFB
rDB
rFB
fuzzy
deductive
negation
29Classical Fuzzy Logic
- Solution
- Truth values in 0,1 instead of 0,1.
- Assertions
- pV (p a formula, V a truth value).
- Conjunction
- pV, qW ? p ? q min(V,W)
- Disjunction
- pV, qW ? p ? q max(V,W)
- Inference
- p ? q1, , qn q1V1, , qnVn ? p min(V1,
, Vn)
30Fuzzy Negation
- Classical fuzzy negation
- LV ? ?L 1-V (Zadeh)
- Our setting (fuzzy adaptation of WFSX)
- LV and ?LV with V ? 1-V possible
- L and ?L not directly related.
31Fuzzy Coherence Principle
- If ?LV and V gt 0, and not LV,
- then V gt V.
- If there is some explicit evidence that L is
false, then there is at least the same evidence
that L is false by default. - If ?LV and V gt 0,
- then not L 1.
32Law of excluded... ...contradiction ...middle
- not p ? p V
- ? V gt 0
- ?p ? p V
- ? V 0 possible
- ? p is unknown
- ?p ? p V
- ? V gt 0 possible
- Contradictory programs!
- not p ? p V
- ? V gt 0 possible
- By coherence principle!
- ? Contradiction removal
33Strength of an argument
- Strength of an argument
- Fact value is given
- Rule minimum of body literals
- Argument Conclusion
- Least fuzzy value of the facts contributing to
the argument.
34Theorems
- Theorem (Soundness and Completeness)
- There is a justified argument of strength V for
L - iff
- There is a successful T-tree of truth value V
for L - Theorem (Conservative Extension)
- Argumentation semantics is a conservative
extension of WFSX.
35Application Fuzzy unification
- Open systems
- knowledge and ontologies may not match
- interaction with humans
- Does Mike live in Londn?
- Approach
- address(mike,london) address(mike,londn) 95
- adapt unification algorithm(normalised edit
distance over trees net) - embed into argumentation framework
36Finding Mismatches Edit distance
- Edit distance between strings A and B
- minimal number of delete, add, replace operations
to convert A into B. - efficient implementation with dynamic programming
- Example
- e(address,adresse)2, e(007,aa7)2
- Normalise
- ne(A,B) e(A,B) / max A, B
- Trees
- net sum of all mismatches divided by sum of
all max lengths
37Fuzzy unification and arguments
- net is conservative extension of MGU (most
general unifier) - net(t,t) ? ne(t,t)
- Adapt definition of argument for fuzzy
unification - V-argument for all L in a body, there is L in
head such that net(L,L) ? 1-V - A V-undercuts B if A contains not L and Bs head
is L and net(L,L) ? 1-V - A V-rebuts B if As head is L and Bs head is ?L
and net(L,L) ? 1-V - Adapt previous definitions accordingly
38Comparison Argumentation
- Our framework allows us to relate existing and
new argumentation semantics - Dung a/sua/ua/aa/da/sa
- PrakkenSartor d/sud/ud/ad/dd/sa
- WFSX u/a u/d u/sa
- Dung ? PrakkenSartor ? WFSX
- Proof Theory and Top-down Proof Procedure adapted
from Alferes, Damasio, Pereiras SLXA
39Comparison Fuzzy Argumentation
- Wagner
- Scale -1 to 1
- Unlike WFSX, he relates? F and F ? F -V iff
FV - We adopted his interpretation for not not F1
if? FV, Vgt0 - Relates his work to stable models, but there is
no top-down proof procedure for stable models
AlferesPereira - Our approach conservatively extends WFSX, hence
we can adapt proof procedure SLXA
40Comparison Fuzzy unification
- Arcelli, Formato, Gerla
- define abstract fuzzy unification/resolution
framework - cannot deal with missing parameters (common
problem Fung et al.) - no conservative extension of classical
unification - we use concrete distance edit distance
- Evaluated idea on bioinfo DB
41Conclusion
- A database needs two kinds of negation (Wagner)
- Argumentation is an elegant way of defining
semantics - Our framework allows classification of various
new and existing semantics - Efficient top-down proof procedure for justified
arguments - Argumentation as basis for belief revision
(REVISE) - We cover the whole knowledge system cube
including fuzzy argumentation - Defined fuzzy unification, which is useful in
open systems
42Part I.c Vivid Agent
- A vivid agent is a software-controlled system,
- whose state is represented by a knowledge base
and - whose behaviour is represented by
- action- and
- reaction rules
- Actions are planned and executed to achieve a
goal - Reactions are triggered by events
- Epistemic RR Effect lt- Event, Cond
- Physical RR Action, Effect lt- Event, Cond
- Interaction RR Msg, Effect lt- Event, Cond
43Vivid Agent
Interface
Events
Reaction Rules
Perception Reaction Cycle
Believes/ Updates
KB
44Agent State and Transition Semantics
- Agent State
- Event queue, Plan queue, Goal queue, Knowledge
base - Transition semantics
- Perception
- Add event to agents event queue
- Reaction
- Pop event from event queue, execute reactions
including update of knowledge base - Plan execution
- Execute action of plan in plan queue
- Replanning
- If action fails, replan
- Planning
- Pop goal from goal queue and generate plan
45Implementation in Prova
- Original Implementation in PVM-Prolog
- Course-grain parallelism (PVM) for each agent and
Prolog threads for an agents components - Currently Prova
- is a Java-based rule engine
- easy integration of all kinds of data sources.
e.g., database, web services, etc.
46Part II Application to Bioinformatics
- NSF and EUs strategic research workshop found
that bioinformatics could play the role for the
semantic web, which physics played for the web. - Why?
- Masses of information
- Masses of publicly accessible online information
- (e.g. 8000 abstracts per month and over 500
tools) - Data (more and more often) published in XML
- Data standards are accepted and actively
developed - Much valuable information scattered (as
production cheap and hence not centralised) - Systemsintegration and interoperation prime
concern (e.g. GeneOntology)
47Example Information Agents for
- Protein interactions
- PDB, SCOP
- Protein annotation
- TOPPred, HMMTOP,
- Information source
- Wrapper
- Mediator
- Facilitator
Facilitator
Mediator
Wrapper
Source
Wrapper
Wrapper
48Example 1 Protein Interaction
- PDB Protein structures
- SCOP Structure classification
49Example 1 PSIMAP Structural Interactions
50Example 1 Protein Interaction How it is
currently done
- PDB 15 Gigabyte in flat files
- SCOP 3 flat files
- How?
- Download PDB, SCOP files
- Think up DB schema and populate MySQL DB
- Run some Perl scripts on various machines, that
grind through the data and analyse it - Run some Java to visualise results
- Problem Business logic not separated
51How our Prova system can run execute
Might be held locally in file, remotely from a
DB, through a web service, on the grid, etc.
- Declarative and executable specifications
- Interaction(Superfamliy1, Superfamliy2) if
- PDB(Protein),
- Domain(Protein,Domain1),
- Domain(Protein,Domain2),
- SCOP Superfamily(Domain1, Superfamily1),
- SCOP Superfamily(Domain2, Superfamily2),
- InteractionDD(Domain1,Domain2, 5 Ang, 5 Residues)
- Separation of information integration workflow
- Easier to maintain
- Platform independence, because of Java
- Flexible, optimized execution
- Query optimization and load-balancing of
computations
Local or remote computation.
52Actual Prova Code
- ACTUAL PROVA CODE
- Given the open database connection DB
- and a unique protein identifier in Protein
- Data Bank PDB_ID, test whether the provided
- domains with IDs PXA and PXB interact
- (have at least 5 atoms within 5 angstroms)
- scop_dom2dom(DB,PDB_ID,PXA,PXB) -
- access_data(pdb,PDB_ID,Protein),
- scop_dom_atoms(DB,Protein,PXA,DomainA),
- scop_dom_atoms(DB,Protein,PXB,DomainB),
- DomainA.interacts(DomainB).
53Caching
- Two alternative rules for either retrieving
data - from the cache or accessing the data from its
- original location and caching it.
- access_data(Type,ID,Data,CacheData) -
- Attempt to retrieve the data
- DataCacheData.get(ID),
- Success, Data (whatever object it is) is
returned - !.
- access_data(Type,ID,Data,CacheData) -
- Retrieve the data from its location and
update the cache - retrieve_data_general(Type,ID,Data),
- update_cache(Type,ID,Data,CacheData).
54Example 2 GoPubmed
55Consistency of GO
- Simple example
- Parsimony If A is-a C is explicitly stated in
the ontology, it should be possible to derive it
implicitly - I.e. Dont state A is-a C if you have already A
is-a B and B is-a C - Done with Prova
56Towards functional annotation through GoPubmed
Protein Name/Enzyme activity hydrolase kinase transferase lyase isomerase one other
Pyruvate kinase M1 isozyme X X X X X oxireductase
CAMP dpt protein kinase type II regulatory chain X X X X cyclase
Galactokinase X X X X X
Tropomyosin bêta chain X X X X
HnRNP DO X X X X helicase
57Example 3 Consistent Integration of Protein
Annotation
58Conflicts
59Example Edit2TrEMBL
- EditToTrEMBL (Steffen Möller, EBI) automate
annotation of DNA sequences by combining results
of various tools and databases, which are online
Dispatcher
Info object
Info object
Analyser
Analyser
Info object
Info object
Host
Host
Info object
Host
Analyser
Host
Info object
Info object
Info object
60Challenge
- Uncertain, incomplete, vague, contradictory
information - Wrappers domains overlap How can mediator
resolve conflicts? - How can mediator integrate information
consistently? - How can mediator improve info quality using
overlapping info and inconsistencies - Mediator contains conflict resolution component
- Semantic conflict resolution requires domain
knowledge to identify conflicts - We use extended logic programming
Facilitator
Mediator
Wrapper
Source
Wrapper
Wrapper
Common Problem Overlapping information can lead
to inconsistencies
Solution Semantic consistency checking
61Modelling domain knowledge
- Facts, Rules, Assumptions, Integrity
ConstraintsFor example - The length of transmembrane regions is
limitedfalse if ft(AccNo,transmembrane,From,To),
To-From gt25false if ft(AccNo,transmembrane,From,
To), To-From lt15 - Maximal difference in membrane bordersfalse if
ft(Agent1,Acc,transmembrane,From1,To1),
ft(Agent2,Acc,transmembrane,From2,To2),
(From1gtFrom2,From1ltTo2To1gtFrom2,To1ltTo2), (ab
s(From2-From1)gt4abs(To2-To1)gt4). - Assessment of predictionsprobability(ft(tmhmm,p1
2345,transmem,6,26), 0.5)
62REVISE
- REVISE detects conflicting arguments and computes
minimal set of assumptions, which removes
conflict - Dropping these assumptions yields minimal
consistent annotation of all predictions - Minimality is based on probabilities given as
part of predictions - alternative cardinality, set-inclusion
63Vision A semantic Grid for Bioinformatics
64Conclusion
- Advanced applications on the web, will require
rules and reasoning - Part I
- Argumentation is an elegant way of defining
semantics - Classification of various new and existing
semantics - Fuzzy reasoning and unification
- Reactivity with vivid agents and prova
- Part II
- Bioinformatics requires a semantic web and the
semantic web requires bioinformatics
65Acknowledgment
- Ralf Schweimeier (Argumentation semantics)
- Panos Dafas, Dan Bolser (PSIMAP)
- Steffen Moeller (Edit2Trembl)
- David Gilbert (Fuzzy Unification)
- Ralph Delfs, Alexander Kozlenkov (Go, Prova)
- Carlos Damasio (REVISE)
- More information at comas.soi.city.ac.uk
- Email ms_at_mpi-cbg.de