Reasoning on the Web: Theory, Challenges, and Applications in Bioinformatics - PowerPoint PPT Presentation

About This Presentation
Title:

Reasoning on the Web: Theory, Challenges, and Applications in Bioinformatics

Description:

Reasoning on the Web: Theory, Challenges, and Applications in Bioinformatics Contents Motivation Beyond the web: Rules, Reasoning, Semantics, Ontologies Semantics of ... – PowerPoint PPT presentation

Number of Views:236
Avg rating:3.0/5.0
Slides: 66
Provided by: Michael3067
Category:

less

Transcript and Presenter's Notes

Title: Reasoning on the Web: Theory, Challenges, and Applications in Bioinformatics


1
Reasoning on the WebTheory, Challenges, and
Applications in Bioinformatics
2
Contents
  • Motivation
  • Beyond the web Rules, Reasoning, Semantics,
    Ontologies
  • Semantics of Deduction Rules
  • Argumentation Semantics
  • Fuzzy Reasoning
  • Reaction rules
  • Vivid Agents
  • Prova
  • Applications in Bioinformatics

3
The Web
  • A great success story, but
  • its the web for humans, not machines
  • Many areas, such as biology, have fully embraced
    the web
  • Human genome project is only tip of the iceberg
  • More than 500 tools and databases online

4
Example Pubmed
  • gt12.000.000 literature abstracts
  • Great resource if one knows what one is looking
    for
  • Kox1 has 17 hits
  • But diabetes will produce gt200.000
  • Often need to automatically process abstracts

5
Results of PubMed
  • Lorenz P, Transcriptional repression mediated by
    the KRAB domain of the human C2H2 zinc finger
    protein Kox1/ZNF10 does not require histone
    deacetylation.Biol Chem. 2001 Apr382(4)637-44.
  • Fredericks WJ. An engineered PAX3-KRAB
    transcriptional repressor inhibits the malignant
    phenotype of alveolar rhabdomyosarcoma cells
    harboring the endogenous PAX3-FKHR oncogene.Mol
    Cell Biol. 2000 Jul20(14)5019-31....

However, to a machine things look different!
6
Results of PubMed
  • Lorenz P, Transcriptional repression mediated by
    the KRAB domain of the human C2H2 zinc finger
    protein Kox1/ZNF10 does not require histone
    deacetylation.Biol Chem. 2001 Apr382(4)637-44.
  • Fredericks WJ. An engineered PAX3-KRAB
    transcriptional repressor inhibits the malignant
    phenotype of alveolar rhabdomyosarcoma cells
    harboring the endogenous PAX3-FKHR oncogene.Mol
    Cell Biol. 2000 Jul20(14)5019-31....

Solution tag data (XML)
7
Results of PubMed
  • ltauthorgtLorenz Plt/authorgtlttitlegtTranscriptional
    repression mediated by the KRAB domain of the
    human C2H2 zinc finger protein Kox1/ZNF10 does
    not require histone deacetylation.
    lt/titlegtltjournalgtBiol Chem lt/journalgtltyeargt2001lty
    eargt
  • ltauthorgtLorenz Plt/authorgtlttitlegtTranscriptional
    repression mediated by the KRAB domain of the
    human C2H2 zinc finger protein Kox1/ZNF10 does
    not require histone deacetylation.
    lt/titlegtltjournalgtBiol Chem lt/journalgtltyeargt2001lty
    eargt
  • ...

However, to a machine things look different!
8
Results of PubMed
  • ltauthorgtLorenz Plt/authorgtlttitlegtTranscriptional
    repression mediated by the KRAB domain of the
    human C2H2 zinc finger protein Kox1/ZNF10 does
    not require histone deacetylation.
    lt/titlegtltjournalgtBiol Chem lt/journalgtltyeargt2001lty
    eargt
  • ltauthorgtLorenz Plt/authorgtlttitlegtTranscriptional
    repression mediated by the KRAB domain of the
    human C2H2 zinc finger protein Kox1/ZNF10 does
    not require histone deacetylation.
    lt/titlegtltjournalgtBiol Chem lt/journalgtltyeargt2001lty
    eargt
  • ...

Solution use ontologies (Semantic Web)
9
GeneOntology
  • Biologists have recognised the problem of
    semantic inter-operability between disparate
    information sources
  • GeneOntology (GO) is effort to provide common
    vocabulary for molecular biology
  • GO has gt10.000 terms in three branches
    function, process, localisation

10
GeneOntology
  • Has 13 levels
  • Width broadens to level 6 (3885 terms wide) then
    shrinks
  • Number of leaves per levels broadens to level 6
    (1223 leaves) then shrinks
  • Average term has 4 words
  • Maximal term has 29 words

Oxidoreductase activity, acting on paired donors,
with incorporation or reduction of molecular
oxygen, 2-oxoglutarate as one donor, and
incorporation of one atom each of oxygen into
both donors
Breadth of GO
11
Motivation Summary
  • Web in the old days
  • HTML (for humans)
  • Web these days
  • HTML
  • XML, Ontologies (for machines)
  • Web of the future
  • HTML
  • XML, Ontologies
  • rules, reasoning, semantics
  • access to computational resources (a la
    grid-computing)

12
Open Problems
  • Part I Theory of rules and reasoning on the web
  • Knowledge representation Which level of
    expressiveness?
  • Semantics How to guarantee inter-operability
  • Reasoning Fuzzy reasoning and unification
  • Reactivity Vivid agents
  • Part II Applications of rules and reasoning on
    the web
  • Integration and querying of information sources
  • Integration transmembrane prediction tools
  • Integration protein structure DB and structure
    classification
  • Consistency checking
  • Ontology If A is B and B is C, then the ontology
    should not explicitly mention A is C, as it is
    already implicit
  • Annotation Do different tools agree or
    disagree?

13
The wider Picture www.RuleML.org
  • Goal develop Web language for rules
  • using XML markup,
  • formal semantics, and
  • efficient implementations.
  • Rules derivation rules, transformation rules,
    and reaction rules.
  • RuleML can thus specify queries and inferences in
    Web ontologies, mappings between Web ontologies,
    and dynamic Web behaviors of workflows, services,
    and agents.
  • Currently, some 30 international members and
    close collaboration with W3C

14
The wider Picture REWERSE
  • Reasoning on the Web with Rules and Semantics
  • FP6 Network of Excellence with nearly 30 partners
  • Working groups on Infrastructure and Applications
  • Composition
  • Typing
  • Policies
  • Querying
  • Reactivity and evolution
  • Personalised Web sites
  • Calendar systems
  • Bioinformatics

15
Part I Theory
  • Motivation Expressive Knowledge Representation
  • Part I.a Argumentation as LP semantics
  • Notions of attack and justified arguments
  • Hierarchy of semantics
  • Proof procedure
  • Part I.b Fuzzy unification and argumentation
  • Fuzzy negation
  • Fuzzy argumentation
  • Fuzzy unification
  • Part I.c Vivid Agents

16
Part I.a A Hierarchy of Semantics
  • RuleML caters for different degrees of knowledge
    representation
  • A hierarchy of semantics is required to guarantee
    inter-operation.
  • Analogy In HTML, ltbgtMichaellt/bgt will be
    interpreted differently in Netscape (Michael) and
    the text-based browser Lynx (Michael).
  • Problem How can we guarantee inter-operability
    between different interpretations of rules?

17
Knowledge representation
  • Pete earns 500.000 p.a.
  • earns(pete,500000).
  • Cross the street if there are no cars
  • cross ? not car
  • cross ? ? car
  • The fridge is quite cheap
  • cheap(fridge)70
  • Does Mike live in Londn?
  • address(mike,london) address(mike,londn) 95

18
Knowledge System Cube
  • r relational
  • f fuzzy
  • d deductive
  • DB database
  • FB factbase

19
Part I.aArgumentation as semantics for
Extended Logic Programs
fdFB
fdDB
  • f fuzzy
  • d deductive
  • DB database
  • FB factbase

dDB
dFB
fDB
fFB
rDB
rFB
fuzzy
deductive
negation
20
Extended Logic Programming
  • Logic Programming with 2 negations
  • Default negation
  • not p true if all attempts to prove p fail.
  • Explicit negation
  • ?p falsehood of a literal may be stated
    explicitly.
  • Coherence principle
  • ?p ? not p

21
Argumentation
  • Interaction between agents in order to
  • gain knowledge
  • revise existing knowledge
  • convince the opponent
  • solve conflicts
  • Elegant way to define semantics for (extended)
    logic programming
  • Dung
  • Kowalski, Toni, Sadri
  • Prakken Sartor
  • Etc.

22
Arguments
  • An argument is a partial proof, with implicitly
    negated literals as assumptions.
  • Argument sequence of rules

23
Attacking arguments
  • Two fundamental kinds of attack
  • A undercuts B A invalidates premise of B
  • P Lets go to the lake as it is not snowing
    anymore
  • O Hang, it is snowing
  • A rebuts B A contradicts B
  • P Lets go to the lake as it is not snowing
  • O Lets not, as Ive got to prepare my talk
  • Derived notions of attack used in Literature
  • A attacks B A u B or A r B
  • A defeats B A u B or (A r B and not B u A)
  • A strongly attacks B A a B and not B u A
  • A strongly undercuts B A u B and not B u A

24
Proposition Hierarchy of attacks
Attacks a u ? r
Defeats d u ? ( r - u -1)
Undercuts u
Strongly attacks sa (u ? r ) - u -1
Strongly undercuts su u - u -1
25
Fixpoint Semantics
  • Argumentation
  • game between proponent and opponent
  • argument A is acceptable if opponents x-attack
    is countered by proponents y-attack, which
    proponent already accepted earlier.
  • Acceptable
  • Let x,y be notions of attack.
  • An argument A is x,y-acceptable w.r.t. a set of
    arguments S iff
  • for every argument B, such that (B,A) ? x, there
    is a C ? S such that (C,B) ? y
  • Fixpoint semantics
  • Fx/y (S) A A is x,y-acceptable w.r.t. S
  • x/y-justified arguments Least Fixpoint of Fx/y.
  • x/y-overruled arguments x-attacked by a
    justified argument.
  • x/y-defensible iff neither justified nor overruled

26
Theorem Relationship of semantics
  • Weakening opponent or strengthening proponent
    increases justified arguments
  • Different notions of acceptability give rise to
    different argumentation semantics

Prakken and Sartorssemantics w/o priorities
If opponent is allowed to attack,type of defense
does not matter
If opponent is allowed defeat,type of defense
does not matter
Dungs groundedargumentation semantics
WFSX
su/asu/d
If opponent is allowed undercut,defense with
(a,u,sa) or without(su,u) rebut makes a
difference
su/u
su/sa
sa/usa/dsa/a
su/su
u/au/du/sa
sa/susa/sa
u/suu/u
d/sud/ud/ad/dd/sa
a/sua/ua/aa/da/sa
27
Proof procedure
  • Dialogues
  • x/y-dialogue is sequence of moves such that
  • Proponent and Opponent alternate
  • Players cannot repeat arguments
  • Opponent x-attacks Proponents last argument
  • Proponent y-attacks Opponents last argument
  • Player wins dialogue if other player cannot move
  • Argument A is provably justified if proponent
    wins all branches of dialogue tree with root A
  • Concrete implementation SLXA
  • Since u/au/du/saWFSX ? compute justified
    arguments with top-down proof procedure SLXA for
    WFSX Alferes, Damasio, Pereira
  • SLXA can be adapted for other notions

28
Part I.bFuzzy unification and argumentation
fdFB
fdDB
  • r relational
  • f fuzzy
  • d deductive
  • DB database
  • FB factbase

dDB
dFB
fDB
fFB
rDB
rFB
fuzzy
deductive
negation
29
Classical Fuzzy Logic
  • Solution
  • Truth values in 0,1 instead of 0,1.
  • Assertions
  • pV (p a formula, V a truth value).
  • Conjunction
  • pV, qW ? p ? q min(V,W)
  • Disjunction
  • pV, qW ? p ? q max(V,W)
  • Inference
  • p ? q1, , qn q1V1, , qnVn ? p min(V1,
    , Vn)

30
Fuzzy Negation
  • Classical fuzzy negation
  • LV ? ?L 1-V (Zadeh)
  • Our setting (fuzzy adaptation of WFSX)
  • LV and ?LV with V ? 1-V possible
  • L and ?L not directly related.

31
Fuzzy Coherence Principle
  • If ?LV and V gt 0, and not LV,
  • then V gt V.
  • If there is some explicit evidence that L is
    false, then there is at least the same evidence
    that L is false by default.
  • If ?LV and V gt 0,
  • then not L 1.

32
Law of excluded... ...contradiction ...middle
  • not p ? p V
  • ? V gt 0
  • ?p ? p V
  • ? V 0 possible
  • ? p is unknown
  • ?p ? p V
  • ? V gt 0 possible
  • Contradictory programs!
  • not p ? p V
  • ? V gt 0 possible
  • By coherence principle!
  • ? Contradiction removal

33
Strength of an argument
  • Strength of an argument
  • Fact value is given
  • Rule minimum of body literals
  • Argument Conclusion
  • Least fuzzy value of the facts contributing to
    the argument.

34
Theorems
  • Theorem (Soundness and Completeness)
  • There is a justified argument of strength V for
    L
  • iff
  • There is a successful T-tree of truth value V
    for L
  • Theorem (Conservative Extension)
  • Argumentation semantics is a conservative
    extension of WFSX.

35
Application Fuzzy unification
  • Open systems
  • knowledge and ontologies may not match
  • interaction with humans
  • Does Mike live in Londn?
  • Approach
  • address(mike,london) address(mike,londn) 95
  • adapt unification algorithm(normalised edit
    distance over trees net)
  • embed into argumentation framework

36
Finding Mismatches Edit distance
  • Edit distance between strings A and B
  • minimal number of delete, add, replace operations
    to convert A into B.
  • efficient implementation with dynamic programming
  • Example
  • e(address,adresse)2, e(007,aa7)2
  • Normalise
  • ne(A,B) e(A,B) / max A, B
  • Trees
  • net sum of all mismatches divided by sum of
    all max lengths

37
Fuzzy unification and arguments
  • net is conservative extension of MGU (most
    general unifier)
  • net(t,t) ? ne(t,t)
  • Adapt definition of argument for fuzzy
    unification
  • V-argument for all L in a body, there is L in
    head such that net(L,L) ? 1-V
  • A V-undercuts B if A contains not L and Bs head
    is L and net(L,L) ? 1-V
  • A V-rebuts B if As head is L and Bs head is ?L
    and net(L,L) ? 1-V
  • Adapt previous definitions accordingly

38
Comparison Argumentation
  • Our framework allows us to relate existing and
    new argumentation semantics
  • Dung a/sua/ua/aa/da/sa
  • PrakkenSartor d/sud/ud/ad/dd/sa
  • WFSX u/a u/d u/sa
  • Dung ? PrakkenSartor ? WFSX
  • Proof Theory and Top-down Proof Procedure adapted
    from Alferes, Damasio, Pereiras SLXA

39
Comparison Fuzzy Argumentation
  • Wagner
  • Scale -1 to 1
  • Unlike WFSX, he relates? F and F ? F -V iff
    FV
  • We adopted his interpretation for not not F1
    if? FV, Vgt0
  • Relates his work to stable models, but there is
    no top-down proof procedure for stable models
    AlferesPereira
  • Our approach conservatively extends WFSX, hence
    we can adapt proof procedure SLXA

40
Comparison Fuzzy unification
  • Arcelli, Formato, Gerla
  • define abstract fuzzy unification/resolution
    framework
  • cannot deal with missing parameters (common
    problem Fung et al.)
  • no conservative extension of classical
    unification
  • we use concrete distance edit distance
  • Evaluated idea on bioinfo DB

41
Conclusion
  • A database needs two kinds of negation (Wagner)
  • Argumentation is an elegant way of defining
    semantics
  • Our framework allows classification of various
    new and existing semantics
  • Efficient top-down proof procedure for justified
    arguments
  • Argumentation as basis for belief revision
    (REVISE)
  • We cover the whole knowledge system cube
    including fuzzy argumentation
  • Defined fuzzy unification, which is useful in
    open systems

42
Part I.c Vivid Agent
  • A vivid agent is a software-controlled system,
  • whose state is represented by a knowledge base
    and
  • whose behaviour is represented by
  • action- and
  • reaction rules
  • Actions are planned and executed to achieve a
    goal
  • Reactions are triggered by events
  • Epistemic RR Effect lt- Event, Cond
  • Physical RR Action, Effect lt- Event, Cond
  • Interaction RR Msg, Effect lt- Event, Cond

43
Vivid Agent
Interface
Events
Reaction Rules
Perception Reaction Cycle
Believes/ Updates
KB
44
Agent State and Transition Semantics
  • Agent State
  • Event queue, Plan queue, Goal queue, Knowledge
    base
  • Transition semantics
  • Perception
  • Add event to agents event queue
  • Reaction
  • Pop event from event queue, execute reactions
    including update of knowledge base
  • Plan execution
  • Execute action of plan in plan queue
  • Replanning
  • If action fails, replan
  • Planning
  • Pop goal from goal queue and generate plan

45
Implementation in Prova
  • Original Implementation in PVM-Prolog
  • Course-grain parallelism (PVM) for each agent and
    Prolog threads for an agents components
  • Currently Prova
  • is a Java-based rule engine
  • easy integration of all kinds of data sources.
    e.g., database, web services, etc.

46
Part II Application to Bioinformatics
  • NSF and EUs strategic research workshop found
    that bioinformatics could play the role for the
    semantic web, which physics played for the web.
  • Why?
  • Masses of information
  • Masses of publicly accessible online information
  • (e.g. 8000 abstracts per month and over 500
    tools)
  • Data (more and more often) published in XML
  • Data standards are accepted and actively
    developed
  • Much valuable information scattered (as
    production cheap and hence not centralised)
  • Systemsintegration and interoperation prime
    concern (e.g. GeneOntology)

47
Example Information Agents for
  • Protein interactions
  • PDB, SCOP
  • Protein annotation
  • TOPPred, HMMTOP,
  • Information source
  • Wrapper
  • Mediator
  • Facilitator

Facilitator
Mediator
Wrapper
Source
Wrapper
Wrapper
48
Example 1 Protein Interaction
  • PDB Protein structures
  • SCOP Structure classification

49
Example 1 PSIMAP Structural Interactions
50
Example 1 Protein Interaction How it is
currently done
  • PDB 15 Gigabyte in flat files
  • SCOP 3 flat files
  • How?
  • Download PDB, SCOP files
  • Think up DB schema and populate MySQL DB
  • Run some Perl scripts on various machines, that
    grind through the data and analyse it
  • Run some Java to visualise results
  • Problem Business logic not separated

51
How our Prova system can run execute
Might be held locally in file, remotely from a
DB, through a web service, on the grid, etc.
  • Declarative and executable specifications
  • Interaction(Superfamliy1, Superfamliy2) if
  • PDB(Protein),
  • Domain(Protein,Domain1),
  • Domain(Protein,Domain2),
  • SCOP Superfamily(Domain1, Superfamily1),
  • SCOP Superfamily(Domain2, Superfamily2),
  • InteractionDD(Domain1,Domain2, 5 Ang, 5 Residues)
  • Separation of information integration workflow
  • Easier to maintain
  • Platform independence, because of Java
  • Flexible, optimized execution
  • Query optimization and load-balancing of
    computations

Local or remote computation.
52
Actual Prova Code
  • ACTUAL PROVA CODE
  • Given the open database connection DB
  • and a unique protein identifier in Protein
  • Data Bank PDB_ID, test whether the provided
  • domains with IDs PXA and PXB interact
  • (have at least 5 atoms within 5 angstroms)
  • scop_dom2dom(DB,PDB_ID,PXA,PXB) -
  • access_data(pdb,PDB_ID,Protein),
  • scop_dom_atoms(DB,Protein,PXA,DomainA),
  • scop_dom_atoms(DB,Protein,PXB,DomainB),
  • DomainA.interacts(DomainB).

53
Caching
  • Two alternative rules for either retrieving
    data
  • from the cache or accessing the data from its
  • original location and caching it.
  • access_data(Type,ID,Data,CacheData) -
  • Attempt to retrieve the data
  • DataCacheData.get(ID),
  • Success, Data (whatever object it is) is
    returned
  • !.
  • access_data(Type,ID,Data,CacheData) -
  • Retrieve the data from its location and
    update the cache
  • retrieve_data_general(Type,ID,Data),
  • update_cache(Type,ID,Data,CacheData).

54
Example 2 GoPubmed
55
Consistency of GO
  • Simple example
  • Parsimony If A is-a C is explicitly stated in
    the ontology, it should be possible to derive it
    implicitly
  • I.e. Dont state A is-a C if you have already A
    is-a B and B is-a C
  • Done with Prova

56
Towards functional annotation through GoPubmed
Protein Name/Enzyme activity hydrolase  kinase transferase lyase isomerase one other
Pyruvate kinase M1 isozyme X X X X X oxireductase
CAMP dpt protein kinase type II regulatory chain X X X X cyclase
Galactokinase X X X X X
Tropomyosin bêta chain X X X X
HnRNP DO X X X X helicase
57
Example 3 Consistent Integration of Protein
Annotation
58
Conflicts
59

Example Edit2TrEMBL
  • EditToTrEMBL (Steffen Möller, EBI) automate
    annotation of DNA sequences by combining results
    of various tools and databases, which are online

Dispatcher
Info object
Info object
Analyser
Analyser
Info object
Info object
Host
Host
Info object
Host
Analyser
Host
Info object
Info object
Info object
60
Challenge
  • Uncertain, incomplete, vague, contradictory
    information
  • Wrappers domains overlap How can mediator
    resolve conflicts?
  • How can mediator integrate information
    consistently?
  • How can mediator improve info quality using
    overlapping info and inconsistencies
  • Mediator contains conflict resolution component
  • Semantic conflict resolution requires domain
    knowledge to identify conflicts
  • We use extended logic programming

Facilitator
Mediator
Wrapper
Source
Wrapper
Wrapper
Common Problem Overlapping information can lead
to inconsistencies
Solution Semantic consistency checking
61
Modelling domain knowledge
  • Facts, Rules, Assumptions, Integrity
    ConstraintsFor example
  • The length of transmembrane regions is
    limitedfalse if ft(AccNo,transmembrane,From,To),
    To-From gt25false if ft(AccNo,transmembrane,From,
    To), To-From lt15
  • Maximal difference in membrane bordersfalse if
    ft(Agent1,Acc,transmembrane,From1,To1),
    ft(Agent2,Acc,transmembrane,From2,To2),
    (From1gtFrom2,From1ltTo2To1gtFrom2,To1ltTo2), (ab
    s(From2-From1)gt4abs(To2-To1)gt4).
  • Assessment of predictionsprobability(ft(tmhmm,p1
    2345,transmem,6,26), 0.5)

62
REVISE
  • REVISE detects conflicting arguments and computes
    minimal set of assumptions, which removes
    conflict
  • Dropping these assumptions yields minimal
    consistent annotation of all predictions
  • Minimality is based on probabilities given as
    part of predictions
  • alternative cardinality, set-inclusion

63
Vision A semantic Grid for Bioinformatics
64
Conclusion
  • Advanced applications on the web, will require
    rules and reasoning
  • Part I
  • Argumentation is an elegant way of defining
    semantics
  • Classification of various new and existing
    semantics
  • Fuzzy reasoning and unification
  • Reactivity with vivid agents and prova
  • Part II
  • Bioinformatics requires a semantic web and the
    semantic web requires bioinformatics

65
Acknowledgment
  • Ralf Schweimeier (Argumentation semantics)
  • Panos Dafas, Dan Bolser (PSIMAP)
  • Steffen Moeller (Edit2Trembl)
  • David Gilbert (Fuzzy Unification)
  • Ralph Delfs, Alexander Kozlenkov (Go, Prova)
  • Carlos Damasio (REVISE)
  • More information at comas.soi.city.ac.uk
  • Email ms_at_mpi-cbg.de
Write a Comment
User Comments (0)
About PowerShow.com