BioMAS: A Multi-Agent System for Automated Genomic Annotation presentation

About This Presentation

Transcript and Presenter's Notes

Title: BioMAS: A Multi-Agent System for Automated Genomic Annotation

1
BioMAS A Multi-Agent System for Automated
Genomic Annotation

Keith Decker
Department of Computer and Information
SciencesUniversity of Delaware

Salim Khan, Ravi Makkena, Gang Situ Computer
Information Sciences
Dr. Carl Schmidt, Heebal Kim Animal Food
Sciences
2
Outline

General class of problems and MAS solution
approach
BioMAS Automated Genomic Annotation
HVDB HerpesVirus Database
ChickDB Gallus Gallus Database
GOFigure!
CoPrDom
Signal Transduction Pathway Discovery

3
What problems are we addressing?

Huge, dynamic Primary Source Databases
Highly distributed, overlapping
Heterogeneous content, structure, curation
Multitude of analysis algorithms
Different interfaces, output formats
Create contingent process plans chaining many
analyses together
Individual PIs, working on non-model organisms
Learn, then hand-navigate sea of DBs and analysis
tools
Easily overwhelmed by new sequence and EST data
Struggle to make results available usefully to
others

4
Approach Multi-Agent Information Gathering

Software agents for information retrieval,
filtering, integration, analysis, and display
Embody heterogeneous database technology
(wrappers, mediators, )
Deal with dynamic data and changing data sources
Efficient and robust distributed computation (for
both info retrieval and analysis)
Deal with issues of data organization and
ownership
Natural approach to providing integrated
information
To humans via web
To other agents via semantic markup XML/OIL/DAML

5
Example Multi-Agent System for Automated
Herpesvirus Annotation

Input raw sequence data
Output an annotated database that allows fairly
complex queries
BLAST homologs
Motifs
Protein domains Prodomain records
PSORT sub-cellular location predictions
GO Gene Ontology electronic annotation
Show me all the genes in Mareks Disease virus
with a tyrosine phosphorylation motif and a
transmembrane domain value 2

6
(No Transcript)
7
(No Transcript)
8
(No Transcript)
9
How does this help?

Automates collection of information from various
primary source databases
If the info changes, can be updated
automatically. PI can be notified.
Allows various analyses to be done automatically
Can encode complex (contingent) sequences of info
retrieval and linked analyses, report interesting
results only
New data sources, annotation, analyses can be
applied as they are developed, automatically
(open system)
Made available on internet to others, or private
data
Much more sophisticated queries than keyword
search
Dynamic menu of keys
Concept hierarchies (ontology) allow more
concise queries
Query planning (e.g., time, resource usage)
Can search across multiple databases (i.e., from
other researchers)

10
How does it work?
RETSINA-style Multi-Agent Organization
11
DECAF A multi-agent system toolkit

Focus on programming agents, not designing
internal architecture
Programming at the multi-agent level
Value-added architecture
Support for persistent, flexible, robust actions

12
DECAF

Focus on programming agents, not designing
internal architecture
Avoiding the API approach
DECAF as agent operating system, programmers
have strictly limited access
Communication, planning, scheduling,
coordination, execution
Graphical dataflow plan editor

13
DECAF

Programming at the multi-agent level
Standardized, domain-independent, reusable
middle agents
Agent Name Server (white pages)
Matchmaker (yellow pages/directory service)
Brokers (managers)
Information extraction (learning STALKER
knowledgebase PARKA)
Proxy (web interfaces)
Agent Management Agent (debugging, demos,
external control)
Note heterogeneous architectures are OK!

14
DECAF

Value-added architecture
Taking care of details (social/individual)
ANS registration/dereg (eventually MM)
Standard behaviors (AMA, error, FIPA, libraries)
Message dispatching (ontology, conversation)
Coordination (GPGP)
Efficient use of computational resources
Highly threaded internally domain actions
Memory efficient (ran systems for weeks, hundreds
of thousands of messages)

15
DECAF

Support for persistent, flexible, robust actions
HTN-style programming
Task alternatives and contingencies
RETSINA-style dataflow
Provisions/Parameters determine task activation
Multiple outcomes, Loops
TÆMS-style task network annotations
Dynamic overall utility Quality, cost, duration
task characteristics
Explicit representation of non-local tasks
Example Time/Quality tradeoff

16
DECAF Architecture
Incoming KQML/FIPA messages
Plan file
Incoming Message Queue
Objectives Queue
Task Queue
Agenda Queue
Agent Initialization
Dispatcher
Planner
Scheduler
Executor
Pending Action Queue
Task Templates Hash Table
Action Results Queue
concurrent
Domain Facts and Beliefs
Outgoing KQML/FIPA messages
17
Plan Editor
18
Expanding the Genomic Annotation System
19
Functional Annotation Suborganization
Gene Ontology Consortium www.geneontology.org
Biological process Molecular Function
Cellular Component
20
(No Transcript)
21
(No Transcript)
22
Co-present Domain Networks (CoPrDom)

Proteins can be viewed as conserved sets of
domains
Vertex domain, edge co-present in some
protein, edge weight of proteins co-present
in
Network constructed from InterPro domain markup
of proteins in 10 species (human, drosophila, c.
elegans, s. cerevisiae among them)
Functional characterization via InterPro to GO
mapping
Network constructed per organism per functional
group, eg apoptosis regulation in human

23
(No Transcript)
24
Uses for COPRDOM

Functional characterization of unknown domains
Identification of core domains/groups in a
functional group
Tracking domain evolution through species
evolution
Predicting protein-protein interaction by
identifying evolutionary merging of domain groups

25
Biological Pathway Discovery thru AI Planning
Techniques

AI planning is a computational method to develop
complex plans of action using the representation
of the initial states, the actions which
manipulate these states to achieve the goal
states specified.
Initial States The initial state representation
of objects in the "plan world"
Actions Logical descriptions of preconditions
and effects
Goals The end states desired
HTN (Hierarchical Task Network) Planning proceeds
by task decompostion of networks, and a
successful is one that satisfies a task network.

26
Uses of the Signal Transduction Planner

To produce computer interpretable plans capturing
relevant qualitative information regarding signal
transduction pathways.
To produce testable hypotheses regarding gaps in
knowledge of the pathway, and drive future signal
transduction research in an ordered manner.
To identify key nodes where many pathways are
regulated by a node with only 1 functional
protein serving as a critical checkpoint.
To perform in silico experiments of hyper
expression and deletion mutation.
To enable pathway vizualization tools by
providing human- and machine-readable pathway
description.

27
Advantages of Planning

Operator schema Abstracted axiomatic definitions
of sub-cellular processes, understandable to
human computer
Task abstraction Decomposition of complex task
into simpler, interchangeable actions.
Reduces search space, conflicts
Modeling of pathways at different levels of
biochemical detail
Search conducted in Plan Space Most planners
perform bi-directional search (vs. Pathway Tools,
Prolog implementations, etc.)
Partial-order Planning Succinct representation
of multiple pathways helps identify key causal
relationships

28
Advantages of Planning (contd.)

Conditional effects can be used to model special
cases ("exceptions") when applying operator
schema
Resource Utilization can be used to model
quantitative aspects such as amplification of a
signal, feedback and feed-forward loops
Plan re-use Old plans can be successfully
inserted into new ones (if initial and final
conditions are met )without additional
computation

29
(ontologically driven) Operator Schema Example
Transport

(action transport
parameters (?mol - macromolecule,
?compfrom, ?compto - compartment)
condition (and (in ?mol ?compfrom)
(open ?compfrom ?compto))
effects (and (in ?mol ?compto)
(not (in ?mol ?compfrom)))

30
RTK-MAPK pathway
Activation of Ras following binding of a hormone
(eg. EGF) to a receptor
31
RTK-MAPK pathway step O-Plan Output
Phosphorylation of GRB2 at domain Sh2 by the RTK
receptor
32
Summary

Bioinformatics has many features amenable to
multi-agent information gathering approach
BioMAS Automated Analysis EST processing to
functional annotation ontologies
DECAF / RETSINA / TÆMS
GOFigure! And electronic GO annotation
CoPrDom Co-Present Domain Analysis
Signal Transduction Pathway Discovery

33
BioMAS Future Work

Sophisticated queries are possible, but how to
make available to Biologists??
Show me all glycoproteins in Mareks Disease
virus with a tyrosine phosphorylation motif and a
transmembrane domain value 2 that are expressed
in feather follicles
Robustness, efficiency, scale, data
materialization issues
Automating and integrating more complex analysis
processes (using existing software!)
Estimating physical location of genes by synteny
Integrate new data sources
Microarray and other gene expression data
And thus, more analyses QTL mapping, metabolic
pathway learning
New off-site organism databases and analysis
agents

http//udgenome.ags.udel.edu/
http//www.cis.udel.edu/decaf/

Write a Comment

User Comments (0)

About PowerShow.com

BioMAS: A Multi-Agent System for Automated Genomic Annotation PowerPoint PPT Presentation