WP6%20Part%201:%20Bioinformatics - PowerPoint PPT Presentation

About This Presentation
Title:

WP6%20Part%201:%20Bioinformatics

Description:

WP6 Part 1: Bioinformatics Presenters: Xueping Quan, Marco Schorlemmer, Dave Robertson First results passed peer review Working on more extensive proteomics knowledge ... – PowerPoint PPT presentation

Number of Views:84
Avg rating:3.0/5.0
Slides: 27
Provided by: acuk
Category:

less

Transcript and Presenter's Notes

Title: WP6%20Part%201:%20Bioinformatics


1
WP6 Part 1 Bioinformatics
Presenters Xueping Quan, Marco Schorlemmer, Dave
Robertson
  • First results passed peer review
  • Working on more extensive proteomics knowledge
    sharing
  • Library of existing services collated
  • Library of LCC experiment protocols underway

2
OK From an Experimenters Viewpoint
  • Interaction model Experiment design
  • Experimental roles allocated to peers
  • Constraints prescribe methods on peers
  • Message passing synchronises tasks
  • Formal model gives
  • Automation, extending experiment repertoire
  • Repeatability, because we preserve state
  • Scrutiny, for reviewers

3
P2P Proteomics
  • Proteome is the protein equivalent of the genome
  • Proteomics studies the quantitative changes
    occurring in a proteome and its application for
  • disease diagnostics
  • therapy
  • drug development

4
Peer-to-Peer Experimentation in Protein Structure
Prediction an Architecture, Experiment and
Initial Results
5
Experiment - Consistency Checking
  • Taking a non-expert users perspective
  • Applied Bioinformatics - Whom to believe??
  • Note
  • This Scenario needs to allow for passive
    peers
  • to incorporate knowledge from the large
    number of
  • traditional bioinformatics resources
    (databases etc.)

Comparison of server results for
consistency typically increases confidence in the
result.
6
Experiment Consistency Checking
Step1 Proxy per service allowing data retrieving
from passive peers. Each query is
related to the appropriate service.
query (input, keyword, ID, sequence, etc. )
data relating to input
Proxies (Wrappers)
Interfaces (WSDL, etc)
Application
Database
Web Server
7
Experiment Consistency Checking
Step 2 Automated harvesting of results for
targets and collation to allow easy comparison
of answers. Scientist logs local opinion on
relative quality of (passive) other peers for
each target and caches the most important
positive and/or negative results.
Local database of trusted results with provenance
Polling multiple sites
8
Experiment Specific Task
  • Extend structural knowledge through modelling
  • Find fragments of 3D-models of S.cerevisiae
    (yeast)
  • proteins that can be trusted
  • 6604 yeast protein sequences (some predicted)
  • currently 330 known 3D-structures (in PDB)

(Popular strategy, typically accomplished with
the help of a meta-WWW-server today.)
9
Databases of pre-computed 3D-models
SWISS restrictive non-redundant high-quality models only (SWISSMODEL)
SAM yeast models complete (at least one model per ID) redundant raw models (SAM-T06 / UNDERTAKER)
ModBase permissive highly redundant pre-filtered before the task (PSI-BLAST / MODELLER)
10
Complications True and False Redundancy
Example 1 highly redundant set
Example 2 multi-domain proteins non-redundant
sets (lt 90 overlap)
11
Databases of pre-computed 3D-models
SWISS 769 models
SAM yeast models 2211 models (selected top model if E-value lt 10-3)
ModBase 2546 models (pre-filtered sequence-id gt 20 score gt 0.7 E-value lt 10-6)
12
Implementation using LCC interpreter
  • multi-agent interaction coordination through
    service composition
  • LCC interpreter
  • loosely based on electronic societies (of peers)
  • uses WSDL as standard
  • For more information please refer to
  • Xueping Quan, Chris Walton, Dietlind L
    Gerloff, Joanna L Sharman and Dave Robertson,
    GCCB2006.
  • to be superseded by (more flexible) OK-kernel

13
Implementation using LCC Interpreter
14
LCC Protocol
a(data_collator, X) data_request(Is) lt
a(experimenter, E) then
a(data_collector(Is,Sp,Sd),X) ? yeast_id(Is) and
source(Sp) then filter(Is,Sp,Sd) gt
a(data_filter((Is,Sp,Sd),F) then
filtered(Is,Sp,S) lt a(data_filter(Is,Sp,Sd),F)
then filtered(Is,Sp,S) gt
a(data_comparer,C) then
data_compared(Is,SF) lt a(data_comparer,C) then
data_compared(Is,SF) gt a(experimenter,E)
then data_compared(Is,SF) gt
a(data_publisher,PU) a(experimenter, E)
data_request(Is) gt a(data_collator, X) then
data_compared(Is,SF) lt a(data_collator,
X) a(data_collector(Is,Sp,Sd),X) ( null
? Sp and Sd) or (
a(data_retriever(I,P,D),X) ? (SpPRp and
SdDRd and IsIRi) then
a(data_collector(Ri,Rp,Rd),X) ) a(data_retriever(I
,P,D),X) data_request(I) gt
a(data_source,P) then data_report(I,D)
lt a(data_source,P) a(data_filter(I,Sp,Sd),F)
filter(I,Sp,Sd) lt a(data_collator,X) then
filtered(I,Sp,S) gt a(data_collator,X) ?
apply_filter(Sd,S) a(data_source,P)
data_request(I) lt a(data_retriever(I,P,D),X)
then data_report(I,D) gt
a(data_retriever(I,P,D),X) ? lookup(I,D) a(data_co
mparer,C) filtered(Is,Sp,S) lt
a(data_collator,X) then data_compared(Is,SF)
gt a(data_collator,X) ? consistency_check(S,SF)
15
MaxSub - Examples
  • pair-wise, sequence-dependent
  • finds common substructure (shown in blue)

16
Results
  • CYSP
  • Comparison of Yeast 3D Structure Predictions
  • 578 three-way supported
  • MaxSub-substructures gt 45 aa
  • from 545 proteins
  • (Linked from www.openk.org)

Pair-wise MaxSub Comparisons
SWISS ModBase SAM
SWISS 769 (717) 649 (594) 585 (559)
ModBase 2546 (2280) 620 (594)
SAM 2211 (2211)
17
Proteomic Analysis
  • Expression Proteomics
  • proteins are extracted from cells and tissues
  • proteins are separated
  • two dimensional cell electrophoresis
  • liquid chromatography
  • proteins are digested and identified
  • various mass spectrometry methods
  • Bioinformatic Analysis
  • primary, secondary, tertiary structures
  • sequence alignment and homology
  • motifs and domains
  • protein interactions and networks
  • Functional Proteomics

18
Expression Proteomics
19
Expression Proteomics
20
Peptide/Protein Identification
  • Sequencing information in archives that do not
    produce clear identifications rarely accessible
    to other groups
  • most part of it will never be reflected in
    protein DBs
  • information is trashed
  • Information of high importance for other groups
    analysing sequence/function of homologue proteins
  • contains sequences with post-translational
    modifications not to be found in current protein
    DBs
  • Spectra and sequence tags generated in one lab
    could be used by other labs to evaluate
    confidence of experimental or predicted sequences

21
Information Overflow
  • Proteomic analysis is currently an inhumane task
  • LC-MS analysis produces gt10,000 of spectra
  • each spectra yields (after sequencing and DB
    search) several peptide or peptide tag candidates
  • each step produces an identification score whose
    final evaluation is performed manually (using
    probability data)
  • Many proteomic labs are involved in the
    characterization of proteomes, protein complexes
    and networks
  • ? speed of information production increases very
    fast

22
Expression Proteomics
23
P2P Proteomics with OK
24
Sequence Identification Scenario
  • An investigator asks an identifier to match a
    sequence against proteomic labs repositories.
  • The identifier acts as a searcher inquiring each
    known proteomics lab retrieving hits for the
    given input sequence, collects results, and then
    sends them back to investigator.
  • The inquired proteomics lab could store high
    scoring queries to increase the reliability of
    the matching sequences.
  • The end-point process of sequence data-mining
    done by the proteomics lab is performed by Blast
    engines local to each peer.
  • The first prototype only matches input sequences
    next release could also directly accept mass
    spectra as input. For this task will us an OMSSA
    engine capable of matching spectra against the
    same sequence database used by Blast engine.

25
Sequence Identification IM in LCC
  • a(investigator,A)
  • identify(Seqs,P) gt a(identifier,B) ?
    get_sequences(Seqs,P) then
  • visualise(Result_set) ? answer(Result_set)
    lt a(identifier,B)
  • a(identifier,B)
  • identify(Seqs,P) lt a(investigator,A) then
  • a(searcher(Seqs,P,Ls,Result_set),B) ?
    lab_list(Ls) then
  • answer(Result_set) gt a(investigator,A) then
  • a(identifier,B)
  • a(searcher(Seqs,P,Ls,Result_set),B)
  • ( query(Seqs,P) gt a(proteomics_lab,L) ? Ls
    LRLs then
  • Result_set (Result,L)RSs ?
    answer(Result) lt a(proteomics_lab,L) then
  • a(searcher(Seqs,P,RLs,RSs) ) or
  • null ? Ls and Result_set
  • a(proteomics_lab,L)
  • query(Seqs,P) lt a(searcher(_,_,_,_),B) then
  • answer(Result) gt a(searcher(_,_,_,_),B) ?
    find_hit(Seqs,P,Result) then
  • a(proteomics_lab,L)

26
Step by Step
peer
message
constraint
An investigator uses a GUI to get an input
sequences and a set of parameters P
Investigator sends message identify(Seqs, P) to
an identifier
identifier retrieves a list of known proteomics
labs
identifier becomes searcher and sends a query to
the first proteomics_lab of the list
proteomics_lab resolves find_hit constraint and
sends back an answer with the result (i.e. an URL
for a XML file)
searcher loops the queries over the list of
proteomics_labs and collects results in a
result_set
searcher comes back to role identifier and sends
back result_set to investigator
investigator receives the result_set and displays
it on a GUI
investigator
identifier
identify(Seqs, P)
searcher
proteomics_lab
query(Seqs, P)
answer(result)
identifier
find_hit() constraint also kicks up a process
inside proteomics_lab peer which will store high
scoring queries
investigator
answer(result_set)
Write a Comment
User Comments (0)
About PowerShow.com