Combining rapid word searches with segmenttosegment alignment for sensitive similarity detection, do - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Combining rapid word searches with segmenttosegment alignment for sensitive similarity detection, do

Description:

0 IENQMYLDR HENQSYLAR At1g20840 putative sugar transporter protein 860340 261 3. 0 IENQMYLDR IENQRSLRR At1g30510 putative ferredoxin NADP reductase 1228095 15 3 ... – PowerPoint PPT presentation

Number of Views:106
Avg rating:3.0/5.0
Slides: 16
Provided by: bioinforma9
Category:

less

Transcript and Presenter's Notes

Title: Combining rapid word searches with segmenttosegment alignment for sensitive similarity detection, do


1
Combining rapid word searches with
segment-to-segment alignment for sensitive
similarity detection, domain identification and
structural modelling.Matej LEXA and Giorgio
VALLEMasaryk University, Brno, Czech
RepublicUniversity of Padova, Padova, Italy
2
PEPTIMEX 1.0
Output of PEPTIMEX local similarity search
showing all approximate occurrences of the
sequence MARNFLLVLL in the Arabidopsis thaliana
proteome.
3
PEPTIMEX 1.0
Number of non-redundant proteins at the 30-40
level PDBselect 3639 (out of 24000)
Arabidopsis gt15000 (out of
27288) SwissProt - nr 561808 (out of 1217288)
KELLLRYMVKTNQNQLPSPSPQ---SHFYNGNGYWFMSN ...
. HRLVLRYILCE--PDEGVHN
PLFRH-GLWNSQGYWFQMN
4
(No Transcript)
5
(No Transcript)
6
(No Transcript)
7
PROTEIN SIMILARITY RECORDS
mysqlgt SELECT FROM wordHits WHERE
queryProtID"At5g07210" AND hitProtID"15220515"
---------------------------------------------
----------------------------------------------
------------------------ wordHitID query
queryWord queryProtID queryPos hitCount
hitNo hitWord hitProtID hitProtPos
hitScore -----------------------------------
----------------------------------------------
----------------------------------
5016931 SHLQKYRI SHLQKYRI At5g07210
275 51 6 SHLQKYRL 15220515
239 3 5016532 ASHLQKYR
ASHLQKYR At5g07210 274 49
6 ASHLQKYR 15220515 238
0 5016069 VASHLQKY VASHLQKY
At5g07210 273 96 10
VASHLQKY 15220515 237 0
5015604 NVASHLQK NVASHLQK At5g07210
272 145 13 NVASHLQK 15220515
236 0 5015278 ENVASHLQ
ENVASHLQ At5g07210 271 109
9 ENVASHLQ 15220515 235
0 5014874 RENVASHL RENVASHL
At5g07210 270 143 18
RENVASHL 15220515 234 0
5014637 TRENVASH TRENVASH At5g07210
269 82 7 TRENVASH 15220515
233 0 5014285 LTRENVAS
LTRENVAS At5g07210 268 157
14 LTRENVAS 15220515 232
0 5014174 YLTRENVA YLTRENVA
At5g07210 267 101 8
WLTRENVA 15220515 231 4
5014017 PYLTRENV PYLTRENV At5g07210
266 36 4 PWLTRENV 15220515
230 4 5013952 VPYLTREN
VPYLTREN At5g07210 265 55
5 VPWLTREN 15220515 229
4 5011844 KAVPKKIL KAVPKKIL
At5g07210 253 88 9
KAGPKKIL 15220515 217 4
4916599 NVMVVDDD NVMVVDDD At5g07210
16 574 46 RVLVVDDD 15220515
11 5 ---------------------
----------------------------------------------
----------------------------------------------
-- 13 rows in set (0.10 sec) P(A)
145/580000 1/4000 145 proteins in
SwissProt/Trembl share the NVASHLQK matches



8
ANNOTATION EXAMPLE
9
ANNOTATION EXAMPLE
10
PROTEIN SIMILARITY RECORDS
mysqlgt SELECT FROM wordHits WHERE
queryProtID"At5g07210" AND hitProtID"15220515"
---------------------------------------------
----------------------------------------------
------------------------ wordHitID query
queryWord queryProtID queryPos hitCount
hitNo hitWord hitProtID hitProtPos
hitScore -----------------------------------
----------------------------------------------
----------------------------------
5016931 SHLQKYRI SHLQKYRI At5g07210
275 51 6 SHLQKYRL 15220515
239 3 5016532 ASHLQKYR
ASHLQKYR At5g07210 274 49
6 ASHLQKYR 15220515 238
0 5016069 VASHLQKY VASHLQKY
At5g07210 273 96 10
VASHLQKY 15220515 237 0
5015604 NVASHLQK NVASHLQK At5g07210
272 145 13 NVASHLQK 15220515
236 0 5015278 ENVASHLQ
ENVASHLQ At5g07210 271 109
9 ENVASHLQ 15220515 235
0 5014874 RENVASHL RENVASHL
At5g07210 270 143 18
RENVASHL 15220515 234 0
5014637 TRENVASH TRENVASH At5g07210
269 82 7 TRENVASH 15220515
233 0 5014285 LTRENVAS
LTRENVAS At5g07210 268 157
14 LTRENVAS 15220515 232
0 5014174 YLTRENVA YLTRENVA
At5g07210 267 101 8
WLTRENVA 15220515 231 4
5014017 PYLTRENV PYLTRENV At5g07210
266 36 4 PWLTRENV 15220515
230 4 5013952 VPYLTREN
VPYLTREN At5g07210 265 55
5 VPWLTREN 15220515 229
4 5011844 KAVPKKIL KAVPKKIL
At5g07210 253 88 9
KAGPKKIL 15220515 217 4
4916599 NVMVVDDD NVMVVDDD At5g07210
16 574 46 RVLVVDDD 15220515
11 5 ---------------------
----------------------------------------------
----------------------------------------------
-- 13 rows in set (0.10 sec) P(A,B)
P(A)P(B) 145/580000 574/580000
1/4000000 8 proteins in SwissProt/Trembl share
the NVMVVDDD and NVASHLQK matches



11
DOMAIN SEARCH
12
DOMAIN SEARCH
gt512
257-512
129-256
65-128
33-64
17-32
9-16
5-8
2-4
13
DOMAIN SEARCH
gt512
257-512
129-256
65-128
33-64
17-32
9-16
5-8
2-4
14
(No Transcript)
15
CONCLUSIONS
- rapid short word searches in proteins can be
used in several applications - search for
biological patterns, repeats, annotation -
similarity searches in protein databases and seq.
alignment - to provide words for linguistic
analysis of proteins
Write a Comment
User Comments (0)
About PowerShow.com