An%20Introduction%20to%20Multiple%20Sequence%20Alignments - PowerPoint PPT Presentation

View by Category
About This Presentation
Title:

An%20Introduction%20to%20Multiple%20Sequence%20Alignments

Description:

An Introduction to Multiple Sequence Alignments – PowerPoint PPT presentation

Number of Views:110
Avg rating:3.0/5.0
Slides: 108
Provided by: tcof
Learn more at: http://www.tcoffee.org
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: An%20Introduction%20to%20Multiple%20Sequence%20Alignments


1
An Introduction to Multiple Sequence Alignments
Cédric Notredame
2
(No Transcript)
3
Manguel M, Samaniego F.J., Abraham Walds Work
on Aircraft Suvivability, J. American
Statistical Association. 79, 259-270, (1984)
4
Our Scope
How Can I Use My Alignment?
How Does The Computer Align The Sequences?
How Can I Assemble a Mult. Aln?
What are the Difficulties?
5
Outline
-Why Do We Need Multiple Sequence Alignment ?
-The progressive Alignment Algorithm
-A possible Strategy
-Potential Difficulties
6
Pre-requisite
-How Do Sequences Evolve?
-How can We COMPARE Sequences ?
-How can We ALIGN Sequences ?
7
Why Do We Need Multiple Sequence Alignment ?
8
Sometimes Two Sequences Are Not Enough
9
What is A Multiple Sequence Alignment?
10
(No Transcript)
11
(No Transcript)
12
How Can I Use A Multiple Sequence Alignment?
BUT Conserved where it MATTERS
13
(No Transcript)
14
How Can I Use A Multiple Sequence Alignment?
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAK
KGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPK
NKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFR
S----KHSDLS-IVEMSKAAGAAWKELGP mouse
-----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLS
P . . .. . . .
. chite AATAKQNYIRALQEYERNGG- wheat
ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM---------
mouse AKDDRIRYDNEMKSWEEQMAE . .

Extrapolation
Prosite Patterns
15
How Can I Use A Multiple Sequence Alignment?
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAK
KGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPK
NKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFR
S----KHSDLS-IVEMSKAAGAAWKELGP mouse
-----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLS
P . . .. . . .
. chite AATAKQNYIRALQEYERNGG- wheat
ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM---------
mouse AKDDRIRYDNEMKSWEEQMAE . .

Extrapolation
P-K-R-PA-x(1)-ST
Prosite Patterns
16
How Can I Use A Multiple Sequence Alignment?
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAK
KGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPK
NKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFR
S----KHSDLS-IVEMSKAAGAAWKELGP mouse
-----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLS
P . . .. . . .
. chite AATAKQNYIRALQEYERNGG- wheat
ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM---------
mouse AKDDRIRYDNEMKSWEEQMAE . .

Extrapolation
Prosite Patterns
SwissProt
Uncharacterised Signature
Match?
17
How Can I Use A Multiple Sequence Alignment?
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAK
KGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPK
NKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFR
S----KHSDLS-IVEMSKAAGAAWKELGP mouse
-----KPKRPRSAYNIYVSESFQ----EAKDDS-IQGKLKLVNEAWKNLS
P . . .. . . .
. chite AATAKQNYIRALQEYERNGG- wheat
ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM---------
mouse AKDDRIRYDNEMKSWEEQMAE . .

Extrapolation
Prosite Patterns
Profiles And HMMs
-More Sensitive -More Specific
18
A PROSITE PROFILE
19
How Can I Use A Multiple Sequence Alignment?
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAK
KGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPK
NKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFR
S----KHSDLS-IVEMSKAAGAAWKELGP mouse
-----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLS
P . . .. . . .
. chite AATAKQNYIRALQEYERNGG- wheat
ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM---------
mouse AKDDRIRYDNEMKSWEEQMAE . .

Extrapolation
chite
wheat
Motifs/Patterns
trybr
mouse
Profiles
-Evolution -Paralogy/Orthology
Phylogeny
20
How Can I Use A Multiple Sequence Alignment?
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAK
KGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPK
NKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFR
S----KHSDLS-IVEMSKAAGAAWKELGP mouse
-----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLS
P . . .. . . .
. chite AATAKQNYIRALQEYERNGG- wheat
ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM---------
mouse AKDDRIRYDNEMKSWEEQMAE . .

Extrapolation
Motifs/Patterns
Profiles
Phylogeny
Struc. Prediction
21
How Can I Use A Multiple Sequence Alignment?
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAK
KGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPK
NKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFR
S----KHSDLS-IVEMSKAAGAAWKELGP mouse
-----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLS
P . . .. . . .
. chite AATAKQNYIRALQEYERNGG- wheat
ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM---------
mouse AKDDRIRYDNEMKSWEEQMAE . .

Extrapolation
PsiPred OR PhD For secondary Structure
Prediction 75 Accurate.
Motifs/Patterns
Profiles
Threading is improving but is not yet as good.
Phylogeny
Struc. Prediction
22
How Can I Use A Multiple Sequence Alignment?
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAK
KGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPK
NKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFR
S----KHSDLS-IVEMSKAAGAAWKELGP mouse
-----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLS
P . . .. . . .
. chite AATAKQNYIRALQEYERNGG- wheat
ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM---------
mouse AKDDRIRYDNEMKSWEEQMAE . .

23
(No Transcript)
24
Why Is It Difficult To Compute A multiple
Sequence Alignment?
A CROSSROAD PROBLEM
25
Why Is It Difficult To Compute A multiple
Sequence Alignment ?
BIOLOGY
COMPUTATION
CIRCULAR PROBLEM....
Good
Good
Alignment
Sequences
26
The Biological Problem.
Same as PairWise Alignment Problem
We do NOT know how Sequences Evolve.
We do NOT understand the Relation Between
Structures and Sequences.
We would NOT recognize the Correct Alignment if
we had it IN FRONT of our eyes
27
The Biological Problem. The Charlie Chaplin
Paradox
28
The Biological Problem. How to Evaluate an
Alignment
-Substitution Matrix (Blosum)
-An Evaluation Function
29
The COMPUTATIONAL Problem. Producing the Alignment
-Substitution Matrix (Blosum)
-An Evaluation Function
-An Alignment Algorithm
30
HOW CAN I ALIGN MANY SEQUENCES
2 Globins gt1 Min
31
HOW CAN I ALIGN MANY SEQUENCES
3 Globins gt2 hours
32
HOW CAN I ALIGN MANY SEQUENCES
4 Globins gt 10 days
33
HOW CAN I ALIGN MANY SEQUENCES
5 Globins gt 3 years
34
HOW CAN I ALIGN MANY SEQUENCES
! DHEA Loaded
6 Globins gt300 years
35
HOW CAN I ALIGN MANY SEQUENCES
7 Globins gt30. 000 years
Solidified Fossil, Old stuff
36
HOW CAN I ALIGN MANY SEQUENCES
8 Globins gt3 Million years
37
The Progressive Multiple Alignment
Algorithm (Clustal W)
38
(No Transcript)
39
Making An Alignment
Any Exact Method would be TOO SLOW
We will use a Heuristic Algorithm.
Progressive Alignment Algorithm is the most
Popular
-ClustalW
40
Progressive Alignment
Feng and Dolittle, 1988 Taylor 1989
Clustering
41
Progressive Alignment
42
Progressive Alignment
-Depends on the CHOICE of the sequences.
-Depends on the ORDER of the sequences (Tree).
  • -Depends on the PARAMETERS
  • Substitution Matrix.
  • Penalties (Gop, Gep).
  • Sequence Weight.
  • Tree making Algorithm.

43
Progressive Alignment When Does It Work
Works Well When Phylogeny is Dense
No outlayer Sequence.
Image River Crossing
44
Progressive Alignment When Doesnt It Work
45
(No Transcript)
46
Building the Right Multiple Sequence Alignment.
47
Recognizing The Right Sequences When you Meet
Them
48
Gathering Sequences BLAST
49
Common Mistake Sequences Too Closely Related
PRVA_MACFU SMTDLLNAEDIKKAVGAFSAIDSFDHKKFFQMVGL
KKKSADDVKKVFHILDKDKSGFIEE PRVA_HUMAN
SMTDLLNAEDIKKAVGAFSATDSFDHKKFFQMVGLKKKSADDVKKVFHML
DKDKSGFIEE PRVA_GERSP SMTDLLSAEDIKKAIGAFAAADSFDH
KKFFQMVGLKKKTPDDVKKVFHILDKDKSGFIEE PRVA_MOUSE
SMTDVLSAEDIKKAIGAFAAADSFDHKKFFQMVGLKKKNPDEVKKVFHIL
DKDKSGFIEE PRVA_RAT SMTDLLSAEDIKKAIGAFTAADSFDH
KKFFQMVGLKKKSADDVKKVFHILDKDKSGFIEE PRVA_RABIT
AMTELLNAEDIKKAIGAFAAAESFDHKKFFQMVGLKKKSTEDVKKVFHIL
DKDKSGFIEE .
.. PRVA_MACF
U DELGFILKGFSPDARDLSAKETKTLMAAGDKDGDGKIGVDEFSTLV
AES PRVA_HUMAN DELGFILKGFSPDARDLSAKETKMLMAAGDKDG
DGKIGVDEFSTLVAES PRVA_GERSP DELGFILKGFSSDARDLSAK
ETKTLLAAGDKDGDGKIGVEEFSTLVSES PRVA_MOUSE
DELGSILKGFSSDARDLSAKETKTLLAAGDKDGDGKIGVEEFSTLVAES
PRVA_RAT DELGSILKGFSSDARDLSAKETKTLMAAGDKDGDGKI
GVEEFSTLVAES PRVA_RABIT EELGFILKGFSPDARDLSVKETKT
LMAAGDKDGDGKIGADEFSTLVSES
.. .
-IDENTICAL SEQUENCES BRING NO INFORMATION FOR THE
MULTIPLE SEQUENCE ALIGNMENT
-MULTIPLE SEQUENCE ALIGNMENTS THRIVE ON DIVERSITY
50
(No Transcript)
51
Sequence Weighting Within ClustalW
52
Selecting Diverse Sequences (Opus II)
53
Respect Information!

PRVA_MACFU ------------------------------------
------SMTDLLN----AEDIKKA PRVA_HUMAN
------------------------------------------SMTDLLN-
---AEDIKKA PRVA_GERSP --------------------------
----------------SMTDLLS----AEDIKKA PRVA_MOUSE
------------------------------------------SMTDVLS-
---AEDIKKA PRVA_RAT --------------------------
----------------SMTDLLS----AEDIKKA PRVA_RABIT
------------------------------------------AMTELLN-
---AEDIKKA TPCC_MOUSE MDDIYKAAVEQLTEEQKNEFKAAFDI
FVLGAEDGCISTKELGKVMRMLGQNPTPEELQEM
.
. PRVA_MACFU VGAFSAIDS--FDHKKFFQMVG----
--LKKKSADDVKKVFHILDKDKSGFIEEDELGFI PRVA_HUMAN
VGAFSATDS--FDHKKFFQMVG------LKKKSADDVKKVFHMLDKDKSG
FIEEDELGFI PRVA_GERSP IGAFAAADS--FDHKKFFQMVG----
--LKKKTPDDVKKVFHILDKDKSGFIEEDELGFI PRVA_MOUSE
IGAFAAADS--FDHKKFFQMVG------LKKKNPDEVKKVFHILDKDKSG
FIEEDELGSI PRVA_RAT IGAFTAADS--FDHKKFFQMVG----
--LKKKSADDVKKVFHILDKDKSGFIEEDELGSI PRVA_RABIT
IGAFAAAES--FDHKKFFQMVG------LKKKSTEDVKKVFHILDKDKSG
FIEEEELGFI TPCC_MOUSE IDEVDEDGSGTVDFDEFLVMMVRCMK
DDSKGKSEEELSDLFRMFDKNADGYIDLDELKMM
54
Selecting Diverse Sequences (Opus II)
55
Selecting Diverse Sequences (Opus II)

PRVB_CYPCA -AFAGVLNDADIAAALEACKAADSFNHKAFFAKVGLT
SKSADDVKKAFAIIDQDKSGFIE PRVB_BOACO
-AFAGILSDADIAAGLQSCQAADSFSCKTFFAKSGLHSKSKDQLTKVFGV
IDRDKSGYIE PRV1_SALSA MACAHLCKEADIKTALEACKAADTFS
FKTFFHTIGFASKSADDVKKAFKVIDQDASGFIE PRVB_LATCH
-AVAKLLAAADVTAALEGCKADDSFNHKVFFQKTGLAKKSNEELEAIFKI
LDQDKSGFIE PRVB_RANES -SITDIVSEKDIDAALESVKAAGSFN
YKIFFQKVGLAGKSAADAKKVFEILDRDKSGFIE PRVA_MACFU
-SMTDLLNAEDIKKAVGAFSAIDSFDHKKFFQMVGLKKKSADDVKKVFHI
LDKDKSGFIE PRVA_ESOLU --AKDLLKADDIKKALDAVKAEGSFN
HKKFFALVGLKAMSANDVKKVFKAIDADASGFIE
. . . ..
PRVB_CYPCA EDELKLFLQNFKADARALTDGETKT
FLKAGDSDGDGKIGVDEFTALVKA- PRVB_BOACO
EDELKKFLQNFDGKARDLTDKETAEFLKEGDTDGDGKIGVEEFVVLVTKG
PRV1_SALSA VEELKLFLQNFCPKARELTDAETKAFLKAGDADGDG
MIGIDEFAVLVKQ- PRVB_LATCH DEELELFLQNFSAGARTLTKTE
TETFLKAGDSDGDGKIGVDEFQKLVKA- PRVB_RANES
QDELGLFLQNFRASARVLSDAETSAFLKAGDSDGDGKIGVEEFQALVKA-
PRVA_MACFU EDELGFILKGFSPDARDLSAKETKTLMAAGDKDGDG
KIGVDEFSTLVAES PRVA_ESOLU EEELKFVLKSFAADGRDLTDAE
TKAFLKAADKDGDGKIGIDEFETLVHEA
.. . .
-A REASONABLE Model Now Exists. -Going
FurtherRemote Homologues.
56
Aligning Remote Homologues

PRVA_MACFU -------------------------------------
-----SMTDLLNA----EDIKKA PRVA_ESOLU
-------------------------------------------AKDLLKA
----DDIKKA PRVB_CYPCA --------------------------
----------------AFAGVLND----ADIAAA PRVB_BOACO
------------------------------------------AFAGILSD
----ADIAAG PRV1_SALSA --------------------------
---------------MACAHLCKE----ADIKTA PRVB_LATCH
------------------------------------------AVAKLLAA
----ADVTAA PRVB_RANES --------------------------
----------------SITDIVSE----KDIDAA TPCS_RABIT
-TDQQAEARSYLSEEMIAEFKAAFDMFDADGG-GDISVKELGTVMRMLGQ
TPTKEELDAI TPCS_PIG -TDQQAEARSYLSEEMIAEFKAAFDM
FDADGG-GDISVKELGTVMRMLGQTPTKEELDAI TPCC_MOUSE
MDDIYKAAVEQLTEEQKNEFKAAFDIFVLGAEDGCISTKELGKVMRMLGQ
NPTPEELQEM
PRVA_MACFU
VGAFSAIDS--FDHKKFFQMVG------LKKKSADDVKKVFHILDKDK
SGFIEEDELGFI PRVA_ESOLU LDAVKAEGS--FNHKKFFALVG--
----LKAMSANDVKKVFKAIDADASGFIEEEELKFV PRVB_CYPCA
LEACKAADS--FNHKAFFAKVG------LTSKSADDVKKAFAIIDQDKSG
FIEEDELKLF PRVB_BOACO LQSCQAADS--FSCKTFFAKSG----
--LHSKSKDQLTKVFGVIDRDKSGYIEEDELKKF PRV1_SALSA
LEACKAADT--FSFKTFFHTIG------FASKSADDVKKAFKVIDQDASG
FIEVEELKLF PRVB_LATCH LEGCKADDS--FNHKVFFQKTG----
--LAKKSNEELEAIFKILDQDKSGFIEDEELELF PRVB_RANES
LESVKAAGS--FNYKIFFQKVG------LAGKSAADAKKVFEILDRDKSG
FIEQDELGLF TPCS_RABIT IEEVDEDGSGTIDFEEFLVMMVRQMK
EDAKGKSEEELAECFRIFDRNADGYIDAEELAEI TPCS_PIG
IEEVDEDGSGTIDFEEFLVMMVRQMKEDAKGKSEEELAECFRIFDRNMDG
YIDAEELAEI TPCC_MOUSE IDEVDEDGSGTVDFDEFLVMMVRCMK
DDSKGKSEEELSDLFRMFDKNADGYIDLDELKMM
. . .. .
. . PRVA_MACFU LKGFSPDARDLSAKETKTLM
AAGDKDGDGKIGVDEFSTLVAES- PRVA_ESOLU
LKSFAADGRDLTDAETKAFLKAADKDGDGKIGIDEFETLVHEA- PRVB_
CYPCA LQNFKADARALTDGETKTFLKAGDSDGDGKIGVDEFTAL
VKA-- PRVB_BOACO LQNFDGKARDLTDKETAEFLKEGDTDGD
GKIGVEEFVVLVTKG- PRV1_SALSA
LQNFCPKARELTDAETKAFLKAGDADGDGMIGIDEFAVLVKQ-- PRVB_
LATCH LQNFSAGARTLTKTETETFLKAGDSDGDGKIGVDEFQKL
VKA-- PRVB_RANES LQNFRASARVLSDAETSAFLKAGDSDGD
GKIGVEEFQALVKA-- TPCS_RABIT
FR---ASGEHVTDEEIESLMKDGDKNNDGRIDFDEFLKMMEGVQ TPCS_
PIG FR---ASGEHVTDEEIESIMKDGDKNNDGRIDFDEFLKM
MEGVQ TPCC_MOUSE LQ---ATGETITEDDIEELMKDGDKNND
GRIDYDEFLEFMKGVE
.. . . .
57
Some Guidelines
58
Do Not Use Two Many Sequences
59
Reading Your Alignment
60
(No Transcript)
61
Going Further

PRVA_MACFU VGAFSAIDS--FDHKKFFQMVG------LKKKSADD
VKKVFHILDKDKSGFIEEDELGFI PRVB_BOACO
LQSCQAADS--FSCKTFFAKSG------LHSKSKDQLTKVFGVIDRDKSG
YIEEDELKKF PRV1_SALSA LEACKAADT--FSFKTFFHTIG----
--FASKSADDVKKAFKVIDQDASGFIEVEELKLF TPCS_RABIT
IEEVDEDGSGTIDFEEFLVMMVRQMKEDAKGKSEEELAECFRIFDRNADG
YIDAEELAEI TPCS_PIG IEEVDEDGSGTIDFEEFLVMMVRQMK
EDAKGKSEEELAECFRIFDRNMDGYIDAEELAEI TPCC_MOUSE
IDEVDEDGSGTVDFDEFLVMMVRCMKDDSKGKSEEELSDLFRMFDKNADG
YIDLDELKMM TPC_PATYE SDEMDEEATGRLNCDAWIQLFER---
KLKEDLDERELKEAFRVLDKEKKGVIKVDVLRWI
. .. . . .
. . PRVA_MACFU LKGFSPDARDLSAKETKTLMAAGDKD
GDGKIGVDEFSTLVAES-- PRVB_BOACO
LQNFDGKARDLTDKETAEFLKEGDTDGDGKIGVEEFVVLVTKG-- PRV1
_SALSA LQNFCPKARELTDAETKAFLKAGDADGDGMIGIDEFAVLVK
Q--- TPCS_RABIT FR---ASGEHVTDEEIESLMKDGDKNNDGRID
FDEFLKMMEGVQ- TPCS_PIG FR---ASGEHVTDEEIESIMKDG
DKNNDGRIDFDEFLKMMEGVQ- TPCC_MOUSE
LQ---ATGETITEDDIEELMKDGDKNNDGRIDYDEFLEFMKGVE- TPC_
PATYE LS---SLGDELTEEEIENMIAETDTDGSGTVDYEEFKCLMM
SSDA . ..
.
62
WHAT MAKES A GOOD ALIGNMENT
-THE MORE DIVERGEANT THE SEQUENCES, THE BETTER
-THE FEWER INDELS, THE BETTER
-NICE UNGAPPED BLOCKS SEPARATED WITH INDELS
  • -DIFFERENT CLASSES OF RESIDUES WITHIN A BLOCK
  • Completely Conserved
  • Conserved For Size and Hydropathy
  • Conserved For Size or Hydropathy

-THE ULTIMATE EVALUATION IS A MATTER OF PERSONNAL
JUDGEMENT AND KNOWLEDGE.
63
(No Transcript)
64
Potential Difficulties
65
DO NOT OVERTUNE!!!
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGG
ELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKS
VAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS--
--KHSDLS-IVEMSKAAGAAWKELGP mouse
-----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLS
P . . .. . . .
. chite AATAKQNYIRALQEYERNGG- wheat
ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM---------
mouse AKDDRIRYDNEMKSWEEQMAE . .

DO NOT PLAY WITH PARAMETERS IF YOU KNOW THE
ALIGNMENT YOU WANT MAKE IT YOURSELF!
chite ---ADKPKRPL-SAYMLWLNSARESIKRENPDFK-VTEVAKKG
GELWRGLKD wheat --DPNKPKRAP-SAFFVFMGEFREEFKQKNPKN
KSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS
----KHSDLS-IVEMSKAAGAAWKELGP mouse
-----KPKRPR-SAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNL
SP . . .. . . .
. chite AATAKQNYIRALQEYERNGG- wheat
ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM---------
mouse AKDDRIRYDNEMKSWEEQMAE . .

66
TUNING or NOT TUNING!!!
  • -PARAMETERS TO TUNE USUALLY INCLUDE
  • GOP/ GEP
  • MATRIX
  • SENSITIVITY Vs SPEED

Substitution Matrices (Etzold and al.
1993) Gonnet 61.7 Blosum50 59.7
Pam250 59.2
-MOST METHODS ARE TUNED FOR WORKING WELL ON
AVERAGE
-PARAMETERS BEHAVIOUR DO NOT NECESSARILY FOLLOW
THE THEORY (i.e. Substitution Matrices).
-A GOOD ALIGNMENT IS USUALLY ROBUST(i.e. Changes
little).
-TUNE IF YOU WANT TO CONVINCE YOURSELF.
67
(No Transcript)
68
KEEP A BIOLOGICAL PERSPECTIVE
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKG
GELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNK
SVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS-
---KHSDLS-IVEMSKAAGAAWKELGP mouse
-----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLS
P . . .. . . .
.
DIFFERENT PARAMETERS
chite AD--K----PKR-PLYMLWLNS-ARESIKRENPDFK-VT-EV
AKKGGELWRGL- wheat -DPNK----PKRAP-FFVFMGE-FREEFKQ
KNPKNKSVA-AVGKAAGERWKSLS trybr -K--KDSNAPKR-AMT-MF
FSSDFR-S-KH-S-DLS-IV-EMSKAAGAAWKELG mouse
----K----PKR-PRYNIYVSESFQEA-K--D-D-S-AQGKL-KLVNEAW
KNLS . ... . . .
.
WRONG ALIGNMENT !!!
69
REPEATS
THERE IS A PROBLEM WHEN TWO SEQUENCES DO NOT
CONTAIN THE SAME NUMBER OF REPEATS
IT IS THEN BETTER TO MANUALLY EXTRACT THE REPEATS
AND TO ALIGN THEM. INDIVIDUAL REPEATS CAN BE
RECOGNIZED USING DOTTER
70
(No Transcript)
71
Naming Your Sequences The Right Way
72
What Are The Available Methods ???
73
Simultaneous Alignments MSA
1) Set Bounds on each pair of sequences (Carillo
and Lipman)
2) Compute the Maln within the Hyperspace
-Few Small Closely Related Sequence.
-Memory and CPU hungry
-Do Well When They Can Run.
74
Simultaneous Alignments DCA
75
Dialign
76
3) Assemble the alignment according to the
segment pairs.
77
-May Align Too Few Residues
-No Gap Penalty -Does well with ESTs
78
bibiserv.techfak.uni-bielefeld.de/dialign/submissi
on.html
79
Muscle
80
Iterative Methods
7.16.1 Progressive
-HMMs, HMMER, SAM, MUSCLE
-Slow, Sometimes Inaccurate
-Good Profile Generators
81
MUSCLE
7.16.1 Progressive
82
MUSCLE
phylogenomics.berkeley.edu/cgi-bin/muscle/input_mu
scle.py
7.16.1 Progressive
83
MUSCLE
phylogenomics.berkeley.edu/cgi-bin/muscle/input_mu
scle.py
7.16.1 Progressive
84
T-Coffee
85
Mixing Local and Global Alignments
Local Alignment
Global Alignment
Extension
Multiple Sequence Alignment
86
Mixing Heterogenous Data With T-Coffee
Local Alignment
Global Alignment
Multiple Alignment
Structural
Specialist
Multiple Sequence Alignment
87
Mixing Sequences and Structures with T-Coffee
Seq Vs Seq
Local Global
Seq Vs Struct
Struct Vs Struct
Thread
Superpose
Evaluation on Homestrad
88
What is the Local Quality of my Alignment
I
II
89
T-Coffee
igs-server.cnrs-mrs.fr/Tcoffee/
90
DBClustal
91
DBClustal
BlastP
92
DBClustal
93
DBClustal
94
Expasy Blast
95
Expasy BLAST
www.expasy.org/tools/blast/
96
Expasy BLAST
97
Choosing the right method
98
Situation ? Solution
99
Priority ? Solution
Method Priority Trees Profile 2D Pred 3D-Pred Func-Pred
Accuracy
Speed
100
Purpose ? Solution
101
Conclusion
102
Multiple Alignment
103
Multiple Alignment
Know Your Problem What do you want to do with
your MSA
104
Addresses
MAFFT Progressive/iterative www.biophys.kyoto-u.jp/katoh
POA Progressive/Simultaneous www.bioinformatics.ucla.edu/poa
MUSCLE Progressive/Iterative www.drive5.com/muscle
105
BaliBase
What Is BaliBase
Source BaliBase, Thompson et al, NAR, 1999,
Description
PROBLEM
106
Which Method ?
What Is BaliBase
Source BaliBase, Thompson et al, NAR, 1999,
Strategy
Strategy
PROBLEM
107
Methods /Situtations
1-Carillo and Lipman
-MSA, DCA.
-Few Small Closely Related Sequence.
-Do Well When They Can Run.
2-Segment Based
-DIALIGN, MACAW.
-May Align Too Few Residues -Good For Long Indels
3-Iterative
-HMMs, HMMER, SAM.
-Slow, Sometimes Inaccurate
-Good Profile Generators
4-Progressive
-ClustalW, Pileup, Multalign
-Fast and Sensitive
About PowerShow.com