# An%20Introduction%20to%20Multiple%20Sequence%20Alignments - PowerPoint PPT Presentation

View by Category
Title:

## An%20Introduction%20to%20Multiple%20Sequence%20Alignments

Description:

### An Introduction to Multiple Sequence Alignments – PowerPoint PPT presentation

Number of Views:110
Avg rating:3.0/5.0
Slides: 108
Provided by: tcof
Category:
Tags:
Transcript and Presenter's Notes

Title: An%20Introduction%20to%20Multiple%20Sequence%20Alignments

1
An Introduction to Multiple Sequence Alignments
Cédric Notredame
2
(No Transcript)
3
Manguel M, Samaniego F.J., Abraham Walds Work
on Aircraft Suvivability, J. American
Statistical Association. 79, 259-270, (1984)
4
Our Scope
How Can I Use My Alignment?
How Does The Computer Align The Sequences?
How Can I Assemble a Mult. Aln?
What are the Difficulties?
5
Outline
-Why Do We Need Multiple Sequence Alignment ?
-The progressive Alignment Algorithm
-A possible Strategy
-Potential Difficulties
6
Pre-requisite
-How Do Sequences Evolve?
-How can We COMPARE Sequences ?
-How can We ALIGN Sequences ?
7
Why Do We Need Multiple Sequence Alignment ?
8
Sometimes Two Sequences Are Not Enough
9
What is A Multiple Sequence Alignment?
10
(No Transcript)
11
(No Transcript)
12
How Can I Use A Multiple Sequence Alignment?
BUT Conserved where it MATTERS
13
(No Transcript)
14
How Can I Use A Multiple Sequence Alignment?
KGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPK
NKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFR
S----KHSDLS-IVEMSKAAGAAWKELGP mouse
-----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLS
P . . .. . . .
. chite AATAKQNYIRALQEYERNGG- wheat
ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM---------
mouse AKDDRIRYDNEMKSWEEQMAE . .

Extrapolation
Prosite Patterns
15
How Can I Use A Multiple Sequence Alignment?
KGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPK
NKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFR
S----KHSDLS-IVEMSKAAGAAWKELGP mouse
-----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLS
P . . .. . . .
. chite AATAKQNYIRALQEYERNGG- wheat
ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM---------
mouse AKDDRIRYDNEMKSWEEQMAE . .

Extrapolation
P-K-R-PA-x(1)-ST
Prosite Patterns
16
How Can I Use A Multiple Sequence Alignment?
KGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPK
NKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFR
S----KHSDLS-IVEMSKAAGAAWKELGP mouse
-----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLS
P . . .. . . .
. chite AATAKQNYIRALQEYERNGG- wheat
ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM---------
mouse AKDDRIRYDNEMKSWEEQMAE . .

Extrapolation
Prosite Patterns
SwissProt
Uncharacterised Signature
Match?
17
How Can I Use A Multiple Sequence Alignment?
KGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPK
NKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFR
S----KHSDLS-IVEMSKAAGAAWKELGP mouse
-----KPKRPRSAYNIYVSESFQ----EAKDDS-IQGKLKLVNEAWKNLS
P . . .. . . .
. chite AATAKQNYIRALQEYERNGG- wheat
ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM---------
mouse AKDDRIRYDNEMKSWEEQMAE . .

Extrapolation
Prosite Patterns
Profiles And HMMs
-More Sensitive -More Specific
18
A PROSITE PROFILE
19
How Can I Use A Multiple Sequence Alignment?
KGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPK
NKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFR
S----KHSDLS-IVEMSKAAGAAWKELGP mouse
-----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLS
P . . .. . . .
. chite AATAKQNYIRALQEYERNGG- wheat
ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM---------
mouse AKDDRIRYDNEMKSWEEQMAE . .

Extrapolation
chite
wheat
Motifs/Patterns
trybr
mouse
Profiles
-Evolution -Paralogy/Orthology
Phylogeny
20
How Can I Use A Multiple Sequence Alignment?
KGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPK
NKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFR
S----KHSDLS-IVEMSKAAGAAWKELGP mouse
-----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLS
P . . .. . . .
. chite AATAKQNYIRALQEYERNGG- wheat
ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM---------
mouse AKDDRIRYDNEMKSWEEQMAE . .

Extrapolation
Motifs/Patterns
Profiles
Phylogeny
Struc. Prediction
21
How Can I Use A Multiple Sequence Alignment?
KGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPK
NKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFR
S----KHSDLS-IVEMSKAAGAAWKELGP mouse
-----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLS
P . . .. . . .
. chite AATAKQNYIRALQEYERNGG- wheat
ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM---------
mouse AKDDRIRYDNEMKSWEEQMAE . .

Extrapolation
PsiPred OR PhD For secondary Structure
Prediction 75 Accurate.
Motifs/Patterns
Profiles
Threading is improving but is not yet as good.
Phylogeny
Struc. Prediction
22
How Can I Use A Multiple Sequence Alignment?
KGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPK
NKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFR
S----KHSDLS-IVEMSKAAGAAWKELGP mouse
-----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLS
P . . .. . . .
. chite AATAKQNYIRALQEYERNGG- wheat
ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM---------
mouse AKDDRIRYDNEMKSWEEQMAE . .

23
(No Transcript)
24
Why Is It Difficult To Compute A multiple
Sequence Alignment?
25
Why Is It Difficult To Compute A multiple
Sequence Alignment ?
BIOLOGY
COMPUTATION
CIRCULAR PROBLEM....
Good
Good
Alignment
Sequences
26
The Biological Problem.
Same as PairWise Alignment Problem
We do NOT know how Sequences Evolve.
We do NOT understand the Relation Between
Structures and Sequences.
We would NOT recognize the Correct Alignment if
we had it IN FRONT of our eyes
27
The Biological Problem. The Charlie Chaplin
28
The Biological Problem. How to Evaluate an
Alignment
-Substitution Matrix (Blosum)
-An Evaluation Function
29
The COMPUTATIONAL Problem. Producing the Alignment
-Substitution Matrix (Blosum)
-An Evaluation Function
-An Alignment Algorithm
30
HOW CAN I ALIGN MANY SEQUENCES
2 Globins gt1 Min
31
HOW CAN I ALIGN MANY SEQUENCES
3 Globins gt2 hours
32
HOW CAN I ALIGN MANY SEQUENCES
4 Globins gt 10 days
33
HOW CAN I ALIGN MANY SEQUENCES
5 Globins gt 3 years
34
HOW CAN I ALIGN MANY SEQUENCES
6 Globins gt300 years
35
HOW CAN I ALIGN MANY SEQUENCES
7 Globins gt30. 000 years
Solidified Fossil, Old stuff
36
HOW CAN I ALIGN MANY SEQUENCES
8 Globins gt3 Million years
37
The Progressive Multiple Alignment
Algorithm (Clustal W)
38
(No Transcript)
39
Making An Alignment
Any Exact Method would be TOO SLOW
We will use a Heuristic Algorithm.
Progressive Alignment Algorithm is the most
Popular
-ClustalW
40
Progressive Alignment
Feng and Dolittle, 1988 Taylor 1989
Clustering
41
Progressive Alignment
42
Progressive Alignment
-Depends on the CHOICE of the sequences.
-Depends on the ORDER of the sequences (Tree).
• -Depends on the PARAMETERS
• Substitution Matrix.
• Penalties (Gop, Gep).
• Sequence Weight.
• Tree making Algorithm.

43
Progressive Alignment When Does It Work
Works Well When Phylogeny is Dense
No outlayer Sequence.
Image River Crossing
44
Progressive Alignment When Doesnt It Work
45
(No Transcript)
46
Building the Right Multiple Sequence Alignment.
47
Recognizing The Right Sequences When you Meet
Them
48
Gathering Sequences BLAST
49
Common Mistake Sequences Too Closely Related
PRVA_MACFU SMTDLLNAEDIKKAVGAFSAIDSFDHKKFFQMVGL
KKFFQMVGLKKKTPDDVKKVFHILDKDKSGFIEE PRVA_MOUSE
AMTELLNAEDIKKAIGAFAAAESFDHKKFFQMVGLKKKSTEDVKKVFHIL
DKDKSGFIEE .
.. PRVA_MACF
U DELGFILKGFSPDARDLSAKETKTLMAAGDKDGDGKIGVDEFSTLV
AES PRVA_HUMAN DELGFILKGFSPDARDLSAKETKMLMAAGDKDG
DGKIGVDEFSTLVAES PRVA_GERSP DELGFILKGFSSDARDLSAK
ETKTLLAAGDKDGDGKIGVEEFSTLVSES PRVA_MOUSE
DELGSILKGFSSDARDLSAKETKTLLAAGDKDGDGKIGVEEFSTLVAES
PRVA_RAT DELGSILKGFSSDARDLSAKETKTLMAAGDKDGDGKI
GVEEFSTLVAES PRVA_RABIT EELGFILKGFSPDARDLSVKETKT
.. .
-IDENTICAL SEQUENCES BRING NO INFORMATION FOR THE
MULTIPLE SEQUENCE ALIGNMENT
-MULTIPLE SEQUENCE ALIGNMENTS THRIVE ON DIVERSITY
50
(No Transcript)
51
Sequence Weighting Within ClustalW
52
Selecting Diverse Sequences (Opus II)
53
Respect Information!

PRVA_MACFU ------------------------------------
------SMTDLLN----AEDIKKA PRVA_HUMAN
------------------------------------------SMTDLLN-
---AEDIKKA PRVA_GERSP --------------------------
----------------SMTDLLS----AEDIKKA PRVA_MOUSE
------------------------------------------SMTDVLS-
---AEDIKKA PRVA_RAT --------------------------
----------------SMTDLLS----AEDIKKA PRVA_RABIT
------------------------------------------AMTELLN-
---AEDIKKA TPCC_MOUSE MDDIYKAAVEQLTEEQKNEFKAAFDI
FVLGAEDGCISTKELGKVMRMLGQNPTPEELQEM
.
. PRVA_MACFU VGAFSAIDS--FDHKKFFQMVG----
--LKKKTPDDVKKVFHILDKDKSGFIEEDELGFI PRVA_MOUSE
IGAFAAAES--FDHKKFFQMVG------LKKKSTEDVKKVFHILDKDKSG
FIEEEELGFI TPCC_MOUSE IDEVDEDGSGTVDFDEFLVMMVRCMK
54
Selecting Diverse Sequences (Opus II)
55
Selecting Diverse Sequences (Opus II)

LDQDKSGFIE PRVB_RANES -SITDIVSEKDIDAALESVKAAGSFN
. . . ..
FLKAGDSDGDGKIGVDEFTALVKA- PRVB_BOACO
EDELKKFLQNFDGKARDLTDKETAEFLKEGDTDGDGKIGVEEFVVLVTKG
MIGIDEFAVLVKQ- PRVB_LATCH DEELELFLQNFSAGARTLTKTE
TETFLKAGDSDGDGKIGVDEFQKLVKA- PRVB_RANES
QDELGLFLQNFRASARVLSDAETSAFLKAGDSDGDGKIGVEEFQALVKA-
PRVA_MACFU EDELGFILKGFSPDARDLSAKETKTLMAAGDKDGDG
.. . .
-A REASONABLE Model Now Exists. -Going
FurtherRemote Homologues.
56
Aligning Remote Homologues

PRVA_MACFU -------------------------------------
-----SMTDLLNA----EDIKKA PRVA_ESOLU
-------------------------------------------AKDLLKA
----DDIKKA PRVB_CYPCA --------------------------
------------------------------------------AFAGILSD
------------------------------------------AVAKLLAA
----------------SITDIVSE----KDIDAA TPCS_RABIT
TPTKEELDAI TPCS_PIG -TDQQAEARSYLSEEMIAEFKAAFDM
MDDIYKAAVEQLTEEQKNEFKAAFDIFVLGAEDGCISTKELGKVMRMLGQ
NPTPEELQEM
PRVA_MACFU
SGFIEEDELGFI PRVA_ESOLU LDAVKAEGS--FNHKKFFALVG--
--LHSKSKDQLTKVFGVIDRDKSGYIEEDELKKF PRV1_SALSA
--LAKKSNEELEAIFKILDQDKSGFIEDEELELF PRVB_RANES
FIEQDELGLF TPCS_RABIT IEEVDEDGSGTIDFEEFLVMMVRQMK
IEEVDEDGSGTIDFEEFLVMMVRQMKEDAKGKSEEELAECFRIFDRNMDG
YIDAEELAEI TPCC_MOUSE IDEVDEDGSGTVDFDEFLVMMVRCMK
. . .. .
. . PRVA_MACFU LKGFSPDARDLSAKETKTLM
AAGDKDGDGKIGVDEFSTLVAES- PRVA_ESOLU
VKA-- PRVB_BOACO LQNFDGKARDLTDKETAEFLKEGDTDGD
GKIGVEEFVVLVTKG- PRV1_SALSA
LATCH LQNFSAGARTLTKTETETFLKAGDSDGDGKIGVDEFQKL
VKA-- PRVB_RANES LQNFRASARVLSDAETSAFLKAGDSDGD
GKIGVEEFQALVKA-- TPCS_RABIT
FR---ASGEHVTDEEIESLMKDGDKNNDGRIDFDEFLKMMEGVQ TPCS_
PIG FR---ASGEHVTDEEIESIMKDGDKNNDGRIDFDEFLKM
MEGVQ TPCC_MOUSE LQ---ATGETITEDDIEELMKDGDKNND
GRIDYDEFLEFMKGVE
.. . . .
57
Some Guidelines
58
Do Not Use Two Many Sequences
59
60
(No Transcript)
61
Going Further

VKKVFHILDKDKSGFIEEDELGFI PRVB_BOACO
YIDAEELAEI TPCS_PIG IEEVDEDGSGTIDFEEFLVMMVRQMK
EDAKGKSEEELAECFRIFDRNMDGYIDAEELAEI TPCC_MOUSE
YIDLDELKMM TPC_PATYE SDEMDEEATGRLNCDAWIQLFER---
KLKEDLDERELKEAFRVLDKEKKGVIKVDVLRWI
. .. . . .
. . PRVA_MACFU LKGFSPDARDLSAKETKTLMAAGDKD
GDGKIGVDEFSTLVAES-- PRVB_BOACO
LQNFDGKARDLTDKETAEFLKEGDTDGDGKIGVEEFVVLVTKG-- PRV1
Q--- TPCS_RABIT FR---ASGEHVTDEEIESLMKDGDKNNDGRID
FDEFLKMMEGVQ- TPCS_PIG FR---ASGEHVTDEEIESIMKDG
DKNNDGRIDFDEFLKMMEGVQ- TPCC_MOUSE
LQ---ATGETITEDDIEELMKDGDKNNDGRIDYDEFLEFMKGVE- TPC_
PATYE LS---SLGDELTEEEIENMIAETDTDGSGTVDYEEFKCLMM
SSDA . ..
.
62
WHAT MAKES A GOOD ALIGNMENT
-THE MORE DIVERGEANT THE SEQUENCES, THE BETTER
-THE FEWER INDELS, THE BETTER
-NICE UNGAPPED BLOCKS SEPARATED WITH INDELS
• -DIFFERENT CLASSES OF RESIDUES WITHIN A BLOCK
• Completely Conserved
• Conserved For Size and Hydropathy
• Conserved For Size or Hydropathy

-THE ULTIMATE EVALUATION IS A MATTER OF PERSONNAL
JUDGEMENT AND KNOWLEDGE.
63
(No Transcript)
64
Potential Difficulties
65
DO NOT OVERTUNE!!!
ELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKS
VAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS--
--KHSDLS-IVEMSKAAGAAWKELGP mouse
-----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLS
P . . .. . . .
. chite AATAKQNYIRALQEYERNGG- wheat
ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM---------
mouse AKDDRIRYDNEMKSWEEQMAE . .

DO NOT PLAY WITH PARAMETERS IF YOU KNOW THE
ALIGNMENT YOU WANT MAKE IT YOURSELF!
GELWRGLKD wheat --DPNKPKRAP-SAFFVFMGEFREEFKQKNPKN
KSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS
----KHSDLS-IVEMSKAAGAAWKELGP mouse
-----KPKRPR-SAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNL
SP . . .. . . .
. chite AATAKQNYIRALQEYERNGG- wheat
ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM---------
mouse AKDDRIRYDNEMKSWEEQMAE . .

66
TUNING or NOT TUNING!!!
• -PARAMETERS TO TUNE USUALLY INCLUDE
• GOP/ GEP
• MATRIX
• SENSITIVITY Vs SPEED

Substitution Matrices (Etzold and al.
1993) Gonnet 61.7 Blosum50 59.7
Pam250 59.2
-MOST METHODS ARE TUNED FOR WORKING WELL ON
AVERAGE
-PARAMETERS BEHAVIOUR DO NOT NECESSARILY FOLLOW
THE THEORY (i.e. Substitution Matrices).
-A GOOD ALIGNMENT IS USUALLY ROBUST(i.e. Changes
little).
-TUNE IF YOU WANT TO CONVINCE YOURSELF.
67
(No Transcript)
68
KEEP A BIOLOGICAL PERSPECTIVE
GELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNK
SVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS-
---KHSDLS-IVEMSKAAGAAWKELGP mouse
-----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLS
P . . .. . . .
.
DIFFERENT PARAMETERS
AKKGGELWRGL- wheat -DPNK----PKRAP-FFVFMGE-FREEFKQ
KNPKNKSVA-AVGKAAGERWKSLS trybr -K--KDSNAPKR-AMT-MF
FSSDFR-S-KH-S-DLS-IV-EMSKAAGAAWKELG mouse
----K----PKR-PRYNIYVSESFQEA-K--D-D-S-AQGKL-KLVNEAW
KNLS . ... . . .
.
WRONG ALIGNMENT !!!
69
REPEATS
THERE IS A PROBLEM WHEN TWO SEQUENCES DO NOT
CONTAIN THE SAME NUMBER OF REPEATS
IT IS THEN BETTER TO MANUALLY EXTRACT THE REPEATS
AND TO ALIGN THEM. INDIVIDUAL REPEATS CAN BE
RECOGNIZED USING DOTTER
70
(No Transcript)
71
Naming Your Sequences The Right Way
72
What Are The Available Methods ???
73
Simultaneous Alignments MSA
1) Set Bounds on each pair of sequences (Carillo
and Lipman)
2) Compute the Maln within the Hyperspace
-Few Small Closely Related Sequence.
-Memory and CPU hungry
-Do Well When They Can Run.
74
Simultaneous Alignments DCA
75
Dialign
76
3) Assemble the alignment according to the
segment pairs.
77
-May Align Too Few Residues
-No Gap Penalty -Does well with ESTs
78
bibiserv.techfak.uni-bielefeld.de/dialign/submissi
on.html
79
Muscle
80
Iterative Methods
7.16.1 Progressive
-HMMs, HMMER, SAM, MUSCLE
-Slow, Sometimes Inaccurate
-Good Profile Generators
81
MUSCLE
7.16.1 Progressive
82
MUSCLE
phylogenomics.berkeley.edu/cgi-bin/muscle/input_mu
scle.py
7.16.1 Progressive
83
MUSCLE
phylogenomics.berkeley.edu/cgi-bin/muscle/input_mu
scle.py
7.16.1 Progressive
84
T-Coffee
85
Mixing Local and Global Alignments
Local Alignment
Global Alignment
Extension
Multiple Sequence Alignment
86
Mixing Heterogenous Data With T-Coffee
Local Alignment
Global Alignment
Multiple Alignment
Structural
Specialist
Multiple Sequence Alignment
87
Mixing Sequences and Structures with T-Coffee
Seq Vs Seq
Local Global
Seq Vs Struct
Struct Vs Struct
Superpose
88
What is the Local Quality of my Alignment
I
II
89
T-Coffee
igs-server.cnrs-mrs.fr/Tcoffee/
90
DBClustal
91
DBClustal
BlastP
92
DBClustal
93
DBClustal
94
Expasy Blast
95
Expasy BLAST
www.expasy.org/tools/blast/
96
Expasy BLAST
97
Choosing the right method
98
Situation ? Solution
99
Priority ? Solution
Method Priority Trees Profile 2D Pred 3D-Pred Func-Pred
Accuracy
Speed
100
Purpose ? Solution
101
Conclusion
102
Multiple Alignment
103
Multiple Alignment
Know Your Problem What do you want to do with
104
MAFFT Progressive/iterative www.biophys.kyoto-u.jp/katoh
POA Progressive/Simultaneous www.bioinformatics.ucla.edu/poa
MUSCLE Progressive/Iterative www.drive5.com/muscle
105
BaliBase
What Is BaliBase
Source BaliBase, Thompson et al, NAR, 1999,
Description
PROBLEM
106
Which Method ?
What Is BaliBase
Source BaliBase, Thompson et al, NAR, 1999,
Strategy
Strategy
PROBLEM
107
Methods /Situtations
1-Carillo and Lipman
-MSA, DCA.
-Few Small Closely Related Sequence.
-Do Well When They Can Run.
2-Segment Based
-DIALIGN, MACAW.
-May Align Too Few Residues -Good For Long Indels
3-Iterative
-HMMs, HMMER, SAM.
-Slow, Sometimes Inaccurate
-Good Profile Generators
4-Progressive
-ClustalW, Pileup, Multalign
-Fast and Sensitive