Multiple alignment - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Multiple alignment

Description:

A conserved pattern of hydrophobicity with spacing 2 (that is every second ... A conserved pattern of hydrophobicity with spacing ~4 suggests a (surface) a-helix. ... – PowerPoint PPT presentation

Number of Views:87
Avg rating:3.0/5.0
Slides: 22
Provided by: peter217
Category:

less

Transcript and Presenter's Notes

Title: Multiple alignment


1
Multiple alignment
Peter Højrup Department of Biochemistry
Molecular Biology, SDU, Odense University,
Denmark.
2
Multiple sequence alignment
One amino acid sequence plays coy A pair of
homologous sequences whisper Many aligned
sequences shout out loud.
3
Main applications for MA
  • Extrapolation
  • Determine family relationship
  • Phylogenetic analysis
  • Reconstruct the history of the protein
  • Pattern identification
  • Identify structural/functional important residues
  • Domain identification
  • Construct a pattern to find other family members
  • Structure prediction
  • A multiple alignment greatly enhances prediction
    accuracy
  • PCR analysis
  • Find the less degenerate parts of a gene.

4
Definition
Consensus sequence
5
Multiple alignment parameters
  • AA substitution matrix
  • Usually one based on global alignment PAM250 or
    Gonnet
  • Gap parameters
  • Both opening and extension parameters are very
    important for the optimal alignment
  • Alignment order
  • ClustalX initially performs a pairwise comparison
    closest fit is the initial alignment.
  • Sequence length
  • Try to align sequences of the same length you
    may do an initial dot-plot for alignment regions.

6
Multiple sequence alignment
Rat_CALRETICULIN ----MLLSVPLLLGLLGLAAAD-------
---------------------------PAIYFKEQFLDGDAWTNR-----
----WVESKHKSD--FGKFVL Human_CALRETICULIN
----MLLSVPLLLGLLGLAVAE----------------------------
------PAVYFKEQFLDGDGWTSR---------WIESKHKSD--FGKFVL
RAT_CALNEXIN MEGKWLLCLLLVLGTAAIQAHDGHDDD
MIDIEDDLDDVIEEVEDSKSKSDTSTPPSPKVTYKAPVPTGEVYFADSFD
RGSLSGWILSKAKKDDTDDEIAK Human_CALNEXIN
MEGKWLLCMLLVLGTAIVEAHDGHDDDVIDIEDDLDDVIEEVEDSKPDT-
TAPPSSPKVTYKAPVPTGEVYFADSFDRGTLSGWILSKAKKDDTDDEIAK
. .
.
. .. Prim.cons.
MEGK2LL2V2L2LG22GLAA2DGHDDD2IDIEDDLDDVIEEVEDSK222D
T22P2SP2V22K22222G2V22A2SFDRG2LSGWI2SK2K2DDT222222
Rat_CALRETICULIN SSGKFYGDQEK------DKGLQTSQD
ARFYALSARF-EPFSNKGQTLVVQFTVKHEQNIDCGGGYVKLFPGG--LD
QKDMHGDSEYNIMFGPDICGPGTK Human_CALRETICULIN
SSGKFYGDEEK------DKGLQTSQDARFYALSASF-EPFSNKGQTLVVQ
FTVKHEQNIDCGGGYVKLFPNS--LDQTDMHGDSEYNIMFGPDICGPGTK
RAT_CALNEXIN YDGKWEVDEMKETKLPGDKGLVLMSRA
KHHAISAKLNKPFLFDTKPLIVQYEVNFQNGIECGGAYVKLLSKTSELNL
DQFHDKTPYTIMFGPDKCG-EDY Human_CALNEXIN
YDGKWEVEEMKESKLPGDKGLVLMSRAKHHAISAKLNKPFLFDTKPLIVQ
YEVNFQNGIECGGAYVKLLSKTPELNLDQFHDKTPYTIMFGPDKCG-EDY
. .
. . . ....
.. . Prim.cons.
22GK222DE2KE2KLPGDKGL22222A222A2SAK2N2PF222222L2VQ
22V22222I2CGG2YVKL22KT2EL22D22H2222Y2IMFGPD2CGP222
Rat_CALRETICULIN KVHVIFNYKGKNVLINKDIRCK----
------DDEFTHLYTLIVRPDNTYEVKIDNSQVESGSLEDDWD--FLPPK
KIKDPDAAKPEDWDERAKIDDPTD Human_CALRETICULIN
KVHVIFNYKGKNVLINKDIRCK----------DDEFTHLYTLIVRPDNTY
EVKIDNSQVESGSLEDDWD--FLPPKKIKDPDASKPEDWDERAKIDDPTD
RAT_CALNEXIN KLHFIFRHKNPKTGVYEEKHAKRPDAD
LKTYFTDKKTHLYTLILNPDNSFEILVDQSVVNSGNLLNDMTPPVNPSRE
IEDPEDRKPEDWDERPKIADPDA Human_CALNEXIN
KLHFIFRHKNPKTGIYEEKHAKRPDADLKTYFTDKKTHLYTLILNPDNSF
EILVDQSVVNSGNLLNDMTPPVNPSREIEDPEDRKPEDWDERPKIPDPEA
... . .
. . .
. . Prim.cons.
K2H2IF22K22222I222222KRPDADLKTYF2D22THLYTLI22PDN22
E222D2S2V2SG2L22D22PP22P222I2DP22RKPEDWDER2KIDDPT2
Rat_CALRETICULIN SKPEDWDK------------------
---PEHIPDPDAKKPEDWDEEMDGEWEP-------------------PVI
QNPEYKGEWKPRQIDNPDYKGTWI Human_CALRETICULIN
SKPEDWDK---------------------PEHIPDPDAKKPEDWDEEMDG
EWEP-------------------PVIQNPEYKGEWKPRQIDNPDYKGTWI
RAT_CALNEXIN VKPDDWDEDAPSKIPDEEATKPEGWLDD
EPEYIPDPDAEKPEDWDEDMDGEWEAPQIANPKCESAPGCGVWQRPMIDN
PNYKGKWKPPMIDNPNYQGIWK Human_CALNEXIN
VKPDDWDEDAPAKIPDEEATKPEGWLDDEPEYVPDPDAEKPEDWDEDMDG
EWEAPQIANPRCESAPGCGVWQRPVIDNPNYKGKWKPPMIDNPSYQGIWK


.
. Prim.cons.
2KP2DWD2DAP2KIPDEEATKPEGWLDDEPE2IPDPDA2KPEDWDE2MDG
EWE2PQIANP2CESAPGCGVWQRPVI2NP2YKG2WKP22IDNPDY2G2W2

7
Nomenclature
Rat_CALRETICULIN SSGKFYGDQEK------DKGLQTSQDARF
YALSARF-EPFSNKGQTL Human_CALRETICULIN
SSGKFYGDEEK------DKGLQTSQDARFYALSASF-EPFSNKGQTL RA
T_CALNEXIN YDGKWEVDEMKETKLPGDKGLVLMSRAKHHA
ISAKLNKPFLFDTKPL Human_CALNEXIN
YDGKWEVEEMKESKLPGDKGLVLMSRAKHHAISAKLNKPFLFDTKPL
. .
. . . Prim.cons.
22GK222DE2KE2KLPGDKGL22222A222A2SAK2N2PF222222L
Consensus sequence
8
Mind the gap!
250 260 270
280 290 300

Papain DGVRQVQPYNEGALLYSIANQPVSVVLEAAGKDFQ
LYRGGIFVGPCGNKVDHAVAAVGYG Staphopain
--------I---AILGSRV-E-----S----------RNGMHAGHAMAVV
GN--AKLNNG .
... . . . Prim.cons.
DGVRQVQP2NEGA2L2S22N2PVSVV2EAAGKDFQLYR2G222G22222V
22AVA2222G
Never have islands (widows)
Gaps should be in-frame
Papain YTTTELSYEEVLNDGDVNIPEYVDWRQKGAVTPVKNQ
GSCGSCWAFSAVVTIEGIIKIRT CathLx2
PRKGKVFQEPLFYEA----PRSVDWREKGYVTPVKNQGQCGSCWAFSATG
ALEGQMFRKT CathBx3 PPQRVMFTEDLKLPAS--FDAREQWP
QCPTIKEIRDQGSCGSCWAFGAVEAISDRICIHT Staphopain
---------------------ETQGNN-------------GWCAGYTMSA
LLN-------
. . Prim.cons.
P33333F3E3L333A2VN2P34V2WRQKG3VTPVKNQGSCGSCWAFSAV4
A2EG3I3I3T
9
Structural inferences
  • The most highly conserved regions are likely to
    correspond to the active site.
  • Regions rich in insertions and deletions probably
    correspond to surface loops.
  • A position containing a conserved Gly or Pro
    probably corresponds to a turn.
  • A conserved pattern of hydrophobicity with
    spacing 2 (that is every second residue) with
    the intervening residues more variable and
    including hydrophilic residues suggests a
    b-strand on the surface.
  • A conserved pattern of hydrophobicity with
    spacing 4 suggests a (surface) a-helix.

10
ClustalW/ClustalX
  • Multiple alignment takes place in three steps
  • Pairwise alignment of all sequences
  • Calculating a guide tree
  • Progressive alignment.

11
Guide tree of hexokinases
12
Hexokinase alignment
13
3D aspects - thioredoxins
Loop
a
Loop
Loop
Active site
Loop
b
14
Thioredoxin
Loop
Loop
a
b
Loop
Loop
Active site
15
Sequence logo from alignment
16
Naming conventions in multiple alignments
  • Clustal W only use the first word (i.e. never use
    white space in name)
  • Do not use special symbols use underscore _
    to connect words
  • The protein should be understandable in 15
    characters (truncation)
  • All proteins to be aligned needs an individual
    name.

17
PSI - BLAST
  • Position Iterated Blast
  • For each search round, the aligned results are
    used as the basis for calculating a new
    substitution matrix.
  • New iterations can be carried out as long as new
    hits are found.
  • If no results are found in a normal BLAST,
    PSI-BLAST will not help.
  • Check results carefully!!

18
PSI BLAST of HIT protein
19
First search 103 hits
20
First PSI iteration
21
Second PSI iteration
Write a Comment
User Comments (0)
About PowerShow.com