Multiple Sequence Alignment - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Multiple Sequence Alignment

Description:

Calculates synonymous and non-synonymous substitution rates for codon-aligned ... Distmat calculates the evolutionary distances between every pair of sequences in ... – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 32
Provided by: Niv76
Category:

less

Transcript and Presenter's Notes

Title: Multiple Sequence Alignment


1
Multiple Sequence Alignment
  • ClustalW
  • TCoffee
  • Ka, Ks, and Ka/Ks
  • Anchored alignment

2
ClustalW
  • http//www.ebi.ac.uk/clustalw/

3
ClustalW
4
ClustalW
Paste your sequences
Run
5
Results
Press Start Jalview for interactive view of the
alignment
6
ClustalW output format
Guide Tree Cladogram
7
Exercise
  • HomoloGene is a system for automated detection of
    homologs among the annotated genes of several
    completely sequenced eukaryotic genomes.
  • Download the FASTA sequences of HomoloGene5276
    and align them with ClustalW

8
(No Transcript)
9
Result
10
TCoffee
  • http//www.tcoffee.org/

Tcoffee computes its alignments by combining a
collection of smaller alignments
11
Main features
  • Multiple Sequence Alignment
  • Structure based Multiple Sequence Alignment
  • Combining the output of several multiple sequence
    alignment packages
  • Combining two (or more) multiple sequence
    alignments into a single one
  • Turning amino acid alignments into CDS nucleotide
    alignments

12
Exercise
  • The 18-kDa protein plays an important role in
    fertilization of several abalone species
  • Build a multiple sequence alignment using the
    following sequences

13
Sequences
  • gtgi604533gbAAC37231.1 fertilization protein
  • MRSLVLLCVLLMAICAADKKTSVSKENEAAMKVAMMKFLDMKAGVFKEII
    EDMGYPITPPQWTTLLYYNR
  • ERLIEFCRSFLALSKKIILLGGNKLNKANFARMGRILGWKSQWAVRQRQW
    GMVRVSRRHTSTAIAKRIVA
  • MKVADLPCN
  • gtgi604531gbAAC37233.1 fertilization protein
  • MRFLLLLCVLMGAVSQAVCRKRPNVWGKIVVKEKNKAAMKIGFMEYLDAK
    LVKFKRHWLVGANWKLQKFE
  • TDEMRYLAIKRLIKVCHGYTIWSQRLIMLKYRPLNEKYFKKVGRYLAWRN
    YLIVFRMWIGVLKKNLKRSE
  • ITKPMQKLLDTKDGELPCPVRKIHG
  • gtgi604529gbAAC37232.1 fertilization protein
  • MRSLVLLCVLMAVGCVAFDDVVVSRQEQSYVQRGMVNFLDEEMHKLVKRF
    RDMRWNLGPGFVFLLKKVNR
  • ERMMRYCMDYARYSKKILQLKHLPVNKKTLTKMGRFVGYRNYGVIRELYA
    DVFRDVQGFRGPKMTAAMRK
  • YSSKDPGTFPCKNEKRRG
  • gtgi604527gbAAC37230.1 fertilization protein
  • MRSLVLLCVLLMAICAADKKTTVSKENAAAMKIAMIKFLDARAGKFKKRV
    ENMGYPITPPQWTTLLYYNR
  • QRLMEWCHTYVEFSKKIILMGGNKLNKKNFTRMGRIIGWKNQWVLKRRQW
    EMVRVMRRYKSTAIAKKIVA
  • MKVADLPCN

14
Choose TCoffee Regular, paste the sequences in
the data box, and press submit
15
(No Transcript)
16
Estimating the rate of evolution
  • In order to study selection patterns, you will
    need to have the corresponding DNA alignment
  • By using the PROTOGENE (Protein-to-Gene) in
    Tcoffee, the amino-acid alignment will be
    transformed into the corresponding DNA
    alignment. The actual procedure is tBLASTn.

17
(No Transcript)
18
Results
19
In case it takes too long
  • gtgi604533gbAAC37231.1_G_L36554 _S_ AAC37231
    _DESC_ fertilization protein MATCHES_ON Haliotis
    assimilis fertilization protein mRNA, complete
    cds
  • ATGAGGTCTTTGGTGCTTCTCTGTGTTTTGCTGATGGCAATATGTGCGGC
    GGAC------
  • ------------------AAAAAAACCTCGGTCTCGAAGGAAAATGAAGC
    CGCAATGAAG
  • GTAGCGATGATGAAGTTTTTGGATATGAAGGCGGGTGTATTCAAAGAAAT
    C---ATTGAG
  • GATATGGGATATCCAATAACCCCTCCGCAATGGACAACTCTACTGTACTA
    CAACAGAGAG
  • AGATTGATTGAATTTTGCCGTTCCTTCCTTGCATTGTCCAAAAAGATTAT
    ATTGCTGGGA
  • GGTAACAAATTAAATAAGGCGAACTTCGCTAGGATGGGTCGAATCCTTGG
    CTGGAAAAGC
  • CAGTGGGCTGTGAGACAGAGGCAATGGGGGATGGTCAGA---------GT
    GTCGAGGCGC
  • CATACAAGTACTGCAATAGCTAAAAGGATCGTCGCCATGAAAGTTGCTGA
    CCTACCCTGT
  • AAC------------------TAG
  • gtgi604531gbAAC37233.1_G_L36590 _S_ AAC37233
    _DESC_ fertilization protein MATCHES_ON Haliotis
    corrugata fertilization protein mRNA, complete
    cds
  • ATGAGGTTTTTGCTGCTTCTCTGTGTTTTGATGGGGGCAGTATCTCAGGC
    AGTATGCAGA
  • AAAAGACCTAATGTCTGGGGGAAAATCGTGGTCAAGGAGAAAAATAAAGC
    CGCAATGAAG
  • ATAGGGTTTATGGAATATTTGGATGCAAAGTTGGTAAAGTTTAAAAGGCA
    CTGGCTTGTT
  • GGAGCCAATTGGAAACTTCAAAAATTTGAAACGGATGAAATGAGATACCT
    CGCCATAAAG
  • AGACTGATAAAAGTTTGCCATGGATACACTATTTGGTCCCAACGACTAAT
    AATGTTAAAA
  • TATCGACCATTGAATGAGAAATACTTCAAAAAGGTGGGTCGATACCTTGC
    CTGGCGAAAC
  • TACCTCATAGTTTTTCGGATGTGGATCGGCGTTTTG------AAGAAAAA
    TCTTAAAAGA
  • TCGGAAATAACGAAACCCATGCAAAAACTCCTCGACACAAAGGATGGTGA
    GTTGCCCTGC

20
SNAP - Ds/Dn Calculation tool
  • http//hcv.lanl.gov/content/sequence/SNAP/SNAP.htm
    l
  • Calculates synonymous and non-synonymous
    substitution rates for codon-aligned nucleotide
    sequences according to Nei and Gojobori (1986)
    method.
  • This program will only yield valid results if the
    input alignment is codon-aligned

21
SNAP - Ds/Dn Calculation tool
  • Using the alignment we obtained previously.
  • Averages of all pairwise comparisons
  • ds 0.3510, dn 0.3535, ds/dn 0.8241
  • The positive selection in sperm protein genes
    from abalone (genus Haliotis) is assumed to be
    the result of species-specific interaction with
    egg surface proteins during fertilization
    (Swanson and Vacquier 1998).

22
Distmat
http//sbcr.bii.a-star.edu.sg/cgi-bin/emboss/menu/
distmat
Distmat calculates the evolutionary distances
between every pair of sequences in a multiple
alignment. The distances are expressed in terms
of the number per 100 nucleotides or number of
replacements per 100 amino acids
23
Distmat
  • Feed the DNA alignment of 18-kDa protein into
    distmat.
  • Calculate separately the distances between the
    sequences for codon positions 1 and 2, and for
    codon position 3.
  • Are the results in agreement with those from the
    dn/ds analysis?

24
Distmat
25
http//dialign.gobics.de/anchor/submission.php
User manual
http//dialign.gobics.de/anchor/manual
Align the following sequences (use the file
dalign_sequences.txt) gtseq1 WKKNADAPKRAMTSFMKAA
Y gtseq2 WNLDTNSPEEKQAYIQLAKDDRIRYD gtseq3
WRMDSNQKNPDSNNPKAAYNKGDANAPK
26
Results
  • DIALIGN composes alignments from fragments
  • Lower-case letters denote residues not belonging
    to any of these selected fragments. They are not
    considered to be aligned.

27
Results
  • Numbers below the alignment roughly reflect the
    degree of local similarity among the sequences

28
Anchored alignment
  • Now, let us assume that the user has some expert
    knowledge concerning a certain domain that is
    present in all the input sequences
  • The domains marked in red in the three sequences
    are thought to be homologous to one another

gtseq1 WKKNADAPKRAMTSFMKAAY gtseq2
WNLDTNSPEEKQAYIQLAKDDRIRYD gtseq3
WRMDSNQKNPDSNNPKAAYNKGDANAPK
29
  • Therefore, the user wants to define this domain
    as anchor and align the rest of the sequences
    automatically.
  • To specify a set of anchor points, each anchor
    point corresponds to a equal-length segment pair
    involving two of the input sequences should be
    defined

30
  • first sequence involved
  • second sequence involved
  • start of anchor in first sequence
  • start of anchor in second sequence
  • length of anchor

31
Results
  • The specified domain is aligned and the remainder
    of the sequences is aligned automatically
    respecting the constraints given by the anchor
    points
Write a Comment
User Comments (0)
About PowerShow.com