Building phylogenetic trees - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Building phylogenetic trees

Description:

Trees can be rooted (a common ancestor in known) or unrooted ... Produces a rooted tree, where the root is hypothesized ancestor of the sequences in the tree ... – PowerPoint PPT presentation

Number of Views:824
Avg rating:3.0/5.0
Slides: 32
Provided by: pl959
Category:

less

Transcript and Presenter's Notes

Title: Building phylogenetic trees


1
Building phylogenetic trees
  • Topics in Computational Biology
  • 25.2.2004
  • Pia Laine

2
Contents
  • Phylogeny
  • Phylogenetic trees
  • How to make a phylogenetic tree from pairwise
    distances
  • UPGMA method ( an example)
  • Neighbor-Joining method ( an example)
  • Comparison of methods
  • Conclusion

3
Phylogeny
  • Phylogeny is the evolution of related
    species/genes
  • Phylogenetic tree diagram showing evolutionary
    lineages of species/genes
  • The history of genes or species may be very
    different
  • Genes can be homologous or analogous, but still
    remind each other
  • Homologous sequences can be devided into two
    parts
  • Orthologous sequences diverged by specification
    from a common ancestor
  • Paralogous sequences evolved by gene dublication
    within species
  • Analogous sequences may appear and function very
    similarly, but they do not have a common ancestor
  • WHEN WE WANT TO EXPLORE EVOLUTIONARY
    RELATIONSHIPS, WE NEED TO HANDLE ORTHOLOGOUS
    SEQUENCES

4
Phylogenetic trees
  • WHY construct a phylogenetic tree?
  • to understand lineage of various species
  • to understand how various functions evolved
  • to inform multiple alignments
  • Trees can be rooted (a common ancestor in known)
    or unrooted
  • Leaves are the terminal nodes that correspond to
    the observed sequences of genes or species (A, B,
    C, D)
  • Internal nodes are hypothetical ancestral nodes
  • All trees will be assumed to be binary, meaning
    that an edge that branches splits into two
    daughter edges
  • Each edge has a certain amount of evolutionary
    divergence associated to it, defined by some
    measure of distance between sequences, or from a
    model of substitution of residues over the course
    of evolution

5
Phylogenetic trees
  • Different ways to represent a phylogenetic tree
    (illustrated by Treeview)

6
Different algorithms used to infer phylogeny from
sequence data
  • Distance methods
  • Parsimony
  • Likelihood
  • Probabilistic methods
  • Phylogenetic invariants

7
Route from the molecular sequences to the
phylogenetic tree
  • Distance methods
  • Select a set of related (orthologous) nucleotide
    or amino acid sequences
  • Perform multiple sequence alignment (Clustal
    series widely used)
  • Calculate pairwise distances of the sequence
    using chosen evolution model of substitution
    (Distances between sequences describe the
    evolution the smaller distances are the closer
    they are related)
  • Select the most suitable algorithm to infer
    phylogeny
  • View the tree with a certain program (Treeview,
    NJPlot,..)

8
Making a tree from pairwise distances
  • Distances dij between each pair of sequences i
    and j are calculated in the given dataset
  • Different ways defining distances
  • For nucleotide sequences
  • Jukes-Cantor, Kimura-2-parameter K2P, HKY
    (Hasegawa-Kishino-Yano), F84, Tamura-Nei, General
    time-reversible model, General 12-parameter model
  • For amino acid sequences
  • PAM-matrices, BLOSUM-matrices

9
Distance matrix methods
  • UPGMA
  • Algorithm introduced by Sokal and Michener 1958
  • Neighbor-Joining
  • Algorithm introduced by Saitou and Nei 1987
  • Modified by Studier and Keppler 1988

10
Clustering method UPGMA
  • UPGMA Unweighted pair group method using
    arithmetic averages
  • Simple method
  • It works by clustering the sequences, at each
    stage connecting two clusters and finally
    creating a new node on a tree
  • Method assumes equal rate of evolutionary change
    along branches ? Molecular clock assumption

11
UPGMA
A
C
B
D
  • UPGMA produces a rooted tree
  • Branch lengths satisfy a molecular clock
  • ? The divergence of sequences is assumed to occur
    at the same constant rate at all points in the
    tree
  • Trees that are clocklike are rooted and the total
    branch length from the root up to any leaf is
    equal
  • Trees are often referred to be ultrametric
  • A distance measures are ultrametric if either all
    three distances are equal
  • dij dik djk or two of them are equal and one
    is smaller djk lt dij dik
  • ? UPGMA is guaranteed to build the correct tree
    if distances are ultrametric
  • Method can be used for reconstructing phylogenies
    if evolutionary rates are assumed to be same in
    all lineages ? criticism in the phylogeny
    literature
  • Suitable for the species closely related
  • Running time O(n2)

12
Algorithm UPGMA
  • Initialisation
  • Assign each sequence i in dataset to its own
    cluster
  • Define one leaf of T for each sequence, and
    place at height zero
  • Iteration
  • Find the two clusters i and j for which dij is
    the smallest (pick randomly if several equal
    distances)
  • Define a new cluster ij by Cij Ci U Cj.
    Cluster ij has nij ni nj members ( initially
    ni 1 )
  • Connect i and j on the tree to a new node v
  • The branch lengths from new node to i and j are
  • placed at height

13
Algorithm UPGMA (cont.)
  • Iteration (cont.)
  • Compute the distances between the new cluster
    and the remaining clusters by using
  • Add ij to the current clusters and remove i and
    j
  • Termination
  • When only two clusters i and j remain, place the
    root at height

14
An example UPGMA (1)
  • Distance matrix (arbitrary)
  • for four items (sequences)
  • A, B, C and D
  • Actually distances are not ultrametric, because
    three distances are not equal
  • dij ? dik ? djk or two of them are not equal and
    one is smaller djk lt dij ? dik

Step 1. Find the smallest distance, dij, between
two clusters ? A and C, where dij is 7
15
An example UPGMA (2)
  • Step 2. Define new cluster ij, which has nij
    ni nj
  • members (initially ni 1)
  • New cluster ? A and C
  • nAC nA nC2
  • Step 3. Connect A and C on the tree to a new
    node v1
  • Step 4. The branch lengths from new node v1 to A
    and C

3,5
A
C
3,5
16
An example UPGMA (3)
  • Step 5. Compute the distances between the new
    cluster AC and the remaining clusters (B and D)
  • Step 6. Delete the columns and rows of the
    distance matrix that correspond to clusters A and
    C, and add a column and a row for cluster AC

?New distance matrix
17
An example UPGMA (4)
  • 2nd iteration process
  • Step 1. Find the two sequences i and j for which
    dij
  • is the smallest (randomly if several equal
    distances)
  • ?AC-B
  • Step 2. Define new cluster (ij), which has nij
    ni nj
  • members ( initially ni 1 ) New cluster ? AC and
    B
  • nACB nAC nB 2 1 3
  • Step 3. Connect AC and B on the tree to a new
    node v2
  • Step 4. The branch lengths from new node v2 to AC
    and B
  • ?

3,5
A
C
3,5
B
4,25
18
An example UPGMA (5)
  • Step 5. Compute the distances between the new
    cluster and the remaining cluster (D)
  • Step 6. Delete the columns and rows of the
    distance matrix that correspond to clusters AC
    and B, and add a column and a row for cluster ACB

?New distance matrix
19
An example UPGMA (6)
  • Termination
  • Only two clusters (ACB and D) remaining
  • Place the root height

Original distance matrix and final phylogenetic
tree(including the branch lengths)
3,5
A
0,75
C
1,92
3,5
B
4,25
D
6,17
20
Neighbor-Joining (N-J)
D
B
  • Another algorithm that works by clustering the
    sequences
  • Does not assume molecular clock
  • N-J trees are unrooted
  • N-J assumes additivity
  • Def. Edge lengths are said to be additive if the
    distance between any pair of leaves is the sum of
    lengths of the edges on the path connecting them
  • Method uses an approximate algorithm, where the
    tree is built by finding a pair of neighboring
    leaves i and j that minimize the length of the
    tree. Finally neighboring leaves are joined.
  • Running time O(n2)

A
C
21
Algorithm Neighbor-Joining
  • Initialisation
  • Define T to be the set of leaf nodes, one for
    each given sequence
  • Iteration
  • Compute for each sequence,
    where n is the number of sequences in the
    distance matrix
  • Pick a pair i and j (for which dij ui uj is
    the smallest (pick randomly if several equal)
  • Join items i and j with a new node v
  • Compute the branch lengths from a new node v to
    items i and j
  • Compute the distances between new node v and
    remaining items
  • Remove i and j from the distance matrix and
    replace them by new node v
  • Termination
  • When only two items i and j remain, add the
    remaining edge between i and j, with length dij

22
An example N-J (1)
Step 1. Compute for each row in distance
matrix Step 2. Compute (the lower-diagonal
matrix) and choose the smallest (most
negative)
23
An example N-J (2)
  • Step 3. Join A and B together with a new node
    v1. Compute the edge lengths, from A to node v
    and from B to node v1
  • Step 4. Compute distances between the new node
    v1 and remaining items (C and D)

B
5
v1
3
A
24
An example N-J (3)
New reduced distance matrix
  • Step 5. Delete A and B from the distance matrix
    and replace them by new item AB
  • Step 6. Continue from step 1, because more than
    two items remain
  • Step 1. Compute
  • for each row in
  • distance matrix
  • Step 2 Compute
  • and choose
  • the smallest (the lower-diagonal matrix)

25
An example N-J (4)
  • Step 3 Join v1 and C together with a new node
    v2. Compute the edge lengths, from v1 to node v2
    and from C to node v2
  • Step 4 Compute distances between the new node v2
    and remaining items (D)

B
5
v1
v2
1
3
3
A
C
26
An example N-J (5)
  • Step 5 Delete AB and C from the distance matrix
    and replace them by ABC
  • Step 6 Only two nodes remaining ? connect them

Original distance matrix and final phylogenetic
tree (including the edge lengths)
D
8
B
5
1
3
3
A
C
27
Comparison
  • Neighbor-joining
  • Unrooted tree, where the direction of evolution
    is unknown
  • Suitable for datasets with largely varying rates
    of evolution
  • Suitable for large datasets
  • UPGMA
  • The total branch length from the root up to any
    leaf is equal
  • Produces a rooted tree, where the root is
    hypothesized ancestor of the sequences in the
    tree
  • Suitable for closely related sequences
  • Can be used to infer phylogenies if one can
    assume that evolutionary rates are the same in
    all lineages

D
8
3,5
A
B
5
C
3,5
1
B
3
3
A
C
4,25
D
6,17
28
Conclusion
  • UPGMA method constructs a rooted phylogenetic
    tree correctly if there is a molecular clock with
    a constant rate of mutation
  • UPGMA method is rarely used, because molecular
    clock assumption is not generally true selection
    pressures vary across time periods, genes within
    organisms, organisms, regions within gene
  • N-J method produces an unrooted tree without
    molecular clock hypothesis
  • N-J method is one of the most popular and widely
    used by molecular evolutionist
  • Distance methods are strongly dependent on the
    model of evolution used
  • Sequence information is reduced when transforming
    sequence data into distances
  • Distance methods are computationaly fast

29
Reference
  • Durbin, R., Eddy, S., Krogh, A., Mithchison G.
    2003 Biological sequence analysis Probabilistic
    models of proteins and nucleic acid. Campridge
    University Press.
  • Li, W. 1997. Molecular Evolution. Sinauer
    Associates, Sunderland, MA.  p. 108
  • Felsenstein, J. 2003. Inferring Phylogenies.
    Sinauer Associates, Sunderland, MA. p.147-170

30
Examples of phylogeny programs
  • Multiple sequence alignment
  • Clustal series (W, V) (free, http//www-igbmc.u-st
    rasbg.fr/BioInfo/ClustalX/Top.html )
  • Phylogeny packages
  • PAUP (http//paup.csit.fsu.edu/ )
  • Phylip (free, http//evolution.gs.washington.edu)
  • MEGA (free, http//www.megasoftware.net)
  • Viewing/plotting phylogenetic trees
  • Treeview (free, http//taxonomy.zoology.gla.ac.uk/
    rod/treeview.html)
  • NJPlot (free, http//pbil.univ-lyon1.fr/software/n
    jplot.html)

31
Further reading
  • N-J Saitou, N. and M. Nei.1987. The
    neighbor-joining method a new method for
    reconstructing phylogenetic trees. Mol Biol Evol
    4(4) 406-25.
  • N-J Studier, J. A., K. J. Keppler, et al. 1988.
    A note on the neighbor-joining algorithm of
    Saitou and Nei The neighbor-joining method a new
    method for reconstructing phylogenetic trees. Mol
    Biol Evol 5(6) 729-31.
  • UPGMA Michener, C. D., and R. R. Sokal. 1957. A
    quantative approach to a problem in
    classification. Evolution 11 130-162.
  • ClustalW Thompson, J. D., T. J. Gibson, et al.
    1997. The CLUSTAL_X windows interface flexible
    strategies for multiple sequence alignment aided
    by quality analysis tools. Nucleic Acids Res
    25(24) 4876-82.
Write a Comment
User Comments (0)
About PowerShow.com