Phylogenetic Trees - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Phylogenetic Trees

Description:

A phylogenetic tree is a data structure that stores information regarding the ... to a related type, thus preserving properties such as hydrophobicity or charge. ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 26
Provided by: dav1167
Category:

less

Transcript and Presenter's Notes

Title: Phylogenetic Trees


1
Phylogenetic Trees
  • What?
  • How?
  • Why?
  • Methods

2
What is a phylogenetic tree?
  • A phylogenetic tree is a data structure that
    stores information regarding the relationship of
    several sequences
  • Given an appropriate scoring function, a given
    sequence can always be found to be more related
    to one sequence than to another, non-identical
    sequence. In other words, these relationships
    can be approximated by a binary tree.

3
What relationships are stored in the tree
structure?
  • The relationship represented in a phylogenetic
    tree is a measure of homology.
  • The actual definition of homology is biological
    in nature (the precise definition of which is
    also a matter of some debate), but
    computationally it is usually thought of in terms
    of an identity or similarity score di, j between
    two entities (taxa, sequences, etc.).

4
How is this homology represented in a tree?
  • As seen before, sequence identity/similarity is
    calculated by aligning two or more sequences in a
    multiple sequence alignment.
  • Thus, a phylogenetic tree is simply an
    arrangement of the data inherent within a
    multiple sequence alignment into a tree.

5
Why use phylogenetic trees?
  • This arrangement is useful to biologists because
    it organizes the sequences into their projected
    evolutionary history.
  • Due to the construction method of a phylogenetic
    tree, each node represents a common ancestor.
  • The distance from the leaves to this common
    ancestor is a measure of the evolutionary
    distance between the leaves.

6
Example Phylogenetic trees
?
?
?
?
Problem given several sequences, determine how
to arrange them according to evolutionary
distance.
7
Example Phylogenetic trees
x1
x2
sequence 1
sequence 1
sequence 2
sequence 2
y1
y2
sequence 3
sequence 3
z2
z1
sequence 4
sequence 4
The first arrangement differs from the second
only in that sequence 4 is more closely related
to the others (z1 lt z2). The evolutionary
distance between sequences 1, 2, and 3 remain the
same as evidenced by the horizontal distance from
each leaf to the respective node connecting them
(x1 x2 and y1y2).
8
Why use phylogenetic trees?
  • Thus, the tree structure provides a context for
    the evolutionary information inherent in a
    sequence alignment.
  • This context represents the divergent
    relationship of the sequences.
  • Furthermore, this contextual information is
    weighted according to evolutionary distance.

9
Why use phylogenetic trees? Summary
  • The tree structure shows that two sequences are
    related, how they are related in the context of
    other sequences, and how distantly they are
    related.

10
How is a phylogenetic tree constructed?
  • While the weighted score di, j is known from the
    multiple sequence alignment, the rate of
    evolutionary change is not known.
  • The rate of change depends on mutation rates,
    which are not constant.

11
How is a phylogenetic tree constructed?
  • Mutations can occur in several ways
  • forward mutations in one sequence from the
    original sequence
  • backward mutations in one sequence towards the
    original sequence
  • parallel mutations in two or more sequences
  • insertions in one or more sequences
  • deletions in one or more sequences

12
How is a phylogenetic tree constructed?
  • To re-iterate from the sequence alignment
    discussions not all mutations are alike.
  • Mutation rates vary between organisms.
  • Mutation rates vary with amino acid type.
  • Mutation rates vary with the environment the
    amino acids are in (mutations in the core of the
    protein are less likely than those at the
    surface).

13
How is a phylogenetic tree constructed?
  • Mutation rates vary with mutation type
    (substitutions more likely than
    insertions/deletions).
  • Mutations that conserve properties of the
    original amino acid (such as charge, size, etc.)
    are more likely than those that modify or invert
    those properties (such as changing a
    positively-charged amino acid to a neutral or
    negatively-charged one).

14
Methods to calculate rate of evolutionary change
  • Poisson process model
  • Define Unit Evolutionary Time (average time to
    produce one substitution per 100 amino acids) Tu
    1/100?, solve for ?.
  • Find probability of a substitution and
    approximate rate of change from theoretical
    considerations
  • Not used because this model assumes that rate is
    independent of residue position and amino acid
    type

15
Methods to calculate rate of evolutionary change
  • Amino acid substitution matrix
  • Use empirically-determined matrix obtained by
    comparing many protein sequences to estimate
    probability pi, j that during one evolutionary
    time unit Tu, amino acid i will be substituted by
    residue j.
  • The substitution matrix obtained M (pi, j) is
    called PAM1 matrix (one point-accepted mutation
    per 100 residues).

16
Methods to calculate rate of evolutionary change
  • Since the matrix was calculated using proteins
    with small evolutionary distances (close
    homologs), the matrix is then scaled to
    approximate the probabilities for proteins with
    larger evolutionary distances (more remote
    homologs).
  • PAM250 is one commonly-used matrix (corresponding
    to M250, the 250-th power of the PAM1 matrix)
  • (One problem with this is that zero to the 250-th
    power is still zero)

17
Methods to calculate rate of evolutionary change
  • Using this model, the probability p that a
    substitution at a given site (at either position
    i in sequence X or position j in sequence Y) has
    occurred during t time units is
  • 20
  • p 1 - S (pi, j)(2t) pi
  • i1
  • where p is a column vector (p1,, p20)T
    representing the amino acid composition frequency
    of a given polypeptide

18
Methods to calculate rate of evolutionary change
  • Currently, NCBIs BLAST search tool offers PAM30
    70 and BLOSUM60, 70, 45. The default choice
    (and thus the most commonly-used substitution
    matrix) is BLOSUM60.
  • BLOSUM was created in a manner similar to that of
    the PAM matrix, but using a more diverse set of
    sequences.
  • Thus it is (arguably) more accurate when
    comparing more diverse proteins with less
    sequence homology.
  • Since the function of BLAST is to identify
    (possibly remote) homologs, it is ideal for BLAST
    searches.

19
Nucleotide Sequences
  • Nucleotide sequences are handled differently due
    to their unique properties
  • there is redundancy in the genetic code (multiple
    nucleotide codons specify a given amino acid)
  • nucleotide substitutions dont always translate
    to amino acid substitutions
  • many third positions in the codon
  • introns and other non-coding regions

20
Nucleotide Sequences
  • some substitutions alter the protein sequence at
    more than a single position
  • creation of stop codon
  • frameshift mutation
  • altered promoter/operator/splice site, etc.
  • may totally destroy protein functionality
  • may instead alter only amount of expression
  • other sites (such as poly-A binding) may allow
    normal protein to be expressed, but mRNA is
    rapidly degraded, or not exported from the
    nucleus, or secluded in some cellular organelle

21
Nucleotide Sequences
  • computer simulations have found that the genetic
    code is optimized such that many nucleotide
    substitutions change an amino acid to a related
    type, thus preserving properties such as
    hydrophobicity or charge.

22
Nucleotide Sequences
  • Other features are shared
  • substitution rate is species-dependent
  • forward/backward/parallel mutations
  • insertions/deletions - although they must occur
    in multiples of 3 nucleotides to avoid a
    frameshift mutation

23
Nucleotide Sequences
  • The PAM matrices assumed a discrete Markov chain
    where
  • the 1 PAM matrix is the transition matrix of the
    markov chain
  • the parameters are estimated from close homologs
    using local sequence alignment
  • it is assumed that the two sequences being
    compared are generated using one application of
    the transition matrix gt and thus, that multiple
    substitutions did not occur

24
Nucleotide Sequences
  • it is also assumed that the evolutionary distance
    of more distantly related sequences can be
    modeled by n-times iteration of the Markov chain
    gt although this allows only evolutionary
    distances that are multiples of the evolutionary
    distance used for setting up the PAM matrix
  • again, the only multiple of zero is zero...

25
Nucleotide Sequences
  • Can use continuous Markov process instead of
    discrete Markov chain to avoid the assumptions of
    no multiple substitutions and discrete
    evolutionary time
  • omit long formal definition, but basically
    analogous to Markov chain except the transition
    matrix is substituted by a matrix of transition
    probability functions that depend on time
    parameter t
Write a Comment
User Comments (0)
About PowerShow.com