Evolution and Scoring Rules - PowerPoint PPT Presentation

1 / 68
About This Presentation
Title:

Evolution and Scoring Rules

Description:

Each block represents a conserved region of a protein family ... PAM: Phylogenetic Tree. PAM: Accepted Point Mutation. Mutability. Total Mutation Rate ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 69
Provided by: sch19
Category:

less

Transcript and Presenter's Notes

Title: Evolution and Scoring Rules


1
Evolution and Scoring Rules
  • Example Score
  • 5 x ( matches) (-4) x ( mismatches)
  • (-7) x (total length of all gaps)
  • Example Score
  • 5 x ( matches) (-4) x ( mismatches)
  • (-5) x ( gap openings) (-2) x (total
    length of all gaps)

2
(No Transcript)
3
(No Transcript)
4
Scoring Matrices
5
Scoring Rules vs. Scoring Matrices
  • Nucleotide vs. Amino Acid Sequence
  • The choice of a scoring rule can strongly
    influence the outcome of sequence analysis
  • Scoring matrices implicitly represent a
    particular theory of evolution
  • Elements of the matrices specify the similarity
    of one residue to another

6
Translation - Protein Synthesis Every 3
nucleotides (codon) are translated into one amino
acid
DNA A T G C 11 RNA A U G
C 31 Protein 20 amino acids
Replication
Transcription
Translation
7
Nucleotide sequence determines the amino acid
sequence
8
Translation - Protein Synthesis
RNA Protein
5 -gt 3 N-term -gt C-term
9
(No Transcript)
10
(No Transcript)
11
Log Likelihoods used as Scoring Matrices PAM
- Accepted Mutations1500 changes in 71 groups
w/ gt 85 similarity BLOSUM Blocks
Substitution Matrix2000 blocks from 500
families
12
Log Likelihoods used as Scoring MatricesBLOSUM
13
Likelihood Ratio for Aligning a Single Pair of
Residues
  • Above the probability that two residues are
    aligned by evolutionary descent
  • Below the probability that they are aligned by
    chance
  • Pi, Pj are frequencies of residue i and j in all
    protein sequences (abundance)

14
Likelihood Ratio of Aligning Two Sequences
15
  • The alignment score of aligning two sequences is
    the log likelihood ratio of the alignment under
    two models
  • Common ancestry
  • By chance

16
  • PAM and BLOSUM matrices are all log likelihood
    matrices
  • More specificly
  • An alignment that scores 6 means that the
    alignment by common ancestry is 2(6/2)8 times
    as likely as expected by chance.

17
BLOSUM matrices for Protein
  • S. Henikoff and J. Henikoff (1992). Amino acid
    substitution matrices from protein blocks. PNAS
    89 10915-10919
  • Training Data 2000 conserved blocks from BLOCKS
    database. Ungapped, aligned protein segments.
    Each block represents a conserved region of a
    protein family

18
Constructing BLOSUM Matrices of Specific
Similarities
  • Sets of sequences have widely varying similarity.
    Sequences with above a threshold similarity are
    clustered.
  • If clustering threshold is 62, final matrix is
    BLOSUM62

19
  • A toy example of constructing a BLOSUM matrix
    from 4 training sequences

20
Constructing a BLOSUM matr.1. Counting mutations
21
Constructing a BLOSUM matr.2. Tallying mutation
frequencies
22
Constructing a BLOSUM matr.3. Matrix of mutation
probs.
23
4. Calculate abundance of each residue (Marginal
prob)
24
5. Obtaining a BLOSUM matrix
25
  • Constructing the real BLOSUM62 Matrix

26
1.2.3.Mutation Frequency Table
27
4. Calculate Amino Acid Abundance
28
5. Obtaining BLOSUM62 Matrix
29
(No Transcript)
30
PAM Matrices (Point Accepted Mutations)
  • Mutations accepted by natural selection

31
PAM Matrices
  • Accepted Point Mutation
  • Atlas of Protein Sequence and Structure,
  • Suppl 3, 1978, M.O. Dayhoff.
  • ed. National Biomedical Research Foundation,
    1
  • Based on evolutionary principles

32
Constructing PAM Matrix Training Data
33
PAM Phylogenetic Tree
34
PAM Accepted Point Mutation
35
Mutability
36
Total Mutation Rate
is the total mutation rate of all amino acids
37
Normalize Total Mutation Rate
38
Mutation Probability Matrix Normalized Such that
the Total Mutation Rate is 1
39
Mutation Probability Matrix (transposed) M10000
40
-- PAM1 mutation prob. matr. --PAM2
Mutation Probability Matrix? -- Mutations that
happen in twice the evolution period of that for
a PAM1
41
PAM Matrix Assumptions
42
In two PAM1 periods
  • A?R A?A and A?R or
  • A?N and N?R or
  • A?D and D?R or
  • or
  • A?V and V?R

43
Entries in a PAM-2 Mut. Prob. Matr.
44
PAM-k Mutation Prob. Matrix
45
PAM-1 log likelihood matrix
46
PAM-k log likelihood matrix
47
PAM-250
48
  • PAM6060, PAM8050,
  • PAM12040
  • PAM-250 matrix provides a better scoring
    alignment than lower-numbered PAM matrices for
    proteins of 14-27 similarity

49
Sources of Error in PAM
50
Comparing Scoring Matrix
  • PAM
  • Based on extrapolation of a small evol. Period
  • Track evolutionary origins
  • Homologous seq.s during evolution
  • BLOSUM
  • Based on a range of evol. Periods
  • Conserved blocks
  • Find conserved domains

51
Choice of Scoring Matrix
52
Global Alignment with Affine Gaps
  • Complex Dynamic Programming

53
Problem w/ Independent Gap Penalties
  • The occurrence of x consecutive
    deletions/insertions is more likely than the
    occurrence of x isolated mutations
  • We should penalize x long gap less than x
  • times of the penalty for one gap

54
Affine Gap Penalty
  • w2 is the penalty for each gap
  • w1 is the _extra_ penalty for the 1st gap

55
Scoring Rule not Additive!
  • We need to know if the current gap is a new gap
    or the continuation of an existing gap
  • Use three Dynamic Programming matrices to keep
    track of the previous step

56
  • S1 is the vertical sequence
  • S2 is the horizontal sequence
  • (From Diagonal) a(i,j) current position is a
    match
  • (From Left) b(i,j) current position is a gap in
    S1
  • (From Above) c(i,j) current position is a gap in
    S2
  • Filling the next element in each matrix depends
    on the previous step, which is stored in the
    three matrices.

57
(No Transcript)
58
Last step a match
a gap in S2
a gap in S1
new gap in S2
a continued gap in S2
a gap in S2 following a gap in S1
59
(No Transcript)
60
(No Transcript)
61
(No Transcript)
62
(No Transcript)
63
(No Transcript)
64
(No Transcript)
65
Decisions in Seq. Alignment
  • Local or global alignment?
  • Which program to use
  • Type of scoring matrix
  • Value of gap penalty

66
Aij10
67
PAM-k log-likelihood matrix
68
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com