Using physical-chemical properties of amino acids to model site-specific substitution propensities. - PowerPoint PPT Presentation

About This Presentation
Title:

Using physical-chemical properties of amino acids to model site-specific substitution propensities.

Description:

Using physical-chemical properties ... based on which physical and chemical properties are important at each site to ... physical and chemical properties. ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 33
Provided by: DerekD45
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Using physical-chemical properties of amino acids to model site-specific substitution propensities.


1
Using physical-chemical properties of amino
acids to model site-specific substitution
propensities.
  • Rose Hoberman
  • Roni Rosenfeld
  • Judith Klein-Seetharaman

2
HeterogeneityAcross Sites
  • Substitution rate varies across sites
  • rate parameter assumed to follow a gamma
    distribution
  • mathematically convenient
  • little biological justification
  • provides little explanation

3
HeterogeneityAcross Sites
  • Rate of substitution varies across sites
  • rate parameter distributed according to a gamma
    distribution
  • mathematically convenient
  • little biological justification
  • provides little explanation
  • Substitution propensities vary across sites
  • leads to an explosion of parameters (400)
  • still no biological explanation

4
Explaining Why Substitution Propensities Vary
  • Differing substitution propensities are a result
    of different amino acid preferences (Halpern
    Bruno, Koshi Goldstein)
  • e.g. substitutions to deleterious amino acids are
    unlikely
  • Learning amino acid preferences at each site (20
    vs 400 parameters)
  • still too many parameters to estimate accurately
  • still not biologically informative

5
Our Modeling Assumption
  • Amino acids preferences are based on which
    physical and chemical properties are important at
    each site to the function or structure of the
    protein
  • restricts the parameter space (3-5)
  • provides more explanation

6
A New Statistical Model of Site-Specific
Molecular Evolution
  • Learn which properties are important at each site
  • Model amino acid preferences as a function of
    their properties
  • Determine a mapping from amino acid preferences
    to substitution propensities
  • Combine property-based substitution propensities
    with other factors that effect substitutions
  • nucleotide mutation processes
  • different distances between codons

7
A New Statistical Model of Site-Specific
Molecular Evolution
  • Learn which properties are important at each site
  • dont rely on structural knowledge about the
    protein
  • do not artificially restrict to a few preselected
    physical features
  • Model amino acid preferences as a function of
    their properties
  • Determine a mapping from amino acid preferences
    to substitution propensities
  • Combine substitution propensities with codon
    distance and nucleotide mutation rates

8
250 Amino Acid Properties
1 Hydrophobicity
2 Volume
3 Net charge
4 Transfer free energy
...
248 Average flexibility
249 Alpha-helix propensity
250 Number of surrounding residues

(Downloaded from http//www.scsb.utmb.edu/comp
biol.html/venkat/prop.html)
9
250 Amino Acid Properties
1 Hydrophobicity
2 Volume
3 Net charge
4 Transfer free energy
...
248 Average flexibility
249 Alpha-helix propensity
250 Number of surrounding residues

A C D E F G H I K L 0.66 2.87 0.10 0.87 3.15 1.64 2.17 1.67 0.09 2.77
M N P Q R S T V W Y 0.67 0.87 1.52 0.00 0.85 0.07 0.07 1.87 3.77 2.67
10
Visualizing the Amino Acid Distribution
  • FAMLR...
  • LAMLR...
  • IAMLR...
  • P-EL-...
  • GAELR...
  • PGEIR...
  • L-ELY...
  • L-EVR...
  • I-MLK...
  • WAELR...
  • HAELY...
  • YAILY...
  • WAML-...

11
Variance
  • FAMLR...
  • LAMLR...
  • IAMLR...
  • P-EL-...
  • GAELR...
  • PGEIR...
  • L-ELY...
  • L-EVR...
  • I-MLK...
  • WAELR...
  • HAELY...
  • YAILY...
  • WAML-...

12
Limitations of Variance
13
Limitations of Variance
Our assumption when selection is based on a
single property, distribution should be unimodal
14
Using Gaussian Goodness-of-Fit to Test for
Property Conservation
  • Fit a maximum-likelihood Gaussian to amino acid
    frequencies in property space
  • From (discretized) Gaussian calculate expected AA
    frequencies
  • Calculate goodness-of-fit to learned
    Gaussian
  • identifies unimodal distributions
  • penalizes missing amino acids (holes)
  • Use Monte-Carlo method to calculate significance
  • Otherwise will have high false discovery rate
    when entropy is low

15
GPCR-A Family
  • Characterized by 7 TM segments
  • Responds to a large variety of ligands
  • Ligand binding allows binding and activation of a
    G protein
  • Diversity in sequences
  • Believed to share similar structure
  • Only known structure is for Rhodopsin

16
(No Transcript)
17
(No Transcript)
18
Results for GPCR
19
Estimating the False Discovery Rate (FDR)
Properties Tested Significance Threshold Significant Positions Significant Positions FDR
Properties Tested Significance Threshold Number Expected Number Detected FDR
5 0.0005 0.63 76 0.8
5 0.001 1.26 85 1.5
5 0.005 6.26 136 4.6
50 0.0005 6.25 103 6.1
50 0.001 12.34 130 9.5
240 0.0005 28.61 154 18.6
FDR false positives / predicted positives
20
Initial Validation
  • Charge conserved at 134
  • part of D/E R Y motif of importance to binding
    and activation of G-protein
  • Size conserved at 54, 80, 87, 123, 132, 153, 299
  • helix faces one or two other helices
  • Cluster of dynamics properties conserved in third
    cytoplasmic loop
  • in Rhodopsin this is the most flexible
    interhelical loop

21
Continuing Work
  • Use multivariate Gaussian to model selection
    pressure from multiple properties
  • Derive substitution propensities from amino acid
    preferences and combine these with codon distance
    effects and nucleotide mutation rates

22
Thank YouRoni RosenfeldJudith
Klein-SeetharamanNSF
23
Summary
  • Proposed a new approach for modeling
    heterogeneity of the evolutionary process across
    sites
  • Designed a test that is able to identify which
    properties are conserved at different sites
  • Promising approach for modeling site-specific
    substitution propensities in a biologically-realis
    tic and computationally tractable way

24
Significance
  • Problem for positions with low entropy, every
    property will have low variance
  • very high false positive rate any combination of
    1 more more properties can explain this!
  • actual explanation may involve several properties
  • In this case, multiple property constraints
  • Cannot determine which one property is conserved
  • Need to condition on entropy

25
Significance Testing
  • What is the probability of a property having low
    variance in this position purely by chance?
  • Generate a large set of random (shuffled)
    property scales
  • show examples of shuffling
  • Calculate variance for each random property
  • The distribution of this statistic can be used to
    calculate a threshold for acceptability of
    false-positives
  • Show picture here? add error bars?

26
Gaussian Significance I
27
(No Transcript)
28
Related Work
Koshi Goldstein 1998
Halpern Bruno 1997
29
New Model
Model of One Fitness Class
Model of Multiple Sequences from one Protein
Family
30
Abstract
  • Existing models of molecular evolution capture
    much of the variability in mutation rates across
    sites. More biologically realistic models also
    seek to explain site-specific differences in
    substitution propensities between residue pairs,
    leading to more accurate and informative models
    of evolutionary dynamics. Toward this end, we
    describe a procedure for systematically
    characterizing the conservation of each position
    in a multiple sequence alignment in terms of
    specific physical and chemical properties. We use
    a Monte-Carlo method to ascertain the statistical
    significance of the findings and to control the
    False Discovery Rate. We use our method to
    annotate the diverse GPCRA family with a
    selection pressure profile. We demonstrate the
    computational and statistical significance of the
    properties we have identified, and discuss the
    biological significance of our findings. The
    latter include confirmation of experimentally
    determined properties as well as novel testable
    hypotheses.

31
Results
32
Novel Hypothesis
  • 175 and 265 highly similar conservation patterns
  • Both tryptophans in rhodopsin
  • Trp265 in direct contact with retinal ligand, but
    when exposed to light, crosslinks to Ala169
    instead.
  • Trp161 has been proposed to contribute to this
    process
  • The property conservation patterns suggest Trp175
    has a more significant role
  • This hypothesis can be tested experimentally
Write a Comment
User Comments (0)
About PowerShow.com