Inferring Selection Pressure from Positional Residue Conservation - PowerPoint PPT Presentation

About This Presentation
Title:

Inferring Selection Pressure from Positional Residue Conservation

Description:

e.g. hydrophobicity conserved in interhelical loop regions. Charge conserved at 134 ... 5. Hydrophobicity of physiological L-alpha amino acids ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 45
Provided by: DerekD45
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Inferring Selection Pressure from Positional Residue Conservation


1
Inferring Selection Pressure from Positional
Residue Conservation
  • Rose Hoberman
  • Roni Rosenfeld
  • Judith Klein-Seetharaman

2
Sequence Conservation
  • Conserved positions in proteins are of functional
    and structural importance
  • Many quantitative measures of positional
    conservation
  • Applications
  • predict function/structure
  • model protein families/domains
  • evaluate and refine multiple sequence alignments

3
Objective
  • Given a multiple sequence alignment...
  • Which positions are conserved?
  • Identify specific selectional pressures at each
    position
  • assume amino acids are selected based on their
    underlying physical and chemical properties

4
Related Work
  • Entropy
  • most common measure of conservation
  • amino acids as independent symbols
  • Property-based
  • smallest set of properties
  • small number (binary) properties
  • ignores amino acid frequency

5
Related Work
  • Mutation-based substitution matrices
  • average over many positions/proteins
  • Aggregate experimental scales
  • build a similarity matrix
  • propotypical properties via dimensionality
    reduction
  • average over many properties
  • By aggregating, loses specificity

6
250 Properties

7
250 Properties

8
Multiple Sequence Alignment
  • FAMLR...
  • LAMLR...
  • IAMLR...
  • P-EL-...
  • GAELR...
  • PGEIR...
  • L-ELY...
  • L-EVR...
  • I-MLK...
  • WAELR...
  • HAELY...
  • YAILY...
  • WAML-...

9
Multiple Sequence Alignment
  • FAMLR...
  • LAMLR...
  • IAMLR...
  • P-EL-...
  • GAELR...
  • PGEIR...
  • L-EVY...
  • L-EVR...
  • I-MLK...
  • WAELR...
  • HAELY...
  • YAILY...
  • WAML-...

10
Multiple Sequence Alignment
  • FAMLR...
  • LAMLR...
  • IAMLR...
  • P-EL-...
  • GAELR...
  • PGEIR...
  • L-ELY...
  • L-EVR...
  • I-MLK...
  • WAELR...
  • HAELY...
  • YAILY...
  • WAML-...

11
Limitations of our approach
  • Not modelling phylogenetic relationships
  • Cannot identify property conservation when
    entropy is low

12
Variance
13
Variance
  • If a property is conserved, the distribution of
    its values should have lower variance
  • However, must condition on entropy

14
Holes
15
Unimodality
  • When selection based on a single property,
    fitness is unimodal
  • Look for properties for which all amino acids are
    adjacent

16
Adjacency Limitations
  • Definition of a hole too strict, fails to model
    unimodality
  • Ignores distances between property values

17
Adjacency Limitations
  • Definition of a hole too strict, fails to model
    unimodality
  • Ignores distances between property values
  • Therefore... calculate goodness of fit to a
    Gaussian

18
Gaussian Goodness-of-Fit
  • Fit a maximum-likelihood Gaussian to amino acid
    frequencies in property space
  • Calculate goodness-of-fit to learned Gaussian
  • discretize Gaussian over 20 values
  • calculate statistic

19
Significance
  • High false discovery rate when low entropy
  • Calculate significance using shuffled properties
  • For a position
  • Calculate value for each shuffled property
  • Tells us the likelihood of getting a particular
    value by chance
  • Convert to a p-value

20
Gaussian Significance I
21
Gaussian Significance II
22
Gaussian Significance III
23
GPCR Family A
  • 7 TM segments
  • Cytoplasmic, TM, and extracellular domains
  • Responds to a large variety of ligands
  • Ligand binding allows binding and activation of a
    G protein
  • Diversity in sequences
  • Believed to share similar structure
  • Only known structure is for Rhodopsin

24
False Discovery Rate
25
Computational Validation
  • Predict the occurence of amino acids not yet seen
    in a position of a multiple sequence alignment
  • Subsample sequences from GPCR alignment of sizes
    25, 50, 75, 100, 200
  • Baseline logistic regression with only entropy
    and variability
  • New model incorporates the distance of the
    residues value from the mean of the best fitting
    property Gaussian

26
Computational Validation
27
Results for GPCR
28
Biological Validation
  • Property conservation in all three domains of
    GPCR (not just TM)
  • e.g. hydrophobicity conserved in interhelical
    loop regions
  • Charge conserved at 134
  • part of D/E R Y motif of importance to binding
    and activation of G-protein
  • Size conserved at 54, 80, 87, 123, 132, 153, 299
  • helix faces one or two other helices

29
Biological Validation
  • Helix-coil equilibrium constant conserved at 278
  • a helix capping residue at the EC end of helix VI
  • Dynamic properties conserved
  • especially in third CP loop
  • in Rhodopsin this is the most flexible
    interhelical loop
  • in TM at 215 and 269
  • close proximity to retinal ligand the
    ligand-binding pocket is the most rigid part of
    crystal structure

30
Clustered Positions
  • Positions with high property-conservation
    similarity are often close in space but distant
    in primary sequence
  • 129 and 226 and 131 and 219
  • 162 and 297 both face outside of the bundle at a
    similar height with respect to the membrane

31
Novel Hypothesis
  • 175 and 265 highly similar conservation patterns
  • Both tryptophans in rhodopsin
  • Trp265 in direct contact with retinal ligand, but
    when exposed to light, crosslinks to Ala169
    instead.
  • Trp161 has been proposed to contribute to this
    process
  • The property conservation patterns suggest Trp175
    has a more significant role
  • This hypothesis can be tested experimentally

32
Results
33
Results
34
More Detailed Results
35
Future Work
  • Additional evaluation
  • Consider selection pressure from more than one
    property
  • Use multivariate Gaussians
  • Look for adjacency in 2-D property space
  • Test for unimodality directly
  • Model phylogenetic relationship between sequences

36
Conclusions
  • Method identifies positions with significant
    evidence that a particular property is conserved
  • More discriminatory than entropy
  • Framework is easily extendable to multiple
    properties

37
Thanks!
38
Entropy
  • Most commonly-used measure of conservation
  • Based only on frequency of each amino acid
  • Treats amino acids as independent symbols
  • Amino acids are not uniformly different

39
Property-based Measures
  • Based on Venn diagram of physical-chemical
    properties
  • Find smallest set of properties that explains
    amino acids
  • Considers only a small number of property
    combinations
  • Only binary properties
  • Ignores amino acid frequency

Livingstone Barton, CABIOS, 1993
40
Hybrid Methods
  • Stereochemically-sensitive entropy scores
  • Divide amino acids into clusters (show example?)
  • Calculate entropy on reduced alphabet
  • Clustering is ad hoc
  • Can only consider a limited number properties
  • Still considers each amino acids in a cluster as
    uniformly distant from each amino acid in every
    other cluster
  • Only models limited number of properties at a
    very coarse level

41
More ...
  • Substitution methods
  • Physical-chemical distance matrices or aggregate
    properties
  • DNA, phylogeny, mutation rates

42
Adjacency Significance
  • Positions with small number of amino acids will
    have a high false discovery rate
  • Add equation and table

43
Significance
  • Problem for positions with low entropy, every
    property will have low variance
  • very high false positive rate any combination of
    1 more more properties can explain this!
  • actual explanation may involve several properties
  • In this case, multiple property constraints
  • Cannot determine which one property is conserved
  • Need to condition on entropy

44
Significance Testing
  • What is the probability of a property having low
    variance in this position purely by chance?
  • Generate a large set of random (shuffled)
    property scales
  • show examples of shuffling
  • Calculate variance for each random property
  • The distribution of this statistic can be used to
    calculate a threshold for acceptability of
    false-positives
  • Show picture here? add error bars?

45
Results
Write a Comment
User Comments (0)
About PowerShow.com