What do we do with these extra people Navigating the functional universe - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

What do we do with these extra people Navigating the functional universe

Description:

Human genome project revealed remarkable genetic similarity. between ... Pedigree of proband's family. Can variants be classified based on proband pedigrees? ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 45
Provided by: rch68
Category:

less

Transcript and Presenter's Notes

Title: What do we do with these extra people Navigating the functional universe


1
Lecture 8Research in the Karchin LabAnnotating
human genomic variation with computational
biology
Rachel Karchin BME 580.688/580.488 and CS
600.488 Spring 2009
2
Programming Assignment 2 run times on Influenza
genome
Max 367 s Min 13.1s
3
Human genome project revealed remarkable genetic
similarity between individual humans
Image courtesy of genome.gov
  • Our DNA is 99.9 identical
  • The 0.1 that makes us different is a key to
    inter-individual disease susceptibilities and
    sensitivity to medications

4
In the future, many human frailties will be
linked to inherited genetic mutations
5
A single inherited amino acid change can result
in sickle cell anemia or fatal bleeding events
with low dosage of warfarin
Sensitivity to medications
  • Disease susceptibilities

R144C
Q6V
Cytochrome P450
Hemoglobin
Sickle cell anemia
Warfarin-induced bleeding
6
There are many kinds of common genetic variants
in different regions of our DNA
7
There are many kinds of common genetic variants
in different regions of our DNA
8
There are many kinds of common genetic variants
in different regions of our DNA
9
There are many kinds of common genetic variants
in different regions of our DNA
10
There are many kinds of common genetic variants
in different regions of our DNA
11
There are many kinds of common genetic variants
in different regions of our DNA
12
Most known disease-associated human genetic
variants are missense mutants
R144C
13
There are many ways that missense mutants can
impact protein function
  • Protein aggregates and does not fold
  • Protein is destabilized and unfolds
  • Binding interfaces are disrupted
  • Active sites are disrupted

Williams and Glover, 2003
14
There are many ways that the functional impact of
a missense mutant can be assessed
  • Biochemical experiments
  • Physics
  • Epidemiology
  • Computational biology

15
Some computational biology methods for missense
mutant function prediction
  • Ng and Henikoff, 2001
  • PSSM SIFT
  • Sunyaev et al., 2001
  • evolution, structure
  • Chasman and Adams 2001
  • evolution, structure
  • Saunders and Baker, 2002
  • PSSM, solvent accessibility
  • Krishnan and Westhead 2003
  • sequence, evolution, structure
  • Cai et al. 2004
  • sequence, evolution, structure

substitution likelihood threshold
empirical rules
probabilistic
decision tree, logistic regression
decision tree, SVM
Bayesian network
16
Properites of wild-type and mutant amino acids
are a simple predictive feature
17
Predictive features can be computed from protein
tertiary structure
X-ray crystal structure ofE. coli Lac Repressor
Solvent accessibility (Å2) Methyl(ene) groups
within 6Å Backbone dihedral angles
PHI
PSI
18
Predictive features can be computed from the
evolutionary history of an amino acid residue
19
Computational biology takes advantage of 25,000
annotated missense mutations in databases and
publications
.0092 CYSTIC FIBROSIS CFTR, ARG352GLN In a
systematic study of 133 CF individuals in
northern Italy, Gasparini et al. (1993)
identified an arg352-to-glu mutation.
20
Comprehensive biochemicalmutation analysis
neutral (functional) mutant- deleterious
(non-functional) mutant ts temperature-sensitive
mutant
Markiewicz et al. 1994
  • Published studies available for human p53, HIV
    protease, E coli lac repressor, bacteriophage T4
    lysozyme and others (Markiewicz 1992, Rennell
    1991, Loeb 1989, Kato 2003)

21
Annotated missense mutations can help us quantify
the information content ofpredictive features
that can be represented on a computer
  • The missense mutants allow us to test hypotheses
    that a predictive feature is informative.
  • We can measure the correlation of the predictive
    feature of interest and mutations annotated
    functional class functional or nonfunctioanl.

22
Which predictors contain the most information?
Mutant dataset
p53 is a transcription factor that regulates the
cell cycle
398 deleterious missense mutants220 neutral
missense mutants
Kato et al. PNAS 10014, 2003
Amino acid change
Protein structure
Evolutionary conservation
Information
Random
Actual
Maximum
Karchin, Kelly and Sali Pac. Symp. Bio
2005397-408.
23
Predictors are combined in supervised machine
learning
YDeleterious/Neutral
X1..XkPredictors

24
SVMs trained on biochemical or clinical mutant
sets can predict whether missense mutants have
functional impact
SVM Performance
Cross-validation training/testing set
Clinically characterized mutantsand dbSNP
polymoprhisms1840 human proteins
Biochemically characterized mutantshuman p53
(Kato et al. 2003)
25
Clinically and biochemically characterized
mutants can be used to train classifiers
26
Relevance of deleterious BRCA1 or BRCA2 mutation
BRCA1
BRCA2
Eeles et al. JCO, 2000
27
A possible scenario of BRCA testing with
BRACAnalysis full sequencing
  • Mother diagnosed last month with breast cancer
  • Aunt with breast cancer was paternal, died at 45

28
  • BRCA1 wildtype
  • BRCA2 variant of undetermined significance
    (VUS) H2116R

29
Can variants be classified based on proband
pedigrees?
Pedigree of probands family
30
SVM predictions for BRCA1 missense mutants are
highly correlated with results of biochemical
assays
  • Predictions by SVM trained on biochemically
    characterized P53 mutants are in agreement with
    transactivation assay for 27 out of 28 mutants in
    BRCT domains.

Monteiro TIBS 2000
BRCT DOMAINS
Karchin, Monteiro et al. PLoS Compbio 2007
31
(No Transcript)
32
Evaluation of classifiers by agreementwith BRCA1
in vitro functional assay
Supervised learning algorithms using sequence
and structure
Naïve Bayes Support Vector Machine Random
Forest Decision Tree
Evolutionary-basedmethods using onlysequence
analysis
Ancestral Sequence AGVGD to sea Urchin AGVGD to
Tetraodon SIFT
Rule-based methodsusing sequence andstructure
Rule-based decision tree
33
Predicted deleterious mutations provide clueto
previously uncharacterized BRCA1 binding site
34
BRCA1 interaction partners
BRCA1
image courtesy of Dr. Alvaro Monteiro
35
BRCA1 and CBP/P300 interaction ?
According to Pao et al. two BRCA1 fragments
(1-303 and 1314-1863) interact with residues
451-721 of CBP. This region of CBP contains
coiled-coils and the KIX domain. However, the
best in our model was of BRCA1 BRCTs in complex
with the bromodomain of p300 (known to have
acetyl-transferase activity). A rerun of
PatchDock using the CBP KIX domain (PDB 1kdx)
was done with all the candidate patches on the
BRCTs that are hotspots for transactivation-defici
ent mutants in our assay. The CBP KIX fits well
at the phosphopeptide binding site, although
interestingly, the BRCA1-CBP binding has been
shown to be phosphorylation-independent.
36
(No Transcript)
37
Computational biology approach to missense mutant
impact can be scaled up to scan an entire genome
  • Prioritize variants to be genotyped and tested in
    a candidate-gene association study
  • Predict causal variant in a region identified by
    candidate gene study
  • Provide a resource that biochemists can use to
    develop hypotheses about molecular mechanisms
    underlying the deleterious effects of variants

38
A pipeline for large-scale non-synonymous SNP
annotation
dbSNP
UNIPROTProteins
RefSeq mRNA
Predicted Genes
Hsu et al. 2006
Genomic DNA
SNPs
Ryan, Diekhans, Lien, Liu, Karchin
Bioinformatics 2009
39
nsSNPs are identified by projection mapping of
proteins to genome
Ryan, Diekhans, Lien, Liu, Karchin
Bioinformatics 2009
40
http//ls-snp.icm.jhu.edu/ls-snp-pdb/
41
http//ls-snp.icm.jhu.edu/ls-snp-pdb/
42
Search criteria Chr 11 q13, moderate/radical
amino acid change,medium/high conservation,
domain interface proximity
  • nsSNP (D-gtA) in barrier-to-autointegration factor
    (BAF) protein (protects DNA from retrovirus
    integration)
  • D is a critical residue in a network of salt
    bridges, which connect two halves of the BAF
    homodimer
  • By decreasing stability of BAF, A could improve
    the resistance to viral infection in individuals
    with this variant.

Ryan, Diekhans, Lien, Liu, Karchin
Bioinformatics 2009
43
Limitations of large-scale classification of
variants
  • Black-box (not interpretable)
  • Tuned to decisions of the methods designers, not
    the users
  • Selection of important features
  • Choice of training and/or validation sets
  • Make user-configurable plug-in classifiers ?

44
What next?
  • Integrating sequence variation information with
    measurements of
  • mRNA transcript abundance
  • DNA methylation
  • miRNA abundance
  • exon expression levels
  • copy number variation
Write a Comment
User Comments (0)
About PowerShow.com