Title: What do we do with these extra people Navigating the functional universe
1Lecture 8Research in the Karchin LabAnnotating
human genomic variation with computational
biology
Rachel Karchin BME 580.688/580.488 and CS
600.488 Spring 2009
2Programming Assignment 2 run times on Influenza
genome
Max 367 s Min 13.1s
3Human genome project revealed remarkable genetic
similarity between individual humans
Image courtesy of genome.gov
- Our DNA is 99.9 identical
- The 0.1 that makes us different is a key to
inter-individual disease susceptibilities and
sensitivity to medications
4In the future, many human frailties will be
linked to inherited genetic mutations
5A single inherited amino acid change can result
in sickle cell anemia or fatal bleeding events
with low dosage of warfarin
Sensitivity to medications
R144C
Q6V
Cytochrome P450
Hemoglobin
Sickle cell anemia
Warfarin-induced bleeding
6There are many kinds of common genetic variants
in different regions of our DNA
7There are many kinds of common genetic variants
in different regions of our DNA
8There are many kinds of common genetic variants
in different regions of our DNA
9There are many kinds of common genetic variants
in different regions of our DNA
10There are many kinds of common genetic variants
in different regions of our DNA
11There are many kinds of common genetic variants
in different regions of our DNA
12Most known disease-associated human genetic
variants are missense mutants
R144C
13There are many ways that missense mutants can
impact protein function
- Protein aggregates and does not fold
- Protein is destabilized and unfolds
- Binding interfaces are disrupted
- Active sites are disrupted
Williams and Glover, 2003
14There are many ways that the functional impact of
a missense mutant can be assessed
- Biochemical experiments
- Physics
- Epidemiology
- Computational biology
15Some computational biology methods for missense
mutant function prediction
- Ng and Henikoff, 2001
- PSSM SIFT
- Sunyaev et al., 2001
- evolution, structure
- Chasman and Adams 2001
- evolution, structure
- Saunders and Baker, 2002
- PSSM, solvent accessibility
- Krishnan and Westhead 2003
- sequence, evolution, structure
- Cai et al. 2004
- sequence, evolution, structure
substitution likelihood threshold
empirical rules
probabilistic
decision tree, logistic regression
decision tree, SVM
Bayesian network
16Properites of wild-type and mutant amino acids
are a simple predictive feature
17Predictive features can be computed from protein
tertiary structure
X-ray crystal structure ofE. coli Lac Repressor
Solvent accessibility (Å2) Methyl(ene) groups
within 6Å Backbone dihedral angles
PHI
PSI
18Predictive features can be computed from the
evolutionary history of an amino acid residue
19Computational biology takes advantage of 25,000
annotated missense mutations in databases and
publications
.0092 CYSTIC FIBROSIS CFTR, ARG352GLN In a
systematic study of 133 CF individuals in
northern Italy, Gasparini et al. (1993)
identified an arg352-to-glu mutation.
20Comprehensive biochemicalmutation analysis
neutral (functional) mutant- deleterious
(non-functional) mutant ts temperature-sensitive
mutant
Markiewicz et al. 1994
- Published studies available for human p53, HIV
protease, E coli lac repressor, bacteriophage T4
lysozyme and others (Markiewicz 1992, Rennell
1991, Loeb 1989, Kato 2003)
21Annotated missense mutations can help us quantify
the information content ofpredictive features
that can be represented on a computer
- The missense mutants allow us to test hypotheses
that a predictive feature is informative. - We can measure the correlation of the predictive
feature of interest and mutations annotated
functional class functional or nonfunctioanl.
22Which predictors contain the most information?
Mutant dataset
p53 is a transcription factor that regulates the
cell cycle
398 deleterious missense mutants220 neutral
missense mutants
Kato et al. PNAS 10014, 2003
Amino acid change
Protein structure
Evolutionary conservation
Information
Random
Actual
Maximum
Karchin, Kelly and Sali Pac. Symp. Bio
2005397-408.
23Predictors are combined in supervised machine
learning
YDeleterious/Neutral
X1..XkPredictors
24SVMs trained on biochemical or clinical mutant
sets can predict whether missense mutants have
functional impact
SVM Performance
Cross-validation training/testing set
Clinically characterized mutantsand dbSNP
polymoprhisms1840 human proteins
Biochemically characterized mutantshuman p53
(Kato et al. 2003)
25Clinically and biochemically characterized
mutants can be used to train classifiers
26Relevance of deleterious BRCA1 or BRCA2 mutation
BRCA1
BRCA2
Eeles et al. JCO, 2000
27A possible scenario of BRCA testing with
BRACAnalysis full sequencing
- Mother diagnosed last month with breast cancer
- Aunt with breast cancer was paternal, died at 45
28- BRCA1 wildtype
- BRCA2 variant of undetermined significance
(VUS) H2116R
29Can variants be classified based on proband
pedigrees?
Pedigree of probands family
30SVM predictions for BRCA1 missense mutants are
highly correlated with results of biochemical
assays
- Predictions by SVM trained on biochemically
characterized P53 mutants are in agreement with
transactivation assay for 27 out of 28 mutants in
BRCT domains.
Monteiro TIBS 2000
BRCT DOMAINS
Karchin, Monteiro et al. PLoS Compbio 2007
31(No Transcript)
32Evaluation of classifiers by agreementwith BRCA1
in vitro functional assay
Supervised learning algorithms using sequence
and structure
Naïve Bayes Support Vector Machine Random
Forest Decision Tree
Evolutionary-basedmethods using onlysequence
analysis
Ancestral Sequence AGVGD to sea Urchin AGVGD to
Tetraodon SIFT
Rule-based methodsusing sequence andstructure
Rule-based decision tree
33Predicted deleterious mutations provide clueto
previously uncharacterized BRCA1 binding site
34BRCA1 interaction partners
BRCA1
image courtesy of Dr. Alvaro Monteiro
35BRCA1 and CBP/P300 interaction ?
According to Pao et al. two BRCA1 fragments
(1-303 and 1314-1863) interact with residues
451-721 of CBP. This region of CBP contains
coiled-coils and the KIX domain. However, the
best in our model was of BRCA1 BRCTs in complex
with the bromodomain of p300 (known to have
acetyl-transferase activity). A rerun of
PatchDock using the CBP KIX domain (PDB 1kdx)
was done with all the candidate patches on the
BRCTs that are hotspots for transactivation-defici
ent mutants in our assay. The CBP KIX fits well
at the phosphopeptide binding site, although
interestingly, the BRCA1-CBP binding has been
shown to be phosphorylation-independent.
36(No Transcript)
37Computational biology approach to missense mutant
impact can be scaled up to scan an entire genome
- Prioritize variants to be genotyped and tested in
a candidate-gene association study
- Predict causal variant in a region identified by
candidate gene study
- Provide a resource that biochemists can use to
develop hypotheses about molecular mechanisms
underlying the deleterious effects of variants
38A pipeline for large-scale non-synonymous SNP
annotation
dbSNP
UNIPROTProteins
RefSeq mRNA
Predicted Genes
Hsu et al. 2006
Genomic DNA
SNPs
Ryan, Diekhans, Lien, Liu, Karchin
Bioinformatics 2009
39nsSNPs are identified by projection mapping of
proteins to genome
Ryan, Diekhans, Lien, Liu, Karchin
Bioinformatics 2009
40http//ls-snp.icm.jhu.edu/ls-snp-pdb/
41http//ls-snp.icm.jhu.edu/ls-snp-pdb/
42Search criteria Chr 11 q13, moderate/radical
amino acid change,medium/high conservation,
domain interface proximity
- nsSNP (D-gtA) in barrier-to-autointegration factor
(BAF) protein (protects DNA from retrovirus
integration) - D is a critical residue in a network of salt
bridges, which connect two halves of the BAF
homodimer - By decreasing stability of BAF, A could improve
the resistance to viral infection in individuals
with this variant.
Ryan, Diekhans, Lien, Liu, Karchin
Bioinformatics 2009
43Limitations of large-scale classification of
variants
- Black-box (not interpretable)
- Tuned to decisions of the methods designers, not
the users - Selection of important features
- Choice of training and/or validation sets
- Make user-configurable plug-in classifiers ?
44What next?
- Integrating sequence variation information with
measurements of - mRNA transcript abundance
- DNA methylation
- miRNA abundance
- exon expression levels
- copy number variation