SNP Discovery and Analysis: - PowerPoint PPT Presentation

About This Presentation
Title:

SNP Discovery and Analysis:

Description:

SeattleSNPs PGA: Candidate Gene SNP Resource. 4.9 Mb in 47 individuals = 230 Mb total sequence ... Summary of PGA samples (European, n = 23) Total: 13 SNPs ... – PowerPoint PPT presentation

Number of Views:720
Avg rating:3.0/5.0
Slides: 99
Provided by: markjr
Category:
Tags: snp | analysis | discovery | pga

less

Transcript and Presenter's Notes

Title: SNP Discovery and Analysis:


1
SNP Discovery and Analysis Application to
Association Studies
Mark J. Rieder, PhD Dana Crawford, PhD Deborah
Nickerson, PhD SeattleSNPs PGA July 19-20, 2005
2
Practical Aspects of SNP Association Studies
  • SNP Discovery
  • Where do I find SNPs to use in my association
    studies?
  • (e.g. databases, direct resequencing)
  • SNP Selection
  • How do I choose SNPs that are informative?
  • (i.e. assessing SNP correlation - linkage
    disequilibrium)
  • SNP Associations
  • What analyses can I perform after genotyping
    these SNPs?
  • (e.g. single SNP data, haplotype data)
  • SNP Replication/Function
  • How is function predicted or assessed.
  • (e.g. nonsynonymous SNPs, conserved non-coding
    regions (CNS)
  • transcription factor binding sites, gene
    expression)

3
SeattleSNPs Program for Genomic Applications
Overview
Aim 1 To establish a variation discovery
resource capable of comprehensive
resequencing of candidate genes related to HLBS.

Biological Focus Inflammation Genes
and Pathways Coagulation, Complement, Cytokines
Interacting
Partners
4
SNPs in Candidate Genes
Average Gene Size - 26.5 kb Compare 2 haploid
- 1 in 1,200 bp 130 SNPs (200 bp) -
15,000,000 SNPs 44 SNPs gt 0.05 MAF (600 bp)
- 6,000,000 SNPs
5
SeattleSNPs PGA Candidate Gene SNP Resource
  • 4.9 Mb in 47 individuals 230 Mb total
    sequence
  • Define sequence diversity - catalogue all SNPs
  • Select optimal tagSNPs sets
  • Determine haplotype structure
  • Provide necessary baseline data for association
    studies

6
Warfarin Pharmacogenetics
  • Background
  • Warfarin characteristics
  • Pharmacokinetics/Pharmacodynamics
  • Discovery of VKORC1
  • VKORC1 - SNP Discovery
  • VKORC1 - SNP Selection (tagSNPs)
  • VKORC1 - SNP Testing
  • SNP/Haplotype Inference
  • Haplotype Inference, Testing
  • VKORC1 - SNP Replication/Function

7
Pharmacogenomics as a Model for Association
Studies
Clear genotype-phenotype link intervention
variable response Pharmacokinetics - 5x
variation Quantitative intervention and
response drug dose, response time, metabolism
rate, etc. Target/metabolism of drug generally
known gene target that can be tested directly
with response
Reduce variability and identify outliers.
Prospective testing Personalized
Medicine
8
Warfarin Background
  • Commonly prescribed oral anti-coagulant
  • In 2003, 21.2 million prescriptions were
    written for
  • warfarin (Coumadin?)
  • Prescribed following MI, atrial fibrillation,
    stroke,
  • venous thrombosis, prosthetic heart valve
    replacement,
  • and following major surgery
  • Difficult to determine effective dosage
  • Narrow therapeutic range
  • - Monitoring of prothrombin time (INR) - 2.0 -
    3.0
  • Large inter-individual variation

9
Add warfarin dose distribution
Ave 5.2 mg/d n 186 European-American
30x dose variability
  • Patient/Clinical/Environmental Factors
  • Pharmacokinetic/Pharmacodynamic - Genetic

10
Warfarin inhibits the vitamin K cycle
Vitamin K-dependent clotting factors (FII, FVII,
FIX, FX, Protein C/S/Z)
11
Warfarin Metabolism (Pharmacokinetics)
  • Major pathway for termination of pharmacologic
    effect
  • is through metabolism of S-warfarin in the liver
    by CYP2C9
  • CYP2C9 SNPs alter warfarin metabolism
  • CYP2C91 (WT) - normal
  • CYP2C92 (Arg144Cys) - low/intermediate
  • CYP2C93 (Ile359Leu) - low
  • CYP2C9 alleles occur at a significant minor
    allele frequency
  • European 2 - 10.7 3 - 8.5
  • Asian 2 - 0 3 - 1-2
  • African-American 2 - 2.9 3 - 0.8

12
Effect of CYP2C9 Genotype on Anticoagulation-Relat
ed Outcomes (Higashi et al., JAMA 2002)
WARFARIN MAINTENANCE DOSE
mg warfarin/day
- Variant alleles have significant clinical
impact - Still large variability in warfarin dose
(15-fold) in 1/1 controls?
13
Analysis of Independent Predictors of Warfarin
Dose
Adapted from Gage et al., Thromb Haemost, 2004
Variable Change in Warfarin Dose P
value Target INR, per 0.5 increase 21 lt0.0005
BMI, per SD 14 lt0.0001 Ethnicity
(African-American, Asian) 13, 10-15
0.003 Age, per decade 13 lt0.0001 Gender,
Female 12 lt0.0001 Drugs (Amiodarone) 24
0.007 CYP2C92, per allele 19 lt0.0001 CYP
2C93, per allele 30 lt0.0001
30 of the variability in warfarin dose is
explained by these factors
What other candidate genes are influencing
warfarin dosing?
14
Warfarin acts as a vitamin K antagonist
Pharmacodynamic
CYP2C9
Inactivation
Vitamin K-dependent clotting factors (FII, FVII,
FIX, FX, Protein C/S/Z)
15
New Target Protein for Warfarin
? -Carboxylase (GGCX)
Clotting Factors (FII, FVII, FIX, FX, Protein
C/S/Z)
Rost et al. Li, et al., Nature (2004)
5 kb - chr 16
16
Warfarin Resistance VKORC1 Polymorphisms
Rost, et. al. Nature (2004)
  • Rare non-synonymous mutations in VKORC1
    causative for warfarin resistance (15-35 mg/d)
  • NO non-synonymous mutations found in control
    chromosomes (n 400)

17
Inter-Individual Variability in Warfarin Dose
Genetic Liabilities
SENSITIVITY CYP2C9 coding SNPs - 3/3
RESISTANCE VKORC1 nonsynonymous coding SNPs
Frequency
Common VKORC1 non-coding SNPs?
Warfarin maintenance dose (mg/day)
18
SNP Discovery Resequencing VKORC1
  • PCR amplicons --gt Resequencing of the complete
    genomic region
  • 5 Kb upstream and each of the 3 exons and
    intronic segments 11 Kb
  • SeattleSNPs PGA - pga.gs.washington.edu (24
    African-Am./23 Europeans)
  • Warfarin treated clinical patients (UWMC) 186
    European
  • Other populations 96 European, 96
    African-Am., 120 Asian

19
SNP Discovery Resequencing Results
Summary of PGA samples (European, n 23) Total
13 SNPs identified 10 common/3 rare
(lt5 MAF) Clinical Samples (European patients n
186) Total 28 SNPs identified 10
common/18 rare (lt5 MAF) 15 - intronic/regulatory
7 - promoter SNPs 2 - 3 UTR SNPs 3 - synonymous
SNPs 1 - nonsynonymous - single heterozygous
indiv. - highest warfarin dose 15.5 mg/d
How does the comprehensive SNP discovery compare
to what was known for this gene?
20
SNP Discovery dbSNP database
dbSNP -NCBI SNP database
21
SNP Discovery dbSNP database (VKORC1)
  • SeattleSNPs Resequencing
  • 28 SNPs --gt 15 SNPs gene region
  • 10 dbSNPs
  • 8/10 confirmations
  • 3 frequency/genotype data
  • 7 new dbSNP entries generated
  • by SeattleSNPs resequencing
  • 8 dbSNPs/15 SNPs (50)

22
SNP Discovery dbSNP database
Mar 2005 - 5.0 million (validated - 1/600 bp)
5.0/10.0 50 of all common SNPs (validated)!
23
SNP discovery is dependent on your sample
population size

GTTACGCCAATACAGGATCCAGGAGATTACC GTTACGCCAATACAGCAT
CCAGGAGATTACC
2 chromosomes
24
SNP Discovery dbSNP database
dbSNP (Perlegen/HapMap)
50
Minor Allele Freq. (MAF)
Rarer and population specific SNPs are found by
resequencing
25
dbSNP Increasing numbers of SNPs now have
genotype data
HapMap Phase II Perlegen
Perlegen Data
26
Current State of dbSNP
Many SNPs left to validate and characterize.
27
Development of a genome-wide SNP map How many
SNPs?
Nickerson and Kruglyak, Nature Genetics, 2001
10 million common SNPs (gt1- 5 MAF) - 1/300
bp Mar 2005 - 5.0 million (validated - 1/600 bp)
5.0/10.0 50 of all common SNPs
validated! Coming Soon! 5.0 million validated
SNPs with genotypes!
28
SNP Discovery dbSNP database
dbSNP Issues Not comprehensive catalog (50
of SNPs) Is the data confirmed? (50 are
validated) Information about allele
frequency/population (50) No information about
SNP correlations (linkage disequilibrium) genoty
ping efficiency
29
SNP Selection Using Linkage Disequilibrium
  • Common SNPs
  • VKORC1 - 28 total - 10 SNPs gt 10 MAF
  • Evaluate linkage disequilibrium (non-random
    association of
  • genotype data)

Does common variation in VKORC1 have a role in
determining warfarin dose?
30
SNP Selection Using Linkage Disequilibrium
Site 2
Site 2
Site 1
Site 1
C
A
Maternal
C 50
A 50
T
G
T 50
G 50
Paternal
Possible 2-site comb.
Expected Freq.
Observed Freq.
C A 0.5 X 0.5 0.25
0.50
C G 0.5 X 0.5 0.25
0.01
T A 0.5 X 0.5 0.25
0.01
T G 0.5 X 0.5 0.25
0.48
Sites Correlated
31
SNP Selection Using Linkage Disequilibrium
  • SNP discovery data (i.e. population of samples
    with genotypes)
  • Find all correlated SNPs to minimize the total
    number of SNPs
  • Maintains genetic information (correlations) for
    that locus

LD_Select - SNP tagging/binning algorithm - based
on LD (r2), not haplotypes
Carlson, et al. AJHG (2004)
32
SNP Selection VG/LD_Select on the Web
pga.gs.washington.ed/VG2
33
SNP Selection tagSNP Data
34
SNP Selection VKORC1 tagSNPs
35
SNP Testing VKORC1 tagSNPs
  • Five Bins to Test
  • 381, 3673, 6484, 6853, 7566
  • 2653, 6009
  • 861
  • 5808
  • 9041

Bin 1 - p lt 0.001
Bin 2 - p lt 0.02 Bin 3 - p lt 0.01 Bin 4 - p lt
0.001 Bin 5 - p lt 0.001
SNP x SNP interactions - haplotype analysis?
36
VKORC1 Summary SNP Discovery/SNP Selection
  • VKORC1 candidate gene for warfarin dose response
  • SNP discovery performed using PCR/resequencing to
  • catalog common SNPs
  • 28 SNPs found
  • 10 common SNPs
  • SNP discovery using dbSNP
  • 8/10 dbSNPs confirmed
  • 7 new SNPs added
  • SNP Selection using linkage disequilibrium
  • 10 common SNPs (gt 10 MAF)
  • 5 informative SNPs for genotyping

37
Haplotypes in Genetic Association Studies
Two main approaches with haplotypes
38
Haplotypes in Genetic Association Studies
  • How can you get haplotypes?
  • What information do you get from haplotypes?
  • How do you use haplotypes to find tagSNPs?
  • How do you use haplotypes to test for
  • associations?

39
Haplotypes The Definition
a unique combination of genetic markers present
in a chromosome. pg 57 in Hartl Clark, 1997
40
Constructing Haplotypes
41
Constructing Haplotypes
Examples of Haplotype Inference Software EM
Algorithm Haploview http//www.broad.mit.edu/mpg/
haploview/index.php Arlequin http//lgb.unige.ch/
arlequin/ PHASE v2.1 http//www.stat.washington.e
du/stephens/software.html HAPLOTYPER http//www.p
eople.fas.harvard.edu/junliu/Haplo/docMain.htm
42
Haplotypes in SeattleSNPs
  • gt200 genes re-sequenced in inflammation
    response
  • 2 populations European- and African-Americans
  • PHASEv2.0 results posted on website
  • Interactive tool (VH1) to visualize and sort
    haplotypes

http//pga.gs.washington.edu
43
Haplotypes in SeattleSNPs
44
Haplotypes in SeattleSNPs
45
Haplotypes in SeattleSNPs
46
Haplotypes in SeattleSNPs
47
Haplotypes in SeattleSNPs
48
Haplotypes in SeattleSNPs
49
Haplotypes in SeattleSNPs
50
Haplotypes in SeattleSNPs
51
Haplotypes in SeattleSNPs
52
Haplotypes in SeattleSNPs
53
Haplotypes in SeattleSNPs
54
Haplotypes in Genetic Association Studies
Two main approaches with haplotypes
Haplotypes Pick tagSNPs Genotype
samples
Pick tagSNPs Infer haplotypes Test for
association
55
Measuring Pair-wise SNP Correlations
  • SNP correlation described by linkage
    disequilibrium (LD)
  • Pair-wise measures of LD D and r2
  • D pAB - pApB D D/Dmax
    Recombination
  • r2 D2
  • f(A1)f(A2)f(B1)f(B2) Power

56
Using LD and Haplotypes to Pick tagSNPs
  • r2 is inversely related to power
  • 1/r2
  • 1,000 cases 1,250 cases
  • 1,000 controls r21.0 1,250 controls r2 0.80
  • D is related to recombination history
  • D 1 no recombination
  • D lt 1 historical recombination

Example LDSelect
Example Haplotype blocks
57
Haplotype Blocks
Daly et al Nat. Genet. (2001)
Daly et al 2001
58
Block Definitions
Daly et al Nat. Genet. (2001)
Daly et al 2001
D Gabriel et al Science (2002)
59
Block Definitions
60
Haplotype Blocks and tagSNPs
  • Identifying blocks and tagSNPs
  • Manually
  • Algorithms
  • Haploview

61
Haplotype Blocks and tagSNPs
IL1B 19 SNPs (MAF gt5) 4 common haplotypes
62
Haplotype Blocks and tagSNPs
  • Identifying blocks and tagSNPs
  • Manually
  • Algorithms
  • HaploView

63
(No Transcript)
64
LD and tagSNPs using Haploview
VKORC1 European-Americans PHASEv2.1 data
65
(No Transcript)
66
(No Transcript)
67
(No Transcript)
68
Minimal set of tagSNPs based on r2
69
(No Transcript)
70
Where to Find Tagging Software
HaploBlockFinder
http//cgi.uc.edu/cgi-bin/kzhang/haploBlockFinder.
cgi
http//www.broad.mit.edu/personal/jcbarret/haplo/
Haploview
http//droog.gs.washington.edu/ldSelect.html
LDSelect
http//www.well.ox.ac.uk/xiayi/haplotype/index.ht
ml
SNPtagger
TagIT
http//popgen.biol.ucl.ac.uk/software.html
http//www-rcf.usc.edu/stram/tagSNPs.html
tagSNPs
71
Haplotypes, TagSNPs, and Caveats
  • Haplotypes are inferred
  • Block-like structure assumed for some software
  • Different block definitions
  • Block boundaries sensitive to marker density
  • Genotype savings may not be great (recombination)

72
Haplotypes in Genetic Association Studies
Two main approaches with haplotypes
Haplotypes Pick tagSNPs Genotype samples
73
Multi-SNP testing Haplotypes
Five tagSNPs (10 total SNPs)
9 haplotypes/5 common (gt5)
74
Multi-SNP testing Haplotypes
Test for association between haplotype and
warfarin dose using multiple linear regression
Adjusted for all significant covariates age,
sex, amiodarone, CYP2C9 genotype
75
Multi-SNP testing Haplotypes
Explore the evolutionary relationship across
haplotypes
5808
(381, 3673, 6484, 6853, 7566)
CCGATCTCTG-H1
A
CCGAGCTCTG-H2
861
TCGGTCCGCA-H7
TAGGTCCGCA-H8
B
9041
TACGTTCGCG-H9
VKORC1 haplotypes cluster into divergent clades
Patients can be assigned a clade diplotype e.g.
Patient 1 - H1/H2 A/A Patient 2 - H1/H7
A/B Patient 3 - H7/H9 B/B
76
VKORC1 clade diplotypes show a strong association
with warfarin dose
Low
High
77
Multi-SNP testing Haplotypes
  • European - mean 5 mg/d
  • African-American - higher 6.0-7.0 mg/d
  • Asian - lower 3.0-3.5 mg/d
  • Hypothesis VKORC1 haplotypes contribute to
    racial
  • variability in warfarin dosing.
  • Control populations 120 Europeans
  • 96 African-Americans
  • 120 Asian

78
Multi-SNP testing Haplotypes
Explore the evolutionary relationship across
populations
Clade A Low Clade B High
79
Common Errors in Association Studies Bell and
Cardon (2001)
  • Small sample size
  • Subgroup analysis and multiple testing
  • Random error
  • Poorly matched control group
  • Failure to attempt study replication
  • Failure to detect LD with adjacent loci
  • Overinterpreting results and positive
    publication bias
  • Unwarranted candidate gene declaration after
    identifying
  • association in arbitrary genetic region

80
SNP Replication VKORC1
Univ. of Washington n 185
21 variance in dose explained
81
SNP Function VKORC1 Expression
mechanism
No nonsynonymous SNPs
- mRNA expression in human liver cell lines
82
SNP Function VKORC1 Expression
Expression in human liver tissue (n 53) shows a
graded change in expression.
83
VKORC1 SNP alters liver-specific binding site
84
SNP Discovery and Analysis Application to
Association Studies Summary
  • Databases and resources available for SNP
    discovery
  • Software for tagSNP selection available
  • Both single and multi-SNP analysis are useful
  • Replication required by several journals

85
SeattleSNPs Genotyping Service
  • Free genotyping (BeadArray or SNPlex)
  • Emphasis on young investigators
  • Research related to heart, lung, blood, or sleep
    disorders
  • Moderate to large population samples
  • Apply at pga.gs.washington.edu
  • Due October 15th, 2005

86
SNP
Typing Formats
Scale
Microtiter
Plates -
Fluorescence
Low
eg. Taqman - Good for a few markers - lots of
samples - PCR prior to genotyping
Medium
eg. SNPlex - Intermediate Multiplexing
reduces costs - Genotype directly on genomic
DNA - new paradigm for high throughput
High
eg. Illumina, ParAllele, Affymetrics - Highly
multiplexed - 1,500 SNPs and beyond (500K)
87
Taqman
Genotyping with fluorescence-based homogenous
assays (single-tube assay)
1 SNP/ tube
88
SNP
Typing Formats
Scale
Microtiter
Plates -
Fluorescence
Low
eg. Taqman - Good for a few markers - lots of
samples - PCR prior to genotyping
Medium
eg. SNPlex - Intermediate Multiplexing
reduces costs - Genotype directly on genomic
DNA - new paradigm for high throughput
High
eg. Illumina, ParAllele, Affymetrics - Highly
multiplexed - 1,500 SNPs and beyond (500K)
89
Technological Leap - No advance PCR Universal
PCR after preparing multiple regions for analysis
- Several based on primer specific on genomic
DNA followed by PCR of the ligated products -
different strategies and different
readouts. SNPlex, Illumina, Parallele Also,
reduced representation - Affymetrix - cut with
restriction enzyme, then ligate linkers and
amplify from linkers and follow by
chip hybridization to read out.
90
Detection
9. Characterize on Capillary Sequencer
SNP 1
SNP 2
91
SNP
Typing Formats
Scale
Microtiter
Plates -
Fluorescence
Low
eg. Taqman - Good for a few markers - lots of
samples - PCR prior to genotyping
Medium
eg. SNPlex - Intermediate Multiplexing
reduces costs - Genotype directly on genomic
DNA - new paradigm for high throughput
High
eg. Illumina, ParAllele, Affymetrics - Highly
multiplexed - 1,500 SNPs and beyond (500K)
92
Genotyping
- Universal Tag Readouts
Multiplexed
G
A
C
T
L
o
c
u
s

2

S
p
e
c
i
f
i
c

S
e
q
u
e
n
c
e
L
o
c
u
s

1

S
p
e
c
i
f
i
c

S
e
q
u
e
n
c
e
c
T
a
g
1

s
e
q
u
e
n
c
e
T
a
g
1

s
e
q
u
e
n
c
e
c
T
a
g
2

s
e
q
u
e
n
c
e
T
a
g
2

s
e
q
u
e
n
c
e


S
u
b
s
t
r
a
t
e


S
u
b
s
t
r
a
t
e
B
e
a
d

o
r

C
h
i
p
B
e
a
d

o
r

C
h
i
p
C
h
i
p

A
r
r
a
y
B
e
a
d

A
r
r
a
y
T
a
g

1
T
a
g

2
T
a
g

3
T
a
g

4
ParAllele Affymetrics
Multiplex 1,000 SNPs Not dependent on primary
PCR
Illumina
93
Illumina Platform
  • 96 Multi-array Matrix matches standard microtiter
    plates
  • 1,500 SNPs typed per matrix for 96 samples

94
Affymetrixs 100K Chip
Optimized for 250-2000bp
http//www.affymetrix.com/products/arrays/specific
/100k.affx
95
High Throughput Chip Formats
96
Defining the scale of the genotyping project is
key to selecting an approach 5 to 10 SNPs in a
candidate gene - Many approaches (expensive
0.60 per SNP/genotype) 48 ( to 96) SNPs in a
handful of candidate genes ( 0.25 per
SNP/genotype) 384 0 1,536 SNPs (0.15 - 0.08
per SNP/genotype) 10,000 cSNPs - defined
format (0.05 per SNP/genotype) 100,000 Genic
SNPs - defined format (0.005 per
SNP/genotype 500,000 SNPs defined format (0.004
per SNP/ genotype)
1000 individuals 6,000 12,000 57,600-122,8
80 500,000 500,000 2,000,000
97
Acknowledgements
Allan Rettie, Medicinal Chemistry Alex
Reiner Dave Veenstra Dave Blough Ken
Thummel Noel Hastings Maggie Ahearn Josh
Smith Chris Baier Peggy Dyer-Robertson
Washington University Brian Gage Howard
McLeod Charles Eby Joyce You - Hong Kong
98
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com