Title: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies
1Genetics for EpidemiologistsLecture 5 Analysis
of Genetic Association Studies
National Human Genome Research Institute
U.S. Department of Health and Human
Services National Institutes of Health National
Human Genome Research Institute
National Institutes of Health
Teri A. Manolio, M.D., Ph.D.Director, Office of
Population Genomics and Senior Advisor to the
Director, NHGRI, for Population Genomics
U.S. Department of Health and Human Services
2Topics to be Covered
- Discrete traits and quantitative traits
- Measures of association
- Detecting/correcting for false positives
- Genotyping quality control
- Quantile-quantile (Q-Q) plots
- Odds ratios allelic and genotypic
- Models of genetic transmission
- Interactions gene-gene, gene-environment
3Larson, G. The Complete Far Side. 2003.
4Quantitative Genetics
concerned with the inheritance of those
differences between individuals that are of
degree rather than of kind
Quantitative Qualitative
Falconer and Mackay, Quantitative Genetics 1996.
5Quantitative Genetics
concerned with the inheritance of those
differences between individuals that are of
degree rather than of kind
Quantitative Qualitative
Continuous gradation among individuals from one extreme to other Sharply demarcated types with little connection by intermediates
Falconer and Mackay, Quantitative Genetics 1996.
6Quantitative Genetics
concerned with the inheritance of those
differences between individuals that are of
degree rather than of kind
Quantitative Qualitative
Continuous gradation among individuals from one extreme to other Sharply demarcated types with little connection by intermediates
Effects of genes are small Effects of genes are large
Falconer and Mackay, Quantitative Genetics 1996.
7Quantitative Genetics
concerned with the inheritance of those
differences between individuals that are of
degree rather than of kind
Quantitative Qualitative
Continuous gradation among individuals from one extreme to other Sharply demarcated types with little connection by intermediates
Effects of genes are small Effects of genes are large
Usually many genes Single genes inherited in Mendelian ratios?
Falconer and Mackay, Quantitative Genetics 1996.
8Inheritance Models in Single Gene Trait
9Inheritance Models in Single Gene Trait
Genotype Group Genotype Group Genotype Group
Model AA Aa aa
10Inheritance Models in Single Gene Trait
Genotype Group Genotype Group Genotype Group
Model AA Aa aa
A is Dominant
11Inheritance Models in Single Gene Trait
Genotype Group Genotype Group Genotype Group
Model AA Aa aa
A is Dominant
12Inheritance Models in Single Gene Trait
Genotype Group Genotype Group Genotype Group
Model AA Aa aa
A is Dominant
A is Recessive
13Inheritance Models in Single Gene Trait
Genotype Group Genotype Group Genotype Group
Model AA Aa aa
A is Dominant
A is Recessive
A is Co-Dominant
14Inheritance Models in Quantitative Trait
15Inheritance Models in Quantitative Trait
Population Mean Population Mean Population Mean
Model -x 0 x
16Inheritance Models in Quantitative Trait
Population Mean Population Mean Population Mean
Model -x 0 x
A is Completely Dominant aa AA Aa
17Inheritance Models in Quantitative Trait
Population Mean Population Mean Population Mean
Model -x 0 x
A is Completely Dominant aa AA Aa
A is Partially Dominant aa Aa AA
18Inheritance Models in Quantitative Trait
Population Mean Population Mean Population Mean
Model -x 0 x
A is Completely Dominant aa AA Aa
A is Partially Dominant aa Aa AA
A is Not (Co-) Dominant aa Aa AA
19Inheritance Models in Quantitative Trait
Population Mean Population Mean Population Mean
Model -x 0 x
A is Completely Dominant aa AA Aa
A is Partially Dominant aa Aa AA
A is Not (Co-) Dominant aa Aa AA
A is Over-Dominant aa AA Aa
20Quantitative Traits with Published GWA Studies
(16 - 34)
- QT interval
- Lipids and lipoproteins
- Memory
- Nicotine dependence
- ORMDL3 expression
- YKL-40 levels
- Obesity, BMI, waist
- Insulin resistance
- Height
- Bone mineral density
- F-cell distribution
- Fetal hemoglobin levels
- C-Reactive protein
- 18 groups of Framingham traits
- Pigmentation
- Uric Acid Levels
- Recombination Rate
21Association of Alleles and Genotypes of rs1333049
(3049) with Myocardial Infarction
C N () C N () G N () G N () ?2 (1df) P-value
Cases 2,132 (55.4) 2,132 (55.4) 1,716 (44.6) 1,716 (44.6) 55.1 1.2 x 10-13
Controls 2,783 (47.4) 2,783 (47.4) 3,089 (52.6) 3,089 (52.6) 55.1 1.2 x 10-13
Allelic Odds Ratio 1.38 Allelic Odds Ratio 1.38 Allelic Odds Ratio 1.38 Allelic Odds Ratio 1.38 Allelic Odds Ratio 1.38 Allelic Odds Ratio 1.38 Allelic Odds Ratio 1.38 Allelic Odds Ratio 1.38
Samani N et al, N Engl J Med 2007 357443-453.
22Association of Alleles and Genotypes of rs1333049
(3049) with Myocardial Infarction
C N () C N () G N () G N () ?2 (1df) P-value
Cases 2,132 (55.4) 2,132 (55.4) 1,716 (44.6) 1,716 (44.6) 55.1 1.2 x 10-13
Controls 2,783 (47.4) 2,783 (47.4) 3,089 (52.6) 3,089 (52.6) 55.1 1.2 x 10-13
Allelic Odds Ratio 1.38 Allelic Odds Ratio 1.38 Allelic Odds Ratio 1.38 Allelic Odds Ratio 1.38 Allelic Odds Ratio 1.38 Allelic Odds Ratio 1.38 Allelic Odds Ratio 1.38 Allelic Odds Ratio 1.38
CC N () CG N () CG N () GG N () GG N () ?2 (2df) P-value
Cases 586 (30.5) 960 (49.9) 960 (49.9) 378 (19.6) 378 (19.6) 59.7 1.1 x 10-14
Controls 676 (23.0) 1,431 (48.7) 1,431 (48.7) 829 (28.2) 829 (28.2) 59.7 1.1 x 10-14
Heterozygote Odds Ratio 1.47 Heterozygote Odds Ratio 1.47 Heterozygote Odds Ratio 1.47 Heterozygote Odds Ratio 1.47 Heterozygote Odds Ratio 1.47 Heterozygote Odds Ratio 1.47 Heterozygote Odds Ratio 1.47 Heterozygote Odds Ratio 1.47
Homozygote Odds Ratio 1.90 Homozygote Odds Ratio 1.90 Homozygote Odds Ratio 1.90 Homozygote Odds Ratio 1.90 Homozygote Odds Ratio 1.90 Homozygote Odds Ratio 1.90 Homozygote Odds Ratio 1.90 Homozygote Odds Ratio 1.90
Samani N et al, N Engl J Med 2007 357443-453.
23-Log10 P Values for SNP Associations with
Myocardial Infarction
Samani N et al, N Engl J Med 2007 357443-453.
24Genome-Wide Scan for Type 2 Diabetes in a
Scandinavian Cohort
http//www.broad.mit.edu/diabetes/scandinavs/type2
.html
25GWA Study of Serum Uric Acid Levels
- Linear regression of inverse normalized levels
against number of alleles - Additive model
- Sex, age, age2 as covariates
Li S et al, PLoS Genet 2007 3e194.
26Association of rs6855911 and Uric Acid Levels
Genotype Means (mg/dl) Genotype Means (mg/dl) Genotype Means (mg/dl)
Cohort Additive Effect AA AG GG
SardiNIA -0.317 4.66 (1.51) 4.48 (1.59) 4.02 (1.63)
InCHIANTI -0.397 5.27 (1.44) 4.94 (1.31) 4.33 (1.37)
Li S et al, PLoS Genet 2007 3e194.
27Association Methods for Quantitative Traits
- Linear regression of multivariable adjusted
residual against number of alleles
(Kathiresan,Nat Genet 2008 40189-97) - Linear regression of log transformed or
centralized BMI against genotype (Frayling,
Science 2007 316889-94) - Variance components based Z-score analysis of
quantile normalized height (Sanna, Nat Genet
2008 40198-203)
28Ways of Dealing with Multiple Testing
- Control family wise error rate (FWER) Bonferroni
(a a/n) or Sidák (a 1- 1- a1/n) - False discovery rate proportion of significant
associations that are actually false positives - False positive report probability probability
that the null hypothesis is true, given a
statistically significant finding - Bayes factors analysis avoids need for assessing
genome-wide error rates but must identify
reasonable alternative model
Hogart CJ et al, Genet Epidemiol 2008 32179-85.
29Larson, G. The Complete Far Side. 2003.
30Quality Control of SNP Genotyping Samples
- Identity with forensic markers (Identifiler)
- Blind duplicates
- Gender checks
- Cryptic relatedness or unsuspected twinning
- Degradation/fragmentation
- Call rate (gt 80-90)
- Heterozygosity outliers
- Plate/batch calling effects
Chanock et al, Nature 2007 Manolio et al Nat
Genet 2007
31Quality Control of SNP Genotyping SNPs
- Duplicate concordance (CEPH samples)
- Mendelian errors (typically lt 1)
- Hardy-Weinberg errors (often gt 10-5)
- Heterozygosity (outliers)
- Call rate (typically gt 98)
- Minor allele frequency (often gt 1)
- Validation of most critical results on
independent genotyping platform
Chanock et al, Nature 2007 Manolio et al Nat
Genet 2007
32Hardy-Weinberg Equilibrium
- Occurrence of two alleles of a SNP in the same
individual are two independent events - Ideal conditions
- random mating - no selection (equal
survival) - no migration - no mutation
- no inbreeding - large population sizes
- gene frequencies equal in males and females)
- If alleles A and a of SNP rs1234 have frequencies
p and 1-p, expected frequencies of the three
genotypes are
Freq AA p2
Freq Aa 2p(1-p)
Freq aa (1-p)2
After G. Thomas, NCI
33Coverage, Call Rates, and Concordance of Perlegen
and Affymetrix Platforms on HapMap Phase II
Metric Perlegen Perlegen Affymetrix/Broad Affymetrix/Broad
Number of SNPs 480,744 480,744 439,249 439,249
Coverage Single Marker Multi-Marker Single Marker Multi-Marker
CEU 0.90 0.96 0.78 0.87
CHB JPT 0.87 0.93 0.78 0.86
YRI 0.64 0.78 0.63 0.75
Average call rate 98.9 98.9 99.3 99.3
Concordance
Homozygous genotypes 99.8 99.8 99.9 99.9
Heterozygous genotypes 99.8 99.8 99.8 99.8
GAIN Collaborative Group, Nat Genet 2007
391045-51.
34Sample and SNP QC Metrics for Affymetrix 5.0 and
6.0 Platforms in GAIN
Metric 5.0 fail 6.0 fail
Total Samples 1,829 -- 2,289 --
Passing QC 1,817 0.44 2,192 4.24
gt 98 call rate 1,815 0.55 2,257 1.40
Courtesy, J Paschall, NCBI
35Sample and SNP QC Metrics for Affymetrix 5.0 and
6.0 Platforms in GAIN
Metric 5.0 fail 6.0 fail
Total Samples 1,829 -- 2,289 --
Passing QC 1,817 0.44 2,192 4.24
gt 98 call rate 1,815 0.55 2,257 1.40
Total SNPs 457,645 -- 906,660 --
Passing QC 429,309 6.19 845,814 6.70
MAF gt 1 457,466 0.04 888,234 2.03
gt 98 call rate 419,810 8.27 821,942 9.34
gt 95 call rate 439,272 4.01 873,856 3.61
HWE lt 10 -6 455,899 0.38 904,275 0.26
lt 1 Mendel error 417,722 8.72 899,721 0.01
lt 1 Duplicate error 454,820 0.01 892,103 0.02
Courtesy, J Paschall, NCBI
36Sample Heterozygosity in GAIN
Courtesy, J Paschall, NCBI
37Sample Heterozygosity in GAIN
Courtesy, J Paschall, NCBI
38Signal Intensity Plots for rs10801532 in AREDS
http//www.ncbi.nlm.nih.gov/sites/entrez
39Signal Intensity Plots for rs4639796 in AREDS
http//www.ncbi.nlm.nih.gov/sites/entrez
40Signal Intensity Plots for rs534399 in AREDS
http//www.ncbi.nlm.nih.gov/sites/entrez
41Signal Intensity Plots for rs572515 in AREDS
http//www.ncbi.nlm.nih.gov/sites/entrez
42Signal Intensity Plots for CD44 SNP rs9666607
Clayton DG et al, Nat Genet 2005 371243-1246.
43Principal Component Analysis of Structured
Population First to Third Components
Courtesy, G. Thomas, NCI
44Principal Component Analysis of Structured
Population Fourth and Fifth Components
Courtesy, G. Thomas, NCI
45Influence of Relatedness on Principal Component
Analysis
Courtesy, G. Thomas, NCI
46Principal Component Analysis of Structured
Population Fourth and Fifth Components
Courtesy, G. Thomas, NCI
47Principal Component Analysis of Structured
Population Fourth and Fifth Components
Courtesy, G. Thomas, NCI
48Summary Points Genotyping Quality Control
- Sample checks for identity, gender error, cryptic
relatedness - Sample handling differences can introduce
artifacts but probably can be adjusted for - Association analysis is often quickest way to
find genotyping errors - Low MAF SNPs are most difficult to call
- Inspection of genotyping cluster plots is
crucial!
49Quantile-Quantile Plot for Test Statistics, 390
Breast Cancer Cases, 364 Controls
205,586 SNPs ? 1.03
Easton D et al, Nature 2007 4471087-1093.
50Observed and Expected Associations after Stage 2
of Breast Cancer GWA
Significance Observed Observed Observed Adjusted Observed Adjusted Expected Expected Ratio
0.01 - 0.05 1,239 1,162 934 1.24
10-3 10-2 574 517 348 1.49
10-4 10-3 112 88 53 1.65
10-5 10-4 16 12 7 1.71
lt 10-5 15 13 1 13.5
All p lt 0.05 1,956 1,792 1,343 1.33
Easton D et al, Nature 2007 4471087-93.
51Q-Q Plot for Multiple Sclerosis Effect of MHC
Hafler D et al, N Engl J Med 2007 357851-862.
52Q-Q Plot for Prostate Cancer, all SNPs
Gudmundsson J et al, Nat Genet 2007 39977-983.
53Q-Q Plot for Prostate Cancer, excluding
Chromosome 8
Gudmundsson J et al, Nat Genet 2007 39977-983.
54Q-Q Plot for Myocardial Infarction
0 20 40
60
Observed chi-squared statistic
0 5 10
15 20 25
Expected chi-squared statistic
Samani N et al, N Engl J Med 2007 357443-453.
55-Log10 P Values for SNP Associations with
Myocardial Infarction
Samani N et al, N Engl J Med 2007 357443-453.
56-Log10 P Values for SNP Associations with
Myocardial Infarction
Samani N et al, N Engl J Med 2007 357443-453.
57SNP Associations with 1,928 MI Cases and 2,938
Controls from UK
Samani N et al, N Engl J Med 2007 357443-453.
58Association Signal for Coronary Artery Disease on
Chromosome 9
3049
Samani N et al, N Engl J Med 2007 357443-453.
59Winners Curse Odds Ratios for CHD Associated
with LTA Genotypes in Multiple Studies
Clarke et al, PLoS Genet 2006 2e107.
60Genome-Wide Scan for Alzheimers Disease in 861
Cases and 550 Controls
Reiman E et al, Neuron 2007 54713-20.
61Genome-Wide Scan for Alzheimers Disease in
ApoEe4Carriers
Reiman E et al, Neuron 2007 54713-20.
62LOAD Odds Ratios Associated with rs2373115 GG by
APOEe4 Status
APOEe4 Group APOEe4 OR 95 CI rs2373115 OR 95CI
APOEe4 - 1.12 0.82,1.53
APOEe4 2.88 1.90,4.36
All 6.07 4.63-7.95 1.34 1.06,1.70
Reiman et al, Neuron 2007 54713-720.
63P Values of GWA Scan for Age-Related Macular
Degeneration
Klein et al, Science 2005 308385-389.
64Odds Ratios and Population Attributable Risks for
AMD
Attribute (SNP) rs380390 (C/G) rs1329428 (C/T)
Risk allele C C
Allelic association ?2 P value 4.1 x 108 1.4 x 106
Odds ratio (dominant) 4.6 2.0-11 4.7 1.0-22
Frequency in HapMap CEU 0.70 0.82
Population Attributable Risk 70 42-84 80 0-96
Odds ratio (recessive) 7.4 2.9-19 6.2 2.9-13
Frequency in HapMap CEU 0.23 0.41
Population Attributable Risk 46 31-57 61 43-73
Klein et al, Science 2005 308385-389.
65Risk of Developing AMD by CFH Y402H and
Modifiable Risk Factors
Risk Factor CFH Y402H Genotype CFH Y402H Genotype CFH Y402H Genotype
Risk Factor YY YH HH
BMI lt 30 kg/m2 1.00 1.95 1.42-2.67 3.96 2.69-5.82
BMI gt 30 kg/m2 1.98 0.91-4.31 2.19 1.11-4.30 12.28 4.88-30.90
Non-smoker 1.00 1.95 1.41-2.71 4.23 2.86-6.27
Current smoker 2.34 1.20-4.55 3.20 1.85-5.55 8.69 3.86-19.57
Schaumberg DA et al, Arch Ophthalmol 2007
12555-62.
66Interaction Is LIPC Genotype Related to HDL-C?
CC
TT
CT
CT
TT
CC
Ordovas et al, Circulation 2002 1062315-2321.
67Inverse Relation between Endotoxin Exposure and
Allergic Sensitization by CD14 Genotype
Simpson A et al, Am J Respir Crit Care Med
2006174386-392.
68Challenges in Studying Gene-Environment
Interactions
Challenge Genes Environment
Ease of measure Pretty easy Often hard
Variability over time Low/none High
Recall bias None Possible
Temporal relation to disease Easy Hard
69Larson, G. The Complete Far Side. 2003.
70(No Transcript)