Title: Genetics for Epidemiologists Lecture 2: Measurement of Genetic Exposures
1Genetics for EpidemiologistsLecture 2
Measurement of Genetic Exposures
National Human Genome Research Institute
U.S. Department of Health and Human
Services National Institutes of Health National
Human Genome Research Institute
National Institutes of Health
Teri A. Manolio, M.D., Ph.D.Director, Office of
Population Genomics and Senior Advisor to the
Director, NHGRI, for Population Genomics
U.S. Department of Health and Human Services
2Topics to be Covered
- Measuring genetic variation
- Blood group markers
- Restriction-fragment length polymorphisms
- Variable number of tandem repeats (VNTRs,
minisatellites and microsatellites) - Single nucleotide polymorphisms (SNPs)
- Linkage disequilibrium (LD)
- Familial resemblance and family history
3Larson, G. The Complete Far Side. 2003.
4Measuring Genetic Variation Blood Group and
Enzymatic Markers
- RBC COMT activity measured in 5 large families
with hypertension (total 518 individuals) - Associations tested with 25 genetic markers ABO,
Rh, K, MNS, P, Fy, Jk, PGD, ADA, ACP1, PGM1, HBB,
GPT, C3, HPA, TF, GC, OR, GM, KM, BF, ESD, GLO1,
Le - Lod score of 1.27 and estimated recombination
fraction of 0.1 found for phosphogluconate
dehydrogenase (PGD)
Am J Med Genet 1984 19525-32.
5Restriction Fragment Length Polymorphisms (RFLPs)
- Define polymorphic marker loci that can be
detected as differences in length of DNA
fragments after digestion with DNA
sequence-specific endonucleases - Establish linkage relationships using pedigree
analysis
Am J Hum Genet 1980 32314-331.
6Restriction Fragment Length Polymorphisms (RFLPs)
- Since the RFLPs are being used simply as genetic
markers, any trait segregating in a pedigree can
be mapped. Such a procedure would not require
any knowledge of the biochemical nature of the
trait or of the nature of the alterations in the
DNA responsible for the trait.
Am J Hum Genet 1980 32314-331.
7RFLPs Used to Map Neurofibromatosis
- Linkage analysis of 15 Utah kindreds showed
that a gene responsible for von Recklinghausen
neurofibromatosis (NF) is located near the
centromere on chromosome 17
Science 1987 2361100-1102.
8RFLPs Used to Map Neurofibromatosis
- Cosegration of NF with the A2 (1.9 kb) allele
and not A1 (2.4kb) in each of four affected
offspring.
Science 1987 2361100-1102.
9Variable Numbers of Tandem Repeats (VNTRs)
Minisatellites
- Repetition in tandem of a short (6- to 100-bp)
motif spanning 0.5 kb to several kb - Opened the way to DNA fingerprinting for
individual identification - Provided the first highly polymorphic,
multiallelic markers for linkage studies - Associated with many interesting features of
human genome biology and evolution - Well-known minisatellite is 5.5kb, kringle IV
repeat in apolipoprotein(a) and plasminogen
Vernaud G and Denoued F, Genome Res 2000
10899-907.
10Kringle-IV Encoding Sequences of Human apo(a)
cDNA ApoA1 Alleles
Lackner et al, Hum Mol Genet 1993 2933-40.
11Correlations of ApoA Molecular Weight with Lp(a)
Levels and Number of Kringle-IV Repeats
Gavish et al, J Clin Invest 1989 842021-27.
12Simple Sequence Repeats (also VNTRs)
Microsatellites
Repetition in tandem of a short (2- to 6-bp)
motif from 5-5,000 times
- Most are di-, tri-, and tetra-nucleotide repeats
repeated 20-50 times - Most are highly polymorphic making them
enormously useful for mapping and linkage - Marshfield and similar maps placed 400
microsatellites across genome, provided primers
for analysis - Could be highly automated NHLBI and CIDR
large-scale genotyping services
13Multipoint LOD Scores for Long-term SBP and DBP
on Chromosome 17
Levy et al, Hypertension 200036477-483.
14Larson, G. The Complete Far Side. 2003.
15Single Nucleotide Polymorphisms (SNPs)
GAAATAATTAATGTTTTCCTTCCTTCTCCTATTTTGTCCTTTACTTCAA
TTTATTTATTTATTATTAATATTATTATTTTTTGAGACGGAGTTTC/ACT
CTTGTTGCCAACCTGGAGTGCAGTGGCGTGATCTCAGCTCACTGCACACT
CCGCTTTCCTGGTTTCAAGCGATTCTCCTGCCTCAGCCTCCTGAGTAGCT
GGGACTACAGTCACACACCACCACGCCCGGCTAATTTTTGTATTTTTAGT
AGAGTTGGGGTTTCACCATGTTGGCCAGACTGGTCTCGAACTCCTGACCT
TGTGATCCGCCAGCCTCTGCCTCCCAAAGAGCTGGGATTACAGGCGTGAG
CCACCGCGCTCGGCCCTTTGCATCAATTTCTACAGCTTGTTTTCTTTGCC
TGGACTTTACAAGTCTTACCTTGTTCTGCC/TTCAGATATTTGTGTGGTC
TCATTCTGGTGTGCCAGTAGCTAAAAATCCATGATTTGCTCTCATCCCAC
TCCTGTTGTTCATCTCCTCTTATCTGGGGTCACA/CTATCTCTTCGTGAT
TGCATTCTGATCCCCAGTACTTAGCATGTGCGTAACAACTCTGCCTCTGC
TTTCCCAGGCTGTTGATGGGGTGCTGTTCATGCCTCAGAAAAATGCATTG
TAAGTTAAATTATTAAAGATTTTAAATATAGGAAAAAAGTAAGCAAACAT
AAGGAACAAAAAGGAAAGAACATGTATTCTAATCCATTATTTATTATACA
ATTAAGAAATTTGGAAACTTTAGATTACACTGCTTTTAGAGATGGAGATG
TAGTAAGTCTTTTACTCTTTACAAAATACATGTGTTAGCAATTTTGGGAA
GAATAGTAACTCACCCGAACAGTG/TAATGTGAATATGTCACTTACTAGA
GGAAAGAAGGCACTTGAAAAACATCTCTAAACCGTATAAAAACAATTACA
TCATAATGATGAAAACCCAAGGAATTTTTTTAGAAAACATTACCAGGGCT
AATAACAAAGTAGAGCCACATGTCATTTATCTTCCCTTTGTGTCTGTGTG
AGAATTCTAGAGTTATATTTGTACATAGCATGGAAAAATGAGAGGCTAGT
TTATCAACTAGTTCATTTTTAAAAGTCTAACACATCCTAGGTATAGGTGA
ACTGTCCTCCTGCCAATGTATTGCACATTTGTGCCCAGATCCAGCATAGG
GTATGTTTGCCATTTACAAACGTTTATGTCTTAAGAGAGGAAATATGAAG
AGCAAAACAGTGCATGCTGGAGAGAGAAAGCTGATACAAATATAAAT/GA
AACAATAATTGGAAAAATTGAGAAACTACTCATTTTCTAAATTACTCATG
TATTTTCCTAGAATTTAAGTCTTTTAATTTTTGATAAATCCCAATGTGAG
ACAAGATAAGTATTAGTGATGGTATGAGTAATTAATATCTGTTATATAAT
ATTCATTTTCATAGTGGAAGAAATAAAATAAAGGTTGTGATGATTGTTGA
TTATTTTTTCTAGAGGGGTTGTCAGGGAAAGAAATTGCTTTTT SNPs
1 / 300 bases 10 million across genome
16Mapping the Relationships Among SNPs
Christensen and Murray, N Engl J Med 2007
3561094-1097.
17Chromosome 9p21 Region Associated with MI
Samani N et al, N Engl J Med 2007 357443-453.
18Distances Among East Coast Cities
Boston Provi-dence New York Phila-delphia Balti-more
Providence 59
New York 210 152
Philadelphia 320 237 86
Baltimore 430 325 173 87
Washington 450 358 206 120 34
19Distances Among East Coast Cities
Boston Provi-dence New York Phila-delphia Balti-more
Providence 59
New York 210 152
Philadelphia 320 237 86
Baltimore 430 325 173 87
Washington 450 358 206 120 34
lt 100 101-200 201-300 301-400 gt 400
20Distances Among East Coast Cities
Boston Provi-dence New York Phila-delphia Balti-more
Providence 59
New York 210 152
Philadelphia 320 237 86
Baltimore 430 325 173 87
Washington 450 358 206 120 34
lt 100 101-200 201-300 301-400 gt 400
21Distances Among East Coast Cities
22Distances Among East Coast Cities
Boston Provi-dence New York Phila-delphia Balti-more Wash-ington
23One Tag SNP May Serve as Proxy for Many
Block 1
Block 2
SNP4 ?
SNP3 ?
SNP5 ?
SNP6 ?
SNP7 ?
SNP8 ?
SNP2 ?
SNP1 ?
-
- CAGATCGCTGGATGAATCGCATCTGTAAGCAT
- CGGATTGCTGCATGGATCGCATCTGTAAGCAC
- CAGATCGCTGGATGAATCGCATCTGTAAGCAT
- CAGATCGCTGGATGAATCCCATCAGTACGCAT
- CGGATTGCTGCATGGATCCCATCAGTACGCAT
- CGGATTGCTGCATGGATCCCATCAGTACGCAC
-
24One Tag SNP May Serve as Proxy for Many
Block 1
Block 2
SNP4 ?
SNP3 ?
SNP5 ?
SNP6 ?
SNP7 ?
SNP8 ?
SNP2 ?
SNP1 ?
-
- CAGATCGCTGGATGAATCGCATCTGTAAGCAT
- CGGATTGCTGCATGGATCGCATCTGTAAGCAC
- CAGATCGCTGGATGAATCGCATCTGTAAGCAT
- CAGATCGCTGGATGAATCCCATCAGTACGCAT
- CGGATTGCTGCATGGATCCCATCAGTACGCAT
- CGGATTGCTGCATGGATCCCATCAGTACGCAC
25One Tag SNP May Serve as Proxy for Many
Block 1
Block 2
SNP3 ?
SNP5 ?
SNP6 ?
SNP7 ?
SNP8 ?
-
- CAGATCGCTGGATGAATCGCATCTGTAAGCAT
- CGGATTGCTGCATGGATCGCATCTGTAAGCAC
- CAGATCGCTGGATGAATCGCATCTGTAAGCAT
- CAGATCGCTGGATGAATCCCATCAGTACGCAT
- CGGATTGCTGCATGGATCCCATCAGTACGCAT
- CGGATTGCTGCATGGATCCCATCAGTACGCAC
26One Tag SNP May Serve as Proxy for Many
Block 1
Block 2
SNP3 ?
SNP6 ?
SNP8 ?
-
- CAGATCGCTGGATGAATCGCATCTGTAAGCAT
- CGGATTGCTGCATGGATCGCATCTGTAAGCAC
- CAGATCGCTGGATGAATCGCATCTGTAAGCAT
- CAGATCGCTGGATGAATCCCATCAGTACGCAT
- CGGATTGCTGCATGGATCCCATCAGTACGCAT
- CGGATTGCTGCATGGATCCCATCAGTACGCAC
27One Tag SNP May Serve as Proxy for Many
Block 1
Block 2
Frequency
Singleton
-
- GTT 35
- CTC 30
- GTT 10
- GAT 8
- CAT 7
- CAC 6
- other haplotypes 4
28Pair-Wise Linkage Disequilibrium (LD) Measures
Name Symbol Definition
"Lewontin's D" D pABpab pAbpaB
"D prime" D' D / max (D)
Correlation ("r-squared") r2 D2 / pApapBpb
For a discussion and comparison of these LD
measures, see Devlin B, Risch N, Genomics 1995
29311-22.
Courtesy K. Jacobs, NCI
29Two Measures of LD D' and r2
- D' varies from 0 (complete equilibrium) to 1
(complete disequilibrium) - When D' 0, typing one SNP provides no
information on the other SNP - D' does not adequately account for allele
frequencies r2 is correlation between SNPs, is
preferred measure - When r2 1, two SNPs are in perfect LD allele
frequencies are identical for both SNPs, and
typing one SNP provides complete information on
the other
30What can LD do for me?
- Knowledge of patterns of LD can be quite useful
in the design and analysis of genetic data - Design
- Estimation of theoretical power to detect
associations - Evaluation of degree of completeness of sampling
of genetic variants - Choice of most informative genetic variants to
genotype - Sample size increases by 1/r2 to achieve same
power to detect association with SNP2 as SNP1
Courtesy K. Jacobs, NCI
31Association Signal for Coronary Artery Disease on
Chromosome 9
Samani N et al, N Engl J Med 2007 357443-453.
32Region of Chromosome 1 Showing Strong Association
with Inflammatory Bowel Disease
Duerr R et al. Science 2006 3141461-63.
33LD Patterns in TCF7L2 Association Region
Grant et al, Nat Genet 2006 38320-23.
34LD in Three HapMap Populations
International HapMap Consortium, Nature 2005
4371299-1320.
35A HapMap for More Efficient Association Studies
Goals
- Use just the density of SNPs needed to find
associations between SNPs and diseases - Do not miss chromosomal regions with disease
association - Produce a tool to assist in finding genes
affecting health and disease - Ancestral populations differ in their degree of
LD recent African ancestry populations are older
and have shorter stretches of LD, need more SNPs
for complete genome coverage
36SNPs as Gateway to Genome-Wide Association (GWA)
Studies
- SNPs much more numerous than other markers and
easier to assay - Genome-wide studies attempt to capture majority
of genomic variation (10M SNPs!) - Variation inherited in groups, or blocks, so not
all 10 million points have to be tested - Blocks are shorter (so need to test more points)
the less closely people are related - SNP technology allows studies in unrelated
persons, assuming 5kb 10kb lengths in common
(300,000 1,000,000 markers)
37www.hapmap.org
International HapMap Consortium, Nature 2005
4371299-1320.
38www.hapmap.org
International HapMap Consortium, Nature 2007
449851-861.
39Progress in Genotyping Technology
102
ABI TaqMan
ABI SNPlex
10
Cost per genotype (Cents, USD)
Illumina Golden Gate
Affymetrix MegAllele
Affymetrix 10K
Illumina Infinium/Sentrix
Perlegen
1
Affymetrix 100K/500K
Nb of SNPs
1
10
102
103
104
105
106
2001
2005
Courtesy S. Chanock, NCI
40Continued Progress in Genotyping Technology
Affymetrix 500K
Illumina 550K
Illumina 650Y
Illumina 317K
Cost per person (USD)
July 2005
Oct 2006
Courtesy S. Gabriel, Broad/MIT
41Cost of a Genome-Wide Association Study in 2,000
People
Year Number of SNPs Cost/SNP Cost/Study
42Cost of a Genome-Wide Association Study in 2,000
People
Year Number of SNPs Cost/SNP Cost/Study
2001
43Cost of a Genome-Wide Association Study in 2,000
People
Year Number of SNPs Cost/SNP Cost/Study
2001 10,000,000
44Cost of a Genome-Wide Association Study in 2,000
People
Year Number of SNPs Cost/SNP Cost/Study
2001 10,000,000 1.00
45Cost of a Genome-Wide Association Study in 2,000
People
Year Number of SNPs Cost/SNP Cost/Study
2001 10,000,000 1.00 20 billion
46Cost of a Genome-Wide Association Study in 2,000
People
Year Number of SNPs Cost/SNP Cost/Study
2001 10,000,000 1.00 20 billion
2008
47Cost of a Genome-Wide Association Study in 2,000
People
Year Number of SNPs Cost/SNP Cost/Study
2001 10,000,000 1.00 20 billion
2008 1,000,000
48Cost of a Genome-Wide Association Study in 2,000
People
Year Number of SNPs Cost/SNP Cost/Study
2001 10,000,000 1.00 20 billion
2008 1,000,000 0.05
49Cost of a Genome-Wide Association Study in 2,000
People
Year Number of SNPs Cost/SNP Cost/Study
2001 10,000,000 1.00 20 billion
2008 1,000,000 0.05 1 million
50Coverage ( SNPs tagged at r2 gt 0.8) of
Commercial Genotyping Platforms
HapMap population sample HapMap population sample HapMap population sample
Platform YRI CEU CHBJPT
Affymetrix GeneChip 500K 46 68 67
Affymetrix SNP Array 6.0 66 82 81
Illumina HumanHap300 33 77 63
Illumina HumanHap550 55 88 83
Illumina HumanHap650Y 66 89 84
Perlegen 600K 47 92 84
Manolio et al, J Clin Invest 2008 1181590-605.
51Following the Polymorphism Literature
- Sometimes named for
- amino acid change (AGT M235T)
- nucleotide sequence (AGTR1 A1166C)
- promoter (AGT -6 G/A)
- restriction enzyme site (XbaI, PvuII, HindIII)
- gene product (APOEe2)
- legacy system (DRB10104)
- reference SNP (rs709932) or submitted SNP
(ss1487247) - Good sources for information OMIM, HUGO, dbSNP,
UCSC Genome Browser
Courtesy S. Chanock, NCI
52Other Genomic Technologies
- Sequencing measure variation at every point in
gene or candidate region in dozens to hundreds of
people to find functional variants - Gene expression measure changes in mRNA
(transcribed) in cases and controls or in
response to stimulation - Epigenetics measure DNA methylation or histone
deacetylation that turns genes on and off
53Sidney Harris, http//www.sciencecartoonsplus.com/
gallery.htm.
54Summary Points Genotyping Methods
- Unbelievably rapid progress from small number of
blood group markers to gt10M SNPs, CNVs,
structural variants, sequence variants - Technology will continue to change and will be
challenge to keep up with difficult to know when
ready to apply to population studies - SNPs are currently the dominant technology (more
to come in Lecture 4) - Quality control is a major issue
55Familial Resemblance?
http//en.wikipedia.org/wiki/ImageKennedy_bros.jp
gfile
56Evidence for Genetic Influence on Disease or
Trait from Family Data
- Familial resemblance trait more similar among
related than unrelated persons - Familial clustering risk of disease in relative
of case gt risk in relative of non-case or of
general population (sibling relative risk,
Risch's ?S) - Distributions of continuous trait mixtures of
distributions or commingling analysis
57Sibling Relative Risk of Living to Age
90Centenarians vs. Those Dying at Age 73
Perls TT et al, Lancet 1998 3511560.
58Large Representative Pedigree Showing 69 Patients
with Atrial Fibrillation
Arnar et al, Europ Heart J 2006 27708-12.
59Strength of Extensive Genealogies
- Common diseases do not show Mendelian inheritance
patterns - Affected siblings infrequent in common diseases,
but many patients may have more distant relatives
with same disease
Degree of Relatives Risk Ratio 95 CI P-Value
1 1.77 1.67,1.88 lt 0.001
2 1.36 1.27,1.44 lt 0.001
3 1.18 1.14,1.23 lt 0.001
4 1.10 1.06,1.13 lt 0.001
5 1.05 1.02,1.07 lt 0.001
Arnar et al, Europ Heart J 2006 27708-12.
60Familial Correlations
- Phenotypic resemblance among relatives estimated
by regression of one relatives value
(offspring), on that of another (parent) - Yo µ ß (Ym Yf )/2 e
- Twice parent-offspring correlation is estimate of
heritability - If trait under genetic control, expect trait
correlations among closer relatives to be greater
than those among more distant relatives
61Familial Correlations of Sex-Specific LV Mass,
Multiply-Adjusted
Relative Pair Pairs (n) Correlation Expected
Spouse 855 0.05 0
Parent-offspring 662 0.15 0.5
Sibling 1,486 0.16 0.5
Avuncular 369 0.06 0.25
after Post W et al, Hypertension 1997
301025-1028.
62Assessing Familial and Genetic Nature of a
Phenotypic Trait Heritability
- Often designated as H, h2, or s2G /s2P
- Proportion of total inter-individual variation in
the trait (s2P) or phenotypic variation,
attributable to genetic variation (s2G) - Population- and environment-specific parameter
- Its value, high or low, does not indicate role of
genes in any specific individual - Does allow one to predict expected degree of
familial aggregation of a trait - Traits with high heritability should prove
fruitful in identifying trait-related genes
63Genetic Basis of Familial Clustering of Plasma
ACE Activity
Relative N Mean (u/L) Major Gene Effect Major Gene Effect
Relative N Mean (u/L) Mean (u/L) Variance
Fathers 87 34.1 4.8 29
Mothers 87 30.7 4.0 29
Siblings 169 43.1 10.8 75
Cambien F, et. al., Am J Hum Genet 1988
43774-780.
64Estimated Heritability Explained by GWA Findings
to Date
Estimated GWA s2G Estimated Total s2G Reference
Height 3 90 Weedon Nat Genet 2008
T2DM ?s 1.07 ?s 3.5 Zeggini/Scott Science 2007
CRP ? 10.5 30-50 Reiner/Ridker Nat Genet 2008
Psoriasis 9 _at_ 1.3 OR ?s 4-11 Liu PLoS Genet 2008
NHGRI GWA Catalog, www.genome.gov/GWAstudies
65Hardy-Weinberg Equilibrium
- Occurrence of two alleles of a SNP in the same
individual are two independent events - Ideal conditions
- random mating - no selection (equal
survival) - no migration - no mutation
- no inbreeding - large population sizes
- gene frequencies equal in males and females)
- If alleles A and a of SNP rs1234 have frequencies
p and 1-p, expected frequencies of the three
genotypes are
Freq AA p2
Freq Aa 2p(1-p)
Freq aa (1-p)2
After G. Thomas, NCI
66Summary Points Familial Clustering
- Indicator of possible genetic influence
- May over-estimate genetic component due to poor
assessment and adjustment for shared environment - Methods include twin studies, parent-offspring
correlation, relative relative risk, variance
explained - Current genes for complex disease explain only
tiny fraction of total heritability
67Larson, G. The Complete Far Side. 2003.
68(No Transcript)
69Basic Definitions Loci, Genes, Alleles
- Locus Place on a chromosome where a specific
gene or set of markers resides - Quantitative trait locus (QTL) a genetic factor
believed to influence a quantitative trait such
as blood pressure, lipoprotein levels, etc. - Gene Contiguous piece of DNA that can contain
information to make or modify expression of
specific protein(s) - Allele A variant form of a DNA sequence at a
particular locus on a chromosome - Candidate gene Gene believed to influence
expression of complex phenotypes due to known
biologic properties of their products
After S. Chanock, NCI
70Basic Definitions Parts of a Gene
- Exon a DNA sequence that usually specifies the
sequence of amino acids in translation - Intron an intervening DNA sequence removed from
mRNA after transcription and thus does not encode
protein in translation - Splice site Junction of intron and exon
- Promoter region of DNA to which an RNA
polymerase binds and initiates transcription -
the promoter regulates gene expression by
controlling the amount of mRNA transcribed - Polymorphism Variation in the sequence of DNA
among individuals
After S. Chanock, NCI
71SNPs and Function We know so little
- Majority are silent
- No known functional change
- Some alter gene expression/regulation
- Promoter/enhancer/silencer
- mRNA stability
- Small RNAs
- Some alter function of gene product
- Change sequence of protein
Courtesy S. Chanock, NCI
72SNPs within Genes
- Coding SNPs (cSNPs)
- Synonymous no change in amino acid
- previously termed silent but..
- Can alter mRNA stability
- DRD2 (Duan et al 2002)
- Can alter speed of translation and protein
folding - MDR1 (Gottesman et al 2007)
- Nonsynonymous changes amino acid (codon)
- conservative and radical
- Nonsense insertion of stop codon
- Frameshift (insertion/deletion) Disrupts codon
sequence, rare but disruptive
After S. Chanock, NCI
73SNPs Outside Genes
- Majority distributed throughout genome are
silent (excellent as markers) - Alter transcription
- Promoter, enhancer, silencer
- Regulate expression
- Locus control region, mRNA stability
- Most are assumed to be silent hitchhikers
- No function by predictive models or analysis
Courtesy S. Chanock, NCI
74Sample Collection and Processing
- Obtaining samples for DNA preparation
- whole blood, buffy coat
- sputum
- buccal cells
- serum, urine
- pathology specimens
- placenta, excreta, other
- Purifying and quantifying DNA
- Transformed lymphocytes
- Whole genome amplification (WGA)
- Barcode individual DNAs (QC)
After S. Chanock, NCI