Linkage Disequilibrium - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Linkage Disequilibrium

Description:

Fourth Edition (Hardcover) by Daniel L. Hartl, Andrew G. Clark (Author) x. x. x ... Linkage and Association Eds Benjamin M. Neale, Manuel A.R. Ferreira, ... – PowerPoint PPT presentation

Number of Views:1680
Avg rating:3.0/5.0
Slides: 47
Provided by: peopleV
Category:

less

Transcript and Presenter's Notes

Title: Linkage Disequilibrium


1
Linkage Disequilibrium
  • Joe Mychaleckyj
  • Center for Public Health Genomics
  • 982-1107
  • jcm6t_at_virginia.edu

2
Today well cover
  • Haplotypes
  • Linkage Disequilibrium
  • Visualizing LD
  • HapMap

3
References
Principles of Population Genetics, Fourth
Edition (Hardcover) by Daniel L. Hartl, Andrew
G. Clark (Author)
Genetic Data Analysis II Bruce S Weir
x
x
x
4
References
Statistical Genetics Gene Mapping Through
Linkage and Association Eds Benjamin M. Neale,
Manuel A.R. Ferreira, Sarah E. Medland, Danielle
Posthuma
5
2N (ie very large diversity possible)
SNP1 SNP2 SNP3 A / T C / G A /
G A C G A C A
T G G
Haplotype specific combination of alleles
occurring (cis) on the same chromosome (segment
of chromosome)N SNPs - How many Haplotypes are
possible ?
6
Terminology
  • Haplotype Specific combination (phasing) of
    alleles occurring (cis) on the same chromosomal
    segment
  • Linkage/Linked Markers Physical co-location of
    markers on the same chromosome
  • Diplotype Haplogenotype ie pair of phased
    haplotypes one maternally, one paternally
    inherited

7
SNP2 B / b
SNP1 A / a
Major Allele Freq p(A)
p(B) Minor Allele Freq p(a) p(b)
Independently segregating SNPs Haplotype
Frequency p(ab) p(a) x p(b)
LINKAGE EQUILIBRIUM (How many haplotypes in
total ?)
LINKAGE DISEQUILIBRIUM Haplotype Frequency p(ab)?
p(a) x p(b)
8
Linkage Disequilibrium
  • Non-random assortment of alleles at 2 (or more)
    loci
  • The closer the markers, the stronger the LD since
    recombination will have occurred at a low rate
  • Markers co-segregate within and between families

9
LINKAGE EQUILIBRIUM
Not a Punnett Square!
SNP2 Allele B b
SNP1 Allele A a
p(A)p(B) p(a)p(B)
p(A)p(b) p(A) p(a)p(b) p(a)
p(B) p(b)
Example p(A)p(B)p(a)p(B)p(B) p(A)p(a) p(B)
10
SNP2 B / b
SNP1 A / a
Major Allele Freq p(A)
p(B) Minor Allele Freq p(a) p(b)
LINKAGE DISEQUILIBRIUM Haplotype Frequency p(ab)
p(a) p(b) D (sign of D is generally
arbitrary, unless comparing D values between
populations or studies) D Lewontins LD
Parameter (Lewontin 1960)
11
LINKAGE DISEQUILIBRIUM
SNP2 Allele B b
SNP1 Allele A a
p(A)p(B)D p(a)p(B)-D
p(A)p(b)-D p(A) p(a)p(b)D p(a)
p(B) p(b)
p(A)p(B)D p(a)p(B)-D p(B) p(A)p(a) p(B)
12
b B
What is the LD ? ? 0 p(ab) ? p(a)
p(b)
0.16
0.04
p(a)0.20
a A
p(B)0.80
0.14
0.66
p(b)0.30 p(B)0.70
p(ab) p(a) p(b) D
0.16 0.2 x 0.3 D D 0.1 Since p(ab)
p(a)p(b) D D was used and D is ve here, but
arbitrary eg can relabel alleles A,B as minor
13
Range of D values (-ve to ve)
D has a minimum and maximum value that depends on
the allele frequencies of the markers Since
haplotype frequencies cannot be -ve p(aB)
p(a)p(B) - D 0 D p(a)p(B) p(Ab) p(A)p(b)
- D 0 D p(A)p(b) These cannot both be true,
so D min( p(a)p(B), p(A)p(b) ) p(ab) p(a)p(b)
D 0 D -p(a)p(b) p(AB) p(A)p(B) D
0 D -p(A)p(B) These cannot both be true, so D
max( -p(a)p(b), -p(A)p(B) ) Similar
equations if we had defined p(ab) p(a)p(b) - D
14
Limits of D LD Parameter
  • Limits of D are a function of allele frequencies
  • Standardize D by rescaling to a proportion of
    its maximal value for the given allele
    frequencies (D')
  • D D
  • Dmax

15
D (Lewontin, 1964)
  • D D / Dmax
  • Dmax min (p(A)p(B), p(a)p(b)) D lt 0
  • Dmax min (p(A)p(b), p(a)p(B)) D gt 0
  • Again, sign of D depends on definition
  • D 1 or -1 if one of p(A)p(B), p(A)p(b),
    p(a)p(B), p(a)p(b) 0
  • Complete LD (ie only 3 haplotypes seen)
  • D1 or -1 suggests that no recombination has
    taken place between markers
  • Beware rare markers - may not have enough
    power/sample size to detect 4th haplotype

16
D Interpretation
b B
b B
0.06
0.14
p(a)0.20
0.2
0
p(a)0.20
a A
a A
p(A)0.80
0.7
P(A)0.80
0.1
0.56
0.24
p(b)0.30 p(B)0.70
p(b)0.30 p(B)0.70
D0 Dmax undefined
DDmax 0.14 D 1
p(a) 0.2
p(b) 0.3
D1 (perfect LD using D measure - No
recombination between marker - Only 3
haplotypes are seen
17
Creation of LD
  • Easiest to understand when markers are physically
    linked
  • Creation of LD
  • Mutation
  • Founder effect
  • Admixture
  • Inbreeding / non-random mating
  • Selection
  • Population bottleneck or stratification
  • Epistatic interaction
  • LD can occur between unlinked markers
  • Gametic phase disequilibrium is a more general
    term

18
SNP1
SNP1
SNP2
n3 haplotypes
A B
A
Recombination
n2 haplotypes
A b
a B
a
19
Destruction of LD
  • Main force is recombination
  • Gene conversion may also act at short distances
    ( 100-1,000 bases)
  • LD decays over time (generations of
    interbreeding)

20
SNP1
SNP2
Probability Recombination occurs ? Probability
Recombination does not occur 1-?
Initial LD between SNP1 - SNP2 D0 After 1
generation Preservation of LD D1
D0(1-?) After t generations Dt D0 (1- ?)t
NB Overly simple model - does not account for
allele frequency drift over time
21
Dt D0 (1-?)t
22
r2 LD Parameter (Hill Robertson, 1968)
r2 D2 p(a)p(b)p(A)p(B)
  • Squared correlation coefficient varies 0 - 1
  • Frequency dependent
  • Better LD measure for allele correlation between
    markers - predictive power of SNP1 alleles for
    those at SNP2
  • Used extensively in disease gene or phenotype
    mapping through association testing

23
r2 Interpretation
b B
b B
0.06
0.14
p(a)0.20
0.2
0
p(a)0.20
a A
a A
p(A)0.80
0.7
p(A)0.80
0.1
0.56
0.24
p(b)0.30 p(B)0.70
p(b)0.30 p(B)0.70
D0 Dmax undefined
DDmax 0.14 D 1
r2 0
r2 0.14/0.24 0.58
p(a) 0.2
p(b) 0.3
r2 ? 1 Correlation is not perfect, even though
D 1 r2 1 if D 1 and p(a) p(b) 0.3
24
r2 Interpretation
p(a) 0.3
p(b) 0.3
Only 2 haplotypes r2 1 Correlation is
perfect D 1 (less than 4 haplotypes) p(a)
p(b) ( 0.3 in this example)
  • r21 when there is perfect correlation between
    markers and one genotype predicts the other
    exactly
  • Only 2 haplotypes present
  • D 1 ?gt r2 1
  • No recombination AND markers must have identical
    allele frequency
  • SNPs are of similar age
  • Corollary
  • Low r2 values do not necessarily high
    recombination
  • Discrepant allele frequencies

25
Common Measures of Linkage Disequilibrium
-1 D 1 0 r2 1
Recombination
Correlation
Other LD Measures exist, less common usage
26
Visualizing LD metrics
27
SNP 1 2 3 4 5 6
D






SNP1 SNP2 SNP3 SNP4 SNP5 SNP6
1.0
0.8
0.6
0.2
0
Not usually worried about sign of D
28
(No Transcript)
29
Haploview TCN2 (r2)
30
http//www.hapmap.org
Launched October 2002
31
International HapMap Project
  • Initiated Oct 2002
  • Collaboration of scientists worldwide
  • Goal describe common patterns of human DNA
    sequence variation
  • Identify LD and haplotype distributions
  • Populations of different ancestry (European,
    African, Asian)
  • Identify common haplotypes and population-specific
    differences
  • Has had major impact on
  • Understanding of human popualtion history as
    reflected in genetic diversity and similarity
  • Design and analysis of genetic association studies

32
HapMap samples
  • 90 Yoruba individuals (30 parent-parent-offspring
    trios) from Ibadan, Nigeria (YRI)
  • 90 individuals (30 trios) of European descent
    from Utah (CEU)
  • 45 Han Chinese individuals from Beijing (CHB)
  • 44 Japanese individuals from Tokyo (JPT)

33
Project feasible because of
  • The availability of the human genome sequence
  • Databases of common SNPs (subsequently enriched
    by HapMap) from which genotyping assays could be
    designed
  • Development of inexpensive, accurate technologies
    for highthroughput SNP genotyping
  • Web-based tools for storing and sharing data
  • Frameworks to address associated ethical and
    cultural issues

34
HapMap goals
  • Define patterns of genetic variation across human
    genome
  • Guide selection of SNPs efficiently to tag
    common variants
  • Public release of all data (assays, genotypes)
  • Phase I 1.3 M markers in 269 people
  • 1 SNP/5kb (1.3M markers)
  • Minor allele frequency (MAF) gt5
  • Phase II 2.8 M markers in 270 people

35
http//www.hapmap.org/
36
(No Transcript)
37
(No Transcript)
38
HapMap publications
  • The International HapMap Consortium. A Haplotype
    Map of the Human Genome. Nature 437, 1299-1320.
    2005.
  • The International HapMap Consortium. The
    International HapMap Project. Nature 426,
    789-796. 2003.
  • The International HapMap Consortium. Integrating
    Ethics and Science in the International HapMap
    Project. Nature Reviews Genetics 5, 467 -475.
    2004.
  • Thorisson, G.A., Smith, A.V., Krishnan, L., and
    Stein, L.D. The International HapMap Project Web
    site. Genome Research,151591-1593. 2005.

39
ENCODE project
  • Aim To compare the genome-wide resource to a
    more complete database of common variationone in
    which all common SNPs and many rarer ones have
    been discovered and tested
  • Selected a representative collection of ten
    regions, each 500 kb in length
  • Each 500-kb region was sequenced in 48
    individuals, and all SNPs in these regions
    (discovered or in dbSNP) were genotyped in the
    complete set of 269 DNA samples

40
Comparison of linkage disequilibrium and
recombination for two ENCODE regions
Nature 437, 1299-1320. 2005
41
LD in Human Populations
42
Haplotype Blocks
N SNPs 2N Haplotypes possible, ie very large
diversity possible But we do not see the full
extent of haplotype diversity in human
populations Extensive LD especially at short
distances eg 20kbases. Haplotypes are broken
into blocks of markers with high mutual LD
separated by recombination hotspots Non-uniform
LD across genome
43
Haplotype Blocks
  • Haplotype blocks at least 80 of observed
    haplotypes with frequency gt 5 could be grouped
    into common patterns

Whole Genome Patterns of Common DNA Variation in
Three Human Populations, Science 2005, Hinds et
al.
44
Length of LD spans
r2
45
  • Example Large block of LD on chromosome 17
  • Cluster of common (frequent SNPs In high LD)
  • 518 SNPs, spanning 800 kb
  • 25 in EUR, 9 in AFR, missing in CHN
  • Genes
  • Microtubule-associated protein tau
  • Mutations associated with a variety of
    neurodegeneartive disorders
  • Gene coding for a protease similar to
    presenilins
  • Mutations result in Alzheimers disease
  • Gene for corticotropin-releasing hormone
    receptor
  • Immune, endocrine, autonomic, behavioral response
    to stress

46
Chromosome 17 LD Region
Prevalent inversion in EUR human population 25
Write a Comment
User Comments (0)
About PowerShow.com