Title: Recombination, and haplotype structure
1Recombination, and haplotype structure
- Simon Myers, Gil McVean
- Department of Statistics, Oxford
2The starting point
- We have a genomes worth of data on genetic
variation - We wish to understand why the haplotype structure
looks how it does - Differences between regions, populations
3Where do haplotypes come from?
- In the absence of recombination, the most natural
way to think about haplotypes is in terms of the
genealogical tree representing the history of the
chromosomes - Tree affects mutation patterns
- Mutation patterns give information on tree
4What determines the shape of the tree?
Present day
5Ancestry of current population
Present day
6Ancestry of sample
Present day
7The coalescent a model of genealogies
Most recent common ancestor (MRCA)
coalescence
Ancestral lineages
Present day
time
8Simulating histories with the coalescent
9Simulating data with the coalescent
10Haplotype structure in the absence of
recombination
- In the absence of recombination, the shape of the
tree and where mutations fall on it determine
patterns of haplotype structure - Two mutations on the same branch will be in
complete association, mutations on different
branches will have lower and often low association
11Haplotypes when there is recombination
- When there is no recombination, haplotype
structure reflects the age distribution of
mutations and the shape of the underlying tree - When there is some recombination, every
nucleotide position has a tree, but the tree
changes along the chromosome at a rate determined
by the local recombination landscape - By using SNP information to inform us about the
trees, we can learn about how quickly the trees
changes - This relates to the recombination rate
12A bit of recombination shuffles genetic
variation
13Lots of recombination does lots of shuffling
14Recombination and haplotype diversity
- Without recombination, a new mutation can create
at most one new haplotype - Any two mutations delineate at most 3 haplotypes
in total (ancestral, plus two new types) - With recombination, this mutation can spread onto
every existing haplotype background, creating the
potential for more haplotypes - For a given number of SNPs a region with
recombination will tend to have (in comparison to
a region with no recombination) - More haplotypes
- Less variance in the pairwise differences between
haplotypes - Less skewed haplotype frequencies
15The ancestral recombination graph
- The combined history of recombination, mutation
and coalescence is described by the ancestral
recombination graph
Event
Coalescence
Mutation
Coalescence
Coalescence
Mutation
Coalescence
Recombination
16In humans, recombination is not uniformly
distributed
- Most recombination occurs in recombination
hotspots short (1-2kb) regions every 50-100kb
that occupy at most 3 of the genome but probably
account for 90 or more of the recombination - This means that haplotype structure in humans is
an interesting hybrid between the no
recombination and lots of recombination situations
17Learning about recombination
- Just like there is a true genealogy underlying a
sample of sequences without recombination, there
is a true ARG underlying samples of sequences
with recombination - We can consider nonparametric and parametric ways
of learning about recombination - There are useful nonparametric ways of learning
about recombination which we will consider first - These really only apply to species, such as
humans, where we can be fairly sure that most
SNPs are the result of a single ancestral
mutation event
18The signal of recombination?
Ancestral chromosome recombines
Recurrent mutation
Recombination
19Detecting recombination from DNA sequence data
- Look for all pairs of incompatible sites
- Find minimum number of intervals in which
recombination events must have occured (Hudson
and Kaplan 1985) Rm
20Improving the detection algorithm
- Rm greatly underestimates the amount of
recombination in the history of a set of
sequences - Myers and Griffiths (2003) developed an improved
way of detecting recombination events - Without recombination, every new mutation can
create only a single new haplotype - With recombination, mutations can be shuffled
between haplotype background, generating
haplotype diversity - Each recombination makes at most one new
haplotype - If I see H haplotypes with S segregating sites,
at least H-S-1 recombination events must have
occurred - This offers potential to identify many more
recombination events - Carefully combine bounds from different
collection of sites - Dynamic programming algorithm makes computation
extremely fast - Better (sometimes slower) algorithms developed
recently
21Problems with counting recombination events
Tree-pairs where we cannot see recombination
events
A tree-pair where we could see recombination
events, but dont
22Modelling recombination
- Model-based approaches to learning about
recombination allow us to ask more detailed
questions than nonparametric approaches - What is the rate of recombination (as opposed to
just the number of events) - Does gene A have a higher recombination rate than
gene B? - Is the rate of recombination across a region
constant? - Where are the recombination hotspots?
- We can use coalescent model approaches
(approximations) to calculating the likelihood of
arbitrary recombination maps given observed data
23Fitting a variable recombination rate
- Use a reversible-jump MCMC approach (Green 1995)
SNP positions
Cold
Split blocks
Hot
Merge blocks
Change block size
Change block rate
24Acceptance rates
Ratio of priors
Composite likelihood ratio
Hastings ratio
Jacobian of partial derivatives relating changes
in dimension to sampled random numbers
- Include a prior on the number of change points
that encourages smoothing
25Strong concordance between fine-scale rate
estimates from sperm and genetic variation
Rates estimated from genetic variation McVean et
al (2004)
26Inferring hotspots
- We perform a statistical test for hotspot
presence - Based on an approximation to the coalescent
similar to that used for rate estimation - All previously identified hotspots are 1-2kb in
size - At a position in genome, consider where 2kb
hotspot might be present - Fit a model with hotspot
- Fit one without
- Compare in terms of (approximate) likelihood
ratio test - Evaluate significance via simulation
- When p-value below threshold, declare a hotspot
27Rates and hotspots across the human genome
Hotspots throughout human genome (35,000
identified)
From Myers et al. (2005)
28Applications of recombination approaches to real
data
- Rates and hotspots across the human genome (Myers
et al. 2005) - Previously, no understanding of why hotspots
localise where they do - Can 35,000 hotspots, accounting for gt50 of human
recombination, help? - Comparison of recombination rates (Winckler et
al. 2004, Ptak et al. 2005) - Between humans and chimpanzees
- At individual recombination hotspots
- Understanding genomic rearrangements (Myers et
al., submitted!) - Cause a number of genomic disorders
- Relationship to recombination hotspots
29 32,996 Phase II HapMap hotspots
Estimated 50-70 of all human recombination Hotspo
ts on all chromosomes, including X
THE1B (LTR of retrotransposon)
20,000 hotspots localised to within 5kb
THE1B Found in 1196 hotspots versus 606
coldspots (pltlt10-20) AluY Found in 3635
hotspots versus 3262 coldspots (p7x10-5)
30THE1 consensus
...CTTCCGCCATGATTGTGAGGCCTCCCCAGCCATGTGGAACTGTGAGT
CCATT...
CCTCCCTAGCCAC
(n165)
CCNCCNTNNCCNC
(n263)
CCTCCCCNNCCAT
(n10,690)
3-4 of hotspots
L2 consensus
...TGTCACCTCCTCAGAGAGGCCTTCCCTGACCACCCTATCTAAAATWG
CACACC...
CCTCCCTGACCAC
(n157)
CCNCCNTNNCCNC
(n6,901)
CTTCCCTNNCCAC
(n1,211)
3-4 of hotspots
AluY, AluSc, AluSg consensus
...CTCCTGACCTCGTGATCCGCCCGCCTCGGCCTCCCAAAGTGCTGGGA
TTACAG...
CCGCCTTGGCCTC
(n14,028)
CCNCCNTNNCCNC
(n15,706)
CCGCCTCNNCCTC
(n55,916)
3-4 of hotspots, including DNA3
31Human hotspot motifs
- In humans, specific words produce recombination
hotspot activity - Hotspot motif CCTCCCTNNCCAC (plt10-33)
- Raises probability of a hotspot across genetic
backgrounds - Degenerate versions CCNCCNTNNCCNC and truncated
CCTCCCT also raise probability, to lesser extent - Motif explains 40 of human hotspots
- Operates in both sexes
- We dont know, very clearly, which hotspots
- On THE1 background, hotspot 70-80 of time!
- Biology not clearly understood
- We identified a second, different hotspot motif
(the best 9bp motif), CCCCACCCC, also by
comparison of hot and cold regions of the genome
32Variation in individual hotspots
Sequence variation affects recombination at DNA2
(Jeffreys and Neumann, Nature Genetics 2002)
33SNPs disrupting hotspots disrupt motifs!
Jeffreys and Neumann (Nature Genetics 2002, Hum
Mol. Evol. 2005)
AAAAGACAGCCTCCCTGTTGCTGC
Hot
AAAAGACAGCCCCCCTGTTGCTGC
Cold
Hot
CACCCCCCACCCCACCCCAACATA
CACCTCCCACCCCACCCCAACATA
Cold
34SNPs disrupting hotspots disrupt motifs
Jeffreys and Neumann (Nature Genetics 2002, Hum
Mol. Evol. 2005)
AAAAGACAGCCTCCCTGTTGCTGC
Hot
AAAAGACAGCCCCCCTGTTGCTGC
Cold
Disruption of CCTCCCT, best 7bp motif
Hot
CACCCCCCACCCCACCCCAACATA
CACCTCCCACCCCACCCCAACATA
Cold
Disruption of CCCCACCCC, best 9bp motif
35Role of motif in X-linked ichthyosis
VCX2
1/5000 births
- The 1kb deletion hotspot contains 25 repeats of
CCTCCCTNNCCAC - Highest motif density in any LCR in entire
genome - Strongly implicates motif in producing hotspot
- Points to a link between deletion-causing and
normal recombination
36A more general link?
- Many other diseases are caused by
recombination-mediated deletions and duplications
(NAHR) -
- Smith-Magenis syndrome (hotspot)
- CMT1A (hotspot)
- NF1 microdeletion syndrome (hotspot)
- DiGeorge syndrome.
- Two recent studies suggest normal hotspots and
hotspots of disease-causing deletion may coincide - de Raedt, Stephens et al. (Nature Genetics, 2006)
- Two NF1 deletion hotspots both likely to coincide
with crossover hotspots
37Other major NAHR hotspots
CCNCCNTNNCCNC overrepresented in hotspots
p0.0006
38Evolution of recombination human vs. chimps
LDhot hotspots
Human
Chimp
LDhat rate estimates
No significant correlation in hotspots positions
between species (Winckler et al. Science 2005,
Ptak et al. Nature Genetics 2005)
39Reading
- Haplotype structure and recombination
- The International HapMap Consortium A haplotype
map of the human genome. Nature 2005,
4371299-1320. - McVean G, Spencer CCA, Chaix R Perspectives on
human genetic variation from the International
HapMap Project. PLoS Genetics 2005, 1e54. - Myers S, Bottolo L, Freeman C, McVean G, Donnelly
P A fine-scale map of recombination rates and
recombination hotspots in the human genome.
Science 2005, 310321-324. - The coalescent
- Nordborg M Coalescent Theory. In The Handbook
of Statistical Genetics (eds Balding, Bishop and
Cannings), 2001. Wiley Sons. - Hudson RR Gene genealogies and the coalescent
process. In Oxford Surveys in Evolutionary
Biology (eds Futuyama and Antonovics) 1990,
7144. Oxford University Press.
40Selected references
- - Jeffreys, A.J., L. Kauppi, and R. Neumann.
2001. Intensely punctate meiotic recombination in
the class II region of the major
histocompatibility complex. Nat Genet 29
217-222. - - Jeffreys, A.J. and R. Neumann. 2002. Reciprocal
crossover asymmetry and meiotic drive in a human
recombination hot spot. Nat Genet 31 267-271. - Jeffreys, A.J. and R. Neumann. 2005. Factors
influencing recombination frequency and
distribution in a human meiotic crossover
hotspot. Hum Mol Genet 14 2277-2287. - Myers, S., L. Bottolo, C. Freeman, G. McVean,
and P. Donnelly. 2005. A fine-scale map of
recombination rates and hotspots across the human
genome. Science 310 321-324. - Ptak, S.E., D.A. Hinds, K. Koehler, B. Nickel,
N. Patil, D.G. Ballinger, M. Przeworski, K.A.
Frazer, and S. Paabo. 2005. Fine-scale
recombination patterns differ between chimpanzees
and humans. Nat Genet 37 429-434. - The International HapMap Consortium. 2005. A
haplotype map of the human genome. Nature 437
1299-1320. - The International HapMap Consortium. 2007. The
Phase II HapMap. Nature - - Winckler, W., S.R. Myers, D.J. Richter, R.C.
Onofrio, G.J. McDonald, R.E. Bontrop, G.A.
McVean, S.B. Gabriel, D. Reich, P. Donnelly et
al. 2005. Comparison of fine-scale recombination
rates in humans and chimpanzees. Science 308
107-111.