Recombination, and haplotype structure - PowerPoint PPT Presentation

About This Presentation
Title:

Recombination, and haplotype structure

Description:

We have a genome's worth of data on genetic variation ... Intensely punctate meiotic recombination in the class II region of the major ... – PowerPoint PPT presentation

Number of Views:167
Avg rating:3.0/5.0
Slides: 41
Provided by: mcv31
Category:

less

Transcript and Presenter's Notes

Title: Recombination, and haplotype structure


1
Recombination, and haplotype structure
  • Simon Myers, Gil McVean
  • Department of Statistics, Oxford

2
The starting point
  • We have a genomes worth of data on genetic
    variation
  • We wish to understand why the haplotype structure
    looks how it does
  • Differences between regions, populations

3
Where do haplotypes come from?
  • In the absence of recombination, the most natural
    way to think about haplotypes is in terms of the
    genealogical tree representing the history of the
    chromosomes
  • Tree affects mutation patterns
  • Mutation patterns give information on tree

4
What determines the shape of the tree?
Present day
5
Ancestry of current population
Present day
6
Ancestry of sample
Present day
7
The coalescent a model of genealogies
Most recent common ancestor (MRCA)
coalescence
Ancestral lineages
Present day
time
8
Simulating histories with the coalescent
9
Simulating data with the coalescent
10
Haplotype structure in the absence of
recombination
  • In the absence of recombination, the shape of the
    tree and where mutations fall on it determine
    patterns of haplotype structure
  • Two mutations on the same branch will be in
    complete association, mutations on different
    branches will have lower and often low association

11
Haplotypes when there is recombination
  • When there is no recombination, haplotype
    structure reflects the age distribution of
    mutations and the shape of the underlying tree
  • When there is some recombination, every
    nucleotide position has a tree, but the tree
    changes along the chromosome at a rate determined
    by the local recombination landscape
  • By using SNP information to inform us about the
    trees, we can learn about how quickly the trees
    changes
  • This relates to the recombination rate

12
A bit of recombination shuffles genetic
variation
13
Lots of recombination does lots of shuffling
14
Recombination and haplotype diversity
  • Without recombination, a new mutation can create
    at most one new haplotype
  • Any two mutations delineate at most 3 haplotypes
    in total (ancestral, plus two new types)
  • With recombination, this mutation can spread onto
    every existing haplotype background, creating the
    potential for more haplotypes
  • For a given number of SNPs a region with
    recombination will tend to have (in comparison to
    a region with no recombination)
  • More haplotypes
  • Less variance in the pairwise differences between
    haplotypes
  • Less skewed haplotype frequencies

15
The ancestral recombination graph
  • The combined history of recombination, mutation
    and coalescence is described by the ancestral
    recombination graph

Event
Coalescence
Mutation
Coalescence
Coalescence
Mutation
Coalescence
Recombination
16
In humans, recombination is not uniformly
distributed
  • Most recombination occurs in recombination
    hotspots short (1-2kb) regions every 50-100kb
    that occupy at most 3 of the genome but probably
    account for 90 or more of the recombination
  • This means that haplotype structure in humans is
    an interesting hybrid between the no
    recombination and lots of recombination situations

17
Learning about recombination
  • Just like there is a true genealogy underlying a
    sample of sequences without recombination, there
    is a true ARG underlying samples of sequences
    with recombination
  • We can consider nonparametric and parametric ways
    of learning about recombination
  • There are useful nonparametric ways of learning
    about recombination which we will consider first
  • These really only apply to species, such as
    humans, where we can be fairly sure that most
    SNPs are the result of a single ancestral
    mutation event

18
The signal of recombination?
Ancestral chromosome recombines
Recurrent mutation
Recombination
19
Detecting recombination from DNA sequence data
  • Look for all pairs of incompatible sites
  • Find minimum number of intervals in which
    recombination events must have occured (Hudson
    and Kaplan 1985) Rm

20
Improving the detection algorithm
  • Rm greatly underestimates the amount of
    recombination in the history of a set of
    sequences
  • Myers and Griffiths (2003) developed an improved
    way of detecting recombination events
  • Without recombination, every new mutation can
    create only a single new haplotype
  • With recombination, mutations can be shuffled
    between haplotype background, generating
    haplotype diversity
  • Each recombination makes at most one new
    haplotype
  • If I see H haplotypes with S segregating sites,
    at least H-S-1 recombination events must have
    occurred
  • This offers potential to identify many more
    recombination events
  • Carefully combine bounds from different
    collection of sites
  • Dynamic programming algorithm makes computation
    extremely fast
  • Better (sometimes slower) algorithms developed
    recently

21
Problems with counting recombination events
Tree-pairs where we cannot see recombination
events
A tree-pair where we could see recombination
events, but dont
22
Modelling recombination
  • Model-based approaches to learning about
    recombination allow us to ask more detailed
    questions than nonparametric approaches
  • What is the rate of recombination (as opposed to
    just the number of events)
  • Does gene A have a higher recombination rate than
    gene B?
  • Is the rate of recombination across a region
    constant?
  • Where are the recombination hotspots?
  • We can use coalescent model approaches
    (approximations) to calculating the likelihood of
    arbitrary recombination maps given observed data

23
Fitting a variable recombination rate
  • Use a reversible-jump MCMC approach (Green 1995)

SNP positions
Cold
Split blocks
Hot
Merge blocks
Change block size
Change block rate
24
Acceptance rates
Ratio of priors
Composite likelihood ratio
Hastings ratio
Jacobian of partial derivatives relating changes
in dimension to sampled random numbers
  • Include a prior on the number of change points
    that encourages smoothing

25
Strong concordance between fine-scale rate
estimates from sperm and genetic variation
Rates estimated from genetic variation McVean et
al (2004)
26
Inferring hotspots
  • We perform a statistical test for hotspot
    presence
  • Based on an approximation to the coalescent
    similar to that used for rate estimation
  • All previously identified hotspots are 1-2kb in
    size
  • At a position in genome, consider where 2kb
    hotspot might be present
  • Fit a model with hotspot
  • Fit one without
  • Compare in terms of (approximate) likelihood
    ratio test
  • Evaluate significance via simulation
  • When p-value below threshold, declare a hotspot

27
Rates and hotspots across the human genome
Hotspots throughout human genome (35,000
identified)
From Myers et al. (2005)
28
Applications of recombination approaches to real
data
  • Rates and hotspots across the human genome (Myers
    et al. 2005)
  • Previously, no understanding of why hotspots
    localise where they do
  • Can 35,000 hotspots, accounting for gt50 of human
    recombination, help?
  • Comparison of recombination rates (Winckler et
    al. 2004, Ptak et al. 2005)
  • Between humans and chimpanzees
  • At individual recombination hotspots
  • Understanding genomic rearrangements (Myers et
    al., submitted!)
  • Cause a number of genomic disorders
  • Relationship to recombination hotspots

29
32,996 Phase II HapMap hotspots
Estimated 50-70 of all human recombination Hotspo
ts on all chromosomes, including X
THE1B (LTR of retrotransposon)
20,000 hotspots localised to within 5kb
THE1B Found in 1196 hotspots versus 606
coldspots (pltlt10-20) AluY Found in 3635
hotspots versus 3262 coldspots (p7x10-5)
30
THE1 consensus
...CTTCCGCCATGATTGTGAGGCCTCCCCAGCCATGTGGAACTGTGAGT
CCATT...
CCTCCCTAGCCAC
(n165)
CCNCCNTNNCCNC
(n263)
CCTCCCCNNCCAT
(n10,690)
3-4 of hotspots
L2 consensus
...TGTCACCTCCTCAGAGAGGCCTTCCCTGACCACCCTATCTAAAATWG
CACACC...
CCTCCCTGACCAC
(n157)
CCNCCNTNNCCNC
(n6,901)
CTTCCCTNNCCAC
(n1,211)
3-4 of hotspots
AluY, AluSc, AluSg consensus
...CTCCTGACCTCGTGATCCGCCCGCCTCGGCCTCCCAAAGTGCTGGGA
TTACAG...
CCGCCTTGGCCTC
(n14,028)
CCNCCNTNNCCNC
(n15,706)
CCGCCTCNNCCTC
(n55,916)
3-4 of hotspots, including DNA3
31
Human hotspot motifs
  • In humans, specific words produce recombination
    hotspot activity
  • Hotspot motif CCTCCCTNNCCAC (plt10-33)
  • Raises probability of a hotspot across genetic
    backgrounds
  • Degenerate versions CCNCCNTNNCCNC and truncated
    CCTCCCT also raise probability, to lesser extent
  • Motif explains 40 of human hotspots
  • Operates in both sexes
  • We dont know, very clearly, which hotspots
  • On THE1 background, hotspot 70-80 of time!
  • Biology not clearly understood
  • We identified a second, different hotspot motif
    (the best 9bp motif), CCCCACCCC, also by
    comparison of hot and cold regions of the genome

32
Variation in individual hotspots
Sequence variation affects recombination at DNA2
(Jeffreys and Neumann, Nature Genetics 2002)
33
SNPs disrupting hotspots disrupt motifs!
  • DNA2
  • NID1

Jeffreys and Neumann (Nature Genetics 2002, Hum
Mol. Evol. 2005)
AAAAGACAGCCTCCCTGTTGCTGC
Hot
AAAAGACAGCCCCCCTGTTGCTGC
Cold
Hot
CACCCCCCACCCCACCCCAACATA
CACCTCCCACCCCACCCCAACATA
Cold
34
SNPs disrupting hotspots disrupt motifs
  • DNA2
  • NID1

Jeffreys and Neumann (Nature Genetics 2002, Hum
Mol. Evol. 2005)
AAAAGACAGCCTCCCTGTTGCTGC
Hot
AAAAGACAGCCCCCCTGTTGCTGC
Cold
Disruption of CCTCCCT, best 7bp motif
Hot
CACCCCCCACCCCACCCCAACATA
CACCTCCCACCCCACCCCAACATA
Cold
Disruption of CCCCACCCC, best 9bp motif
35
Role of motif in X-linked ichthyosis
VCX2
1/5000 births
  • The 1kb deletion hotspot contains 25 repeats of
    CCTCCCTNNCCAC
  • Highest motif density in any LCR in entire
    genome
  • Strongly implicates motif in producing hotspot
  • Points to a link between deletion-causing and
    normal recombination

36
A more general link?
  • Many other diseases are caused by
    recombination-mediated deletions and duplications
    (NAHR)
  • Smith-Magenis syndrome (hotspot)
  • CMT1A (hotspot)
  • NF1 microdeletion syndrome (hotspot)
  • DiGeorge syndrome.
  • Two recent studies suggest normal hotspots and
    hotspots of disease-causing deletion may coincide
  • de Raedt, Stephens et al. (Nature Genetics, 2006)
  • Two NF1 deletion hotspots both likely to coincide
    with crossover hotspots

37
Other major NAHR hotspots
CCNCCNTNNCCNC overrepresented in hotspots
p0.0006
38
Evolution of recombination human vs. chimps
LDhot hotspots
Human
Chimp
LDhat rate estimates
No significant correlation in hotspots positions
between species (Winckler et al. Science 2005,
Ptak et al. Nature Genetics 2005)
39
Reading
  • Haplotype structure and recombination
  • The International HapMap Consortium A haplotype
    map of the human genome. Nature 2005,
    4371299-1320.
  • McVean G, Spencer CCA, Chaix R Perspectives on
    human genetic variation from the International
    HapMap Project. PLoS Genetics 2005, 1e54.
  • Myers S, Bottolo L, Freeman C, McVean G, Donnelly
    P A fine-scale map of recombination rates and
    recombination hotspots in the human genome.
    Science 2005, 310321-324.
  • The coalescent
  • Nordborg M Coalescent Theory. In The Handbook
    of Statistical Genetics (eds Balding, Bishop and
    Cannings), 2001. Wiley Sons.
  • Hudson RR Gene genealogies and the coalescent
    process. In Oxford Surveys in Evolutionary
    Biology (eds Futuyama and Antonovics) 1990,
    7144. Oxford University Press.

40
Selected references
  • - Jeffreys, A.J., L. Kauppi, and R. Neumann.
    2001. Intensely punctate meiotic recombination in
    the class II region of the major
    histocompatibility complex. Nat Genet 29
    217-222.
  • - Jeffreys, A.J. and R. Neumann. 2002. Reciprocal
    crossover asymmetry and meiotic drive in a human
    recombination hot spot. Nat Genet 31 267-271.
  • Jeffreys, A.J. and R. Neumann. 2005. Factors
    influencing recombination frequency and
    distribution in a human meiotic crossover
    hotspot. Hum Mol Genet 14 2277-2287.
  • Myers, S., L. Bottolo, C. Freeman, G. McVean,
    and P. Donnelly. 2005. A fine-scale map of
    recombination rates and hotspots across the human
    genome. Science 310 321-324.
  • Ptak, S.E., D.A. Hinds, K. Koehler, B. Nickel,
    N. Patil, D.G. Ballinger, M. Przeworski, K.A.
    Frazer, and S. Paabo. 2005. Fine-scale
    recombination patterns differ between chimpanzees
    and humans. Nat Genet 37 429-434.
  • The International HapMap Consortium. 2005. A
    haplotype map of the human genome. Nature 437
    1299-1320.
  • The International HapMap Consortium. 2007. The
    Phase II HapMap. Nature
  • - Winckler, W., S.R. Myers, D.J. Richter, R.C.
    Onofrio, G.J. McDonald, R.E. Bontrop, G.A.
    McVean, S.B. Gabriel, D. Reich, P. Donnelly et
    al. 2005. Comparison of fine-scale recombination
    rates in humans and chimpanzees. Science 308
    107-111.
Write a Comment
User Comments (0)
About PowerShow.com