Fine Scale Mapping and the Coalescent - PowerPoint PPT Presentation

About This Presentation
Title:

Fine Scale Mapping and the Coalescent

Description:

Donnelly-Stephens-Fearnhead (2000-) accelerated these algorithms ... Ott, J.(1999) Analysis of Human Genetic Linkage 3rd edition Publisher: John Hopkins ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 44
Provided by: Jotun
Category:

less

Transcript and Presenter's Notes

Title: Fine Scale Mapping and the Coalescent


1
Fine Scale Mapping and the Coalescent
  • The Fundamental Problem
  • The Data
  • Genotype to Phenotype Functions
  • Types of Mapping
  • Population Set-up Measures of Dependency
  • The Calculations
  • Practical Considerations

2
Genotype and Phenotype Covariation Gene Mapping
Sampling Genotypes and Phenotypes
ResultThe Mapping Function
A set of characters. Binary decision
(0,1). Quantitative Character.
3
Pedigree Analysis Association Mapping
Association Mapping
Pedigree Analysis
2N generations
Pedigree known Few meiosis (max 100s) Resolution
cMorgans (Mbases)
Pedigree unknown Many meiosis (gt104) Resolution
10-5 Morgans (Kbases)
Adapted from McVean and others
4
Causes of linkage disequilibrium
Time t ago
Now
Creates LD Breaks down LD Drift Recombinatio
n Selection Gene conversion Admixture
5
Significance of a Single Association
Disease locus
Marker locus
Disease locus
Marker locus
Test for independence in 2 times 2 Contingency
Table
XA,B Xa,B X.,B
XA,b Xa,b X.,b
XA,. Xa,. X.,.
6
Measuring Linkage Disequilibrium between 2 Loci
with 2 Alleles Remade from McVean
DA,B fA,B-fAfB -Da,B -DA,b Da,b
Correlation Coeffecient Measure 0,1 Hill
Robertson (1968)
Range constrained by allele frequencies 0,1
Lewontin (1964)
Odds-ratio formulation Devlin Risch (1995)
7
Examples of Associations Pairwise, Triple,...
Combine Single (Pairwise) to Multiple Tests
Bonferroni Sharper bounds using linkage
information.
8
ApoE and Alzheimers Syndrome
Causative SNP
6 markers with low association
Martin et al 2000
9
The coalescent with recombination or gene
conversion
Adapted from Hudson 1990
Recombination
Gene Conversion
10
Local trees for recombination and gene conversion
Gene conversion
Recombination
1
4
3
2
1
4
3
2
1
4
3
2
1
4
3
2
1
3
2
4
1
4
3
2
Tree 1 Tree 2 Tree 1
Tree 1 Tree 2 Tree 3
11
Measures of tree similarity
Target tree
Target
Region with no recombination
Same tree as target
Same topology as target
Same MRCA as target
1 2 3 4 5
Same tree
Same MRCA
Same topology
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
12
Local trees of the target and other positions
Sample size 20
Only recombination, r2.
Also gene conversion g/r4
From Mikkel Schierup
13
Probability that the largest segment does not
include the target
Recombination/gene conversion rate R2, G0 R2, G8
segments with same tree 1.02 1.8
P(target segment not largest) 0.2 14
segments same topology 1.02 2.1
P(target segment not largest) 0.3 20
segments same TMRCA 1.1 2.9
P(target segment not largest) 1.5 25
From Mikkel Schierup
14
Quantifying the mosaicism caused by Gene
Conversion
A and B are the most distant markers in
significant LD with target
What is the proportion of markers between these
also in significant LD?
G0 G16
Rho4 56 33
From Mikkel Schierup
15
Development of multi-locus association methods
  • Single Marker Methods
  • Kaplan et al. (1995), Rannala Slatkin (1998)
  • Problem Difficult to combine markers.
  • Haplotype methods with star-shaped genealogies
  • Terwilliger (1995), Graham Thompson (1998),
    McPeek Strahs(1999), Morris et al.(2000)
  • Problem wrong genealogy, gives overconfidence in
    result.
  • Haplotype methods based on the coalescent
  • Rannala Reeve (2001), Morris et al. (2002),
    Larribe et al. (2003).
  • Problem computationally intensive

Based on Morris et al. 2002
16
Probability of Data I
3 step approach
I Probability of Data given topology and branch
lengths
Felsenstein81 for each column Multiply for all
columns
TCAGCCT
TCAGCAT
GCAGGTT
II Integrate over branch lengths
III Sum over topologies
Conclusion Exact Calculation Computationally
Intractible!!
17
Probability of Data II Griffiths Tavavé
TPB46.2.131-149
q(n) determined by equilibrium distribution.
ACCTAGGAT TCCTAGGAT
393 mutations
(1,2) coalescence
ACCTAGGAT TCCTAGGAT TCCTAGGAT
n
18
Griffiths-Ethier-Tavare Recursions
Griffiths-Marjoram (1996) included recombination
in the equations.
19
Example Solving Linear System
q( )
??
r(,)
r(,)
r(,)
r(,)
??
??
r(,)
r(,)
r(,)
r(,)
q( )
q( )
r(,)
??
20
Example Solving Linear System
Construct Markov transition function, A(x,y),
with following properties i) A(x,y) gt 0
when r(x,y) gt0 ii) The chain visits A with
certainty.
  • Introduced in coalescence theory by Griffiths
    Tavare (1994)
  • Griffiths Marjoram (1996) included
    recombination
  • Donnelly-Stephens-Fearnhead (2000-) accelerated
    these algorithms

21
The position of the marker locus is missing
data Larribe and Lessard.(2002)
Data
haplotype
phenotype
multiplicity
15 3 6 2 1 2 1
Where is the disease causing disease?
Likelihood as function of disease locus position
22
Bayesian approach to LD mapping
Continuous version of Bayes formula f
(parameters) prior distribution of
parameters P(dataparameters) L(parameters)
likelihood function f (PD) posterior
distribution of parameters given data The
evolutionary parameter (e.g. disease location) is
considered to have prior distribution (any prior
knowledge we may have) and we learn about
parameters through data Advantage f
(parametersdata) is the full distribution of
parameters of interest given data, e.g.
confidence intervals
23
The basic equation
Marginal posterior distribution of disease
position
24
Parameters in Shattered Coalescent Model Morris,
Whittaker and Balding (2001,,2003,2004..
P(x,h,W,T,z,N,rA,U) L(A,Ux,h,W,T,z,N)
p(W,T,zr) p(r) p(r) 2r, p(W,T,zr) prior
distribution of genealogies (coalescent like) x
Location of disease locus h Population
marker-haplotype proportions W branch lengths of
genealogical tree T topology (branching
pattern) Z Parental-status N effective population
size r shattering parameter A, U cases, controls
Probability of Haplotypes associated Mutant
At recombination markers are incorporated from
the population distribution.
25
Morris et al The Shattered Coalescent
Advantages Allows for multiple origins of the
disease mutant sporadic occurrences of the
disease without the mutation
Coalescent tree
Morris, Whittaker Balding,2002
26
Monte-Carlo (Metropolis) sampling and
integration Metropolis et al.(1953)
  • Evaluate the function in the current point p,
    f(p)x
  • Suggest a new point, p'
  • Evaluate the function in this point f(p') y
  • If x lt y, go to point p'
  • If x gt y, go to point p' with the probability y/x

Due to Jesper Nymann
27
Monte-Carlo (Metropolis)
Projection on one axis equivalent to integration
over the remaining parameters
1
2!
1
2
3
1
Due to Jesper Nymann
28
Example 1 - Cystic fibrosis
11
19
Morris et al. (2002).
Due to Jesper Nymann
29
Example 2 - BRCA2
Iceland Genomics Corporation
1132 Cases, 54 with known mutation
758 Controls
Due to Jesper Nymann
30
Example 2 - BRCA2 continued
True Location
1
3
5
7
9
11
13
15
1
3
5
7
9
11
13
15
Multipoint calculation for the full BRCA2 dataset
Multipoint calculation where the 54 known
mutation cases has been removed.
Due to Jesper Nymann
31
The Basic Setup
Simulation Parameters Recombination rate
50 Number of leaf nodes 1000 Number of
markers 10 Diseased haplotype fraction 0.08
0.12 No Heterogeneity Simulated under the
asumption of constant population size
Diplotypes (phase known)
Type of simulation
50 quantile Basic
(red curve) 0.044
Due to Jesper Nymann
32
The effect of marker density
Type of simulation
50 quantile 19
markers (blue curve) 0.0292 19 markers and
recombination rate 100 (yellow
curve) 0.02321 Basic (red curve) 0.044
Due to Jesper Nymann
33
The effect of knowing phase
Due to Jesper Nymann
34
The Effect of knowing gene genealogy
Type of simulation 50
quantile With known genealogy (blue
curve) 0.03516 Basic (red curve) 0.044
Due to Jesper Nymann
35
The effect of disease fraction
Type of simulation
50 quantile Disease fraction 12 - 14
(blue curve) 0.0353 Disease fraction 18 - 22
(yellow curve) 0.03229 Basic (red
curve) 0.044
Due to Jesper Nymann
36
The effect of Heterogeneity
Type of simulation 50
quantile With Heterogeneity (blue
curve) 0.065587 Basic (red curve) 0.044
Due to Jesper Nymann
37
The effect of Impurity of cases and controls
Cases
Controls
33 cases are moved to the controls and a
similar number of controls are moved to the cases
Type of simulation 50
quantile With mixed cases/controls (blue
curve) 0.1518 Basic (red curve) 0.044
Due to Jesper Nymann
38
LD in background population
Gene Pool
Type of simulation 50
quantile LD in background (blue
curve) 0.0419 Basic (red curve) 0.044
Due to Jesper Nymann
39
Comparing the different scenarios
Due to Jesper Nymann
40
Summary
The Fundamental Problem The Data Genotype to
Phenotype Functions Types of Mapping Population
Set-up Measures of Dependency Methods Pure
Coalescent Based The Shattered
Coalescent Factors influencing mapping error.
41
Articles I
  • M. A. Beaumont and B. Rannala (2004) The
    Bayesian Revolution in genetics, Nature Reviews,
    Genetics vol. 5. 251
  • Botstein D, Risch N. (2003) Discovering genotypes
    underlying human phenotypes past successes for
    mendelian disease, future approaches for complex
    disease. Nat Genet. 33 Suppl228-237. Cardon, L.
    and J. Bell (2001) Association Study Designs for
    Complex Diseases Nature Review Genetics
  • Daly, M. J., Rioux, J. D., Schaner, S. F.,
    Hudson, T. J. Lander, E. S. (2001),
    High-resolution haplotype structure in the human
    genome, Nat Genet 29(2), 229-232.
  • Devlin, B. Roeder, K. (1999), Genomic control
    for association studies, Biometrics 55(4),
    997-1004.
  • Frisse, L et al.(2001) Gene Conversion and
    Different Population Histories May Explain the
    Contrast between Polymorphisms and LD Levels.
    AJHG 69..?-?
  • Gabriel, S. B. et al. (2002), The structure of
    haplotype blocks in the human genome, Science
    296(5576), 2225-2229.
  • Griffiths,R S. Tavare (1994) Simiulating
    probability distributions in the coalescent
    Theor.Pop.Biol. 46.2.131-159
  • Griifiths, R. and P. Marjoram (1996) Ancestral
    inference from samples of DNA sequences with
    recombination J.Compu.Biol.
  • Hudson, R. R. (1990).Gene genealogies and the
    coalescent process, Oxford Surveys in
    Evolutionary Biology (D. futuyma and J.
    Antonovics, Eds.) Vol 7, pp. 1-44, Oxford Univ.
    Press, Oxford, UK
  • B. Kerem, J. M. Rommens, J. A. Buchanan D.
    Markiewicz, T. K. Cox, A. Chakravarti, M.
    Buchwald and L. C. Tsui Identification of the
    Cystic Fibrosis Gene Genetic Analysis Science
    245 1073-1080, 1989
  • Kong A, et al. (2002) A high-resolution
    recombination map of the human genome. Nat Genet.
    31,241-7.
  • Laitinen et al. (2004) Characterization of a
    common susceptibility locus for Asthma-related
    traits. Nature 304, 300-304.
  • Martin, E. R., et al. (2000), SNPing away at
    complex diseases analysis of single-nucleotide
    polymorphisms around APOE in Alzheimer disease,
    Am J Hum Genet 67, 383-394.
  • Larribe, M, S. Lessard and Schork (2002) Gene
    Mapping via the Ancestral Recombination Graph.
    Theor. Pop.Biol. 62.215-229.
  • Liu,J. et al.(2000) Bayesian Analysis of
    Haplotypes for Linkage Disequilibrium Mapping
    Genome Research 11.1716-24.
  • Martin, E. et al.(2001) SNPing Away at Complex
    Diseases Analysis of Single-Nucleotide
    Polymorphisms around APOE Alzheimer Disease
    AJHG 67.838-394.
  • N Metropolis N AW Rosenbluth, MN Rosenbluth, AH
    Teller, E Teller (1953) Equation of state
    calculation by fast computer machines, J. Chem.
    Phys. 211087-1092
  • McVean,G.(2002) A Genealogical Interpretation of
    Linkage Disequilibrium Genetics 162.987-991
  • Morris, A., JC Whittaker and D. Balding
    Fine-Scale Mapping of Disease Loci via Shattered
    Coalescent Modeling of Genealogies AJHG
    70.686-707.

42
Articles II
McVean GA, Myers SR, Hunt S, Deloukas P, Bentley
DR, Donnelly P. (2004) The fine-scale structure
of recombination rate variation in the human
genome. Science 304581-584. Patil, N. et al.
(2001) Blocks of limited haplotype diversity
revealed by high-resolution scanning of human
chromosome 21. Science 294 1719-1723. Reich, D.
E. et al. (2001), Linkage disequilibrium in the
human genome, Nature 411(6834), 199-204. Reich D.
E. and Lander, E. On the allelic spectrum of
human diseases. Trends in Genetics 19,
502-510. Reich, D. E. et al. (2002), Human genome
sequence variation and the influence of gene
history, mutation and recombination, Nat Genet
32(1), 135-142. Risch, N. and Merikangas, K.
(1996) The future of genetic studies of complex
human diseases. Science 273, 15161-1517. Pritchard
, J. K., Stephens, M., Rosenberg, N. A.
Donnelly, P. (2000), Association mapping in
structured populations, Am J Hum Genet 67(1),
170-181. Stefansson, H. et al. (2003),
Association of neuregulin 1 with schizophrenia
confirmed in a Scottish population, Am J Hum
Genet 72(1), 83-87. Stephens JC et al. (2001)
Haplotype variation and linkage disequilibrium in
313 human genes. Science.293(5529)489-93.
Strachan, T. Read, A. P. (2003) Human
Molecular Genetics 3, BIOS Scientific Publishers
Ltd, Wiley, New York. Spielman R S and W J Ewens
(1996) The TDT and other family-basedtests for
linkage disquilibrium and association. Am. J.
Hum. Gen. 59983-989 The International HapMap
Consortium (2003) The International HapMap
Project. Nature 426, 789-795. Weiss, KM and
Clark, AG (2002) Linkage disequilibrium and the
mapping of complex human traits. Trends in
Genetics 1819-24. Pritchard, J and M. Przeworski
(2000) Linkage Disequilibrium in Humans Models
and Data AJHG 69.1-14. Pritchard, JK et
al.(2000) Association Mapping in Structured
Populations Am.J.Hum.Genet. 67.170-181
. Pritchard and Cox (2002) The allelic
architecture of human disease genes common
disease-common variant or not Human Molecular
Genetics 11.20.2417-2Rannala, B and JP Reeve
(2001) High-Resolution Multipoint
Linkage-Disequilibrium Mapping in the Context of
a Human Genome Sequence AMJHG 69.159-178. R S
Spielman and W J Ewens (1996) The TDT and other
family-basedtests for linkage disquilibrium and
association. Am. J. Hum. Gen. 59983-989 Tabor,
Risch and Myers (2002) Candidate-gene approaches
for studying complex genetic traits practical
considerations Nature Reviews Genetics
3.May.1-7 Terwilliger,JD et al(2002) A bias-ed
assessement of the use of SNPs in human complex
traits. Curr.Opin. Genetics Development
12.726-34 Weiss,K and Terwilliger, J (2000) How
many diseases does it take to map a disease with
SNPs Nature Genetics vol. 26 Oct.
43
Books Www-sites
Books Encyclopedia of the Human Genome (2003)
Nature Publishing Group Liu, . J(2001) Monte
Carlo Strategies in Scientific Computation
Springer Verlag Ott, J.(1999) Analysis of Human
Genetic Linkage 3rd edition Publisher John
Hopkins Strachan Read (2004) Human Molecular
Genetics III Publisher Biosciences
Weiss,K.(1993) Genetic Variation and Human
Disease Cambridge University Press. Web-sites ww
w.stats.ox.ac.uk/mcvean Jeff Reeve and Bruce
Rannala A multipoint linkage disequilibrium
disease mapping program (DMLE) that allows
genotype data to be used directly and allows
estimation of allele ages. http//dmle.org/ Liu,
J.S., Sabatti, C., Teng, J., Keats, B.J.B. and
N. Risch (Version upgraded by Xin Lu,
June/9/2002) This is the software for the
Bayesian haplotype analysis method developed by
Liu, J.S., Sabatti, C., Teng, J., Keats, B.J.B.
and N. Risch in article Bayesian Analysis of
Haplogypes for Linkage Disequilibrium Mapping.
Genome Research 111716, 2001 http//www.people.fa
s.harvard.edu/junliu/TechRept/03folder/bladev2.ta
r J. N. Madsen, M.H. Schierup, C. Storm, and L.
Schauser, T. Mailund CoaSim is a tool for
simulating the coalescent process with
recombination and geneconversion under the
assumption of exponential population
growth http//www.birc.dk/Software/CoaSim/
Write a Comment
User Comments (0)
About PowerShow.com