Inferring human demographic history from DNA sequence data presentation

About This Presentation

Transcript and Presenter's Notes

Title: Inferring human demographic history from DNA sequence data

1
Inferring human demographic history from DNA
sequence data

Apr. 28, 2009
J. Wall
Institute for Human Genetics, UCSF

2
Standard model of human evolution
3
Standard model of human evolution(Origin and
spread of genus Homo)
2 2.5 Mya
4
Standard model of human evolution(Origin and
spread of genus Homo)
?
?
1.6 1.8 Mya
5
Standard model of human evolution(Origin and
spread of genus Homo)
0.8 1.0 Mya
6
Standard model of human evolutionOrigin and
spread of modern humans
150 200 Kya
7
Standard model of human evolutionOrigin and
spread of modern humans
100 Kya
8
Standard model of human evolutionOrigin and
spread of modern humans
40 60 Kya
9
Standard model of human evolutionOrigin and
spread of modern humans
15 30 Kya
10
Estimating demographic parameters

How can we quantify this qualitative scenario
into an explicit model?
How can we choose a model that is both
biologically feasible as well as computationally
tractable?
How do we estimate parameters and quantify
uncertainty in parameter estimates?

11
Estimating demographic parameters

Calculating full likelihoods (under realistic
models including recombination) is
computationally infeasible
So, compromises need to be made if one is
interested in parameter estimation

12
African populations
10 populations 229 individuals
13
African populations
Mandenka (bantu)
61 autosomal loci 350 Kb sequence data
Biaka (pygmies)
San (bushmen)
14
A simple model of African population history
T
g1
m
g2
Biaka (or San)
Mandenka
15
Estimation method

We use a composite-likelihood method (cf. Plagnol
and Wall 2006) that uses information from the
joint frequency spectrum such as
Numbers of segregating sites
Numbers of shared and fixed differences
Tajimas D
FST
Fu and Lis D

16
Estimation method

We use a composite-likelihood method (cf. Plagnol
and Wall 2006) that uses information from the
joint frequency spectrum such as
Numbers of segregating sites
Numbers of shared and fixed differences
Tajimas D
FST
Fu and Lis D

17
Estimating likelihoods
Pop1 Pop2
18
Estimating likelihoods
Pop 1 private polymorphisms
Pop1 Pop2
19
Estimating likelihoods
Pop 1 private polymorphisms Pop 2 private
polymorphisms
Pop1 Pop2
20
Estimating likelihoods
Pop 1 private polymorphisms Pop 2 private
polymorphisms Shared polymorphisms
Pop1 Pop2
21
Estimation method

We use a composite-likelihood method (cf. Plagnol
and Wall 2006) that uses information from the
joint frequency spectrum such as
Numbers of segregating sites
Numbers of shared and fixed differences
Tajimas D
FST
Fu and Lis D

22
Estimating likelihoods

We assume these other statistics are multivariate
normal.
Then, we run simulations to estimate the means
and the covariance matrix.
This accounts (in a crude way) for dependencies
across different summary statistics.

23
Composite likelihood

We form a composite likelihood by assuming these
two classes of summary statistics are independent
from each other
We estimate the (composite)-likelihood over a
grid of values of g1, g2, T and M and tabulate
the MLE.
We also use standard asymptotic assumptions to
estimate confidence intervals

24
Estimates (with 95 CIs)

Parameter Man-Bia Man-San
g1 (000s) 0 (0 3.8) 0 (0 3.8)
g2 (000s) 4 (0 7.9) 2 (0 11)
T (000s) 450 (300 640) 100 (77 550)
M ( 4Nm) 10 (8.4 12) 3 (2.2 4)

25
Fit of the null model

How well does the demographic null model fit the
patterns of genetic variation found in the actual
data?

26
Fit of the null model

How well does the demographic null model fit the
patterns of genetic variation found in the actual
data?
Quite well. The model accurately reproduces both
parameters used in the original fitting (e.g.,
Tajimas D in each population) as well as other
aspects of the data (e.g., estimates of ? 4Nr)

27
Estimates (with 95 CIs)

Parameter Man-Bia Man-San
g1 (000s) 0 (0 3.8) 0 (0 3.8)
g2 (000s) 4 (0 7.9) 2 (0 11)
T (000s) 450 (300 640) 100 (77 550)
M ( 4Nm) 10 (8.4 12) 3 (2.2 4)

28
Population growth
population size
time
29
Population growth
population size
time
spread of agriculture and animal husbandry?
30
Estimates (with 95 CIs)

Parameter Man-Bia Man-San
g1 (000s) 0 (0 3.8) 0 (0 3.8)
g2 (000s) 4 (0 7.9) 2 (0 11)
T (000s) 450 (300 640) 100 (77 550)
M ( 4Nm) 10 (8.4 12) 3 (2.2 4)

31
Ancestral structure in Africa

At face value, these results suggest that
population structure within Africa is old, and
predates the migration of modern humans out of
Africa.
Is there any evidence for additional (unknown)
ancient population structure within Africa?

32
Model of ancestral structure
Archaic human population
T
g1
m
g2
Biaka (or San)
Mandenka
33
Standard model of human evolutionOrigin and
spread of modern humans
100 Kya
34
Admixture mapping
Modern human DNA
Neandertal DNA
35
Admixture mapping
Modern human DNA
Neandertal DNA
36
Admixture mapping
Modern human DNA
Neandertal DNA
37
Admixture mapping
Modern human DNA
Neandertal DNA
38
Admixture mapping
Modern human DNA
Neandertal DNA
Orange chunks are 10 100 Kb in length
39
Genealogy with archaic ancestry
time
Modern humans
Archaic humans
present
40
Genealogy without archaic ancestry
time
Modern humans
Archaic humans
present
41
Our main questions

What pattern does archaic ancestry produce in DNA
sequence polymorphism data (from extant humans)?
How can we use data to
estimate the contribution of archaic humans to
the modern gene pool (c)?
test whether c gt 0?

42
Genealogy with archaic ancestry(Mutations added)
time
Modern humans
Archaic humans
present
43
Genealogy with archaic ancestry(Mutations added)
time
Modern humans
Archaic humans
present
44
Patterns in DNA sequence data

Sequence 1 A T C C A C A G C T G
Sequence 2 A G C C A C G G C T G
Sequence 3 T G C G G T A A C C T
Sequence 4 A G C C A C A G C T G
Sequence 5 T G T G G T A A C C T
Sequence 6 A G C C A T A G A T G
Sequence 7 A G C C A T A G A T G

45
Patterns in DNA sequence data

Sequence 1 A T C C A C A G C T G
Sequence 2 A G C C A C G G C T G
Sequence 3 T G C G G T A A C C T
Sequence 4 A G C C A C A G C T G
Sequence 5 T G T G G T A A C C T
Sequence 6 A G C C A T A G A T G
Sequence 7 A G C C A T A G A T G

46
Patterns in DNA sequence data

Sequence 1 A T C C A C A G C T G
Sequence 2 A G C C A C G G C T G
Sequence 3 T G C G G T A A C C T
Sequence 4 A G C C A C A G C T G
Sequence 5 T G T G G T A A C C T
Sequence 6 A G C C A T A G A T G
Sequence 7 A G C C A T A G A T G

We call the sites in red congruent sites these
are sites inferred to be on the same branch of an
unrooted tree
47
Linkage disequilibrium (LD)

LD is the nonrandom association of alleles at
different sites.
Low LD A C High LD A C
A T A C
A C A C
A T A C
G C G T
G T G T
G C G T
G T G T

High recombination Low recombination
48
Measuring congruence

To measure the level of congruence in SNP data
from
larger regions we define a score function
S
where S (i1, . . . ik)
and S (ij, ij1) is a function of both congruence
(or near
congruence) and physical distance between ij and
ij1.

49
An example
50
An example (CHRNA4)
51
An example (CHRNA4)
How often is S from simulations greater than or
equal to the S value from the actual data?
52
An example (CHRNA4)
How often is S from simulations greater than or
equal to the S value from the actual data? p
0.025
53
S is sensitive to ancient admixture
54
General approach

We use the model parameters estimated before
(growth rates, migration rate, split time) as a
demographic null model.
Is our null model sufficient to explain the
patterns of LD in the data?
We test this by comparing the observed S values
with the distribution of S values calculated
from data simulated under the null model.

55
Distribution of p-values(Mandenka and San)
frequency
p-value
56
Distribution of p-values(Mandenka and San)
frequency
p-value
Global p-value 2.5 10-5
57
Estimating ancient admixture rates
The global p-values for S are highly significant
in every population that weve studied! If we
estimate the ancient admixture rate in our
(composite)-likelihood framework, we can exclude
no ancient admixture for all populations
studied.
58
A region on chromosome 4
59
A region on chromosome 4
19 mutations (from 6 Kb of sequence) separate 3
Biaka sequences from all of the other sequences
in our sample. Simulations suggest this cannot
be caused by recent population structure (p lt
10-3) This corresponds to isolation lasting 1.5
million years!
60
Possible explanations

Isolation followed by later mixing is a recurrent
feature of human population history
Mixing between archaic humans and modern humans
happened at least once prior to the exodus of
modern humans out of Africa
Some other feature of population structure is
unaccounted for in our simple models

61
Acknowledgments

Collaborators
Mike Hammer (U. of Arizona)
Vincent Plagnol (Cambridge University)
Samples
Foundation Jean Dausset (CEPH)
Y chromosome consortium (YCC)
Funding
National Science Foundation
National Institutes for Health

Write a Comment

User Comments (0)

About PowerShow.com

Inferring human demographic history from DNA sequence data PowerPoint PPT Presentation