Inference on population structure using multilocus genotype data STRUCTURE V2'1 Pritchard, J'K', and

About This Presentation

Title:

Inference on population structure using multilocus genotype data STRUCTURE V2'1 Pritchard, J'K', and

Description:

JFD-CIMMYT-UDELAR-Uruguay. Inference on population structure using multi ... is a measure of the discriminative power between the probability distributions ... – PowerPoint PPT presentation

Number of Views:142

Avg rating:3.0/5.0

Slides: 38

Provided by: jfr73

Category:

more less

Transcript and Presenter's Notes

Title: Inference on population structure using multilocus genotype data STRUCTURE V2'1 Pritchard, J'K', and

1
Inference on population structure using
multi-locus genotype dataSTRUCTURE V2.1
Pritchard, J.K., and Wen, W. (2004)
2
Model based cluster analysis

We assume some statistical distribution on each
individual

3
The mixture of normal distributions model (a
very simple case)
4
N(0.098,0.67) y N(5.098,1.69)
o o o o o o o
o o ooo o oo o
o
5
the f.d.p are
6
or the mixture (each individual follows the
distribution)
7
f(y) 0.7 N(0.098, 0.67) 0.3 N(5.098, 1.69)
o o o o o o o
o o ooo o oo o o
8
(No Transcript)
9
Membership probability

Pyi ? ?1 Pyi ? ?2 1
i 1,2,,n (individuals)
if Pyi ? ?1 gt Pyi ? ?2 then yi ? ?1

10
Two variables X1, X2
11
Mixture of three Normal bivariate
12
with f.d.p
13
Where
14
Inference on population structure using
multi-locus genotype dataSTRUCTURE V2.1
Pritchard, J.K., and Wen, W. (2004)

Pritchard, Stephens, and Donnelly (2000)
Falush, Stephens, and Pritchard (2003)

15
Main objective

Assign individuals to populations on the bases of
their genotypes, while simultaneously estimating
population allele frequencies

16
Other objectives

Begin with a set of predefined populations and to
classify individuals of unknown origin
Identify the extent of admixture of individuals
Infer the origin of particular loci in the
sampled individuals

17
Structure is a Model Based method of clustering

(we must be assumptions about a lot of parameters
and distributions)

18
Four basic models

Model without admixture
each individual is assumed to originate in one
(only one) of K populations
Model with admixture
each individual is assumed to have inherited some
proportion of its ancestry from each of K
populations

19
Four basic models

Linkage model
Chunks of chromosomes as derived as intact
units from one or another K population and all
allele copies on the same chunk derive from the
same population.
The model consider the derived correlations in
ancestry

20
Four basic models

F model
The populations all diverged from a common
ancestral population at the same time, but allows
that the populations may have experienced
different amounts of drift since the divergence
event

21
Assumptions

Our main modeling assumptions are
Hardy-Weinberg equilibrium within populations and
complete linkage equilibrium between loci within
populations
Loosely speaking, the idea here is that the
model accounts for the presence oh HWD or LD by
introducing population structure and attempts to
find populations groupings that (as far as
possible) are not in disequilibrium

22
Data

Consider a sample of N individuals each one
genotyped at L loci
Assume that the individuals represent a mixture
of K unobserved populations (K unknown)
If diploid, we have an N2L data matrix X
If n-ploid X is N
where Jl is the number of alleles at the lth
locus

23
l 1 l l l
L j1 j2 j1 j2 j1
j2
X is N2L
24
Example
25
Model without admixture

each individual is assumed to originate in one
(only one) of K populations

26
P-matrix (allele frequencies by population)
l 1 l l l
L j1 j2 j1 j2 j1
j2
pklj is the frequency of the jth allele, at the
lth loci, at the kth population k1,2,,K
l1,2,,L j1,2 (diploid)
27
z-vector (membership of the ith individual to
kth population)

If the ith individual is a member of the kth
population then z(i) k

P(z(i) k) is the membership
probability

28
Model with admixture

each individual is assumed to have inherited some
proportion of its ancestry from each of K
populations

29
P-matrix is equal to the above model
Q-matrix (proportion of the genome of
the ith individual inherited from the kth
population)
i1,2,,N k1,2,,K
30
Z-matrix
zl(i,j) is equal to k if the jth allele at the
lth loci at the ith individual was originated
from the kth population k1,2,,K l1,2,,L
j1,2 (diploid)
31
F Model

The populations all diverged from a common
ancestral population at the same time, but allows
that the populations may have experienced
different amounts of drift since the divergence
event

32
Ancestral population (diploid)
Conditional on PA
33
F-model
Fk is the drift rate of the kth population, and
it is associated to the Wrights Fst
Fk
pklj pAlj
34
Interpreting FST

Can range from 0 (no genetic differentiation) to
1 (fixation of alternative alleles).
Wrights Guidelines
0 - 0.05, little differentiation
0.05 0.15, moderate
0.15 0.25, great
gt 0.25, very great

35
The Dirichlet distribution

The probability density of the Dirichlet
distribution for variables
p (p1, p2,, pn)
with parameters u (u1,u2,,un)
is defined by
The parameters ui can be interpreted as prior
observation counts''

36
(No Transcript)
37
Kullback-Leiber

The Kullback-Leiber divergence is a non-negative
value and equals 0 only when the two
distributions are identical. The Kullback-Leiber
divergence is a measure of the discriminative
power between the probability distributions of
the two classes

Write a Comment

User Comments (0)

About PowerShow.com

Inference on population structure using multilocus genotype data STRUCTURE V2'1 Pritchard, J'K', and - PowerPoint PPT Presentation

Inference on population structure using multilocus genotype data STRUCTURE V2'1 Pritchard, J'K', and

JFD-CIMMYT-UDELAR-Uruguay. Inference on population structure using multi ... is a measure of the discriminative power between the probability distributions ... – PowerPoint PPT presentation