Inference on population structure using multilocus genotype data STRUCTURE V2'1 Pritchard, J'K', and - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Inference on population structure using multilocus genotype data STRUCTURE V2'1 Pritchard, J'K', and

Description:

JFD-CIMMYT-UDELAR-Uruguay. Inference on population structure using multi ... is a measure of the discriminative power between the probability distributions ... – PowerPoint PPT presentation

Number of Views:142
Avg rating:3.0/5.0
Slides: 38
Provided by: jfr73
Category:

less

Transcript and Presenter's Notes

Title: Inference on population structure using multilocus genotype data STRUCTURE V2'1 Pritchard, J'K', and


1
Inference on population structure using
multi-locus genotype dataSTRUCTURE V2.1
Pritchard, J.K., and Wen, W. (2004)
2
Model based cluster analysis
  • We assume some statistical distribution on each
    individual

3
The mixture of normal distributions model (a
very simple case)
4
N(0.098,0.67) y N(5.098,1.69)
o o o o o o o
o o ooo o oo o
o
5
the f.d.p are
6
or the mixture (each individual follows the
distribution)
7
f(y) 0.7 N(0.098, 0.67) 0.3 N(5.098, 1.69)
o o o o o o o
o o ooo o oo o o
8
(No Transcript)
9
Membership probability
  • Pyi ? ?1 Pyi ? ?2 1
  • i 1,2,,n (individuals)
  • if Pyi ? ?1 gt Pyi ? ?2 then yi ? ?1

10
Two variables X1, X2
11
Mixture of three Normal bivariate
12
with f.d.p
13
Where
14
Inference on population structure using
multi-locus genotype dataSTRUCTURE V2.1
Pritchard, J.K., and Wen, W. (2004)
  • Pritchard, Stephens, and Donnelly (2000)
  • Falush, Stephens, and Pritchard (2003)

15
Main objective
  • Assign individuals to populations on the bases of
    their genotypes, while simultaneously estimating
    population allele frequencies

16
Other objectives
  • Begin with a set of predefined populations and to
    classify individuals of unknown origin
  • Identify the extent of admixture of individuals
  • Infer the origin of particular loci in the
    sampled individuals

17
Structure is a Model Based method of clustering
  • (we must be assumptions about a lot of parameters
    and distributions)

18
Four basic models
  • Model without admixture
  • each individual is assumed to originate in one
    (only one) of K populations
  • Model with admixture
  • each individual is assumed to have inherited some
    proportion of its ancestry from each of K
    populations

19
Four basic models
  • Linkage model
  • Chunks of chromosomes as derived as intact
    units from one or another K population and all
    allele copies on the same chunk derive from the
    same population.
  • The model consider the derived correlations in
    ancestry

20
Four basic models
  • F model
  • The populations all diverged from a common
    ancestral population at the same time, but allows
    that the populations may have experienced
    different amounts of drift since the divergence
    event

21
Assumptions
  • Our main modeling assumptions are
    Hardy-Weinberg equilibrium within populations and
    complete linkage equilibrium between loci within
    populations
  • Loosely speaking, the idea here is that the
    model accounts for the presence oh HWD or LD by
    introducing population structure and attempts to
    find populations groupings that (as far as
    possible) are not in disequilibrium

22
Data
  • Consider a sample of N individuals each one
    genotyped at L loci
  • Assume that the individuals represent a mixture
    of K unobserved populations (K unknown)
  • If diploid, we have an N2L data matrix X
  • If n-ploid X is N
  • where Jl is the number of alleles at the lth
    locus

23
l 1 l l l
L j1 j2 j1 j2 j1
j2
X is N2L
24
Example
25
Model without admixture
  • each individual is assumed to originate in one
    (only one) of K populations

26
P-matrix (allele frequencies by population)
l 1 l l l
L j1 j2 j1 j2 j1
j2
pklj is the frequency of the jth allele, at the
lth loci, at the kth population k1,2,,K
l1,2,,L j1,2 (diploid)
27
z-vector (membership of the ith individual to
kth population)
  • If the ith individual is a member of the kth
    population then z(i) k
  • P(z(i) k) is the membership
  • probability

28
Model with admixture
  • each individual is assumed to have inherited some
    proportion of its ancestry from each of K
    populations

29
P-matrix is equal to the above model
Q-matrix (proportion of the genome of
the ith individual inherited from the kth
population)
i1,2,,N k1,2,,K
30
Z-matrix
zl(i,j) is equal to k if the jth allele at the
lth loci at the ith individual was originated
from the kth population k1,2,,K l1,2,,L
j1,2 (diploid)
31
F Model
  • The populations all diverged from a common
    ancestral population at the same time, but allows
    that the populations may have experienced
    different amounts of drift since the divergence
    event

32
Ancestral population (diploid)
Conditional on PA
33
F-model
Fk is the drift rate of the kth population, and
it is associated to the Wrights Fst
Fk
pklj pAlj
34
Interpreting FST
  • Can range from 0 (no genetic differentiation) to
    1 (fixation of alternative alleles).
  • Wrights Guidelines
  • 0 - 0.05, little differentiation
  • 0.05 0.15, moderate
  • 0.15 0.25, great
  • gt 0.25, very great

35
The Dirichlet distribution
  • The probability density of the Dirichlet
    distribution for variables
    p (p1, p2,, pn)
  • with parameters u (u1,u2,,un)
  • is defined by
  • The parameters ui can be interpreted as prior
    observation counts''

36
(No Transcript)
37
Kullback-Leiber
  • The Kullback-Leiber divergence is a non-negative
    value and equals 0 only when the two
    distributions are identical. The Kullback-Leiber
    divergence is a measure of the discriminative
    power between the probability distributions of
    the two classes
Write a Comment
User Comments (0)
About PowerShow.com