Title: mStruct: A New Admixture Model for Inference of Population Structure in Light of Both Genetic Admixing and Allele Mutations
1mStructA New Admixture Model for Inference of
Population Structure in Light of Both Genetic
Admixing and Allele Mutations
- Suyash Shringarpure and Eric Xing
- School of Computer Science
- Carnegie Mellon University
- ICML 2008
Presented by Haojun Chen
2Outline
- Background
- Structure Model
- mStruct Model
- Experiment Results
- Summary
3Background
- Allele one member of a pair or series of
different forms of a gene - Population structure analysis aim to shed light
on evolutionary history of modern human
population - Microsatellites and single nucleotide
polymorphisms (SNP) data base of population
structure analysis - State-of-the-art method Structure
4Structure Model
x Microsatellite alleles
unique set of population-specific
multinomial distributions
vector of multinomial parameters,
a.k.a., allele frequency profile (AP), of the
allele distribution at locus i in ancestral
population k total number of observed
marker alleles at locus I total
number of marker loci total number of
individuals individual-specific admixing
coefficient vector
5Pitfall of Structure
- There is no mutation model for modern individual
alleles with respect to common prototypes in the
modern populations - Every unique allele in the modern population is
assumed to have a distinct ancestral frequency,
rather than allowing the possibility of it just
being a descendent of some common ancestral allele
6mStruct Model
set of
ancestral alleles mutation parameter
associated with locus frequencies of the
ancestral alleles total number of ancestral
alleles
Microsatellite mutation model SNP mutation model
7Generative Process
- Generative process for Structure
-
- where
- Generative process for mStruct
- step 2.2 above is replaced by
8mStruct Model Inference
- MCMC slow
- Variational inference for hidden variable
- variational EM for hyperparameter
9Synthetic Data
Twenty microsatellite genotype datasets with 100
individuals from 3 ancestral populations at 50
genotype loci
10HGDP Microsatellite Data
- Model selection by BIC (Bayesian Information
Criterion) score
11HGDP Microsatellite Data
1056 individuals from 52 populations at 377
autosomal microsatellite loci
am-spectrum spectrums of different ancestral
populations gm-spectrum spectrums of
different geographical populations
12Contour of Mutation Rates
13Summary
- mStruct takes into account genetic admixture and
allele mutation effects - mStruct extended LDA which allows noisy
observations - Variational inference algorithm that allows
tractable inference developed for mStruct - Other application images, text and so on