Title: An introduction to genetic analysis softwares- genotype date analysis
1An introduction to genetic analysis
softwares-genotype date analysis
2Content of population genetic
- genetic diversity
- genetic distance
- F- statistics
- population structure
- detection of new immigrants
- Population size
3Softwares for population genetic
- Multi-purpose packages
- Arlequin DnaSP FSTAT Genepop GDA MEGA MSA
Arlequin, SPAGeDi GENETIX popgene GeneStrut
TFPGAgenalex - Individual-centred programs
- BayesAss BAPS GeneClass Geneland
NewHybrids Structure - Specialized programs
- BATWING COLONISE FDIST2 Hickory IM LAMARC
Migrate MSVAR bottleneck - Tree construction programs
- Phylip PAUP Dispan
- AFLP specifical
- AFLPOP AFLPdat AFLPsure
4How to get and choose the softwares
- Related research papers (methods)
- Software reviews
- eg. Laurent Excoffier and Gerald Heckel 2006
NATURE REVIEWS, GENETICS 7 745-758 - Computer programs for population genetics data
analysis a survival guide
5(No Transcript)
6(No Transcript)
7(No Transcript)
8- http//evolution.genetics.washington.edu/phylip/so
ftware.html
9(No Transcript)
10(No Transcript)
11Content of data test
- Test for loci
- Null allele (SSR)
- linkage disequilibrium
- selective neutrality tests
- Test population
- Hardy-Weinberg equilibrium
12Null allele (SSR)
- null allele frequencies estimated methods
- Chakraborty Chakraborty et al 1992 and
Brookfield Brookfield 1996. - Software
- Micro-Checker Genepop
13linkage disequilibrium
- Exact tests for linkage disequilibrium (Guo and
Thompson, 1992) - GDA Genepop FSTAT
- likelihood ratio test (genotypic data) (Slatkin
and Excoffier, 1996) - Arlequin Genepop
- Popgenec2 tests for significance (Weir 1979)
- D-Statistics (Ohtas 1982) for Multiple
Populations.
14selective neutrality tests
- For based on an infinite-alleles model
- Ewens-Watterson (Ewens, 1972 Watterson, 1978)
- popgene Arlequin
- Ewens-Watterson-Slatkin (Slatkin 1994b, 1996)
- Arlequin
- Chakra- bortys (Chakraborty, 1990) test.
- Arlequin
- FDIS2 can also test, but which model?
15Hardy-Weinberg equilibrium
- Chi-square tests
- popgene
- G-tests Levenes (1949)
- popgene
- exact tests (Guo and Thompson, 1992)
- genepop GDA Arlequin FSTAT
- Genepop heterozygote excess or deciency
- (Rousset Raymond, 1995).
- exact tests generally more conservative than
traditional Chi-square tests and G-tests
16Content of genetic data analysis
- genetic diversity
- genetic distance
- genetic distance tree isolation by distance
- F- statistics
- population differentiation migration rate
- population structure
- AMOVA analysis clustering patterns
- immigrants
- detection of new immigrants hybridization
17genetic diversity
- Percentage of polymorphic loci
- Popgene Arlequin GDA
- Expected and Observed heterozygosity
- almost all Multi-purpose packages
- Number of alleles
- Popgene MSA FSTAT GDA SPAGeDi
- Allele and/or genotype frequencies
- Popgene Genepop MSA FSTAT SPAGeDi
- Allelic richness (FSTAT MSA)
- Private allele (GDA)
18genetic distances and F-statistics
19genetic distances
- Standard genetic distance Nei, 1978
the probability of identity of two randomly
chosen genes in population X
the probability of identity of two randomly
chosen genes in population Y
the probability of identity of a gene from X and
a genen from Y
JX, JY and JXY are the arithmetic means of jX, jY
and jXY over all loci
20genetic distances
Come from Dc Cavalli-Sforca and Edwards, 1967
xi the frequency of the ith allele in one
population yi the frequency of the ith allele in
the other population m number of allele for each
locus r number of locus
21genetic distances
- (?m)2, Ddm Goldstein et al. 1995a
- D1, ASD, Average Square Goldstein et al., 1995b
Slatkin, 1995 - Dps, Proportion of shared alleles Bowcock et
al., 1994 - Dfs, Fuzzy set similarity Dubois and Prade,
1980 - Dkf, Kinship coefficient Cavalli-Sforza and
Bodmer, 1971 - Dad, absolute difference
- coancestry (Reynolds et al., 1983)
22F-statistics
- ?, f, F Weir BS, Cockerham CC (1984) Evolution,
38, 13581370. - RST,RIT, RIS Slatkin, M. 1995. Genetics 139
457-462. - NST Pons O, Petit RJ (1996) Genetics 144,
1237-1245. - ?ST Rousset, F., 1996. Genetics 142 13571362.
23Distance-based phylogenetic tree
Genetic distance matrix
- UPGMA (unweighted pair group method with
arithmetic averages) - NJ (the neighbor-joining method)
- Phylip PAUP Dispan
24Genetic distance tree
UPGMA tree
(Muir, Fleming and Schlotterer 2000, Nature)
25Genetic distance tree
NJ tree Dendrogram of 7 Quercus mengolica
populations based on Standard genetic distance
(Nei 1978) ,computed using NJ approach in PHYLIP.
Numbers are bootstrap support values
26Isolation by distance
Geographic distance matrix (km)
genetic distance matrix
- Compute correlation between distance matrices
Mantel test (Mantel, 1967 Smouse et al. 1986)
27Mantal test based on 37 Quercus mengolica
population
Genetic distance
Geographic distance
FSTAT Arlequin SPAGeDi Genepop
28Population structure
- AMOVA (Analysis of molecular variance)
- Population genetic structure inferred by
analysis of variance Arlequin AMOVA GDA
29migration rate
- Gene flow based on island model
Wright, S. 1978. Evolution and the genetics of
populations. Chicago University Press, Chicago.
Hartl, D. L., A. G. Clark. 1989. Principles of
population genetics, 2 edition. Sinauer
Associates, Sunderland
Popgene Genepop (private allele method)
30migration rate
- Immigrant based on individual genetic distance
Neighbor-joining tree of individuals in the T.
helleri data set. Pritchard et al (2000) Genetics
The pairwise distance matrix was computed as
follows (Mountain and Cavalli-Sforza 1997)
31Detecting recent immigrants
- Bayesian clustering approach
- Markov chain Monte Carlo
- NewHybrids Geneland IM BAPS, BayesAss
Structure
32(No Transcript)
33(No Transcript)
34Data convert software
- SSR data
- convert Glaubitz J.C. 2004 Molecular Ecology
Notes 4 309-310 - GDA
- GENEPOP
- ARLEQUIN
- POPGENE
- MICROSAT
- PHYLIP
- STUCTURE
- Table of allele frequencies
35Data convert software
- 2. Formatomatic
- Genepop
- Arlequin
- IMMANC
- MSA
- Msvar
- Arlequin
- Migrate
- im
36Data convert software
- Arlequin
- GenePop ver. 3.0,
- Biosys ver.1.0,
- Phylip ver. 3.5
- Mega ver. 1.0
- Win Amova ver. 1.55.
37Flow chart of possible data exchange between
different population genetics programs
38(No Transcript)
39Data convert software
- AFLP data
- AFLPdat Ehrich, D. 2006 Mol. Ecol. Notes, 6,
603604 - ARLEQUIN
- STRUCTURE (version 2.1)
- TREECON
- PAUP
- HICKORY
40Intruduced softwares
- MSA (Microsatellite analysis)
- Genepop
- Structure
41(No Transcript)
42STRUCTURE -inference of population structure
using multilocus genotype data
43Applications
- Demonstrate the presence of population structure
- Assign individuals to populations
- Identifying migrants and admixed individuals
- Markers Microsatellites (SSR), SNPs, RFLP,
sequence and AFLP
44models
- Main modeling assumptions
- Hardy-Weinberg equilibrium within population
- Complete linkage equilibrium between loci within
populations
45Inference for the number of populations (K)
Propability for K2
Simplify to
46Easy to Infer K
K3
47Difficult to determine K
Fig. 3-B,F Log probability of data L(K) as a
function of K, from Evanno, G., Regnaut, S., and
Goudet, J. (2005)
48?K to determine K
- based on the second order rate of change of the
likelihood function (?K) - L'(K) L(K) - L(K - 1)
- L''(K) L'(K 1) - L'(K)
- ?K m(L''(K))/sL(K)
- ?K m(L(K 1) - 2 L(K) L(K - 1))/sL(K)
- Evanno, G., Regnaut, S., and Goudet, J. (2005).
Detecting the number of clusters of individuals
using the software STRUCTURE a simulation study.
Mol. Ecol., 142611-2620
49(No Transcript)
50Similarity Coefficients to determin K
- Similarity coefficients between runs of the same
K value - The coefficient C(Q1Q2)
- 1-(minP Q1 -P(Q2)F )/Q1 1/KF
- Rosenberg et al., 2002. Genetic Structure of
Human Populations. Science 298 2381-2385