Coalescent - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Coalescent

Description:

The root of the genealogy. Coalescent event: ... Let K be the total number of mutation that occur in a genealogy. ... in a genealogy can be partitioned ... – PowerPoint PPT presentation

Number of Views:139
Avg rating:3.0/5.0
Slides: 21
Provided by: hao9
Category:

less

Transcript and Presenter's Notes

Title: Coalescent


1
Coalescent
  • Introduction

2
Classical Population Genetics
  • How do evolutionary forces affect future change
    in allele and genotyping frequencies.
  • Mutation
  • Natural selection
  • Random genetic drift
  • Migration and population growth

3
Coalescent Theory
  • What past evolutionary forces have acted to give
    us the scope and distribution of genetic
    variation we see today?
  • The major difference
  • Classical population genetics focuses on
    properties of the entire population
  • While coalescent theory on those of a sample from
    the population

4
divergence
coalescent
5
Terminology
  • Gene Genealogy , Genealogy
  • The phylogeny of a sample from a population.
  • The Most Recent Common Ancestor (MRCA)
  • The root of the genealogy
  • Coalescent event
  • When two alleles coalesce, it is called a
    coalescent event.
  • The Coalescent time
  • The number of generations between successive
    coalescent events
  • The Infinite Allele Model
  • Each mutation create an allele that has never be
    present in the population.

6
The Wright-Fisher model The genes of next
generation are random sample with replacement
from the gene pool of current generation.
Current Generation
P1/(2N)
Next Generation
7
The coalescence of two sequence
t-1 generation (2N sequence)
P1/(2N)
P1-1/(2N)
t generation (2N sequence)
8
The probability that the two sequence came from a
single ancestral sequence sequence t2t1
generation ago is
(1-p2)(1-p2)p2(1-p2)tp2
(1- )t
9
Continuous approximation Advantage Make the
mathematics simple and simulation faster. Note
that e-x1-x when x is small. Therefore the
distribution of t2 can be approximation by a
exponential distribution.
-
Properties
2
10
Generalization i-coalescent time The
distribution of ti
Properties
11
The expected total of a sample genealogy
is where
i
i
i
e
where
e
e
e
e
12
Mutations in a genealogy µ the mutation rate per
sequence per generation, The probability that
number ?of mutations in branch of length ?is
13
The number of segregating sites Let K be the
total number of mutation that occur in a
genealogy. Then given that the total tree length
is T, K is a Poisson variable with mean equal to
Tµ .That is
Consider all possible values for T, we have
14
The age of the most recent common ancestor
Since
We have
Note when simple size (n) is large, E(T) is
approximately equal to 4N.
15
Major population genetics models that have been
studied
  • The neutral Wright-Fisher model
  • Recombination
  • Selection
  • Population subdivision and Migration
  • Population growth
  • Partially selfing

16
Traditional summary statistics
  • Heterozygosity
  • The probability that two randomly selected
    alleles are different.
  • Number of alleles
  • The number of distinct alleles in a sample.
  • Proportion of polymorphic loci
  • The proportion of loci that are polymorphic.
    Traditionally a locus is said to be polymorphic
    if the most frequent allele is less than , say
    90 in the sample.

17
  • Measure more suitable for DNA polymorphism
  • The number of segregating site K
  • The number of nucleotide sites that are variable.
  • The mean number of nucleotide differences ?
  • The average number of nucleotide differences
    between a pair of sequences is intuitive and has
    interesting statistical properties
  • Frequency spectrum
  • Mutations in a genealogy can be partitioned into
    different categories.

18
The mean nucleotide difference between two
sequences
Where dij is the number of differences between
sequence i and j.
Seq1 AAGCTTTCC Seq2 AAGCATTCC Seq3 AACCATTCC
d121 d132 d231
19
The mean value of dij is the same as the expected
number of segregating sites in a sample of two
sequences.So
20
Estimating ?
The quantity ?4Nµ is the most important
parameter for the evolution of a DNA region in a
population. Wattersons estimator is defined as
Tajimas esimator is difined as
Write a Comment
User Comments (0)
About PowerShow.com