Combinatorics%20 - PowerPoint PPT Presentation

About This Presentation
Title:

Combinatorics%20

Description:

Combinatorics – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 34
Provided by: Jotun
Category:

less

Transcript and Presenter's Notes

Title: Combinatorics%20


1
Combinatorics the Coalescent (26.2.02)
Tree Counting Tree Properties. Basic
Combinatorics. Allele distribution. Polya
Urns Stirling Numbers. Number af ancestral
lineages after time t. Inclusion-Exclusion
Principle.
2
A set of realisations (from Felsenstein)
3
Binomial Numbers
1
2
3
4
5
n
Binomial Expansion
Special Cases
4


n-1
n-r-1
r
n-r
n0 1 1 1
1 1 2 2
1 2 1 4 3 1 3
3 1 8 4 1 4 6 4 1
16 5 1 5 10 10 5 1
32 6 1 6 15 20 15 6 1 64 7
1 7 21 35 35 21 7 1 128 k 0
1 2 3 4 5 6 7
5
The Exponential Distribution. The Exponential
Distribution R Expo(a) Density f(t)
ae-at, P(Xgtt) e-at
Properties X Exp(a) Y Exp(b) independent
i. P(Xgtt2Xgtt1) P(Xgtt2-t1) (t2 gt
t1) ii. E(X) 1/a. iii. P(X lt Y)
a/(a b). iv. min(X,Y) Exp (a b).
v. Sums of k iid Xi is G(k,a) distributed
6
The Standard Coalescent
Two independent Processes Continuous
Exponential Waiting Times Discrete
Choosing Pairs to Coalesce.
Waiting
Coalescing
1,2,3,4,5
(1,2)--(3,(4,5))
1,23,4,5
1--2
1,234,5
3--(4,5)
123,4,5
4--5
12345
7
Tree Counting
Tree Connected undirected graph without cycles.
k nodes (vertices) k-1 edges. Nodes with one
edge are leaves (tips) - the rest are internal.
Labels of internal nodes are permutable without
change of biological interpretation. If labels at
leaves are ignored we have the shape of a tree.
Ignore root branch lengths gives unrooted tree
topology.
If age ordering of internal nodes are retained
this gives the coalescent topology.
Most biological trees are bifurcating. Valency 3
(number of edges touching internal nodes) if made
unrooted. Such unrooted trees have n-2 internal
nodes 2n-3 edges.
8
Counting by Bijection
Bijection to a decision series
Nk1k2...kL
1
3
2
N
9
Trees Rooted, bifurcating nodes time-ranked.
Recursion Tk Tk-1
Initialisation T1 T21
3 4 5 6 7 8 9 10 15 20
3 18 180 2700 5.7 104 1.5 106 5.7 107 2.5 109 6.9 1018 5.6 1029
10
Trees Unrooted valency 3
Recursion Tn (2n-5) Tn-1
Initialisation T1 T2 T31
4 5 6 7 8 9 10 15 20
3 15 105 945 10345 1.4 105 2.0 106 7.9 1012 2.2 1020
11
Coalescent versus unrooted tree topologies
4 leaves 3 unrooted trees 18 coalescent
topologies. 1
unrooted tree topology contains 6 coalescent
topologies.
3
1
4
2
4
2
3
4
1
1
1
2
2
3
3
4
4
12
Inner outer branches Fu Li (1993)
External (e) versus Internal (?) Branches.
E(e) 2 E(i)
Red - external. Others internal. Except for
green branch, internal-external corresponds to
singlet/non-singlet segregating sites if only one
mutation can happen per position. ACTTGTACGA ACTT
GTACGA ACTTGTACGA TCTTATACGA ACTTATACGA s n
Let li,n be length of ith external branch in an
n-tree. Obviously E(e) nE(ln,i) (any i)
ln-1,j tn Pr 1-2/n Ln,i
tn Pr 2/n
13
Probability of hanging Sub-trees. Kingman (1982b)
For a coalescent with n leaves at time 0, with k
ancestors at time t1, let ? be the groups of
leaves of the k subtrees hanging from time t1.
Let l1, l2 .., lk be the number of leaves of
these sub-trees.
Example n8, k3. Classes observed 4, 3, 1
The basal division splits the leaves into (k,n-k)
sets with probability 1/(n-1).
14
Nested subsamples (Saunders et al.(1986)
Adv.Appl.Prob.16.471-91.)
Transitions
tt1
i-1,j
2N
i,j
i-1,j-1
i
j
i,j
i , j
1,1
2,1 2,2
3,1 3,2 3,3
4,1 4,2 4,3 4,4
5,1 5,2 5,3 5,4 5,5
6,1 6,2 6,3 6,4 6,5 6,6
7,1 7,2 7,3 7,4 7,5 7,6 7,7
8,1 8,2 8,3 8,4 8,5 8,6 8,7 8,8
9,1 9,2 9,3 9,4 9,5 9,6 9,7 9,8 9,9
t0
2N
i
j
Sub-sample
Sample
Population
15
Nested subsamples (Saunders et al.(1986)
Adv.Appl.Prob.16.471-91.)
PrMRCA(sub-sample) MRCA(sample)
PrMRCA(sub-sample) MRCA(population)
16
Age of a Mutation Wiuf Donnelly (1999) Wiuf
(2000), Matthews (2000)
-------------------- -------------------
Exp(?)
Exp(1)
17
Polya Urns Infinite Allele Model (Donnelly,1986
Hoppe,198487)
The only observation made in the infinite allele
models is identity/non-identity among all pairs
of alleles. I.e. The central observation is a
series of classes and their sizes.
Expected number of mutations in unit interval
(2N) is ?.
This model will give rise to distributions on
partitions of 1,2,..,n like 1,4,72,356.
Since the labelling is arbitrary, only the
information about the size of these groups is
essential for instance represented as 122131.
What is the next event - a duplication of an
exiting type or a introduction of a new allele.

18
Classical Polya Urns Feller I.
Let X0 be the initial configuration of the
initial Urn. A step take a random ball the urn
and put it back together with an extra of the
same colour. Xk be the content after the kth
step. Let Yk be the colour of the kth picked
ball.
i. PYk j PY1 j. ii. Sequences Y1 ... Yk
resulting in the same Xk - has the same
probability.
19
Labelling, Polya Urns Age of Alleles (Donnelly,1
986 Hoppe,198487)
As they come By size By age
A ball is picked proportionally to its weight.
Ordinary balls have weight 1. If the initial
?-size ball is picked, it is replaced together
with a completely new type. If an ordinary ball
is picked, it is replaced together with a copy of
itself.
An Urn
?
1
2
?
1
1
There is a simple relationship between the
distribution of the alleles labeled with age
ranking is the same as the alleles labeled with
size ranking
20
Ewens' formula. (1972 TPB 3.87-112)
P5(2,0,1,0,0) is the probability of seeing 2
singles and one allele in 3 copies in a sample of
5. Obviously, a12a2 iai nann
Pn(a1,a2, ,an)
En(k types)
Pn(a1,a2, ,ank)
k is a minimal sufficient statistic for ???????
the probability of the data conditioned on k is
?-less and there is no simpler such statistic.
21
Stirling Numbers
Partitioning into k sets - Stirling Numbers (of
second kind) - Sn,k
k n 1 2 3 4 5 6 7
1 1
2 1 1
3 1 3 1
4 1 7 6 1
5 1 15 25 10 1
6 1 31 90 65 15 1
7 1 63 301 350 140 21 1
B
1
2
5
15
52
193
k unlabelled bins - all non-empty.
877
Bell Numbers - Bn - Partioning into any number of
sets.
Obviously
22
Stirling Numbers
n-1 items - k classes ..,..,..,..
(n-1,k-1) ..,..,..,..
n
n
(n,k) ..,..,..,..
Basic Recursion Sn,k kSn-1,k Sn-1,k-1
Initialisation Sn,1 Sn,n 1.
23
Ewens' formula - example. (1972 TPB 3.87-112)
Assume
has been observed and that 0.5 mutation is
expected per unit (2N) time.
24
Ancestors to Ancestors Griffiths(1980),
Tavaré(1984)
hi,j probability that i individuals has j
ancestors after time t.
ik i(i-1)..(i-k1) i (k)
i(i1)..(ik-1)
Example Disappearance of 7 lineages.
25
Y of Ancestors to time t.
  • 3 methods of solution
  • i.Sum of different independent exponential
    distributions

ii. Distribution in markov chain
i-1
j1
j
i
j-1
1
1
iii. Combination of known probabilities a.
Probability that i alleles has i/less ancestors.
b. This probability is the same for all
i-sets c. No coalescence within a set,
implies no coalescence within all
subsets.
26
3 Ancestors to 2 Ancestors (3/2)(e-t - e-3t)
e-t
1,2,3
?
?
(2,3)
(1,3)
1,2
1,3
2,3
(1,2)
e-3t
(1,2)
e-t
?
(2,3)
e-t
(1,3)
? (e-t - e-3t)/2 Exactly one coalescence3(e-t-
(e-t - e-3t)/2)-e-3t)
Jordans Sieve A1 3e-t
- 2A2 2
((e-t e-3t)/2)
3A3 3 e-3t
27
The exclusion-inclusion principle.
Venn Diagrams
I II - I II III 0
IIIIII I II III
- (I,II I,III II,III)
I,II,III
28
Exclusion-inclusion Jordans Sieve
Sj j1,..,r the given sets, Ak - sum of
intersection of k sets
Total number
In exactly m sets (Jordans Sieve)
Example the elements above
in 1 sets A1 - 2A2 3A3 - 4 A4 in 2 sets
A2 - 3A3 6 A4 in 3 sets
A3 - 4 A4 in 4 sets
A4 in some set A1 - A2 A3 - A4
(Jordans Sieve)
exclusion-inclusion
29
Surviving Lineages
Which probability statements can be made? Let s
be subset of i 1,2,..i and S(s) be the event
that no coalescence has happened to s.
Additionally, if s is a subset of s, then S(s)
implies S(s).
Size number
1,2,..,i
i 1
1,2,..,i-1
2,..,i
1,3..,i
i-1 i
j
1,2
(i-1,i)
2
e-t
e-t
30
Surviving Lineages
There are
sets. We want events member of only one of them.
where
Summation is over all k-subsets of 1,..,r and
intersection is between the k sets chosen.
31
Pk(t1) hi,k(t1) hk,j(t- t1)/ hi,j(t)
Example 7 --gt 4 lineages.
32
Summary
Tree Counting Tree Properties. Basic
Combinatorics. Allele distribution. Polya
Urns Stirling Numbers. Number af ancestral
lineages after time t. Inclusion-Exclusion
Principle.
33
Recommended Literature
Bender(1974) Asymmptotic Methods in Enumeraion
Siam Review vol16.4.485- Donnelly (1986)
Theor.Pop.Biol. Ewens (1972) Theor.Pop.Biol.
Ewens (1989) Population Genetics Theory - The
Past and the Future Feller (196871) Probability
Theory and its Applications I II Wiley Fu Li
(1993) Statistical Tests of Neutrality of
Mutations Genetics 133.693-709. Griffiths (1980)
Griffiths Tavaré(1998) The Age of a mutation
on a general coalescent tree. Griffiths
Tavaré(1999) The ages of mutations in gene
trees Griffiths Tavaré(2001) The genealogy of
a neutral mutation Hoppe (1984) Polya-like
urns and the Ewens sampling formula
J.Math.Biol. 20.91-94 Kingman (1982) On the
Genealogy of Large Populations 27-43. Kingman
(1982) The Coalescent Stochastic Processes and
their Applications 13..235-248. Kingman
(1982) Matthews,S.(1999) Times on Trees, and the
Age of an Allele Theor.Pop.Biol. 58.61-75. Möhle
Pitman Schweinsberg Simonsen Churchill
(1997) Saunders et al.(1986) On the genealogy of
nested subsamples from a haploid population
Adv.Apll.Prob. 16.471-91. Tajima (1983)
Evolutionary Relationships of DNA Sequences in
Finite Poulations Genetics 105.437-60. Tavaré
(1984) Line-of-Descent and Genealogical
Processes, and Their Application in Population
Genetics Models. Theor.Pop.Biol. 26.119-164.
Thompson,R. (1998) Ages of mutations on a
coalescent tree Math.Bios. 153.41-61. van Lint
Wilson (1991) A Course in Combinatorics -
Cambridge Wiuf (2000) On the Genealogy of a
Sample of Neutral Rare Alleles. Theor.Pop.Biol.
58.61-75. Wiuf Donnelly (1999) Conditional
Genealogies and the Age of a Mutant. Theor.
Pop.Biol. 56.183-201.
Write a Comment
User Comments (0)
About PowerShow.com