Phylogenetic networks: recent questions and results or: constructing a level2 phylogenetic network f - PowerPoint PPT Presentation

1 / 47

About This Presentation

Title:

Phylogenetic networks: recent questions and results or: constructing a level2 phylogenetic network f

Description:

Phylogenetic networks: recent questions and results (or: ... Orangutan. Gorilla. Chimpanzee. Human (This tree borrowed from a presentation by Tandy Warnow) ... – PowerPoint PPT presentation

Number of Views:114

Avg rating:3.0/5.0

Slides: 48

Provided by: kelk

Category:

more less

Transcript and Presenter's Notes

Title: Phylogenetic networks: recent questions and results or: constructing a level2 phylogenetic network f

1
Phylogenetic networks recent questions and
results (or constructing a level-2 phylogenetic
network from a dense set of input triplets in
polynomial time)

Leo van Iersel1, Judith Keijsper1, Steven Kelk2,
Leen Stougie12
(1) Technische Universiteit Eindhoven (TU/e)
(2) Centrum voor Wiskunde en Informatica (CWI),
Amsterdam
Email S.M.Kelk_at_cwi.nl
Web http//homepages.cwi.nl/kelk

2
Part 1 Context
3
Phylogenetic tree reconstruction
Phylogenetic tree reconstruction is essentially
the science of efficiently inferring and
constructing plausible evolutionary trees when we
only have limited input data about the species
concerned At the intersection of biology,
bioinformatics, computer science and
mathematics.
Orangutan
Gorilla
Chimpanzee
Human
(This tree borrowed from a presentation by Tandy
Warnow)
4
Dominant methods in phylogenetic reconstruction

Character-based methods
Maximum Parsimony ( Minimum Steiner Tree)
Maximum Likelihood
Bayesian methods (Markov Chain Monte Carlo)
Distance-based methods
Neighbour Joining
UPGMA
Quartet/triplet-based methods

5
Triplet-based methods (1)

Quartet-based methods used for constructing
unrooted evolutionary trees no root ( most
distant ancestor) and edges have no direction
(e.g. edge between species X and Y does not say
whether X evolved into Y, or vice-versa.)
Triplet-based methods are used for constructing
rooted evolutionary trees there is a root and
edges are directed.
The central idea build a single, big
evolutionary tree for a set L of species by
combining smaller evolutionary trees on subsets
of L such that the big tree respects the
structure of the smaller trees.
In triplet-based methods, the small input trees
are always defined on size-3 subsets of the
species set L (and are called rooted triplets.)

6
Triplet-based methods (2)

For example. Suppose I want to reconstruct a
plausible evolution for the species set
W,X,Y,Z.
I am given a set of rooted triplets zwx, yxw,
xyz, wzy. (Note zwx wzx.)

solution
algorithm
w
z
x
y
7
Triplet-based methods (2)

For example. Suppose I want to reconstruct a
plausible evolution for the species set
W,X,Y,Z.
I am given a set of rooted triplets zwx, yxw,
xyz, wzy. (Note zwx wzx.)

solution
z
w
x
algorithm
w
z
x
y
8
Triplet-based methods (2)

For example. Suppose I want to reconstruct a
plausible evolution for the species set
W,X,Y,Z.
I am given a set of rooted triplets zwx, yxw,
xyz, wzy. (Note zwx wzx.)

solution
z
w
x
algorithm
x
y
z
w
z
x
y
9
From trees to networks

The algorithm of Aho et al. (1981) can be used
to construct trees from rooted triplets.
Butwhat if the algorithm fails? Why might the
algorithm fail?
Possible reason 1 The underlying evolution is
tree-like, but the input triplets contain errors.
Possible reason 2 The triplets are correct, but
the underlying evolution is not tree-like.
Biological phenomena such as hybridization,
horizontal gene transfer, recombination and gene
duplication can lead to evolutionary scenarios
that are not tree-like!
Response try and construct not phylogenetic
trees, but phylogenetic networks

10
From trees to networks (2)

For example, suppose the input is xyz, xzy.

z
y
x
(Note that there are cases when, even if there is
at most one triplet per 3 species, a tree is not
possible)
11
From trees to networks (2)

For example, suppose the input is xyz, xzy.

x
y
z
z
y
x
(Note that there are cases when, even if there is
at most one triplet per 3 species, a tree is not
possible)
12
From trees to networks (2)

For example, suppose the input is xyz, xzy.

z
y
x
z
y
x
(Note that there are cases when, even if there is
at most one triplet per 3 species, a tree is not
possible)
13
Level-k phylogenetic networks
root (only one!)
A level-k phylogenetic network is a rooted,
directed acyclic graph where every biconnected
component (in the underlying undirected graph)
contains at most k recombination vertices. This
network here is a very simple example of a
level-1 network. In a level-1 network, the
cycles are vertex-disjoint, hence the
alternative name galled tree.
split-vertex
z
y
x
leaf- vertex
recombination-vertex
14
Level-k phylogenetic networks
root (only one!)
A level-k phylogenetic network is a rooted,
directed acyclic graph where every biconnected
component (in the underlying undirected graph)
contains at most k recombination vertices. This
network here is a very simple example of a
level-1 network. In a level-1 network, the
cycles are vertex-disjoint, hence the
alternative name galled tree.
split-vertex
z
y
x
leaf- vertex
recombination-vertex
15
What Jansson Sung ( Nguyen) did

A set of input triplets is dense iff, for every
subset of 3 species, there is at least one
triplet corresponding to those 3 species.
A dense set of input triplets for n species
contains thus O(n3) triplets.
Jansson Sung (2006) showed the following

Given a dense set of triplets T for a set L of
species, it is possible to determine in
polynomial-time whether a level-1 phylogenetic
network N exists such that all the triplets in T
are consistent with N. (And if so, to construct
such a network.)

They later showed, together with Nguyen, how to
do this in time linear in T. They also showed
that, in the non-dense case, the problem is
NP-hard.
But what about level-2 networks, and higher?

16
Main result Given a dense set of triplets T for
a set L of species, it is possible to determine
in time O(T3) whether a level-2 phylogenetic
network N exists such that all the triplets in T
are consistent with N. (And if so, to construct
such a network.)
17
Part 2 The algorithm
18
Algorithm, high-level idea

The algorithm is conceptually (fairly) simple,
but the proof of correctness and the technical
details are rather complex.
The high-level idea is as follows
PARTITION the set of leaves (i.e. species) into
a correct partition P
INDUCE a new set of triplets T where every
block of the partition P becomes a single leaf (a
kind of meta-leaf if you like)
SOLVE a simpler version of the problem for T to
get a network N
RECURSE inside each leaf of N

Step 3 is the critical part of the algorithm. It
brings together two issues
(a) why is it sufficient to only solve a simpler
version of the problem?
(b) how do we solve this simpler version of the
problem?

19
Definition inducing new triplet sets from
partitions of the leaf set

Suppose I have a partition P P1, P2, , Pt
of the leaf set L.
Suppose I have a dense set of triplets T on the
leaf set L.
Let T be a new triplet set on leaf set q1,
q2,, qt defined as follows
qiqjqk is in T if and only if i?j?k and there
exists a triplet xyz in T such that x is in Pi,
y is in Pj and z is in Pk
Then we say that T is the triplet set induced
by the partition P of L.
Critically if T is dense, then T is also
dense.
In some sense this can be perceived as a
coarsening of the input set.

20
Definition simple level-2 networks
21
An example of a simple level-2 network
22
Definition SN-set

Jansson Sung introduced the idea of the SN-set.
SN-sets are special subsets of the leaves L, and
are defined with respect to triplet sets.
All sets containing just a single leaf, are
SN-sets.
More generally, an SN-set is any subset of leaves
obtained by taking the closure of the following
operation on some subset S of the leaves L

z
x
y
some subset S of the leaves
23
Definition SN-set

Jansson Sung introduced the idea of the SN-set.
SN-sets are special subsets of the leaves L, and
are defined with respect to triplet sets.
All sets containing just a single leaf, are
SN-sets.
More generally, an SN-set is any subset of leaves
obtained by taking the closure of the following
operation on some subset S of the leaves L

In other words, if there is some pair of leaves
x,z in the set S such that xyz is a triplet and
y is not in the set S, add y to S, and repeat
until no more leaves can be added. An SN-set is
any set that can be constructed this way.
z
x
y
24
Definition maximal SN-set

The SN-set that is equal to the total leaf set L,
is called the trivial SN-set.
An SN-set that is non-trivial, and is not a
strict subset of any other non-trivial SN-set, is
called a maximal SN-set.
Jansson and Sung proved that the set of maximal
SN-sets partition the leaf set L. So no two
maximal SN-sets overlap, and they completely
cover the set of input leaves.
It is polynomial-time solveable to find all the
SN-sets, and all the maximal SN-sets.
Jansson Sung solved the level-1 problem by
observing that they could treat the maximal
SN-sets like meta-leaves, thus reducing the
problem to recursively solving the problem on the
triplets induced by the maximal SN-sets.
Our idea is similar, but SN-sets in level-2
networks are (unfortunately) rather more complex
creatures than in level-1 networks.

25
Definition (highest) cut-edges

In a phylogenetic network N, a cut-edge (x,y) is
an edge whose removal disconnects the
(underlying) graph.
A cut-edge (x,y) is said to be a trivial cut
edge iff y is a leaf.
A cut-edge (x,y) is said to be highest iff there
is no cut-edge (p,q) such that there is a
directed path from q to x in N.

26
So each maximal SN-set can be expressed as the
union of the leaves reachable by one or more
highest cut-edges.
27
A first attempt at reducing the problem to simple
level-2 networks

Now, suppose we have a dense set of triplets T
and there exists a level-2 network N such that
all the triplets in T are consistent with N. (Of
course we dont know what N is yet)
Suppose we construct a partition P of L as
follows. The blocks of P are the sets of leaves
reachable from highest cut-edges in N. (Each
maximal SN-set of N thus corresponds to one or
more blocks in P.)
Let T be the new set of triplets induced by the
partition P. In other words, if we collapse the
set of leaves below highest cut-edges into
meta-leaves, T is the new set of triplets we
get. (Nice property the maximal SN-sets of T
are in 11 correlation with the maximal SN-sets
of T.)
Critical fact 1 the only level-2 networks where
all cut-edges are trivial, are simple level-2
networks.
Critical fact 2 there exists some simple
level-2 network N such that the triplets in T
are consistent with N. Furthermore, if we find
such an N, and then recursively construct
networks within each meta-leaf, we obtain a
network consistent with T!

28
But.thats a non-deterministic argument

So, it looks like we can indeed reduce the
problem in some sense to finding simple
level-2 networks.
But that analysis was based on knowing where the
highest cut-edges are in a hypothetical solution
N. And we dont know Nthis is precisely what
were looking for!
We can, however, compute the maximal SN-sets of
the input triplet set T.
We need to be able to say something more about
how maximal SN-sets of T relate to highest
cut-edges in hypothetical solutions. Then we can
base the recursion on maximal SN-sets, instead of
highest cut-edges.

29
Central Theorem (simplified). Suppose there is a
dense triplet set T consistent with some simple
level-2 network N. Then there exists a level-2
network N (not necessarily simple) such that,
with the exception of perhaps one maximal SN-set
with respect to T, every maximal SN-set appears
below a single cut-edge in N. The remaining,
odd-one-out maximal SN-set (if it exists) will
be equal to the union of leaves below two
cut-edges.
30
Observe how SN-set C,G,F has been pushed
below a single cut-edge.
31
An existence argument

If some solution N exists for T, then a simple
level-2 solution N exists for T (induced by the
highest cut-edges of N) where the maximal SN-sets
of T are tightly correlated with the maximal
SN-sets of T. Finding N gives the starting point
for a solution to T.
But by the Central Theorem, all (except maybe
one) of the maximal SN-sets of N can be pushed
below highest cut-edges to give a solution N
for T.
If we re-expand all the meta-leaves of N, we
obtain a new solution N for T. Crucially, all
(except maybe one) of the maximal SN-sets of T
will be beneath single cut-edges in N. The
odd-one-out will be beneath two cut-edges.
So if we substitute N as N in the first step,
we come to the following conclusion
We can find a solution for T by finding a simple
level-2 solution for the set of triplets induced
by the maximal SN-sets of T, and recursing. We
need to correctly guess the odd-one-out maximal
SN-set, however, and split that into two
meta-leaves. Fortunately we can just try
splitting each maximal SN-set in turn.

32
subnetwork below highest cut-edge
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
whole maximal SN-set is now below a cut-edge!
38
Finding simple level-2 networks

So we know that, if we analyse the maximal
SN-sets carefully, and construct an appropriate
new set of triplets, we can recursively reduce
the entire problem to finding simple level-2
networks.
But how do we algorithmically construct a simple
level-2 network that is consistent with a given
dense set of triplets?

39
(No Transcript)
40
(No Transcript)
41
(No Transcript)
42
(No Transcript)
43
(No Transcript)
44
(No Transcript)
45
g
46
Conclusions open problems

So we know how to efficiently construct level-2
networks from dense triplet sets. Whats next?
Applicability how useful is it?
Initial implementation programming and
fine-tuning
Improving running time in the spirit of the
SN-tree of JSN
Complexity what about level-3 and higher?
Bounds worst-case, best-case scenarios
Building all networks
Properties of output networks as function of
input
Different triplet restrictions
Confidence how good are the solutions?
Exponential-time exact algorithms for NP-hard
problems

47
Conclusions open problems

So we know how to efficiently construct level-2
networks from dense triplet sets. Whats next?
Applicability how useful is it?
Initial implementation programming and
fine-tuning
Improving running time in the spirit of the
SN-tree of JSN
Complexity what about level-3 and higher?
Bounds worst-case, best-case scenarios
Building all networks
Properties of output networks as function of
input
Different triplet restrictions
Confidence how good are the solutions?
Exponential-time exact algorithms for NP-hard
problems