Phylogenetic networks: recent questions and results or: constructing a level2 phylogenetic network f - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Phylogenetic networks: recent questions and results or: constructing a level2 phylogenetic network f

Description:

Phylogenetic networks: recent questions and results (or: ... Orangutan. Gorilla. Chimpanzee. Human (This tree borrowed from a presentation by Tandy Warnow) ... – PowerPoint PPT presentation

Number of Views:114
Avg rating:3.0/5.0
Slides: 48
Provided by: kelk
Category:

less

Transcript and Presenter's Notes

Title: Phylogenetic networks: recent questions and results or: constructing a level2 phylogenetic network f


1
Phylogenetic networks recent questions and
results (or constructing a level-2 phylogenetic
network from a dense set of input triplets in
polynomial time)
  • Leo van Iersel1, Judith Keijsper1, Steven Kelk2,
    Leen Stougie12
  • (1) Technische Universiteit Eindhoven (TU/e)
  • (2) Centrum voor Wiskunde en Informatica (CWI),
    Amsterdam
  • Email S.M.Kelk_at_cwi.nl
  • Web http//homepages.cwi.nl/kelk

2
Part 1 Context
3
Phylogenetic tree reconstruction
Phylogenetic tree reconstruction is essentially
the science of efficiently inferring and
constructing plausible evolutionary trees when we
only have limited input data about the species
concerned At the intersection of biology,
bioinformatics, computer science and
mathematics.
Orangutan
Gorilla
Chimpanzee
Human
(This tree borrowed from a presentation by Tandy
Warnow)
4
Dominant methods in phylogenetic reconstruction
  • Character-based methods
  • Maximum Parsimony ( Minimum Steiner Tree)
  • Maximum Likelihood
  • Bayesian methods (Markov Chain Monte Carlo)
  • Distance-based methods
  • Neighbour Joining
  • UPGMA
  • Quartet/triplet-based methods


5
Triplet-based methods (1)
  • Quartet-based methods used for constructing
    unrooted evolutionary trees no root ( most
    distant ancestor) and edges have no direction
    (e.g. edge between species X and Y does not say
    whether X evolved into Y, or vice-versa.)
  • Triplet-based methods are used for constructing
    rooted evolutionary trees there is a root and
    edges are directed.
  • The central idea build a single, big
    evolutionary tree for a set L of species by
    combining smaller evolutionary trees on subsets
    of L such that the big tree respects the
    structure of the smaller trees.
  • In triplet-based methods, the small input trees
    are always defined on size-3 subsets of the
    species set L (and are called rooted triplets.)

6
Triplet-based methods (2)
  • For example. Suppose I want to reconstruct a
    plausible evolution for the species set
    W,X,Y,Z.
  • I am given a set of rooted triplets zwx, yxw,
    xyz, wzy. (Note zwx wzx.)

solution
algorithm
w
z
x
y
7
Triplet-based methods (2)
  • For example. Suppose I want to reconstruct a
    plausible evolution for the species set
    W,X,Y,Z.
  • I am given a set of rooted triplets zwx, yxw,
    xyz, wzy. (Note zwx wzx.)

solution
z
w
x
algorithm
w
z
x
y
8
Triplet-based methods (2)
  • For example. Suppose I want to reconstruct a
    plausible evolution for the species set
    W,X,Y,Z.
  • I am given a set of rooted triplets zwx, yxw,
    xyz, wzy. (Note zwx wzx.)

solution
z
w
x
algorithm
x
y
z
w
z
x
y
9
From trees to networks
  • The algorithm of Aho et al. (1981) can be used
    to construct trees from rooted triplets.
  • Butwhat if the algorithm fails? Why might the
    algorithm fail?
  • Possible reason 1 The underlying evolution is
    tree-like, but the input triplets contain errors.
  • Possible reason 2 The triplets are correct, but
    the underlying evolution is not tree-like.
    Biological phenomena such as hybridization,
    horizontal gene transfer, recombination and gene
    duplication can lead to evolutionary scenarios
    that are not tree-like!
  • Response try and construct not phylogenetic
    trees, but phylogenetic networks

10
From trees to networks (2)
  • For example, suppose the input is xyz, xzy.

z
y
x
(Note that there are cases when, even if there is
at most one triplet per 3 species, a tree is not
possible)
11
From trees to networks (2)
  • For example, suppose the input is xyz, xzy.

x
y
z
z
y
x
(Note that there are cases when, even if there is
at most one triplet per 3 species, a tree is not
possible)
12
From trees to networks (2)
  • For example, suppose the input is xyz, xzy.

z
y
x
z
y
x
(Note that there are cases when, even if there is
at most one triplet per 3 species, a tree is not
possible)
13
Level-k phylogenetic networks
root (only one!)
A level-k phylogenetic network is a rooted,
directed acyclic graph where every biconnected
component (in the underlying undirected graph)
contains at most k recombination vertices. This
network here is a very simple example of a
level-1 network. In a level-1 network, the
cycles are vertex-disjoint, hence the
alternative name galled tree.
split-vertex
z
y
x
leaf- vertex
recombination-vertex
14
Level-k phylogenetic networks
root (only one!)
A level-k phylogenetic network is a rooted,
directed acyclic graph where every biconnected
component (in the underlying undirected graph)
contains at most k recombination vertices. This
network here is a very simple example of a
level-1 network. In a level-1 network, the
cycles are vertex-disjoint, hence the
alternative name galled tree.
split-vertex
z
y
x
leaf- vertex
recombination-vertex
15
What Jansson Sung ( Nguyen) did
  • A set of input triplets is dense iff, for every
    subset of 3 species, there is at least one
    triplet corresponding to those 3 species.
  • A dense set of input triplets for n species
    contains thus O(n3) triplets.
  • Jansson Sung (2006) showed the following

Given a dense set of triplets T for a set L of
species, it is possible to determine in
polynomial-time whether a level-1 phylogenetic
network N exists such that all the triplets in T
are consistent with N. (And if so, to construct
such a network.)
  • They later showed, together with Nguyen, how to
    do this in time linear in T. They also showed
    that, in the non-dense case, the problem is
    NP-hard.
  • But what about level-2 networks, and higher?

16
Main result Given a dense set of triplets T for
a set L of species, it is possible to determine
in time O(T3) whether a level-2 phylogenetic
network N exists such that all the triplets in T
are consistent with N. (And if so, to construct
such a network.)
17
Part 2 The algorithm
18
Algorithm, high-level idea
  • The algorithm is conceptually (fairly) simple,
    but the proof of correctness and the technical
    details are rather complex.
  • The high-level idea is as follows
  • PARTITION the set of leaves (i.e. species) into
    a correct partition P
  • INDUCE a new set of triplets T where every
    block of the partition P becomes a single leaf (a
    kind of meta-leaf if you like)
  • SOLVE a simpler version of the problem for T to
    get a network N
  • RECURSE inside each leaf of N
  • Step 3 is the critical part of the algorithm. It
    brings together two issues
  • (a) why is it sufficient to only solve a simpler
    version of the problem?
  • (b) how do we solve this simpler version of the
    problem?

19
Definition inducing new triplet sets from
partitions of the leaf set
  • Suppose I have a partition P P1, P2, , Pt
    of the leaf set L.
  • Suppose I have a dense set of triplets T on the
    leaf set L.
  • Let T be a new triplet set on leaf set q1,
    q2,, qt defined as follows
  • qiqjqk is in T if and only if i?j?k and there
    exists a triplet xyz in T such that x is in Pi,
    y is in Pj and z is in Pk
  • Then we say that T is the triplet set induced
    by the partition P of L.
  • Critically if T is dense, then T is also
    dense.
  • In some sense this can be perceived as a
    coarsening of the input set.

20
Definition simple level-2 networks
21
An example of a simple level-2 network
22
Definition SN-set
  • Jansson Sung introduced the idea of the SN-set.
  • SN-sets are special subsets of the leaves L, and
    are defined with respect to triplet sets.
  • All sets containing just a single leaf, are
    SN-sets.
  • More generally, an SN-set is any subset of leaves
    obtained by taking the closure of the following
    operation on some subset S of the leaves L

z
x
y
some subset S of the leaves
23
Definition SN-set
  • Jansson Sung introduced the idea of the SN-set.
  • SN-sets are special subsets of the leaves L, and
    are defined with respect to triplet sets.
  • All sets containing just a single leaf, are
    SN-sets.
  • More generally, an SN-set is any subset of leaves
    obtained by taking the closure of the following
    operation on some subset S of the leaves L

In other words, if there is some pair of leaves
x,z in the set S such that xyz is a triplet and
y is not in the set S, add y to S, and repeat
until no more leaves can be added. An SN-set is
any set that can be constructed this way.
z
x
y
24
Definition maximal SN-set
  • The SN-set that is equal to the total leaf set L,
    is called the trivial SN-set.
  • An SN-set that is non-trivial, and is not a
    strict subset of any other non-trivial SN-set, is
    called a maximal SN-set.
  • Jansson and Sung proved that the set of maximal
    SN-sets partition the leaf set L. So no two
    maximal SN-sets overlap, and they completely
    cover the set of input leaves.
  • It is polynomial-time solveable to find all the
    SN-sets, and all the maximal SN-sets.
  • Jansson Sung solved the level-1 problem by
    observing that they could treat the maximal
    SN-sets like meta-leaves, thus reducing the
    problem to recursively solving the problem on the
    triplets induced by the maximal SN-sets.
  • Our idea is similar, but SN-sets in level-2
    networks are (unfortunately) rather more complex
    creatures than in level-1 networks.

25
Definition (highest) cut-edges
  • In a phylogenetic network N, a cut-edge (x,y) is
    an edge whose removal disconnects the
    (underlying) graph.
  • A cut-edge (x,y) is said to be a trivial cut
    edge iff y is a leaf.
  • A cut-edge (x,y) is said to be highest iff there
    is no cut-edge (p,q) such that there is a
    directed path from q to x in N.

26
So each maximal SN-set can be expressed as the
union of the leaves reachable by one or more
highest cut-edges.
27
A first attempt at reducing the problem to simple
level-2 networks
  • Now, suppose we have a dense set of triplets T
    and there exists a level-2 network N such that
    all the triplets in T are consistent with N. (Of
    course we dont know what N is yet)
  • Suppose we construct a partition P of L as
    follows. The blocks of P are the sets of leaves
    reachable from highest cut-edges in N. (Each
    maximal SN-set of N thus corresponds to one or
    more blocks in P.)
  • Let T be the new set of triplets induced by the
    partition P. In other words, if we collapse the
    set of leaves below highest cut-edges into
    meta-leaves, T is the new set of triplets we
    get. (Nice property the maximal SN-sets of T
    are in 11 correlation with the maximal SN-sets
    of T.)
  • Critical fact 1 the only level-2 networks where
    all cut-edges are trivial, are simple level-2
    networks.
  • Critical fact 2 there exists some simple
    level-2 network N such that the triplets in T
    are consistent with N. Furthermore, if we find
    such an N, and then recursively construct
    networks within each meta-leaf, we obtain a
    network consistent with T!

28
But.thats a non-deterministic argument
  • So, it looks like we can indeed reduce the
    problem in some sense to finding simple
    level-2 networks.
  • But that analysis was based on knowing where the
    highest cut-edges are in a hypothetical solution
    N. And we dont know Nthis is precisely what
    were looking for!
  • We can, however, compute the maximal SN-sets of
    the input triplet set T.
  • We need to be able to say something more about
    how maximal SN-sets of T relate to highest
    cut-edges in hypothetical solutions. Then we can
    base the recursion on maximal SN-sets, instead of
    highest cut-edges.

29
Central Theorem (simplified). Suppose there is a
dense triplet set T consistent with some simple
level-2 network N. Then there exists a level-2
network N (not necessarily simple) such that,
with the exception of perhaps one maximal SN-set
with respect to T, every maximal SN-set appears
below a single cut-edge in N. The remaining,
odd-one-out maximal SN-set (if it exists) will
be equal to the union of leaves below two
cut-edges.
30
Observe how SN-set C,G,F has been pushed
below a single cut-edge.
31
An existence argument
  • If some solution N exists for T, then a simple
    level-2 solution N exists for T (induced by the
    highest cut-edges of N) where the maximal SN-sets
    of T are tightly correlated with the maximal
    SN-sets of T. Finding N gives the starting point
    for a solution to T.
  • But by the Central Theorem, all (except maybe
    one) of the maximal SN-sets of N can be pushed
    below highest cut-edges to give a solution N
    for T.
  • If we re-expand all the meta-leaves of N, we
    obtain a new solution N for T. Crucially, all
    (except maybe one) of the maximal SN-sets of T
    will be beneath single cut-edges in N. The
    odd-one-out will be beneath two cut-edges.
  • So if we substitute N as N in the first step,
    we come to the following conclusion
  • We can find a solution for T by finding a simple
    level-2 solution for the set of triplets induced
    by the maximal SN-sets of T, and recursing. We
    need to correctly guess the odd-one-out maximal
    SN-set, however, and split that into two
    meta-leaves. Fortunately we can just try
    splitting each maximal SN-set in turn.

32
subnetwork below highest cut-edge
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
whole maximal SN-set is now below a cut-edge!
38
Finding simple level-2 networks
  • So we know that, if we analyse the maximal
    SN-sets carefully, and construct an appropriate
    new set of triplets, we can recursively reduce
    the entire problem to finding simple level-2
    networks.
  • But how do we algorithmically construct a simple
    level-2 network that is consistent with a given
    dense set of triplets?

39
(No Transcript)
40
(No Transcript)
41
(No Transcript)
42
(No Transcript)
43
(No Transcript)
44
(No Transcript)
45
g
46
Conclusions open problems
  • So we know how to efficiently construct level-2
    networks from dense triplet sets. Whats next?
  • Applicability how useful is it?
  • Initial implementation programming and
    fine-tuning
  • Improving running time in the spirit of the
    SN-tree of JSN
  • Complexity what about level-3 and higher?
  • Bounds worst-case, best-case scenarios
  • Building all networks
  • Properties of output networks as function of
    input
  • Different triplet restrictions
  • Confidence how good are the solutions?
  • Exponential-time exact algorithms for NP-hard
    problems

47
Conclusions open problems
  • So we know how to efficiently construct level-2
    networks from dense triplet sets. Whats next?
  • Applicability how useful is it?
  • Initial implementation programming and
    fine-tuning
  • Improving running time in the spirit of the
    SN-tree of JSN
  • Complexity what about level-3 and higher?
  • Bounds worst-case, best-case scenarios
  • Building all networks
  • Properties of output networks as function of
    input
  • Different triplet restrictions
  • Confidence how good are the solutions?
  • Exponential-time exact algorithms for NP-hard
    problems

Thank you for your attention!
Write a Comment
User Comments (0)
About PowerShow.com