# On the Hardness of Inferring Phylogenies from TripletDissimilarities - PowerPoint PPT Presentation

PPT – On the Hardness of Inferring Phylogenies from TripletDissimilarities PowerPoint presentation | free to view - id: f8888-ZDc1Z

The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
Title:

## On the Hardness of Inferring Phylogenies from TripletDissimilarities

Description:

### Plgw03, 17/12/07. 1. On the Hardness of Inferring Phylogenies from ... Butt'fly ...CGCG... ...AATA... ...AACG... ...CCGT... ...CAGA... ...AAGT... B E G H L M ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 31
Provided by: newt5
Category:
Tags:
Transcript and Presenter's Notes

Title: On the Hardness of Inferring Phylogenies from TripletDissimilarities

1
On the Hardness of Inferring Phylogenies from
Triplet-Dissimilarities
• Ilan Gronau Shlomo Moran
• Technion Israel Institute of Technology
• Haifa, Israel

2
Pairwise-Distance Based Reconstruction
DT
E
M
L
G
H
B
3
Optimization Criteria
We wish the tree-metric DT to approximate
simultaneously the pairwise distances in D.
should be close to
D
DT
Two closeness measures studied here
Maximal Difference (l8 )
• Maximal Distortion

4
Maximal Difference (l8 ) vs. Maximal Distortion
B E G H L M
D
DT
B E G H L M
Goal Find optimal T, which minimizes the
maximal difference/distortion between D and DT
5
Previous works on Approximating Dissimilarities
by Tree Distances
• Negative results (NP-hardness)
• Closest tree-metric (even ultrametric ) to
dissimilarity matrix under l1 l2 Day 87
• Closest tree-metric to dissimilarity matrix
under l8 ABFPT99
• Hard to approximate better than 1.125
• Implicit Hard to approximate closest MaxDist
tree within any constant factor
• Positive results
• Closest ultrametric to dissimilarity matrix
under l8 Krivanek 88
• 3-approximation of closest additive metric to a
given metric ABFPT99
• (implicit 6-approximation for general
dissimilarity matrices)

6
This Work Triplet-Distances Distances to
Triplets Midpoints
C(i,j,k)
tT (i jk)
• tT (i jk) tT (i kj)
• tT (i ij) 0
• tT (i jj) DT (i, j)

i
k
j
7
Triplet-Distances Defined by 2-Distances
• Each distance Matrix D defines 3-trees

t(i jk) ½D(i,j)D(i,k)-D(j,k).
i
Any metric on 3 taxa
8
9
j
7
k
8
Triplet-Distance Based Reconstruction
t(i jk) ½D(i,j)D(i,k)-D(j,k).
BB BE BG.. LL LM MM
B E G H L M
reconstruct
?
9
Why use Triplet-Distances?
1. They enable more accurate estimations of
2-distances. 2. They are used (de facto) by known
reconstruction algorithms
10
Improved Estimations of Pairwise Distances
Information Loss
D
Calculate D(H,E)
11
Improved Estimations (cont)
• Estimate D(H,E) by calculating all the 3-trees on
H,E,XX?H,E
• (Or calculate just one 3-tree, for a trusted
3rd taxon X
• V. Ranwez, O. Gascuel, Improvement of
distance-based phylogenetic methods by a local
maximum likelihood approach using triplets,
Mol.Biol. Evol. 19(11) 19521963. (2002)

12
(Implicit) use of Triplet-Distances in
2-Distance Reconstruction Algorithms
t(i jk) ½D(i,j)D(i,k)-D(j,k).
13
1st use Triplet Distances from a Single
Source
• Fix a taxon r, and construct a tree T which
minimizes
• Optimal solution is doable in O(n2) time, and is
used eg in
• (FKW95) Optimal approximation of distances by
ultrametric trees.
• (ABFPT99) The best known approximation of
distances by general trees
• (BB99) Fast construction of Buneman trees.

14
2nd useSaitouNei Neighbour Joining
The neighbors-selection criterion of NJ selects a
taxon-pair i,j which maximizes the sum
r
r
i
r
r
r
r
j
r
r
15
Previous Works on Triplet-Dissimilarities/Distanc
es
• I. Gronau, S. Moran Neighbor Joining Algorithms
for Inferring Phylogenies via LCA-Distances,
Journal of Computational Biology 14(1) pp. 1-15
(2007).
• Works which use the total weights of 3 trees
• S. Joly, GL Calve, Three Way Distances, Journal
of Classification 12 pp. 191-205 (1995)
• L. Pachter, D. Speyer Reconstructing Trees from
Subtrees Weights , Applied Mathematics Letters 17
pp. 615-621 (2004)
• D. Levy, R. Yoshida, L. Pachter, Beyond pairwise
distances Neighbor-joining with phylogenetic
diversity estimates, Mol. Biol. Evol. 23(3)
491498 (2006) .

16
Summary of Results
• Results for Maximal Difference (l8)
• Decision problem is NP-Hard
• ? IS there a tree T s.t. t,tT 8 ? ?
• Hardness-of-approximation of optimization problem
• ? Finding a tree T s.t. t,tT 8
1.4t,tOPT8
• A 15-approximation algorithm
• ? Using the 6-approximation algorithm for
2-dissimilarities from ABFPT99
• Result for Maximal Distortion
• Hardness-of-approximation within any constant
factor

17
NP Hardness of the Decision Problem
We use a reduction from 3SAT (the problem of
determining whether a 3CNF formula is
satisfiable)
We show
If one can determine for (t,?) whether there
exists a tree T s.t. t,tT 8 ?, then one can
determine for every 3CNF formula f whether it is
satisfiable.
18
The Reduction
Given a 3CNF formula f we define triplet
distances ? and an error bound ? which enforce
the output tree to imply a satisfying assignment
to f.
• The set of taxa
• Taxa T , F.
• A taxon for every literal ( ).
• 3 taxa for every clause Cj ( y j1 , y j2 , y j3
).

19
Properties Enforced by the Input (?,?)
• One the following can be enforced on each taxa
triplet (u,v,w)
• taxon u is close to Path(v,w), or
• taxon u is far to Path(v,w)

u
20
Enforcing Truth Assignmaent
• A truth assignment to f is implied by the
following
• T is far from F
• For each i, is far from , and both of
and are close to Path(T ,F)

Thus we set xi T iff xi is close to T.
21
Enforcing Clauses-Satisfaction
A clause C( l 1 ? l 2 ? l 3 ) is satisfied iff
At least one literal l i is true, i.e. is close
to T.
(l 1 ? l 2 ? l 3 ) is satisfied iff it is not
like this
We need to guarantee that all clauses avoid the
above by the close/far relations.
22
Clauses-Satisfaction (cont)
-?(l 1 ? l 2 ? l 3 ) is satisfied iff out of the
three paths Path(l 1 , l 2), Path(l 1 , l 3),
Path(l 2 , l 3), at least two paths are close
to T .
l 3
T
F
l 1
l 2
23
Clauses-Satisfaction (cont)
We attach a taxon to each such path y1 is
close to Path ( l 2,l 3) y2 is close to Path (
l 1,l 3) y3 is close to Path ( l 1,l 2)
?(l 1 ? l 2 ? l 3 ) is satisfied iff at least
two yis can be located close to T.
24
Clauses-Satisfaction (end)
and, at least two of the yis can be located
close to T Path( y 2,y 3), Path( y 1,y 3),
Path( y 1,y 2), are close to T
So, (l 1 ? l 2 ? l 3 ) is satisfied iff all the
above paths are close to T
25
Construction Example
f is satisfiable ? there is a tree T which
satisfies all bounds
A1 tT (T , F ) 2a2ß A2 i1..n
tT (T ) a tT (F
) a B1 j1..m tT (y j1 l j2 l j3 )
a tT (y j2 l j1 l j3 ) a tT (y
j3 l j1 l j2 ) a B2 j1..m tT (y j1
T F ) a tT (y j2 T F ) a tT
(y j3 T F ) a B3 j1..m tT (T y j2
y j3 ) a tT (T y j1 y j3 ) a
tT (T y j1 y j2 ) a
26
Hardness of Approximation Results
By stretching the close/far restrictions, the
following problems are also shown NP hard
• Approximating Maximal Difference
• Finding a tree T s.t. t,tT 8 1.4t,tOPT8
• Approximating Maximal Distortion
• Finding a tree T s.t.
• MaxDist(t,tT ) C MaxDist(t,tOPT) for any
constant C

Details in I. Gronau and S. moran, On The
Hardness of Inferring Phylogenies from
Triplet-Dissimilarities, Theoretical Computer
Science 389(1-2), December 2007, pp. 44-55.
27
Open Problems/Further Research
• Extending hardness results for 3-diss tables
induced by 2-diss matrices
• (t(i jk) ½D(i,j)D(i,k)-D(j,k) )
• Extending hardness results for naturally
looking trees
• (binary trees with constant-bounded edge
weights)
• Check Performance of NJ when neighbor selection
formula computed from real 3-distances.
• Devise algorithms which use 3-distances as input.
• Does optimization of 3-diss lead to good
topological accuracy (under accepted models of
sequence evolution)
• (it is known that optimization of 2-diss doesnt

28
Thank You
29
Distance-Based Phylogenetic Reconstruction
• Compute distances between all taxon-pairs
• Find a tree (edge-weighted) best-describing the
distances

30
Optimization Criteria
• Known measures of closeness
• l8 -
• lp -
• MaxDist -

( where 0/01 )
31
The Reduction
f
, ?
3CNF formula
There is a tree T s.t. t,tT 8 ?
f is satisfiable
If one can determine for (t,?) whether there
exists a tree T s.t. t,tT 8 ?, then one can
determine for every 3CNF formula f whether it is
satisfiable.
32
The Reduction
Define a set of lower and upper bounds A1 tT (T
, F ) 2a2ß A2 i1..n tT (T
) a tT (F ) a B1
j1..m tT (y j1 l j2 l j3 ) a tT (y
j2 l j1 l j3 ) a tT (y j3 l j1 l j2 )
a B2 j1..m tT (y j1 T F ) a
tT (y j2 T F ) a tT (y j3 T F )
a B3 j1..m tT (T y j2 y j3 ) a
tT (T y j1 y j3 ) a tT (T y j1 y j2
) a
33
The Reduction
f
tu
2?
,
3CNF formula
There is a tree T s.t. tl tT tu
f is satisfiable
If one can determine for (t,?) whether there
exists a tree T s.t. t,tT 8 ?, then one can
determine for every 3CNF formula f whether it is
satisfiable.
34
The Reduction
• Define the set of taxa.
• Define a set of lower and upper bounds on some
entries of tT.
• f is satisfiable ? there is a tree T which
satisfies all bounds
• Define ? according to the slackness required for
the proof of ?.

35
The Reduction
• Define the set of taxa
• Taxa T , F.
• A taxon for every literal ( ).
• 3 taxa for every clause ( y j1 , y j2 , y j3 ).

36
The Analysis
A1 tT (T , F ) 2a2ß A2 i1..n tT
(T ) a tT (F )
a
• Trees satisfying A1 and A2 imply a
truth-assignment to x1 ,..., xn.

37
The Analysis
B1 j1..m tT (y j1 l j2 l j3 ) a tT
(y j2 l j1 l j3 ) a tT (y j3 l j1 l
j2 ) a B2 j1..m tT (y j1 T F )
a tT (y j2 T F ) a tT (y j3 T F
) a B3 j1..m tT (T y j2 y j3 )
a tT (T y j1 y j3 ) a tT (T y j1
y j2 ) a
There is a tree T which satisfies all bounds ? f
is satisfiable
• B1 and B2 imply that y ja l jb l jc for
a,b,c1,2,3.
• B3 implies that at least two of y j1, y j2, y j3
are satisfied.

38
The Reduction t(f)
A1 tT (T , F ) 2a2ß A2 i1..n
tT (T ) a tT (F
) a B1 j1..m tT (y j1 l j2 l j3 )
a tT (y j2 l j1 l j3 ) a tT (y
j3 l j1 l j2 ) a B2 j1..m tT (y j1
T F ) a tT (y j2 T F ) a tT
(y j3 T F ) a B3 j1..m tT (T y j2
y j3 ) a tT (T y j1 y j3 ) a
tT (T y j1 y j2 ) a
• In our constructed tree
• All 2-distances are in 2a , 2a2ß.
• All 3-distances are in a , a2ß.
• ? ?ß.

A1 t(T , F ) 2a3ß A2 i1..n t(T
) a-ß t(F )
a-ß B1 j1..m t(y j1 l j2 l j3 ) a-ß
t(y j2 l j1 l j3 ) a-ß t(y j3 l j1 l j2
) a-ß B2 j1..m t(y j1 T F ) aß
t(y j2 T F ) aß t(y j3 T F )
aß B3 j1..m t(T y j2 y j3 ) a-ß
t(T y j1 y j3 ) a-ß t(T y j1 y j2 )
a-ß Other 2-distances t(s , t )
2a2ß Other 3-distances t(s t u ) a2ß