Species Trees - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Species Trees

Description:

Species Trees & Constraint Programming: recent progress and new challenges. By Patrick Prosser ... Ornithology' 119:88-108 2002. 7 trees of seabirds (A through G) ... – PowerPoint PPT presentation

Number of Views:97
Avg rating:3.0/5.0
Slides: 47
Provided by: Chr466
Category:

less

Transcript and Presenter's Notes

Title: Species Trees


1
Species Trees Constraint Programmingrecent
progress and new challenges
  • By Patrick Prosser
  • Presented by Chris Unsworth

2
Outline
  • Tree of life (whats that then?)
  • Previous work (conventional and CP model)
  • Whats new? (enhanced model, new problems)
  • Conclusions (what have I told you!?)
  • Future work (will this never end?)

3
Tree of life
  • A central goal of systematics
  • construct the tree of life
  • a tree that represents the relationship between
    all living things
  • including constraint programmers
  • The leaf nodes of the tree are species
  • The interior nodes are hypothesized species
  • extinct, where species diverged

4
(No Transcript)
5
Not to be confused with this
6
Not to be confused with this
7
Not to be confused with this either
8
Something like this
9
(No Transcript)
10
To date, biologists have cataloged about 1.7
million species yet estimates of the total number
of species ranges from 4 to 100 million.
Of the 1.7 million species identified only about
80,000 species have been placed in the tree of
life E. Pennisi Modernizing the Tree of
Life Science 3001692-1697 2003
11
Properties of a Species Tree
  • We have a set of leaf nodes, each labelled with a
    species
  • the interior nodes have no labels (maybe)
  • each interior node has 2 children and one parent
    (maybe)
  • a bifurcating tree
  • Note recently there has been a requirements that
  • interior nodes have divergence dates
  • leaf nodes correspond to other trees (such as a
    leaf cats)
  • trees might not bifurcate

12
Super Trees
  • We are given two trees, T1 and T2
  • S1 and S2 are the sets of leaves for T1 and T2
    respectively
  • remember, leaves are species!
  • S1 and S2 have a non-empty intersection
  • some species appear in both trees
  • We want to combine T1 and T2
  • form a super tree

13
superTree
combine
14
Overlap is highlighted in the trees and the
superTree
15
A simple wee example
16
Most Recent Common Ancestors (mrca)
mrca(a,c) mrca(b,c)
mrca(a,b)
We have 3 species, a, b, and c
Species a and b are more closely related to each
other than they are to c
mrca(a,b) ? mrca(a,c) mrca(a,b) ? mrca(b,c)
mrca(a,c) ? mrca(b,c)
The most recent common ancestor of a and b is
further from the root than the most recent
common ancestor of a and c (and b and c)
NOTE mrca(x,y) mrca(y,x)
17
Most Recent Common Ancestors (mrca)
mrca(a,c) mrca(b,c)
mrca(a,b)
mrca(a,b) ? mrca(a,c) mrca(a,b) ? mrca(b,c)
mrca(a,c) ? mrca(b,c)
Note this
defines that
18
Ultrametric relationship
Given 3 leaf nodes labelled a, b, and c there
are only 4 possible situations
19
(No Transcript)
20
(No Transcript)
21
Thats all that there can be, for 3 leafs
22
Ultrametric relationship
Given 3 leaf nodes labelled a, b, and c there
are only 4 possible situations
We can represent this using primitive constraints
Where Di,j is a constrained integer variable
representing the depth in the tree of the most
recent common ancestor of the ith and jth species
23
Ultrametric constraint
Therefore the ultrametric constraint is as follows
24
How it goes (part 1)
Conventional technology (circa 1981)
  • Take 2 species trees T1 and T2
  • Use the breakUp algorithm (Ng Wormald 1996)
    on T1 then T2
  • - This produces a set of triples and fans
  • Use the oneTree algorithm (Ng Wormald 1996)
  • - Generates a superTree or fails

This is the conventional (non-CP) approach
Different versions of oneTree and breakUp from
Semple and Steel (I think) that treats fans
differently (ignores them)
oneTree is essentially the algorithm of Aho,
Sagiv, Szymanski and Ullman in SIAM J.Compt 1981
25
How it goes (part 2)
CP approach (circa 2003)
  • Generate an n by n array of constrained integer
    variables
  • For all 0ltiltjltkltn post the ultrametric constraint
  • - Yes, we have a cubic number of constraints
  • - Yes, we have a quadratic number of variables
  • - This gives us an ultrametric matrix
  • Use breakUp on trees T1 and T2 to produce
    triples and fans
  • Post the triples and fans as constraints,
    breaking disjunctions
  • Find a first solution
  • Convert the ultrametric matrix to an ultrametric
    tree

Algorithm for ultrametric matrix to ultrametric
tree given by Dan Gusfield
This is the CP approach proposed by Gent,
Prosser, Smith Wei in CP03 (a great great
paper, go read it ?)
26
An min ultrametric tree and its min ultrametric
matrix
Matrix value is the value of the most recent
common ancestor of two leaf nodes
As we go down a branch values on interior nodes
decrease
Dont worry about it ?
27
The state of play in 2003
  • Coded up in claire choco
  • more a proof of concept than a useful tool
  • small data sets only

28
Two species trees of sea birds from the CP03 paper
29
Resultant superTree On the left by oneTree and on
the right by CP model
30
Whats new
2006
  • Reimplemented in java JChoco (so faster)
  • More robust (thanks to Pierre Fleners help)
  • Can now deal with larger trees (about 70 species)
  • Can generate all solutions up to symmetry
  • Can handle divergence dates on interior nodes
  • Reimplemented breakUp oneTree in Java
  • All code available on the web

31
(No Transcript)
32
Bigger Trees
Attempted to reconstruct the supertree in Kennedy
Pages Seabird supertrees Combining partial
estimates of rocellariiform phylogeny in The
Auk A Quarterly Journal of Ornithology
11988-108 2002
  • 7 trees of seabirds (A through G)
  • Varying in size from 14 to 90 species

33
From the paper
Table shows on the diagonal the size of each
tree, A through G A table entry is the size of
the combined tree A table entry in () if trees
are incompatible A table entry of if trees are
too big for CP model
The only compatible trees are A, B, D and F The
resultant supertree has 69 species This takes 20
seconds to produce
34
(No Transcript)
35
A lifted representation
Rather than instantiate the D variables why
not just break the disjunctions?
Now the decision variables are Pi,j,k
And yes, we have a cubic number of P variables
36
A lifted representation
Rather than instantiate the D variables why
not just break the disjunctions?
Now the decision variables are Pi,j,k
  • Now we can
  • Enumerate all solutions eliminating value
    symmetries
  • Allow ranges of values on interior nodes of trees
  • - input and output!

37
Ranked Trees
A new problem where input trees have ancestral
divergence dates on interior nodes
A new conventional technique is the RANKED TREE
algorithm
38
Ranked Trees using lifted CP model
A new problem where input trees have ancestral
divergence dates on interior nodes
We do this in the lifted model by merely 1.
reading in divergence dates for pairs of species
and posting these as constraints into the D
variables 2. Then solve using the disjunction
breaking P variables 3. Interior nodes retain
range values 4. In addition can enumerate all
solutions eliminating value symmetries
39
Two trees of cats. Ranks (divergence information)
on interior nodes Common species in boxes
40
Two ranked cats trees on left, and on the right
one of the ranked supertrees
NOTE range of values 6..9 on mrca(PTE,LTI)
41
7 of the 17 solutions have ranges on interior
nodes Without the lifted representation we get
30 solutions (some redundant)
42
Is this a 1st?
  • We thinks so (or at least Patrick thinks so)
  • enumerate all solutions for ranked supertrees
  • remove value symmetries

43
What next?
Reduce the size of the model. Improve
propagation of ultrametric constraint Identify
common features (back bone) of all
supertrees Already underway with Neil Moore
44
Conclusion
  • presented a new (non-conventional) way of
    addressing the supertree problem
  • constraint model has been shown to be versatile
  • enumerate all solutions removing symmetries
  • address divergence dates on interior nodes
  • again enumerate all solutions for ranked trees
  • however, model is bulky/large
  • we are working on this
  • future extensions
  • find the backbone of forest of supertrees
  • address nested taxa

45
Thanks for helping
  • Pierre Flener
  • Xavier Lorca
  • Rod Page
  • Mike Steel
  • Charles Semple
  • Chris Unsworth
  • Neil Moore
  • Christine Wu Wei
  • Barbara Smith
  • Ian Gent

46
Any questions?
Write a Comment
User Comments (0)
About PowerShow.com