Parallel Bayesian Phylogenetic Inference presentation

About This Presentation

Transcript and Presenter's Notes

Title: Parallel Bayesian Phylogenetic Inference

1
Parallel Bayesian Phylogenetic Inference

Xizhou Feng
Directed by Dr. Duncan Buell
Department of Computer Science and Engineering
University of South Carolina, Columbia
2002-10-11

2
Topics

Background
Bayesian phylogenetic inference and MCMC
Serial implementation
Parallel implementation
Numerical result
Future research

3
Background

Darwin Species are related through a history of
common descent, and
The history can be organized as a tree structure
(phylogeny).
Modern species are put on the leaf nodes
Ancient species are put on the internal nodes
The time of the divergence is described by the
length of the branches.
A clade is a group of organisms whose members
share homologous features derived from a common
ancestor.

4
Phylogenetic tree
Internal node Ancestral species
Clade
Leaf node Current species
Branch
Branch length
5
Applications

Phylogenetic tree is
fundamental to understand evolution and diversity
Principle to organize biological data
Central to organism comparison
Practical examples
Resolve quarrel over bacteria-to-human gene
transfers (Nature 2001)
Tracing route of infectious disease transmission
Identify new pathogens
Phylogenetic distribution of biochemical pathways

6
Use DNA data for phylogenetic inference
7
Objectives of phylogenetic inference
output
input

Major objectives include
Estimate the tree topology
Estimate the branch Length, and
Describe the credibility of the result

8
Phylogenetic inference methods

Algorithmic methods
Defining a sequence of specific steps that lead
to the determination of a tree
e.g. UPGMA (unweighted pair group method using
arithmetic average) and Neighbor-Joining
Optimality criterion-based methods
1) Define a criterion 2)Search the tree with
best values
Maximum parsimony (minimize the total tree
length)
Maximum likelihood (maximize the likelihood)
Maximum posterior probability (the tree with the
highest probability to be the true tree)

9
Common Used phylogeny methods
Algorithm
Statistical Supported
Search Strategy
Bayesian Methods
Stochastic search
Maximum Likelihood
Optimization method
Divide Conquer
Maximum Parsimony
Greedy search
GA, SA MCMC
Exact search
Fitch-Margolish
DCM, HGT, Quartet
Algorithmic method
Stepwise addition Global arrangement Star
decomposition
Exhaustive Branch Bound
Neighbor-join
UPGMA
Data set
Distance matrix
Character data
10
Aspects of phylogenetic methods

Accuracy
Is the constructed tree a true tree?
If not, the percentage of the wrong edges?
Complexity
Neighbor-Join O(n3)
Maximum Parsimony (provably NP hard)
Maximum Likelihood (conjectured NP hard)
Scalability
Good for small tree, how about large tree

Robustness
If the model or assumption or the data is not
exact correct, how about the result?
Convergence rate
How long a sequence is needed to recover the true
tree?
Statistical support
With what probability is the computed tree the
true tree?

11
The computational challenge

gt1.7 million known species
Number of trees increase exponentially as new
species was added

The complex of evolution
Data Collection Computational system

Compute the tree of life Source
http//www.npaci.edu/envision/v16.3/hillis.html
12
Topics

Background
Bayesian phylogenetic inference and MCMC
Serial implementation
Parallel implementation
Numerical result
Future research

13
Bayesian Inference-1

Both observed data and parameters of models are
random variables
Setting up the joint distribution

Topology
Branch length
Parameter of models

When data D is known, Bayes theory gives

likelihood
Prior probability
Posterior probability
Unconditional Probability of data
14
Bayesian Inference-2

Having the posterior probability distribution ,
we can compute the marginal probability of T as

P(TD) can be interpreted as the probability of
the tree is correct
We need to do at least two things
Approximate the posterior probability
distribution
Evaluate the integral for P(TD)
These can be done via Markov Chain Monte Carlo
Method

15
Markov chain Monte Carlo (MCMC)

The basic idea of MCMC is
To construct a Markov chain such that
Have the parameters as the state space, and
the stationary distribution is the posterior
probability distribution of the parameters
Simulate the chain
Treat the realization as a sample from the
posterior probability distribution
MCMC sampling continue search

16
Markov chain

A Markov chain is a sequence of random variables
X0, X1, X2, whose transition kernel T(Xt,
Xt1) is determined only by the value of Xt
(tgt0).
Stationary distribution ?(x)Sx(?(x)T(x,x))
is invariant
Ergodic property pn(x) converges to ?(x) as n??
A homogeneous Markov chain is ergodic if
min(T(x,x)/ ?(x)gt0

17
Metropolis-Hasting algorithm-1

Cornerstone of all MCMC methods, Metropolis(1953)
Hasting proposed a generalized version in (1970)
The key point is to how to define the accepted
probability
Metropolis
Hasting

Proposal probability Can be any form Such that
18
Metropolis-Hasting algorithm-2

Initialize x0, set t0
Repeat
Sample x from T(xt, x)
Draw Uuniform0,1
Update

19
Problems of MH Algorithm Improvement

Problems
Mixing rate is slow when
Small step-gtlow movement
Larger step-gtlow acceptance
Stopped at local optima
Dimension of state space may vary
Improvement
Metropolis-coupled MCMC
Multipoint MCMC
Population-based MCMC
Time-reversible jump MCMC

20
Metropolis-coupled MCMC (Geyer 1991)

Run several MCMC chains with different
distribution ?i(x) (i1..m) in parallel
?1(x) is used to sampling
?i(x) (i2..m) are used to improve mixing
For example ?i(x) ?(x)1/(1?(I-1))
After each iteration, attempt to swap the states
of two chains using a Metropolis-Hasting step
with acceptance probability of

21
Illustration of Metropolis-coupled MCMC
?1(x)T10
?2(x) T22
?3(x) T34
?4(x) T48
Metropolis-coupled MCMC is also called Parallel
tempering
22
Multiple Try Metropolis (Liu et al 2000)
23
Population-based MCMC

Metropolis-coupled MCMC uses a minimal
interaction between multiple chains, why not more
active interaction
Evolutionary Monte Carlo (Liang et al 2000)
Combine Genetic Algorithm with MCMC
Used to Simulate protein folding
Conjugate Gradient Monte Carlo (Liu et al 2000)
Use local optimization for adaptation
An improvement of ADS (Adaptive Direction
Sampling)

24
Topics

Background
Bayesian phylogenetic inference and MCMC
Serial implementation
Choose the evolutionary model
Compute the likelihood
Design proposal mechanisms
Parallel implementation
Numerical result
Future research

25
DNA substitution rate matrix
Purine
transition
transversion
Pyrimidine

Consider inference of un-rooted tree and the
computational complications, some simplified
models are used (see next slide)

26
GTR-family of substitution models
GTR

GTR general time-reversible model, corresponding
to a symmetric rate matrix.

Three substitution type (1 transversion, 2
transition)
Equal base frequencies
TN93
SYM
Two substitution types (transition v.s.
tranversion)
Three substitution type (1 transversion, 2
transition)
HKY85 F84
K3ST
Equal base frequencies
Two substitution types (transition v.s.
tranversion)
Single substitution type
K2P
F81
Single substitution type
Equal base frequencies
JC69
27
More complex models

Substitute rates vary across sites
Invariable sites models gamma distribution
Correlation in the rates at adjacent sites
Codon models 61X61 instantaneous rate matrix
Secondary structure models

28
Compute conditional probability of branch

Given substitution rate matrix, how to compute
p(ba,t)-the probability of a is substituted by b
after time t

Eigenvalue of Q
29
Likelihood of a phylogeny tree for one site
When x4 x5 are known,
When x4 x5 are unknown,
30
Likelihood calculation (Felsenstein 1981)

Given a rooted tree with n leaf nodes (species) ,
and each leaf node is represented by a sequence
xi with length N, the likelihood of a rooted tree
is represented as

31
Likelihood calculation-2

Felsensteins algorithms for likelihood
calculation(1981)
Initiation
Set k2n-1
Recursion Compute for all a as follows
If k is a leaf node
Set if
Set if .
If k is not a leaf node
compute for all a its children
nodes i, j.
And set
Termination Likelihood at site u is

Note algorithm modified from Durbin et al (1998)
32
Likelihood calculation-3
Site 1
Site 2
Site m-1
Site m-2
Taxa-1
Taxa-2
Taxa-3
Taxa-n

The likelihood calculation requires filling an N
X M X S X R table
N number of sequences M number of sites
S number of state of characters R number of
rate categories

33
Local update Likelihood

If we just change the topology and branch length
of tree locally, we only need refresh the table
at those affected nodes.
In the following example, only the nodes with red
color need to change their conditional likelihood.

Original tree
Proposed tree
34
Proposal mechanism for trees

Stochastic Nearest-neighbor interchange (NNI)
Larget et al (1999) Huelsenbeck (2000)

35
Proposal mechanisms for parameters
Larget et al. (1999)

Independent parameter
e.g. transition/tranversion ratio k

A set of parameters constrained to sum to a
constant
e.g. base frequency distribution
Draw a sample from the Dirichlet distribution

36
Bayesian phylogenetic inference
Prior probability
DNA Data
Likelihood
Evolutionary model
Phylogenetic tree
Posterior prob.
Starting tree
Proposal
inference
MCMC
A sequence of Samples
Approximate the distribution
37
Topics

Background
Bayesian phylogenetic inference and MCMC
Serial implementation
Parallel implementation
Challenges of serial computation
Difficulty
MCMC is a serial algorithm
Multiple chains need to be synchronized
Choose appropriate grid topology
Synchronize using random number
Numerical result
Future research

38
Computational challenge

Computing global likelihood needs O(NMRS2)
multiplications
Local updating topology branch length needs
O(MRS2log(N))
Updating model parameter needs O(NMRS2)
local update needs all required data in memory
Given N1000 species, each sequence has M5000
sites, rate category R5, and DNA nucleotide
model S4
Run 5 chains each with length of 100 million
generations
Needs 400 days (assume 1 global updates, 99
local update)
And O(NMRSLX2X2X8)32Gigabyte memory
gtSo until more advanced algorithms are
developed, parallel computation is the direct
solution.
Use 32 processor with 1 gigabyte memory, we can
compute the problem in 2 weeks

39
Characteristic of good parallel algorithms

Balancing workload
Concurrency identify, manage, and granularity
Reducing communication
Communication-to-computation ratio
Frequency, volume, balance
Reducing extra work
Computing assignment
Redundant work

40
Single-chain MCMC algorithm
41
Multiple-chain MCMC algorithm
Chain 1
Chain 2
Chain 3
Chain 4
Generate S1(0) t0
Generate S2(0) t0
Generate S3(0) t0
Generate S4(0) t0
Propose Update S1(t)
Propose Update S2(t)
Propose Update S3(t)
Propose Update S4(t)
choose two chains to swap
Compute R and U
UltR
Swap the two selected chains
tt1
tt1
tt1
tt1
42
Task Concurrency

In one likelihood evaluation, the task can be
divided into n_chain x n_site x n_rate concurrent
tasks

43
Task decomposition and assignment
Concurrent tasks
Processors virtual topology

We organize the processor pool into an nx X ny X
nz 3D grid
Each grid runs n_chain/ny chains, each chain has
n_site/nx sites, and each site has n_rate/nz
category rate. (my current implementation dosent
consider rate dimension, so the topology is a 2D
grid).
Choose nx, ny, and nz such that each processor
has equal load

44
Parallel configuration Modes

Chain-level parallel configuration (Mode I)
Each chain runs on one row
Number of chains number of rows
Site-level parallel configuration (Mode II)
All chains run on the same row
Different subsequences run on different columns
Number of subsequences number of columns
Hybrid parallel configuration (Mode III)
Combine the first two configurations
Several chains may run on the same row and also
Divide the whole into segments

45
Orchestration

The data can be distributed at chain level and
sub-sequence level, spatial locality is always
preserved. At the same time, each process only
needs 1/(nxXny) original memory
Each step, each processor needs 1 collective
communication along its row and at most 2 (or 1)
direct communication if it is one of the selected
chain for swapping.
Processors at the same row can easily be
synchronized because they use the same random
number.

46
Communication between different rows

Processors at different rows need some trick to
synchronize without communication
The trick is to use two random number generators
the first one is used to generate the proposal,
processors at different row use different seeds
for random number generator.
The second one is used to synchronize processors
in different rows, using the same seed. So
processors know whats going on in the whole
grid.
To reduce the size of the message, the selected
processors swap message in two steps
Swap current likelihood and temperature
Compute the acceptance ratio r and draw a random
number U, if Ultr, then swap the tree and
parameters else continue to next step.

47
Symmetric parallel algorithm-1

Build grid topology
Initialize
Set seed for random number Rand_1(for proposal)
and Rand_2(for synchronize)
For each chain
t 0
Building random tree T0
Choose initial parameter ?0
Set initial state as x0
While (tltmax_generation) do
3.1 In-chain update
For each chain
Choose proposal type using Rand_2
Generate the proposal x
Evaluate log likelihood locally
Collect the sum of the local log likelihood
For each chain
Compute r using hasting transition kernel
Draw a sample u from uniform distribution using
Rand_1
If ultr set xx

48
Symmetric parallel algorithm-2

3.2 inter-chain swap
Draw two chain indexes I and J from Rand_2
If IJ go to 3.3
Map I, J to the processor coordinate row_I and
row_J
If row_I row_J
Compute acceptance ratio r
Draw u from uniform distribution using Rand_2
If ultr Swap state x and the temperature of
chain I and chain J
Go to 3.3
If row_I ! row_J
Swap log likelihood and temperature of chain I
and chain J between processors
on chain row_I and row_J
Compute acceptance r
Draw u from from uniform distribution using
Rand_2
If ultr Swap state x of chain I and chain J
between between processors
on chain row_I and row_J
3.3 tt1
3.4 if tsample_cycle0 output state x

49
Load Balancing

For simplicity, from now we just consider 2D grid
(x-y).
Along x-axis, the processors are synchronized
since they evaluate the same topology, same
branch length, and same local update step
But along y-axis, different chains will propose
different topologies, so according to local
likelihood evaluation, imbalance will occur.
In the parameter update step, we need global
likelihood evaluation, balancing is not a problem.

50
Asymmetric parallel algorithm

The symmetric algorithm assumes each processor
has the same performance and deals with the
imbalance between different rows by waiting and
synchronization.
An improvement is to use an asymmetric algorithm.
The basic idea is
Introduce a processor as the mediator
Each row send its state to the mediator after it
finished in-chain update
The mediator keeps the current states of
different rows, and decides which two chains
should swap
The selected chains get new states from the
mediator
Asymmetric algorithm can solve the unbalance
problem, but assumes that different chains dont
need to synchronize.

51
Topics

Background
Bayesian phylogenetic inference and MCMC
Serial implementation
Parallel implementation
Numerical result
Future research

52
The likelihood evolution of single chain
53
The likelihood evolution of 4 chains
54
Numerical result Case 1 run time
Dataset 7X2439 of chains4 generation10000 Pa
rallel Mode I
55
Numerical result Case 1s speedup
Dataset 7X2439 of chains4 generation10000 Pa
rallel Mode I
56
Speedup obtained by Mode II
Note Use the same dataset as the previous one.
But only parallel at sequence level.
57
Comparison of two parallel modes
58
Conclusion

Bayesian phylogenetic inference
Using MCMC sample posterior probability
Efficient parallel strategies
Stochastic algorithms vs. NP-problems

59
Future research directions

Improved parallel strategies
Connection between Bayesian Bootstrap
Performance of different Bayesian methods
Alternative MCMC methods
MCMC for other computational biology problems
The power of stochastic algorithms
Super tree construction

Write a Comment

User Comments (0)

About PowerShow.com

Parallel Bayesian Phylogenetic Inference PowerPoint PPT Presentation