Binary Encoding and Gene Rearrangement Analysis

About This Presentation

Title:

Binary Encoding and Gene Rearrangement Analysis

Description:

Binary Encoding and Gene Rearrangement Analysis Jijun Tang Tianjin University University of South Carolina jtang_at_cse.sc.edu (803) 777-8923 – PowerPoint PPT presentation

Number of Views:60

Avg rating:3.0/5.0

Slides: 25

Provided by: buell

Category:

more less

Transcript and Presenter's Notes

Title: Binary Encoding and Gene Rearrangement Analysis

1
Binary Encoding and Gene Rearrangement Analysis

Jijun Tang
Tianjin University
University of South Carolina
jtang_at_cse.sc.edu
(803) 777-8923

2
Outline

Backgrounds
Maximum Likelihood Methods for Phylogenetic
Reconstruction
Maximum Likelihood Methods for Ancestral Genome
Inferrence
Conclusions

3
Phylogenetic Reconstruction
4
Data Type

Sequence Data
DNA/RNA/Protein Sequences
String on an alphabet of 4 or 20 characters
Gene-Order Data

5
(No Transcript)
6
Simple Rearrangements
7
Rearrangement Phylogeny
8
(No Transcript)
9
(No Transcript)
10
Median Problem
Goal find M so that DAMDBMDCM is minimized NP
hard for most metric distances
11
Binary Encoding
12
Biased Model

Model of evolution
Duplications, insertions and deletions of
syntenic blocks
Rearrangements inversions, translocations,
fusions, fissions
Binary sequences 1(presence) vs. 0(absence)
Adjacency Pr (1 -gt0) vs. Pr (0 -gt 1)
Gene content Pr (1 -gt 0) vs. Pr (0 -gt 1)
Strong bias
Pr (1 -gt0) gtgt Pr (0 -gt1) for adjacency
Lose an existing adjacency Pr (1-gt0) ? 1/O(n)
Gain a new adjacency Pr (0 -gt 1) ? 1/O(n2)

13
ML Phylogenetic Reconstruction
14
Simulated Results
15
Ancestral Inference

Step 1. Encoding gene orders into binary
sequences.
Step 2. Setup the biased transition model.
Step 3. Arrange target ancestor to the root, and
calculate the probabilities of character states
for each character in the root.
Step 4. Building the adjacency graph and use a
greedy heuristic to assemble adjacencies into
valid gene order for the target ancestor.

16
Step 3 Root Tree

Probabilities are calculated with a bottom-up
recursive manner, so the target ancestor is
placed to the root to prevent information loss.

17
Step 3 Probabilities of Adjacencies

Likelihood of a tree given sequence data at
leaves can be computed (Felsenstein1981)

0
1
1
0
W
X
Y
Z
W
X
Y
Z
Pick one tree
Pick one site
01
01
01

18
Step 3 Probabilities of Adjacencies

Posterior probabilities of character states (0
and 1) can be calculated according to Yang
(Yang1995).
This is calculated by summing over all other
ancestral states except root

8 histories
4 histories
4 histories
19
Step 4 Assemble Adjacencies

Independent adjacencies are assembled into valid
gene order permutations by a greedy heuristic
proposed by Jian Ma (Ma2007).
Sort the edges by weight.
Add the current heaviest edge to the path until a
cycle is formed, then repeat the process until
all vertices are traversed.
Remove the lightest edge in each cycle.

(1 -4 -3 5 2)
20
Simulation Result

Transition model and reroot procedure are
necessary

21
Results-2

PMAG was compared with InferCarsPro (Ma2011) and
GRAPPA_DCJ(Xu2008)

22
Tests on Large Scale Dataset
Genome Gene Tree Diameter Tree Diameter Tree Diameter Tree Diameter
Genome Gene 1n 2n 3n 4n
PMAG 20 10000
23
Conclusions

ML on Binary Encoding is more accurate and
thousands of times faster than other methods
Binary encoding reduces the complexity and allows
us to using existing methods for sequence data
Biased transition model and rerooting procedure
are very useful
Future work
Extend PMAG to handle a more general model of
evolution, including gene indel and duplication
Missing Adjacencies?

24
Thank You!

Write a Comment

User Comments (0)