Binary Encoding and Gene Rearrangement Analysis - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Binary Encoding and Gene Rearrangement Analysis

Description:

Binary Encoding and Gene Rearrangement Analysis Jijun Tang Tianjin University University of South Carolina jtang_at_cse.sc.edu (803) 777-8923 – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 25
Provided by: buell
Category:

less

Transcript and Presenter's Notes

Title: Binary Encoding and Gene Rearrangement Analysis


1
Binary Encoding and Gene Rearrangement Analysis
  • Jijun Tang
  • Tianjin University
  • University of South Carolina
  • jtang_at_cse.sc.edu
  • (803) 777-8923

2
Outline
  • Backgrounds
  • Maximum Likelihood Methods for Phylogenetic
    Reconstruction
  • Maximum Likelihood Methods for Ancestral Genome
    Inferrence
  • Conclusions

3
Phylogenetic Reconstruction
4
Data Type
  • Sequence Data
  • DNA/RNA/Protein Sequences
  • String on an alphabet of 4 or 20 characters
  • Gene-Order Data

5
(No Transcript)
6
Simple Rearrangements
7
Rearrangement Phylogeny
8
(No Transcript)
9
(No Transcript)
10
Median Problem
Goal find M so that DAMDBMDCM is minimized NP
hard for most metric distances
11
Binary Encoding
12
Biased Model
  • Model of evolution
  • Duplications, insertions and deletions of
    syntenic blocks
  • Rearrangements inversions, translocations,
    fusions, fissions
  • Binary sequences 1(presence) vs. 0(absence)
  • Adjacency Pr (1 -gt0) vs. Pr (0 -gt 1)
  • Gene content Pr (1 -gt 0) vs. Pr (0 -gt 1)
  • Strong bias
  • Pr (1 -gt0) gtgt Pr (0 -gt1) for adjacency
  • Lose an existing adjacency Pr (1-gt0) ? 1/O(n)
  • Gain a new adjacency Pr (0 -gt 1) ? 1/O(n2)

13
ML Phylogenetic Reconstruction
14
Simulated Results
15
Ancestral Inference
  • Step 1. Encoding gene orders into binary
    sequences.
  • Step 2. Setup the biased transition model.
  • Step 3. Arrange target ancestor to the root, and
    calculate the probabilities of character states
    for each character in the root.
  • Step 4. Building the adjacency graph and use a
    greedy heuristic to assemble adjacencies into
    valid gene order for the target ancestor.

16
Step 3 Root Tree
  • Probabilities are calculated with a bottom-up
    recursive manner, so the target ancestor is
    placed to the root to prevent information loss.

17
Step 3 Probabilities of Adjacencies
  • Likelihood of a tree given sequence data at
    leaves can be computed (Felsenstein1981)

0
1
1
0
W
X
Y
Z
W
X
Y
Z
Pick one tree
Pick one site
01
01
01
 
18
Step 3 Probabilities of Adjacencies
  • Posterior probabilities of character states (0
    and 1) can be calculated according to Yang
    (Yang1995).
  • This is calculated by summing over all other
    ancestral states except root

 
 
 
8 histories
4 histories
4 histories
19
Step 4 Assemble Adjacencies
  • Independent adjacencies are assembled into valid
    gene order permutations by a greedy heuristic
    proposed by Jian Ma (Ma2007).
  • Sort the edges by weight.
  • Add the current heaviest edge to the path until a
    cycle is formed, then repeat the process until
    all vertices are traversed.
  • Remove the lightest edge in each cycle.

(1 -4 -3 5 2)
20
Simulation Result
  • Transition model and reroot procedure are
    necessary

21
Results-2
  • PMAG was compared with InferCarsPro (Ma2011) and
    GRAPPA_DCJ(Xu2008)

22
Tests on Large Scale Dataset
Genome Gene Tree Diameter Tree Diameter Tree Diameter Tree Diameter
Genome Gene 1n 2n 3n 4n
PMAG 20 10000
23
Conclusions
  • ML on Binary Encoding is more accurate and
    thousands of times faster than other methods
  • Binary encoding reduces the complexity and allows
    us to using existing methods for sequence data
  • Biased transition model and rerooting procedure
    are very useful
  • Future work
  • Extend PMAG to handle a more general model of
    evolution, including gene indel and duplication
  • Missing Adjacencies?

24
Thank You!
Write a Comment
User Comments (0)
About PowerShow.com