GRAPPA: Large-scale whole genome phylogenies based upon gene order evolution - PowerPoint PPT Presentation

Loading...

PPT – GRAPPA: Large-scale whole genome phylogenies based upon gene order evolution PowerPoint presentation | free to download - id: c344e-ZDc1Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

GRAPPA: Large-scale whole genome phylogenies based upon gene order evolution

Description:

GRAPPA: Large-scale whole genome phylogenies based upon gene order evolution ... Center for Computational Biology and Bioinformatics. Whole-Genome Phylogenetics. A. B ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 31
Provided by: tandyw
Learn more at: http://www.cs.utexas.edu
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: GRAPPA: Large-scale whole genome phylogenies based upon gene order evolution


1
GRAPPA Large-scale whole genome phylogenies
based upon gene order evolution
  • Tandy Warnow, UT-Austin
  • Department of Computer Sciences
  • Institute for Cellular and Molecular Biology
  • Program in Evolution, Ecology, and Behavior
  • Center for Computational Biology and
    Bioinformatics

2
Whole-Genome Phylogenetics
3
Genomes As Signed Permutations
1 5 3 4 -2 -6or6 2 -4 3 5 1 etc.
4
Genomes Evolve by Rearrangements
1 2 3 4 5 6 7 8 9 10
  • Inversion (Reversal)
  • Transposition
  • Inverted Transposition

5
Genome Rearrangement Has A Huge State Space
  • DNA sequences 4 states per site
  • Signed circular genomes with n genes
    states, 1
    site
  • Circular genomes (1 site)
  • with 37 genes
    states
  • with 120 genes
    states

6
Why use gene orders?
  • Rare genomic changes huge state space and
    relative infrequency of events (compared to site
    substitutions) could make the inference of deep
    evolution easier, or more accurate.
  • Our research shows this is true, but accurate
    analysis of gene order data is computationally
    very intensive!

7
Phylogeny reconstruction from gene orders
  • Distance-based reconstruction estimate pairwise
    distances, and apply methods like
    Neighbor-Joining or Weighbor
  • Maximum Parsimony find tree with the minimum
    length (inversions, transpositions, or other edit
    distances)
  • Maximum Likelihood find tree and parameters of
    evolution most likely to generate the observed
    data

8
This talk
  • Distance-based methods we show how to estimate
    genomic distances appropriately for the
    Generalized Nadeau-Taylor model
  • Parsimony-style methods we can find very good
    solutions to NP-hard problems (inversion and
    breakpoint phylogeny) quite quickly
  • Validation of these approaches on real and
    simulated data

9
Distance-based methods
10
Genomic Distance Estimators
  • Standard
  • Breakpoint distance
  • (Minimum) Inversion distance
  • Our estimators We attempt to estimate the actual
    number of events (the true evolutionary
    distance)
  • EDE Moret et al, ISMB01
  • Approx-IEBP Wang and Warnow, STOC01
  • Exact-IEBP Wang, WABI01

11
Breakpoint Distance
  • Breakpoint distance5

1 2 3 4 5 6 7 8 9 10
1 3 2 4 5 9 6 7 8 10
12
Minimum Inversion Distance
  • Inversion distance3

1 2 3 4 5 6 7 8 9 10
1 2 3 8 7 6 5 4 9 10
1 8 3 2 7 6 5 4 9 10
1 8 3 7 2 6 5 4 9 10
13
Measured Distance vs. Actual Number of Events
Breakpoint Distance
Inversion Distance
120 genes, inversion-only evolution
14
Generalized Nadeau-Taylor Model
  • Three types of events
  • Inversions
  • Transpositions
  • Inverted Transpositions
  • Events of the same type are equiprobable
  • Probability of the three types have fixed ratio
  • Inv Trp Inv.Trp (1-a-b)ab

15
Estimating True Evolutionary Distances for Genomes
  • Given fixed probabilities for each type of
    event, we estimate the expected breakpoint
    distance after k random events, or the expected
    inversion distance after k random events.
    Inverting these functions gives us a better
    estimate of true evolutionary distances.
  • Approx-IEBP Wang and Warnow 2001
  • Exact-IEBP Wang 2001
  • EDE Moret et al, ISMB 2001 (inversion based)

16
Estimating True Evolutionary Distances for
Genomes (cont.)
  • Estimating the expected Inversion distance
  • EDE Moret, Wang, Warnow, Wyman 2001
  • Closed-form formula based upon an empirical
    estimation of the expected inversion distance
    after k random events (based upon 120 genes and
    inversion only, but robust to errors in the
    model) .
  • Polynomial time, fastest of the three.

17
Absolute Difference
  • 120 genes
  • Inversion only evolution
  • (Similar relative
  • performance under
  • other models)

18
Accuracy of Neighbor Joining Using Distance
Estimators
  • 120 genes
  • All three event types equiprobable
  • 10, 20, 40, 80, and 160 genomes
  • Similar relative
  • performance under
  • other models

19
Summary of Distance-based Reconstruction Methods
  • Statistically-based estimation of genomic
    distances improves NJ analyses.
  • Our IEBP estimators assume knowledge of the
    probabilities of each type of event, but are
    robust to model violations.
  • EDE is based upon an inversion-only evolutionary
    model, but is robust.
  • Best performing method Weighbor(EDE) second
    best is NJ(EDE) both are robust to model
    violations.
  • Worst performing is NJ(BP).
  • Accuracy is very good, except when very close to
    saturation.

20
Maximum Parsimony on Rearranged Genomes (MPRG)
  • The leaves are rearranged genomes.
  • Find the tree that minimizes the total number of
    rearrangement events

21
Optimization problems for gene order phylogeny
  • Breakpoint phylogeny find the phylogeny which
    minimizes the total number of breakpoints
    (NP-hard, even to find the median of three
    genomes)
  • Inversion phylogeny find the phylogeny which
    minimizes the sum of inversion distances on the
    edges (NP-hard, even to find the median of three
    genomes)

22
Inversion and Breakpoint phylogenies
  • When the data are close to saturated, even
    Weighbor(EDE) analyses are insufficiently
    accurate. In these cases, our initial
    investigations suggest that the inversion and
    breakpoint phylogeny approaches may be superior.
  • Problem finding the best trees is enormously
    hard, since even the point estimation problem
    is hard (worse than estimating branch lengths in
    ML).

Local optimum
MP score
Global optimum
Phylogenetic trees
23
GRAPPA (Genome Rearrangement Analysis under
Parsimony and other Phylogenetic Algorithms)
  • http//www.cs.unm.edu/moret/GRAPPA/
  • Heuristics for NP-hard optimization problems
  • Fast polynomial time distance-based methods
  • Contributors U. New Mexico,U. Texas at Austin,
    Universitá di Bologna, Italy
  • Freely available in source code at this site.
  • Project leader Bernard Moret (UNM)
    (moret_at_cs.unm.edu)

24
Benchmark gene order dataset Campanulaceae
  • 12 genomes 1 outgroup (Tobacco), 105 gene
    segments
  • NP-hard optimization problems breakpoint and
    inversion phylogenies (techniques score every
    tree)
  • 1997 BPAnalysis (Blanchette and Sankoff) 200
    years (est.)

25
Benchmark gene order dataset Campanulaceae
  • 12 genomes 1 outgroup (Tobacco), 105 gene
    segments
  • NP-hard optimization problems breakpoint and
    inversion phylogenies (techniques score every
    tree)
  • 1997 BPAnalysis (Blanchette and Sankoff) 200
    years (est.)
  • 2000 Using GRAPPA v1.1 on the 512-processor Los
    Lobos Supercluster machine 2 minutes
    (200,000-fold speedup per processor)

26
Benchmark gene order dataset Campanulaceae
  • 12 genomes 1 outgroup (Tobacco), 105 gene
    segments
  • NP-hard optimization problems breakpoint and
    inversion phylogenies (techniques score every
    tree)
  • 1997 BPAnalysis (Blanchette and Sankoff) 200
    years (est.)
  • 2000 Using GRAPPA v1.1 on the 512-processor Los
    Lobos Supercluster machine 2 minutes
    (200,000-fold speedup per processor)
  • 2003 Using latest version of GRAPPA 2 minutes
    on a single processor (1-billion-fold speedup per
    processor)

27
Summary
  • Weighbor(EDE) and NJ(EDE) are highly accurate
    polynomial time distance-based reconstructions,
    except when datasets are close to saturated
  • GRAPPA (inversion phylogeny or breakpoint
    phylogeny) produces highly accurate estimates of
    trees, and even of ancestral gene orders,
    acceptably fast

28
DCM-boosting MP and ML
  • Idea it may be faster to run a computationally
    expensive method on a few overlapping subproblems
    of somewhat smaller size
  • Challenge how to pick the best decomposition?

29
Addressing the accuracy/time issues
Disk-Covering Methods
DCM1 decomposition lots of small diameter
subproblems. (Used for NJ.)
DCM2 decomposition Very few subproblems, each
somewhat smaller. (Used for MP or ML.)
30
DCM-boosting Speeding up MP/ML heuristics
Fake study
Performance of hill-climbing heuristic
MP score of best trees
Desired Performance
Time
31
The DCM2 technique for speeding up MP/ML searches
32
GRAPPA (Genome Rearrangement Analysis under
Parsimony and other Phylogenetic Algorithms)
  • http//www.cs.unm.edu/moret/GRAPPA/
  • Heuristics for NP-hard optimization problems
  • Fast polynomial time distance-based methods
  • Contributors U. New Mexico,U. Texas at Austin,
    Universitá di Bologna, Italy
  • Freely available in source code
  • Project leader Bernard Moret (UNM)
    (moret_at_cs.unm.edu)

33
Limitations and ongoing research
  • Current methods limited to single chromosomes
    with equal gene content (or very small amounts of
    deletions and duplications) -- we are working on
    developing reliable techniques for genomes with
    unequal gene content

34
Acknowledgements
  • Funding
  • The David and Lucile Packard Foundation, and
  • The National Science Foundation.
  • Collaborators
  • Bernard Moret (UNM), David Bader (UNM), Bob
    Jansen (UT), Linda Raubeson (CWU)
  • Students Li-San Wang (now postdoc of Junhyong
    Kim at Penn), Jijun Tang (UNM), and others at UNM

35
Phylolab, U. Texas
Please visit us at http//www.cs.utexas.edu/users/
phylo/
About PowerShow.com