RNAsim/CRIMSON Algorithm Benchmark Suite - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

RNAsim/CRIMSON Algorithm Benchmark Suite

Description:

U Penn: Junhyong Kim, Sampath Kannan, Susan Davidson, Steve Fisher, Sheng Guo ... Display trees with Walrus 3D Viewer. Cyberinfrastructure for Phylogenetic Research ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 24
Provided by: tri5290
Learn more at: http://www.phylo.org
Category:

less

Transcript and Presenter's Notes

Title: RNAsim/CRIMSON Algorithm Benchmark Suite


1
RNAsim/CRIMSON Algorithm Benchmark Suite
  • U Penn Junhyong Kim, Sampath Kannan, Susan
    Davidson, Steve Fisher, Sheng Guo
  • U Texas David Hillis, Lauren Meyers, Tracey
    Heath, Derrick Zwickl
  • NC State Spencer Muse
  • Florida State Mark Holder
  • Yale Paul Turner

2
Goal Develop validated datasets of sufficient
complexity and scale to realistically benchmark
latest tree algorithms
3
Benchmark Infrastructure
Model Characterization
Simulators
Character Evolution Simulators
Taxon Sampling
Database
Tree Topology Simulators
Data Subset with Associated Subtree
  • Others
  • Tree/Char Combined
  • Experimental Evolution
  • Virtual Cell
  • etc

Model Sampling
Format Translators
RNAsim
CRIMSON
PAUP, etc
4
Benchmark Scheme
  • Generate a very large dataset (gt106 positions)
    over a very large tree (gt106 taxa) using various
    models of evolution
  • Store the data in a database
  • Retrieve subsets of the data by various sampling
    schemes

5
  • RNA macro-evolution simulation (Sheng Guo, Lisan
    Wang)
  • Incorporate 2ndary structure constraints,
    incorporate indels, using a simulator based on
    edit mutations. A set of edit operators are
    implemented, such as stem edit, each of which
    operate on evolving strings with a characteristic
    wait time. Ancestral molecule is based on known
    rRNA gene with putative known 2ndary structure.
    Evolution of the 2ndary structure is tracked.

anc
delete stem pair
change base
initiate new stem
insert base
delete base
add stem pair
desc
6
Fixation probability as a function of fitness
Parameters Neeffective population size ?
neutral mutation rate s fitness change
Neutral Advantageous(sgt0)/Deleterious(slt0) Comp
ensatory Mutation
7
One-step mutation ensemble of a RNA
8
Weaker Selection
9
Calibration on Empirical Data
Simulated RNA
100 Eukaryotic ssRNA
10
Example Pairwise Similarity of 1000 locally
optimal ML trees (MDS plot)
Empirical Data
RNAsim
ROSE
SeqGen
11
CPU Time to reach local optimum (PAUP ML, TBR)
12
1 Million Leaves (Tracey Heath Birth-Death Model
with variable rates)20 Data Replicate Partition
Simulated and Stored at SDSC
13
Crimson Stephen Fisher, Susan Davidson, Junhyong
Kim
  • Facilitates the extraction of sub-trees from very
    large phylogenetic trees.
  • Trees loaded into a shared database (Oracle or
    MySQL)
  • Extensive tree sampling options
  • Save query output to NEXUS or phylip files
  • Include paup commands in query output files
  • Comprehensive graphical dialogs
  • Command line interface allowing python-like
    scripting
  • Display trees with Walrus 3D Viewer

14
Query Options
  • Species Selection
  • Select All
  • Random Selection
  • Select By Temporal Depth
  • Same number of samples per sub-tree
  • Weight sampling of sub-trees by number of leaves
  • Select By Species Level
  • Same number of samples per sub-tree
  • Weight sampling of sub-trees by number of leaves
  • Manual Selection
  • Sequence Selection
  • Select All
  • Random Selection
  • Manual Selection

15
Depth Threshold Distribution
16
Crimson Interface
17
Current Benchmarking Effort
  • Sample 1
  • 10 leaves per sampled tree
  • Repeat taxon sampling 40 times per replicate data
    partition
  • Sample 2
  • 100 leaves per sampled tree
  • Repeat taxon sampling 30 times per replicate data
    partition
  • Sample 3
  • 1,000 leaves per sampled tree
  • Repeat taxon sampling 20 times per replicate data
    partition
  • Sample 4
  • 10,000 leaves per sampled tree
  • Repeat taxon sampling 10 times per replicate data
    partition

18
Algorithms (to be expanded)
  • Neighbor Joining (paup)
  • breaktiesrandom
  • Parsimony (paup)
  • set maxtrees200 increaseno
  • hsearch timelimit432000
  • contree all /strictno majruleyes
  • RAxML (raxmlHPC)
  • -f a
  • - 100
  • -m GTRGAMMA

19
Benchmarking Stats
20
Distribution of False Positive Edges
21
Computational Difficulty of Dataset Versus
Accuracy
sec
hr
hr
22
RAxML Computation Time (Heuristic) Over 30 Random
100-taxon Trees
Replicates
23
  • Thanks to
  • Davidson, Susan
  • Fisher, Steve
  • Guo, Sheng
  • Hillis, David
  • Heath, Tracey
  • Wang, Lisan
  • Zhang, Yifeng
  • Zwickl, Derrick
  • Please Ask and Talk to
  • Steve Fisher
  • Sheng Guo
  • Lisan Wang

Please See CRIMSON Demo by Steve Fisher
Write a Comment
User Comments (0)
About PowerShow.com