A Study of GeneWise with the Drosophila Adh Region - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

A Study of GeneWise with the Drosophila Adh Region

Description:

Each ASIC board has 3072 processors. System has up to 9 boards. Cost per board around $40K ... Reduce cost by running BLAST to select HMMs with possible hits ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 15
Provided by: Csu48
Learn more at: http://www.cs.umd.edu
Category:

less

Transcript and Presenter's Notes

Title: A Study of GeneWise with the Drosophila Adh Region


1
A Study of GeneWise with the Drosophila Adh Region
  • Asta Gindulyte
  • CMSC 838 Presentation
  • Authors Yi Mo, Moira Regelson, and Mike Sievers
  • Paracel Inc., Pasadena, CA

2
Motivation
  • Genome annotation
  • Extraction of biologically relevant knowledge
    from raw genomic sequence data
  • Need faster genome annotation methods
  • DNA sequences are very long (millions of
    nucleotides)
  • Current methods are computationally too expensive
  • Approach/Solution
  • GeneMatcher2 hardware acceleration of GeneWise

3
Outline
  • Motivation
  • Genome annotation
  • GeneMatcher2
  • Design
  • ASIC hardware
  • Comparison
  • GeneWise algorithm
  • HalfWise algorithm
  • Performance (time, precision)
  • Observations
  • Performance improvement
  • Cost effectiveness

4
Approach
  • Problem make GeneWise run faster
  • Embarassingly parallel algorithm
  • Computationally too expensive when run in
    parallel on PCs
  • Paracells solution hardware acceleration
  • Dont change the algorithm
  • Produce an implementation on the GeneMatcher2
    supercomputer that works as much like the
    original software as possible
  • 6LITE algorithm, now also in Wise2

5
GeneMatcher Architecture
6
ASIC Hardware
  • ASIC application specific integration circuit
  • Designed to speed up dynamic programming
    algorithms
  • (could be used for Smith-Waterman)
  • Each ASIC board has 3072 processors
  • System has up to 9 boards
  • Cost per board around 40K

7
GeneWise Algorithm
  • Perform a search of genomic DNA sequence data
    using a protein HMM
  • Build HMMs from protein families
  • Scan genome using HMM
  • Look for start codon
  • GT sequence signals possible 5 splice site
  • AG sequence signals possible 3 splice site
  • Dynamic programming used in the scanning process
  • Obtain probability of the most likely path in HMM
    generating the sequence
  • Obtain alignment by backtracking

8
GeneWise model on GeneMatcher2
9
HalfWise Algorithm
  • Reduce cost by running BLAST to select HMMs with
    possible hits
  • Use these HMMs with GeneWise database search and
    sequence alignment algorithm
  • May miss some genes due to BLAST misses

10
Evaluation
  • Test data set
  • A genomic DNA sequence contig of about 2.9 Mb
    from the Drosophila Adh region
  • Focuss on finding all Pfam (Protein families
    database of alignments and HMMs) protein
    profile-HMMs that occur in the Adh genomic
    sequence

11
Evaluation Speed
12
Evaluation Score
13
Evaluation Sensitivity and Specificity
14
Observations
  • Performance improvement
  • The speedup is several orders of magnitude.
  • Makes real target applications possible
  • Accuracy might be improved over HalfWise
    algorithm
  • Cost effectiveness
  • System used costs around 500K
  • 500K worth Linux PCs (500 processors at 1K
    each) would run about 10 times slower
  • Weaknesses
  • Cannot modify the algorithm
  • Not enough data to assess scalability
Write a Comment
User Comments (0)
About PowerShow.com