Genehackers - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Genehackers

Description:

... of compression algorithms, and decompressing over multiple hardware components ... ARM would decompress in software. FPGA would decompress in hardware. BLASTP ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 36
Provided by: ece9
Category:

less

Transcript and Presenter's Notes

Title: Genehackers


1
(No Transcript)
2
  • DESIGN REVIEW I

3
Design Overview
  • Virus detection system
  • Receives nucleotide sequence from DNA sequencer
  • Analyzes sequence and compares to known viral
    pathogens
  • Informs user of matches
  • Additional alerts for viruses considered
    particularly dangerous

4
Design Use Case
  • Aid worker in the field
  • Needs access to large amount of data
  • Has limited resources
  • Far from base station
  • Portable devices
  • Must be small/light
  • Do not want to sacrifice function and ability

5
Design Specialization
  • Compress virus database
  • We will be compressing the virus database,
    allowing us to store more of it in less space.
  • Will likely be using a combination of compression
    algorithms, and decompressing over multiple
    hardware components

6
Design Exploration Architecture
  • Store Compressed Data on the Flash
  • Data compressed on the PC and stored to Flash
    with separate hardware software
  • Store compressed data on USB stick
  • Data compressed on PC, communicate with USB stick
    using FPGA or ARM
  • Allow in-system data compression
  • Makes most sense with Flash

7
Design Exploration Architecture
  • Decompression
  • Could be done on ARM, FPGA, or both
  • ARM would decompress in software
  • FPGA would decompress in hardware
  • BLASTP
  • Could be done in software and/or hardware
  • Optimize certain algorithms in software
  • Custom hardware in FPGA for fast calculations

8
Design Exploration Architecture
9
Design Exploration Architecture
10
Design Exploration Architecture
  • Mixed hardware/software decompression seems best
  • In-system compression to Flash makes most sense
    (allows for large database and future updates)
  • Lost team member caused simplification in goals

11
Design Exploration Compression
  • LZW
  • Huffman
  • More Complex/Effective Algorithms

12
Design Exploration Compression
  • LZW Algorithm
  • LZW compression replaces strings of characters
    with single codes.
  • Adds every new string of characters it sees to a
    table of strings.
  • Compression occurs when a single code is output
    instead of a string of characters.

13
Design Exploration Compression
  • LZW Compression Perks
  • Great for repetitive data where a certain chunk
    is repeated in multiple places
  • Can achieve anywhere from 50 to 90 reduction in
    file size on standard text
  • Very fast
  • LZW Compression Dangers
  • If data is not repeated often, file size can be
    considerable

14
Design Exploration Compression
  • Huffman Algorithm
  • Data is read from input file, and characters are
    stored in tree organized by frequency of
    occurrence
  • Requires two reads of the data
  • one to construct the Huffman tree
  • other to write the data do mapping.

15
Design Exploration Compression
  • Huffman Compression Perks
  • Data where certain codes are substantially more
    frequent than others will display excellent
    compression
  • Huffman Compression Dangers
  • Resulting file may be larger than the original if
    most codes occur with similar frequency
  • Requires a table of the codes to be stored with
    the data, which adds overhead

16
Design Exploration Compression
  • Other Algorithms
  • Adaptive Huffman
  • Requires only one pass of the data can be done
    dynamically
  • Tree is continually updated and reshaped each
    time a new character is read allows different
    parts of the file to have different encoding
    depending on frequency of the character it
    adapts to the data

17
Design Exploration Compression
  • Other Algorithms
  • Deflate
  • Algorithm used in gzip/other ZIP variants
  • Combines LZW and Huffman encoding
  • Compressed data set consists of series of blocks,
    corresponding to successive blocks of input data.
  • Each block consists of two parts Huffman code
    trees to describe compressed data, and compressed
    data.
  • Compressed data consists of series of elements of
    literal bytes (of strings that have not been
    detected as duplicated within the previous set
    limit of bytes), and pointers to duplicated
    strings.

18
Design Exploration Compression
  • Compression Analysis
  • Comparisons amongst algorithms still needs to be
    done.
  • We know that Deflate can achieve about 10 better
    compression than LZW, but can be about 10X
    slower.
  • Data needs to be analyzed to see what type of
    compression needs to be used as the base, then we
    can see if that initial compressed data can be
    compressed more efficiently with another
    algorithm on top.

19
Design Exploration Compression
  • Compression Analysis
  • Performing compression in hardware is a
    complicated task.
  • Determination of how to abstract working with the
    huge tables in hardware. All of the compression
    algorithms involve storing data in tables. In
    hardware, this means sharing the memory on the
    FPGA between BLASTP, the database, and the
    tables.
  • Also, need to look at ways of searching the
    tables how to abstract hashing/quick search
    methods into hardware.

20
Matlab Profile on BLASTP
OPTIMIZATION NEEDED!!!
21
Searching Algorithm on BLASTP
  • Divide and Conquer
  • Cooperation between Hardware and Software
  • Software look for possible hits and extend them
    for further matching with the virus database
  • Hardware look for highest scoring pairs
  • Possible Improvement on BLASTP Choices
  • Implementing Hash table
  • Adaptive neighborhood word sizes
  • Correlation

22
Searching Algorithm on BLASTP
  • Implementing Hash table
  • Table 1 To determine the optimal Neighborhood
    Word Size for query sequences based on database
  • Table 2 To determine the Neighborhood words from
    the computed neighborhood word size
  • Implemented in DSP

23
Searching Algorithm on BLASTP
  • Adaptive Neighborhood Word Size
  • Since Neighborhood Word Size controls amt. of
    loops, having an adaptive word size would speed
    up matching
  • Do statistical analyses on individual scores of
    the members of the sequence to determine the
    reasonable neighborhood word size for the query
    sequence
  • Store the information of the neighborhood sizes
    into hash table, and the key will be individual
    or a sequence of characters
  • Implemented in NIOS (C)

24
Searching Algorithm on BLASTP
  • Correlation
  • To improve find_seeds and find_hsps
  • Convert and arrange the query sequence and
    Database smartly into 1-0 matrices
  • Correlate the 1-0 matrix templates (masks) with
    the database and get the highest scored matches
  • Implemented on FPGA (Verilog) for better/faster
    performance

25
Implementation Tasks
  • Communication between all devices
  • Ensuring that we can send a message through all
    required paths
  • Using four-phase handshake to facilitate
    differences in clock
  • Analog/digital conversion on DSP
  • DSP will be converting analog nucleotide signals
    into digital codons
  • Will involve translating to five bit codon

26
Implementation Tasks
  • BLASTP algorithm on the FPGA
  • Examine different search algorithms using MATLAB
    to obtain the fastest matching search algorithm
  • Implementing search algorithm in software and
    hardware
  • Aiming to speed up certain calculations by using
    hardware, and optimizing the more complicated
    functions through software

27
Implementation Tasks
  • Reading in database to PC
  • Proper identification through parsing
  • Compressing Data on the PC
  • Will apply LZW encoding on the codons
  • Will encode result with Huffman compression
  • Huffman Decompression on ARM
  • Will be written in C

28
Implementation Tasks
  • LZW Decompression on FPGA
  • Will be written in Verilog
  • Testing, Integration
  • Making sure everything works

29
Verification Testing
  • DSP Verification
  • Verification by comparison with Matlab outputs
  • FPGA BLASTP Verification
  • Will double check against Matlab searches
  • Compression Testing
  • Will check against written C code
  • Each team members code verified by someone else

30
Division of Labor
  • Chris Thomas
  • General architecture concerns
  • Communication with flash
  • Huffman decoding on the ARM
  • Charles-Christopher Onyeama
  • LZW decoding on the FPGA
  • LZW and Huffman compression
  • Assist BLASTP
  • Mark Pimentel
  • Matlab code optimization
  • Translating analog signals to digital on the DSP
  • BLASTP implementation on the FPGA

31
Demo Deliverables
  • Demo 1
  • Some communication between processors
  • (Chris, Mark)
  • Compression analyses and sample code
  • (Charles)

32
Demo Deliverables
  • Demo 2
  • Huffman compression decompression completed
  • (Charles, Chris)
  • Analog signals processed translated
  • (Mark)
  • BLASTP implementation underway
  • (Mark)
  • Communication with storage/Storage management
  • (Chris,Charles)

33
Demo Deliverables
  • Demo 3
  • Everything completed working.
  • (Team GeneHackers!)

34
Updated Schedule
35
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com