Efficient Implementation of a String Matching Algorithm for SRC and Cray Reconfigurable Computers - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Efficient Implementation of a String Matching Algorithm for SRC and Cray Reconfigurable Computers

Description:

Matching Algorithm for SRC and Cray Reconfigurable Computers ... image detection, handwriting recognition etc. Why align two protein or DNA sequences? – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 27
Provided by: klabsOrgm5
Learn more at: http://www.klabs.org
Category:

less

Transcript and Presenter's Notes

Title: Efficient Implementation of a String Matching Algorithm for SRC and Cray Reconfigurable Computers


1
Efficient Implementation of a String Matching
Algorithm for SRC and Cray Reconfigurable
Computers
  • Esam El-Araby1, Mohamed Taher1, Tarek
    El-Ghazawi1,
  • Mohamed Abouellail1, Nandakishore Sastry2, and
    Kris Gaj2 1The George Washington
    University,2George Mason University

2
Outline
  • Introduction
  • SRC Hardware Software
  • Cray XD1 Hardware Software
  • String Matching Algorithms
  • Implementation Methodology
  • Results and Comparisons
  • Conclusions

3
Introduction
4
Outline
  • Introduction
  • SRC Hardware Software
  • Cray XD1 Hardware Software
  • String Matching Algorithms
  • Implementation Methodology
  • Results and Comparisons
  • Conclusions

5
SRC Architecture(Hi-BarTM Based Systems)
  • Hi-Bar sustains 1.4 GB/s per port with 180 ns
    latency per tier
  • Up to 256 input and 256 output ports with two
    tiers of switch
  • Common Memory (CM) has controller with DMA
    capability
  • Controller can perform other functions such as
    scatter/gather
  • Up to 8 GB DDR SDRAM supported per CM node

6
SRC Reconfigurable Processor
7
SRC Programming Environment
8
SRC Programming Environment (cntd)
9
SRC Programming Environment (cntd)
FPGA contents after the Function_1 call
Program in C or Fortran
Main program
Function_1
a

FPGA
Macro_1(a, b, c) Macro_2(b, d) Macro_2(c, e)
Function_1(a, d, e)
Macro_1

c
b
Function_2
Macro_2
Macro_2
Macro_3(s, t) Macro_1(n, b) Macro_4(t, k)
Function_2(d, e, f)
d
e

10
Outline
  • Introduction
  • SRC Hardware Software
  • Cray XD1 Hardware Software
  • String Matching Algorithms
  • Implementation Methodology
  • Results and Comparisons
  • Conclusions

11
Cray XD1 System Architecture(One Chassis)
  • Compute
  • 12 AMD Opteron 32/64 bit, x86 processors
  • High Performance Linux
  • RapidArray Interconnect
  • 12 communications processors
  • 1 Tb/s switch fabric
  • Active Management
  • Dedicated processor
  • Application Acceleration
  • 6 co-processors

FPGA and 2nd RAP are on Expansion Module
12
Cray XD1 Application Acceleration Interfaces
  • XC2VP30-50 running at up to 200 MHz
  • 4 QDR II RAM  with over 400 HSTL-I I/O at 200 MHz
    DDR (400 MTransfers/s)
  • 16 bit simplified HyperTransport I/F at 400 MHz
    DDR (800 MTransfers/s)
  • QDR and HT I/F take up lt20 of XC2VP30.  The
    rest is available for user applications

13
Cray XD1 Development Flow
14
Cray XD1 Hardware Development Flow
15
Design Methodology using Cray XD1
  • Write application in C for system microprocessor
  • Identify computation intense routine(s)
  • Generate a bitstream using Cray Cores (RT
    QDRII) and language of choice
  • Create module in HDL (Verilog, VHDL)
  • Create module using High Level Language Tools
  • Validate Module
  • Synthesize using (XST, Leonardo, Synplify Pro)
  • Create bitstream using Xilinx place route tools
  • Replace routines with Cray API calls
  • Run Application

16
Outline
  • Introduction
  • SRC Hardware Software
  • Cray XD1 Hardware Software
  • String Matching Algorithms
  • Implementation Methodology
  • Results and Comparisons
  • Conclusions

17
String Matching - Introduction
  • String Matching detecting the occurrence of a
    particular substring, called the pattern, in
    another string, called the text
  • Types of String matching
  • Exact string matching
  • Approximate string matching
  • Exact string matching
  • Involves match patterns, where they exist
    completely, that is unbroken and with no
    irrelevant data in between any letters
  • Numerous Applications NIDS, text editing, etc.
  • Approximate string matching
  • Pattern rarely matches the text completely
  • Finds application in Computational biology (DNA
    matching), image detection, handwriting
    recognitionetc.

18
DNA Matching Basics
  • Problem
  • find the best pairwise alignment of GAATC and
    CATAC
  • Why align two protein or DNA sequences?
  • Determine whether they are descended from a
    common ancestor (homologous)
  • Infer a common function
  • Locate functional elements
  • Infer protein structure, if the structure of one
    of the sequences is known
  • We need a way to measure the quality of a
    candidate alignment
  • Alignment scores consist of two parts
  • substitution matrix
  • gap penalty

19
DNA Matching Basics (cntd)
Scoring aligned bases
Purine A G
Pyrimidine C T
Transversion (expensive)
GAAT-C CA-TAC
Transition (cheap)
-5 10 ? 10 ? 10 ?
Scoring gaps
  • Linear gap penalty every gap receives a score of
    d

GAAT-C d-4 CA-TAC
-5 10 -4 10 -4 10 17
  • Affine gap penalty opening a gap receives a
    score of d extending a gap receives a score of e

G--AATC d-4 CATA--C e-1
-5 -4 -1 10 -4 -1 10 5
20
Approximate String Matching Algorithm(Smith-Water
man Algorithm)
21
Outline
  • Introduction
  • SRC Hardware Software
  • Cray XD1 Hardware Software
  • String Matching Algorithms
  • Implementation Methodology
  • Results and Comparisons
  • Conclusions

22
Implementation Schemes in SRC
23
Operational Scenarios for Cray XD1
24
Outline
  • Introduction
  • SRC Hardware Software
  • Cray XD1 Hardware Software
  • String Matching Algorithms
  • Implementation Methodology
  • Results and Comparisons
  • Conclusions

25
Performance Results
  • Rate (FPGA freq.) X (cycles/cell) X ( SWPEs)
  • Opteron Implementation (SSEARCH34)
  • 100 Million Cell Updates Per Second (CUPS)
  • Cray Inc. Implementation
  • Current unoptimized design
  • 80 MHz X 1 X 32 2.56 Billion CUPS (GCUPS)
  • With optimization
  • 100 MHZ x 1 x 50 5.0 GCUPS
  • With future Virtex 4 FPGA
  • 100 MHZ x 1 x 150 15 GCUPS
  • 25x speedup vs. Opteron
  • Our Implementation
  • SRC-6
  • Current unoptimized design
  • 100 MHz X 1 X (16x16) 25.6 GCUPS
  • 10x speedup vs. Cray
  • 256x speedup vs. Opteron
  • Cray XD1
  • Current unoptimized design

CUG05, New Mexico, May 2005
26
Conclusions
  • Smith-Waterman sequence alignment algorithm has
    been implemented on both SRC-6 and Cray XD1
    systems
  • Similarities and differences are highlighted with
    regard to
  • System hardware architecture
  • Ease of programming
  • Programming model
  • Development time
  • Hardware/software libraries
  • Performance
  • The speed-up vs. microprocessor is reported
  • Primary bottlenecks limiting the performance of
    both systems are recognized
  • The capability to share and port applications
    between the SRC and Cray systems is explored
Write a Comment
User Comments (0)
About PowerShow.com