Benchmark Results for UltraHigh Performance Scalable Processing Architecture for Embedded Signal and - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Benchmark Results for UltraHigh Performance Scalable Processing Architecture for Embedded Signal and

Description:

Benchmark Results for. Ultra-High Performance Scalable Processing Architecture for ... SMPTE Generator/Inserter. SMPTE Level Shifter / Digital Distribution Amp ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 12
Provided by: peter468
Category:

less

Transcript and Presenter's Notes

Title: Benchmark Results for UltraHigh Performance Scalable Processing Architecture for Embedded Signal and


1
Benchmark Results for Ultra-High Performance
Scalable Processing Architecture for Embedded
Signal and Image Processing Applications
Authors
  • Stewart Reddaway (sfr_at_wscapeinc.com)
  • Nigel Bond (nigel.bond_at_wscapeinc.com)
  • Rick Pancoast (rick.pancoast_at_lmco.com)

Justin Kidman (justin.kidman_at_wscapeinc.com) Peter
Rogina (peter.rogina_at_wscapeinc.com)
2
Scalable Processing Platform (SPP)Applications
  • Image Processing
  • Signal Processing
  • Compression/De-compression
  • Encryption/De-cryption
  • Network Processing
  • Search Engine
  • Certain Supercomputing Applications

Wide Ranging Applicability to DoE/DoD/DARPA and
Commercial Embedded HPC Processing
Requirements
3
Scalable Processing Platform
50 GFLOPS 64-bit FP SIMD Processor Chip
300 GFLOPS 64-bit FP 6U VME Expansion board
4
CamArray Multi-Sensor Platform
Initial 30 camera system will process and store
over 1 Billion pixels per second continuously
6 Cameras Dual CameraLink (x3)
42 Cameras per Cardcage

(x 7)
6U Host Motherboard
With SAN Storage
SAN
5
30 Camera CamArray with sync
Universal Sync
Horita BSG-50 Black Burst Generator
Horita TG-50 SMPTE Generator/Inserter
42 Cameras max per Cardcage (21 Duprees x 2
cameras per)
SMPTE Level Shifter / Digital Distribution Amp (1
x 21 Distribution Amp)
8m CameraLink cables (2 cameras per Dupree board)
1U SAN 15 minutes uncompressed
1.25 hrs. _at_ 5x compression etc
Fiber channel from host cards to SAN
6
HPEC 2006Hardware Benchmark Performance Results
DRAM to DRAM demonstration using realistic Pulse
Compression data on actual CSX600 hardware Note
Streaming implementation on Dupree in 4Q06
7
1024-point complex Pulse Compression
  • Per chip measurements
  • 96 PEs enable 12 sets in parallel
  • 8 PEs per PC
  • 9.39 usec DRAM to DRAM
  • 9.87 usec on-chip (ie without I/O from/to DRAM)
  • 11.55 GFLOPS with I/O
  • 11.02 GFLOPS without I/O

Overlapping of I/O and compute results in 95 of
cycles being used for computation
8
Pulse Compression Demonstration
Hardware Performance
per chip
9
FFT/PC speedups
  • Code speedups totaling 11 are possible now
  • Workarounds due to problems with existing
    microcode 5
  • Stripping out unnecessary measurement and
    interface code 2
  • Use of new microcode with 0 degree twiddle 4
  • New microcoded instructions will enable more
    efficient butterfly pipelining (estimated 1.5x
    1.8x overall improvement)
  • A family of begin, middle and end butterflies
    nearly doubles speed
  • similar technique will speedup a sequence of
    complex multiplies
  • More special case butterflies can also be
    microcoded

10
JPEG2000 Performance Estimates
For 30x compression of 1K x 1K, 8-bpp, RGB image
(per chip)
  • Optimization of compute I/O overlap can boost
    performance to 48 fps (shown in brackets)
  • Grayscale images are 3x faster, supporting up to
    125 (146) fps
  • Bayer coded color images are 2.5x faster
  • 10/12-bit pixels can be handled proportionally
    slower

Analysis suggests route to substantially higher
performance
11
Summary
  • SPP offers greatly enhanced performance AND
    performance/watt
  • Scaled SIMD architecture very well suited for
    many embedded applications
  • Existing and emerging libraries will ease
    integration and product insertion
  • WorldScape partners uniquely qualified to
    deliver embedded application products
  • Ongoing hardware and library development will
    foster accelerated ability to achieve TRL-6
    status in FY07
  • First customer shipments of SPP Dupree card -
    1Q07
  • Beta relationships being established now
Write a Comment
User Comments (0)
About PowerShow.com