Frank Vahid, UCR 1 - PowerPoint PPT Presentation

About This Presentation
Title:

Frank Vahid, UCR 1

Description:

Building Fake Body Parts: Digital Mockups Frank Vahid Univ. of California, Riverside Chen Huang (UC Riverside, now Amazon) Bailey Miller (UC Riverside, intern at SpaceX) – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 30
Provided by: FV7
Learn more at: http://www.cs.ucr.edu
Category:
Tags: ucr | digital | dsp | fast | frank | processor | vahid | what

less

Transcript and Presenter's Notes

Title: Frank Vahid, UCR 1


1
Building Fake Body Parts Digital Mockups
  • Frank Vahid
  • Univ. of California, Riverside

Chen Huang (UC Riverside, now Amazon) Bailey
Miller (UC Riverside, intern at SpaceX) Prof.
Tony Givargis (UC Irvine) Ting-Shuo Chou (UC
Irvine) Others...
Support provided by NSF, SRC, Dept. of Educ.
Also CareFusion, Xilinx, METI
2
Bailey Miller, UCR 2
3
Models of physical world that run in real-time
Test cyber-physical systems
http//www.nhlbi.nih.gov/
Frank Vahid, UCR 3
4
Issue Real-time achieved via inaccuracy
Frank Vahid, UCR 4
5
PC GPU
1522
1490
1430
1184
PC(1)
1000
900
800
700
600
500
Performance (ms)
400
Speedup vs real-time PC(1)
0.8x PC(4) 3.1x GPU 1.6x
300
200
100
0
Weibel
Neuron
Weibel gas
Weibel hemo
Hemodynamic
  • Parallel computations
  • Neighbor communication

? Seem like great match for FPGAs
Frank Vahid, UCR 5
6
FPGAs Sw circuits (parallel)
C Code for FIR Filter
Circuit for FIR Filter
for (i0 i lt 128 i) y ci
xi .. .. ..
for (i0 i lt 128 i) yi ci
xi .. .. ..
  • 1000s of instructions
  • Several thousand cycles
  • 7 cycles (though slower clock)
  • Speedup gt 10x-100x

7
FPGAs 101 (A Quick Intro)
FPGA
SM
LUT
4x2 Memory
1
0
a1 a0
00 01 10 11
11
a
b
11
0
 
d1 d0
0
F
G
F G
a
b
c
1 1 1 0 1 1 0 0
0 0 0 0 0 0 1 0
D
E
8
HLS
1522
1490
1430
1184
PC(1)
1000
900
800
700
600
500
Performance (ms)
400
Speedup vs real-time PC(1)
0.8x PC(4) 3.1x GPU
1.6x HLS/FPGA 3.2x
300
200
100
0
Weibel
Neuron
Weibel gas
Weibel hemo
Hemodynamic
High-level synthesis Compiler that converts
program to circuits
Frank Vahid, UCR 8
9
Network of synchronized PEs on FPGAs
  • General Processing Element
  • Iterative ODE solver (Euler/RK4)
  • 0.1 ms / 0.01 ms timestep

PE
1 PE 300 MHz
FPGA
Digital mockup
PE
PE
Frank Vahid, UCR 9
10
Synthesis tool
Phase
10K iterations
Maps ODEs to virtual PEs using simulated annealing
1
Convert virtual PEs to physical circuits using
FPGA place-route
2
11
General PEs
1522
PC(1)
1490
1430
PC(4)
GPU
HLS
1184
1000
900
800
Speedup vs real-time PC(1) 0.8x PC(4) 3.1x GPU
1.6x HLS 3.2x General PEs 4.9x
700
600
500
Performance (ms)
400
300
200
100
0
Weibel
Neuron
Weibel gas
Hemodynamic
weibel hemo
Frank Vahid, UCR 11
12
Problem More PEs ? Lower frequency
Lost ODEs/sec due to freq drop
Real ODEs/sec
11-gen Weibel model, Virtex6 240T FPGA, general
PEs
13
Use model structure to improve
Avoid using FPGA placement (Phase 2)
Graph embedding Map guest graph to host graph,
minim. max wire length
Guest
Virtual PEs
Host
Physical PEs
14
Phase 2 Map virtual PEs to physical PEs
Guest
Embedding algorithm
H-tree embedding
Linear embedding
Direct map embedding
Host
Frank Vahid, UCR 14
1 Zienicke, P. 1990. Embeddings of Treelike
Graphs into 2-Dimensional Meshes. (WG '90). 2
Aleliunas, R., and Rosenberg, A.L. 1982. On
Embedding Rectangular Grids in Square Grids.
(Computers 82). 3 Berman, F., and Snyder, L.
1987. On mapping parallel algorithms into
parallel architectures, (PDC, 87).
15
2D grid of physical PEs
Bypass FPGA placement
FPGA
(Phase 1 May require "graph folding" first to
reduce PEs)
16
Compare/backup Simulated annealing
Cost function
C w1sum w2max w3gaps
Sum sum of wire distances Max max wire length
(Euclidean dist.) Gaps wires across
architectural features
Neighbor function Swap PEs based on
distance to neighbors
P2
P1
P1
17
Results
4 generations shown
5 generations shown
5 generations shown
Simulated annealing placement
No placement strategy
Embedding placement
18
Results
Not routable
2D Neuron model - 256PE Xilinx Virtex6
Strategy Total power (mW) Dynamic power (mW) Static power (mW)
None 15525 8744 6481
SA 16604 10013 6590
Embed 19859 12999 6859
Strategy LUTS BRAM DSP Equivalent LUTs
None 58362 512 256 306682
SA 58567 512 256 306887
Embed 58569 512 256 306889
No impact on size
20 more power
19
Graph emb (Gen PEs)
Speedup vs real-time (avg) PC(1)
0.8x PC(4) 3.1x GPU 1.6x HLS
3.2x General PE 4.9x Grph emb(GPE) 11.2x
Miller, B., F. Vahid, and T. Givargis.
Embedding-Based Placement of Processing element
Networks on FPGAs for Physical Model Simulation.
ACM Int. Symp. on FPGAs, 2013.
Frank Vahid, UCR 19
20
Custom Processing Element
  • Custom datapath to solve specific type of
    equation

V F1 F2 F P1-P2-(FCR)CL
Custom PE for each ODE type
Modified synthesis tool to create custom PEs for
given ODEs first, then synthesis ODEs to PEs
21
Custom PEs
1522
1490
1430
1184
PC(1)
PC(4)
GPU
1000
HLS
900
800
700
Speedup vs real-time (avg) PC(1)
0.8x PC(4) 3.1x GPU 1.6x HLS
3.2x General PE 4.9x Grph emb(GPE)
11.2x Custom PE 6.1x
600
500
Performance (ms)
400
300
200
100
0
Weibel
Neuron
Weibel gas
Hemodynamic
weibel hemo
Huang, Vahid, Givargis. Synthesis of networks of
custom processing elements for real-time physical
system emulation. Transactions on Design
Automation of Electronic Systems (TODAES), 2013
(to appear).
Frank Vahid, UCR 21
22
Networks of Heterogeneous PEs
  • General PE
  • Slow, flexible (can solve any types of ODEs)
  • Custom PE
  • Fast, inflexible (only solves one type of ODEs)
  • Multi-Type PE
  • Combined multiple types of ODEs into single
    custom PE
  • Huge solution space
  • How to choose types of PEs?
  • How many PEs to allocate?
  • How to bind ODEs to PEs?

Huang, Miller, Vahid, Givargis. Synthesis of
Heterogeneous Processing Elements for Physical
System Emulation. CODESISSS 2012, Oct, 2012.
23
Automatic allocation and binding
24
Heterogeneous PEs
1522
1490
1430
1184
PC(1)
PC(4)
GPU
1000
HLS
900
800
700
600
Speedup vs real-time (avg) PC(1)
0.8x PC(4) 3.1x GPU 1.6x HLS
3.2x General PE 4.9x Grph emb(GPE)
11.2x Custom PE 6.1x Heterog PE 34.5x
500
Performance (ms)
400
300
200
100
0
Weibel
Neuron
Weibel gas
Hemodynamic
weibel hemo
C. Huang, B. Miller, F. Vahid, T. Givargis.
Synthesis of Custom Networks of Heterogeneous
Processing Elements for Complex Physical System
Emulation. IEEE/ACM Conf on Hardware/Software
Codesign and System Synthesis (CODES/ISSS, part
of ESWEEK), Finland, Oct 2012.
Frank Vahid, UCR 24
25
Network of general/custom/heterogeneous PEsVS
HLS (regularity extraction)
Heterogeneous PE (10x, 1.1x) HLS (7x, 0.85x)
general PE (6x, 1.35x) custom PE (Speed, Size)
26
Speedup / dollar
Heterogeneous PEs 3X better than PC(4) 4.5x
better than GPU FPGA Easier to build custom
interfaces
CPU (I7-950 Intel X58 board) 480
GPU(GTX460 I3-540 H55 board)
380 FPGA (Xilinx Virtex6 240T-2 board)
1800
27
Other projects
  • Assistive monitoring
  • www.cs.ucr.edu/vahid/assistivemonitoring/
  • http//www.youtube.com/watch?featureplayer_embedd
    edvSf8tU-78lXs
  • ..\Desktop\Fall montage.mp4 ..\Desktop\Frank_pul
    lChair_013113_cam3.video.wmv
  • Web-based learning
  • "Textbook is dead"
  • Multi-univ synergy
  • pcpp.zyante.com (C)
  • Embedded systems educ.
  • New prog. model, virtual lab, programmingembeddeds
    ystems.com
  • Also riosscheduler.org
  • Drunk driving (DUI)
  • ..\Desktop\dui.MOV
  • duicam.org
  • http//www.utsandiego.com/news/2013/feb/11/ucr-dru
    nken-driving-app/

28
Summary
  • FPGAs Fastest cost-effective execution of
    physical models
  • http//www.youtube.com/watch?vThUKVhqoA3Q
  • Future
  • Manycore device
  • Beyond testing CPS
  • Implement end-products

Speedup vs real-time (avg) PC(1)
0.8x PC(4) 3.1x GPU 1.6x HLS
3.2x General PE 4.9x Grph emb(GPE)
11.2x Custom PE 6.1x Heterog PE
34.5x (Grph embHPE 48.5x)
Frank Vahid, UCR 28
http//www.meti.com/
29
Questions?
Frank Vahid, UCR 29
Write a Comment
User Comments (0)
About PowerShow.com