Title: Power Efficient Rapid System Prototyping Using CoDeL: The 2D DWT Using Lifting
1Power Efficient Rapid System Prototyping Using
CoDeL The 2D DWT Using Lifting
- Nainesh Agarwal Nikitas Dimopoulos
- University of Victoria, Canada
- PacRim, August, 2005
2Outline
- Motivation
- Power Dissipation
- Clock Gating
- Hardware Description Languages
- System Level Design Languages
- CoDeL
- Power Savings Analysis Framework
- Evaluation DWT
- Conclusion
3Motivation
- Increase in portable systems that run on
batteries, such as cell phones, PDAs, digital
cameras - DSP techniques needed to process data, and
transmit or display this data - As processing algorithms become complex, power
requirements increase - Higher power requirements means
- Low battery life
- Expensive cooling and packaging techniques, which
may increase the size of the device - Lower circuit density
- Shorter component life
4Motivation (contd.)
- Long design cycles for hardware architectures
- Can take up to a year for a team of engineers to
develop an ASIC - Emergence of System-Design Languages (SLDLs)
- Do not address power dissipation
- Power efficient architecture design is tricky by
hand and requires even longer lead times.
5Power Dissipation
- CMOS circuits
- Static Dissipation
- Steady state
- No Switching
- Dynamic Dissipation
- Switching
- Changes in digital state
6Static Dissipation
- Ideal static dissipation 0
- Reverse biased diodes between pn junctions
- Sub-threshold current when gate to source voltage
is below the threshold - Becoming significant
Source Kursun and Friedman, Sleep Switch Dual
Threshold Voltage Domino Logic with Reduced
Standby Leakage Current. IEEE Trans. VLSI, Vol.
12, No. 5, May 2004.
7Dynamic Dissipation
- Short-circuit dissipation
- When both n- and p-type transistors are on for a
brief moment, there is a short current pulse - Not significant
- Current required to charge and discharge the
capacitive load - Significant
8Clock Gating
- Reduce dynamic power dissipation
- Reduce the clock switching activity
- Enable clock only when a useful write is needed
9Hardware Description Languages
- Describe the temporal and spatial behaviour of a
circuit - Common targets ASIC and FPGA
- VHDL and Verilog
- Design at Register Transfer Level (RTL)
- Abstraction level too low
10System Level Design Languages
HDL (RTL)
Assembly Language
- Started late 1990s
- Provide a high level of abstraction for system
development - Categories
- Extend existing HDLs SystemVerilog
- Extend existing software languages SystemC,
SpecC, Handel-C, JHDL - Newly created languages Rosetta, CoDeL
- Algorithmic level design
- Only CoDeL and Handel-C
Higher Abstraction Fast development Easy to
learn Platform independence
SLDL
High Level Languages C, Java
11CoDeL - Overview
- CoDeL (Controller Description Language), targets
the specification and design at the behavioral
level. - Order of the statements implicitly represents the
sequence of activities. - Extracts the data and control flow from the
program automatically, assigns the necessary
hardware blocks and exploits inherent
parallelism. - Similar to the C language, so easy to learn.
- Includes a library of I/O protocols that simplify
(sub)system interaction. - Compiler produces synthesizable VHDL code which
can be targeted to any technology including FPGA
or ASIC.
12CoDeL Ports and Protocols
- CoDeL abstracts module interaction through ports
and protocols. - Protocols define the sequence of events necessary
to transfer information from one module to
another.
13CoDeL Simple Counter
14CoDeL Clock Gating
- Example shows write in state x
- Gate turned on in state x-1, off in state x1
State x - 1
State x
State x 1
Clk
Enable
GClk
Data Latched
15Power Savings Analysis Framework
- Power saved
- Power saved in avoiding useless switching
- Power saved in avoiding clock switching
- Power required for clock gating (overhead)
16Evaluation 2D DWT
- Key component in JPEG2000 image compression
- Lossy compression using MIT 9/7 wavelet
- Lossless compression using Le Gall 5/3
integer-to-integer wavelet - Integer to integer mapping
- No quantization needed
- Exact recovery of input signal
17DWT Structure
- Successive pair of low-pass and high-pass
filters, followed by factor 2 down-sampling - Analysis stage decomposes, while synthesis
reconstructs - h0 is the low-pass filter and h1 is the high-pass
filter - Low-pass signal recursively decomposed for full,
dyadic transform
Analysis Filter Bank
Synthesis Filter Bank
h0
?2
g0
?2
x(n)
x(n)
h1
?2
g1
?2
18DWT - Lifting
- Reduction in memory and computational complexity
- In-place computation of the wavelet coefficients
- Output is identical to a direct filter bank
convolution
Predict
Update
19Implementation
20Code Complexity
- Analysis and synthesis filter bank modules
- 120 lines of CoDeL code each
- Generate about 1000 lines of VHDL code each
- DWT module
- 110 lines of CoDeL code
- Generates 560 lines of VHDL
- Synthesized on a Xilinx 2v2000ff896-4 FPGA
- About 7 area used
- Maximum clock frequency of 103 MHz
- Eight element DWT takes 3.9µs
21Power Savings Estimation
- No useless switching found
- Analysis Synthesis filter bank modules
- 85 area
- 17 power saved
- DWT modules
- 15 area
- 8 power saved
- Use area complexity as an approximation for power
complexity - 16 total power saved
22Future Work
- Clock gating
- Verify analytical framework using simulation and
ASIC implementation - Efficient clock gating mechanism
- CoDel compiler
- Automated clock gating
- Register and state reuse
- Allow explicit parallelism (similar to technique
used in OpenMP and Handel-C)
23Questions