LowEnergy MultiUser VSELP Vocoder for a Domain Specific Reconfigurable DSP Architecture

About This Presentation

Title:

LowEnergy MultiUser VSELP Vocoder for a Domain Specific Reconfigurable DSP Architecture

Description:

CODEC. How to Save Energy? Eliminate idle time [Weiser, etal. ... CODEC Data Stream Observations. Users don't talk continuously (Data is sporadic and bursty) ... – PowerPoint PPT presentation

Number of Views:82

Avg rating:3.0/5.0

Slides: 21

Provided by: royas

Category:

more less

Transcript and Presenter's Notes

Title: LowEnergy MultiUser VSELP Vocoder for a Domain Specific Reconfigurable DSP Architecture

1
Low-Energy Multi-User VSELP Vocoder for a Domain
Specific Reconfigurable DSP Architecture
Roy A. Sutton University of California, Berkeley
CS 294-1 Spring 1998
2
Typical Digital Cellular phone
Encode
ADC
Modulate
Encode
Duplex
Controller
DAC
Demodulate
Decode
Use CODEC to compress the data stream!
3
Typical Digital Base Station
PBX
CH
R/F
...
Can we replace multiple CODEC with a single and
save Space? Cost? Energy?
4
How to Save Energy?

Eliminate idle time Weiser, etal., OSDI-94
Reduce energy per Op Chandrakassan, etal.
ISLPED-96
Work just fast enough and use most efficient
processing for most repetitious work!

5
CODEC Data Stream Observations

Users dont talk continuously (Data is sporadic
and bursty)
During silence, very little processing is
required
No benefit from processing data faster than
required
Data arrives in packets which need be processed
just fast enough for steady state

6
Which CODEC? answer VSELP

Compression factor 8x (64k bps to 8k bps)
Frame size 160 bits

Output rate 20ms / frame (160 b / 8k bps)
Divided into 4 sub-frames

7
What is the Repetitious Part?
8
VSELP Repetitious Part (cont)

5 of code runs 70 of time and 92 of hits
(ignoring theta)
5 found in two functions 1) dot_product 2)
iiRfilter
Make sure these two computations are efficient!

9
HW Implementation Strategy

Run (the 5) repetitious part on special hardware
optimized for these two functions
Run (the 95) remaining part on general purpose
processor
Use Pleiades architecture template as guide to
select architecture instance Pleiades

10
HW for dot_product and iiRfilter

Can implement using address generators, memory
elements, and MAC units
dot_product
2 Add Gen
2 Memory
1 MAC
iiRfilter
3 Add Gen
3 Memory
1 MAC

11
Architecture Instance
Memory
Memory
CPU
Memory
Add Gen
Add Gen
Add Gen
Interconnect Network
MAC
MAC
Add Gen
Add Gen
Memory
Memory
1 CPU, 12 Satellites, Networked A Domain Specific
Reconfigurable DSP Architecture!
12
HW Simplifying Assumptions

Satellites and network configuration time is
ignored
Network is configured once at startup
(statically)
Satellites may be run-time configured
(dynamically)
Hardware performance tracks 12 gate ring
oscillator over voltage (1 - 3 volts)
Satellites can be accessed by one thread at a
time
The network consumes no energy!

13
Processing 1 Stream

Stream processed by a single thread
Monitor the stream input buffer level
Adjust the task priority and hardware throughput
as required
Thread uses Satellite processors for repetitive
code

Q
scheduler
bl1
tp
p1
Q
14
Processing 4 Streams
Q
Q
hw
hw
hw
Threads compete for Satellites Priority now
important
hw
hw
hw
15
Thread Scheduling

Exists
Use preemptive multithreaded scheduling
Dispatch highest priority thread for next time
slice
Extensions
Dynamically adjust each thread priority base on
its workload
Dynamically adjust total hardware throughput
based on aggregate workload

16
Priority and Throughput Adaptation

4 Performance Levels (TP)
for given fsample required, pick voltage via LUT
Sub-frames mark queue levels (0, 40, 80, 120,
160)
Adjust processor throughput and task priority by
viewing queue levels

17
Satellite Access Simulation Trace
18
Results
19
Conclusions

Sporadic data stream with fixed throughput can be
computed with reduced energy by stretching
computation in time
Using specialized processors for redundant
computation can reduce time and energy
Multiple sporadic data streams can be viewed as a
single with aggregate duty with slight overhead
Always compute using maximum time allowable (and
adjust processor throughput) to minimize energy

20
Future Work

Account for interconnect network energy
consumption
Investigate critical instance adaptation behavior
and adaptation transients
Account for satellite and network configuration
time
Hide configuration time of satellites by setting
up next operation during current
Consider different satellite selection /
configuration
When do energy reduction returns diminish?
What about sensitivity for thread time slice?