LowEnergy MultiUser VSELP Vocoder for a Domain Specific Reconfigurable DSP Architecture - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

LowEnergy MultiUser VSELP Vocoder for a Domain Specific Reconfigurable DSP Architecture

Description:

CODEC. How to Save Energy? Eliminate idle time [Weiser, etal. ... CODEC Data Stream Observations. Users don't talk continuously (Data is sporadic and bursty) ... – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 21
Provided by: royas
Category:

less

Transcript and Presenter's Notes

Title: LowEnergy MultiUser VSELP Vocoder for a Domain Specific Reconfigurable DSP Architecture


1
Low-Energy Multi-User VSELP Vocoder for a Domain
Specific Reconfigurable DSP Architecture
Roy A. Sutton University of California, Berkeley
CS 294-1 Spring 1998
2
Typical Digital Cellular phone
Encode
ADC
Modulate
Encode
Duplex
Controller
DAC
Demodulate
Decode
Use CODEC to compress the data stream!
3
Typical Digital Base Station
PBX
CH
R/F
...
Can we replace multiple CODEC with a single and
save Space? Cost? Energy?
4
How to Save Energy?
  • Eliminate idle time Weiser, etal., OSDI-94
  • Reduce energy per Op Chandrakassan, etal.
    ISLPED-96
  • Work just fast enough and use most efficient
    processing for most repetitious work!

5
CODEC Data Stream Observations
  • Users dont talk continuously (Data is sporadic
    and bursty)
  • During silence, very little processing is
    required
  • No benefit from processing data faster than
    required
  • Data arrives in packets which need be processed
    just fast enough for steady state

6
Which CODEC? answer VSELP
  • Compression factor 8x (64k bps to 8k bps)
  • Frame size 160 bits
  • Output rate 20ms / frame (160 b / 8k bps)
  • Divided into 4 sub-frames

7
What is the Repetitious Part?
8
VSELP Repetitious Part (cont)
  • 5 of code runs 70 of time and 92 of hits
    (ignoring theta)
  • 5 found in two functions 1) dot_product 2)
    iiRfilter
  • Make sure these two computations are efficient!

9
HW Implementation Strategy
  • Run (the 5) repetitious part on special hardware
    optimized for these two functions
  • Run (the 95) remaining part on general purpose
    processor
  • Use Pleiades architecture template as guide to
    select architecture instance Pleiades

10
HW for dot_product and iiRfilter
  • Can implement using address generators, memory
    elements, and MAC units
  • dot_product
  • 2 Add Gen
  • 2 Memory
  • 1 MAC
  • iiRfilter
  • 3 Add Gen
  • 3 Memory
  • 1 MAC

11
Architecture Instance
Memory
Memory
CPU
Memory
Add Gen
Add Gen
Add Gen
Interconnect Network
MAC
MAC
Add Gen
Add Gen
Memory
Memory
1 CPU, 12 Satellites, Networked A Domain Specific
Reconfigurable DSP Architecture!
12
HW Simplifying Assumptions
  • Satellites and network configuration time is
    ignored
  • Network is configured once at startup
    (statically)
  • Satellites may be run-time configured
    (dynamically)
  • Hardware performance tracks 12 gate ring
    oscillator over voltage (1 - 3 volts)
  • Satellites can be accessed by one thread at a
    time
  • The network consumes no energy!

13
Processing 1 Stream
  • Stream processed by a single thread
  • Monitor the stream input buffer level
  • Adjust the task priority and hardware throughput
    as required
  • Thread uses Satellite processors for repetitive
    code

Q
scheduler
bl1
tp
p1
Q
14
Processing 4 Streams
Q
Q
hw
hw
hw
Threads compete for Satellites Priority now
important
hw
hw
hw
15
Thread Scheduling
  • Exists
  • Use preemptive multithreaded scheduling
  • Dispatch highest priority thread for next time
    slice
  • Extensions
  • Dynamically adjust each thread priority base on
    its workload
  • Dynamically adjust total hardware throughput
    based on aggregate workload

16
Priority and Throughput Adaptation
  • 4 Performance Levels (TP)
  • for given fsample required, pick voltage via LUT
  • Sub-frames mark queue levels (0, 40, 80, 120,
    160)
  • Adjust processor throughput and task priority by
    viewing queue levels

17
Satellite Access Simulation Trace
18
Results
19
Conclusions
  • Sporadic data stream with fixed throughput can be
    computed with reduced energy by stretching
    computation in time
  • Using specialized processors for redundant
    computation can reduce time and energy
  • Multiple sporadic data streams can be viewed as a
    single with aggregate duty with slight overhead
  • Always compute using maximum time allowable (and
    adjust processor throughput) to minimize energy

20
Future Work
  • Account for interconnect network energy
    consumption
  • Investigate critical instance adaptation behavior
    and adaptation transients
  • Account for satellite and network configuration
    time
  • Hide configuration time of satellites by setting
    up next operation during current
  • Consider different satellite selection /
    configuration
  • When do energy reduction returns diminish?
  • What about sensitivity for thread time slice?
Write a Comment
User Comments (0)
About PowerShow.com