Design of a HighThroughput LowPower IS95 Viterbi Decoder - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Design of a HighThroughput LowPower IS95 Viterbi Decoder

Description:

Construction of a complex graph called trellis. Computation of the shortest path. IS95 VD Trellis. 256 nodes # of symbols. 1. 2. 3. 4. Challenge of Large-State VD ... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 35
Provided by: xun2
Category:

less

Transcript and Presenter's Notes

Title: Design of a HighThroughput LowPower IS95 Viterbi Decoder


1
Design of a High-Throughput Low-Power IS95
Viterbi Decoder
  • Xun Liu Marios C. Papaefthymiou
  • Advanced Computer Architecture Laboratory
  • Electrical Engineering and Computer Science
    Department
  • University of Michigan

2
(No Transcript)
3
(No Transcript)
4
IS95 Convolutional Encoding
  • Used in the reverse link of IS95 CDMA system
  • 256 states (8 state registers)
  • Rate 1/3
  • Maximum Free Distance coding

5
Viterbi Decoding (VD)
  • VD is optimal for convolutional codes.
  • Maximum likelihood decoding scheme.
  • Minimum error for additive white Gaussian noise
    channel.
  • VD procedure.
  • Construction of a complex graph called trellis.
  • Computation of the shortest path.

6

7
Challenge of Large-State VD Designs
  • High computational complexity.
  • VDs with hundreds of states require multiple Gops
    throughput, when symbol transfer rates reach
    Mbps.
  • Parallel processing.
  • High interconnect power dissipation.
  • Complex routing among the processors.

For large-state VDs, global data transfer and
interconnect issues must be considered carefully
8
Viterbi Decoder Designs
9
Presentation Outline
  • Viterbi decoding overview
  • Our contributions
  • Data transfer oriented hierarchical
    inter-processor optimization
  • Intra-processor power optimization
  • Chip data

10
Encoding Example
11
Viterbi Decoding

12
(No Transcript)
13
(No Transcript)
14
VD Summary
  • Each decoded symbol requires a layer of similar
    computations
  • 2N edge weight computations (N of states).
  • N add-compare-select (ACS) operations.
  • Operations within each layer are independent.

15
Viterbi Decoder Architectures
Design space number of processors used
16
Viterbi Decoder Architectures
Design space number of processors used
Intermediate solutions
17
Key Issues
  • How many ACS processors?
  • Which ACS operations are executed in each
    processor?
  • Which ACS operations can be executed
    concurrently?
  • In what order are the operations executed?
  • Can processors be pipelined?

18
Q Which operations are executed in each ACS
processor?A Operation partitioning for global
data transfer reduction
19
Operation Partitioning Example
20
Operation Partitioning Results
  • Obtain solution by iterative bi-partitioning
    (KL).
  • For 64 partitions, gt50 data transfers are
    global.
  • Largest absolute reduction 4 to 32 partitions.

21
Q Which operations are executed
simultaneously?A Operation packing for global
bus minimization
22
Operation Packing Example
0
2
2
0
23
Operation Packing
  • Packing procedure for global bus minimization
  • One operation from each partition in each slice
  • Global data transfers within a slice done
    simultaneously
  • Bus cost the number of ACS units connected
  • Our heuristic
  • Distribute global transfers evenly in all slices

24
Operation Packing Results
  • Comparison solution one bus between any two ACS
    processors
  • Global buses reduction 31 on the average
  • Most effective range 8 to 32 partitions

25
Q In what order should operations be executed?
Q Can ACS units be pipelined? A
Non-forwarding scheduling
26
Non-forwarding Scheduling
27
Non-forwarding Scheduling Results
  • Greedy heuristic
  • Pick slice with the least dependencies first.
  • Iteratively pick the next slice such that the
    upper bound of the non-forwarding pipeline depth
    derived by the chosen slices is maximized.
  • Architectures with 16 or more parallel processors
    allow very limited non-forwarding pipeline depth.

28
Q How many ACS processors should be used?
29
Viterbi Decoder Architecture
30
Processor Internal Architecture
  • 16-bit datapath
  • 8 pipeline stages

31
Processor Level Power Reduction
  • Combine precomputation and saturation arithmetic.
  • If one or two operands overflow, ACS is partially
    shut off.
  • No significant degradation of the decoding
    performance.

32
Chip Implementation
  • Design RTL Verilog
  • Synthesis Design Analyzer
  • Placement manual floorplan
  • Routing Silicon Ensemble
  • Verification gate level Verilog
  • Power estimation Primepower

33
Chip Summary
34
Conclusion
  • Design case study of a 256-state IS95 VD
  • Hierarchical optimization methodology
  • Global data transfer minimization
  • Global bus reduction
  • Non-forwarding scheduling
  • Precomputation and saturation arithmetic
  • Viterbi decoder
  • 8 pipelined processors
  • 4 global buses
  • Throughput 20Mbps
  • Power dissipation 450mW
Write a Comment
User Comments (0)
About PowerShow.com