Techniques for Low Power Turbo Coding in Software Radio - PowerPoint PPT Presentation

About This Presentation

Title:

Techniques for Low Power Turbo Coding in Software Radio

Description:

Changes with number of paths used. Change hardware at runtime ... error between value stored in lookup table vs. computation becomes negligible. ... – PowerPoint PPT presentation

Number of Views:43

Avg rating:3.0/5.0

Slides: 41

Provided by: chr7164

Learn more at: http://www.ann.ece.ufl.edu

Category:

more less

Transcript and Presenter's Notes

Title: Techniques for Low Power Turbo Coding in Software Radio

1
Techniques for Low Power Turbo Coding in Software
Radio

Joe Antoon
Adam Barnett

2
Software Defined Radio

Single transmitter for many protocols
Protocols completely specified in memory
Implementation
Microprocessors
Field programmable logic

3
Why Use Software Radio?

Wireless protocols are constantly reinvented
5 Wi-Fi protocols
7 Bluetooth protocols
Proprietary mice and keyboard protocols
Mobile phone protocol alphabet soup
Custom DSP logic for each protocol is costly

4
So Why Not Use Software Radio?

Requires high performance processors
Consumes more power

5
Turbo Coding

Channel coding technique
Throughput nears theoretical limit
Great for bandwidth limited applications
CDMA2000
WiMAX
NASA s Messenger probe

6
Turbo Coding Considerations

Presents a design trade-off
Turbo coding is computationally expensive
But it reduces cost in other areas
Bandwidth
Transmission power

7
Reducing Power in Turbo Decoders

FPGA turbo decoders
Use dynamic reconfiguration
General processor turbo decoders
Use a logarithmic number system

8
Generic Turbo Encoder
Data stream
s
Component Encoder
p1
Component Encoder
Interleave
p2
9
Generic Turbo Decoder
Decoder
Decoder
Interleave
r
q1
q2
10
Decoder Design Options

Multiple algorithms used to decode
Maximum A-Posteriori (MAP)
Most accurate estimate possible
Complex computations required
Soft-Output Viterbi Algorithm
Less accurate
Simpler calculations

Decoder

11
FPGA Design Options

Goal Make an adaptive decoder

Decoder
Received Data
Original sequence
Parity
12
Component Encoder
M
M
Generator Function

M blocks are 1-bit registers
Memory provides encoder state

13
Encoder State
0
1
00
00
00
GF
01
01
01
0
1
10
10
10
11
11
11
1
0
Time
14
Viterbis Algorithm

Determine most likely output
Simulate encoder state given received values

Time
15
Viterbis Algorithm

Write Compute branch metric (likelihood)
Traceback Compute path metric, output data
Update Compute distance between paths
Rank paths by path metric and choose best
For N memory
Must calculate 2N-1 paths for each state

16
Adaptive SOVA

SOVA Inflexible path system scales poorly
Adaptive SOVA Heuristic
Limit to M paths max
Discard if path metric below threshold T
Discard all but top M paths when too many paths

17
Implementing in Hardware
Control
Branch Metric Unit
Add Compare Select
Survivor memory
r
q
18
Implementing in Hardware

Add, Compare, Select
Append path metric
Discard paths
Survivor Memory
Store / discard path bits

Controller
Control memory
select paths
Branch Metric Unit
Compute likelihood
Consider all possible next states

19
Implementing in Hardware

Add, Compare, Select Unit

Present State Path Values
Next State Path Values
Path Distance
Compute, ComparePaths
gt T
Branch Values
Threshold
20
Dynamic Reconfiguration

Bit Error Rate (BER)
Changes with signal strength
Changes with number of paths used
Change hardware at runtime
Weak signal use many paths, save accuracy
Strong signal use few paths, save power
Sample SNR every 250k bits, reconfigure

21
Dynamic Reconfiguration
22
Experimental Results
K (Number of encoder bits) proportional to
average speed, power
23
Experimental Results

FPGA decoding has a much higher throughput
Due to parallelism

24
Experimental Results

ASOVA performs worse than commercial cores
However, in other metrics it is much better
Power
Memory usage
Complexity

25
Future Work

Use present reconfiguration means to design
Partial reconfiguration
Dynamic voltage scaling
Compare to power efficient software methods

26
Power-Efficient Implementation of a Turbo Decoder
in SDR System

Turbo coding systems are created by using one of
three general processor types
Fixed Point (FXP)
Cheapest, simplest to implement, fastest
Floating Point (FLP)
More precision than fixed point
Logarithmic Numbering System (LNS)
Simplifies complex operations
Complicates simple add/subtract operations

27
Logarithmic Numbering System

X s, x log(b)x
S sign bit, remaining bits used for number
value
Example
Let b 2,
Then the decimal number 8 would be represented as
log(2)8 3
Numbers are stored in computer memory in 2s
compliment form (3 01111101) (sign bit 0)

28
Why use Logarithmic System?

Greatly simplifies multiplication, division,
roots, and exponents
Multiplication simplifies to addition
E.g. 8 4 32, LNS gt 3 2 5
(25 32)
Division simplifies to subtraction
E.g. 8 / 4 2, LNS gt 3 2 1
(21 2)

29
Why use Logarithmic System?

Roots are done as right shifts
E.g. sqrt(16) 4,
LNS gt 4 shifted right 2
(22 4)
Exponents are done as left shifts
E.g. 82 64, LNS gt 3 shifted left 6
(26 64)

30
So why not use LNS for all processors?

Unfortunately addition and subtraction are
greatly complicated in LNS.
Addition log(b)x y x log(b)1 bz
Subtraction log(b)x - y x log(b)1 -
bz
Where z y x
Turbo coding/decoding is computationally intense,
requiring more mults, divides, roots, and exps,
than adds or subtracts

31
Turbo Decoder block diagram

Use present reconfiguration means to design
Partial reconfiguration
Dynamic voltage scaling
Compare to power efficient software methods

Each bit decision requires a subtraction, table
look up, and addition

32
Proposed new block diagram

As difference between ea and eb becomes larger,
error between value stored in lookup table vs.
computation becomes negligible.
For this simulation a difference of gt5 was used

33
How it works

For d gt 5
New Mux (on right) ignores SRAM input and simply
adds 0 to MAX result.
d gt 5, pre-Decoder circuitry disables the SRAM
for power conservation.

34
Comparing the 3 simulations

Comparisons were done between a 16-bit fixed
point microcontroller, a 16-bit floating point
processor, and a 20-bit LNS processor.
11-bits would be sufficient for FXP and FLP, but
16-bit processors are much more common
Similarly 17-bits would suffice for LNS
processor, but 20-bit is common type

35
Power Consumption
36
Latency

Recall Max(a,b) ln(eaeb)

37
Power savings

Pre-Decoder circuitry adds 11.4 power
consumption compared to SRAM read.
So when an SRAM read is required, we use 111.4
of the power compared to the unmodified system
However, when SRAM is blocked we only use 11.4
of the power we used before.

38
Power savings

The CACTI simulations for the system reported
that the Max operation accounted for 40 of all
operations in the decoder
The Max operations for the modified system
required 69 of the power when compared to the
unmodified system.
This leads to an overall power savings of
69 40 27.6

39
Conclusion

Turbo codes are computationally intense,
requiring more complex operations than simple
ones
LNS processors simplify complex operations at the
expense of making adding and subtracting more
difficult

40
Conclusion

Using a LNS processor with slight modifications
can reduce power consumption by 27.6
Overall latency is also reduced due to ease of
complex operations in LNS processor when compared
to FXP or FLP processors.

Write a Comment

User Comments (0)