Techniques for Low Power Turbo Coding in Software Radio - PowerPoint PPT Presentation

About This Presentation
Title:

Techniques for Low Power Turbo Coding in Software Radio

Description:

Changes with number of paths used. Change hardware at runtime ... error between value stored in lookup table vs. computation becomes negligible. ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 41
Provided by: chr7164
Category:

less

Transcript and Presenter's Notes

Title: Techniques for Low Power Turbo Coding in Software Radio


1
Techniques for Low Power Turbo Coding in Software
Radio
  • Joe Antoon
  • Adam Barnett

2
Software Defined Radio
  • Single transmitter for many protocols
  • Protocols completely specified in memory
  • Implementation
  • Microprocessors
  • Field programmable logic

3
Why Use Software Radio?
  • Wireless protocols are constantly reinvented
  • 5 Wi-Fi protocols
  • 7 Bluetooth protocols
  • Proprietary mice and keyboard protocols
  • Mobile phone protocol alphabet soup
  • Custom DSP logic for each protocol is costly

4
So Why Not Use Software Radio?
  • Requires high performance processors
  • Consumes more power

5
Turbo Coding
  • Channel coding technique
  • Throughput nears theoretical limit
  • Great for bandwidth limited applications
  • CDMA2000
  • WiMAX
  • NASA s Messenger probe

6
Turbo Coding Considerations
  • Presents a design trade-off
  • Turbo coding is computationally expensive
  • But it reduces cost in other areas
  • Bandwidth
  • Transmission power

7
Reducing Power in Turbo Decoders
  • FPGA turbo decoders
  • Use dynamic reconfiguration
  • General processor turbo decoders
  • Use a logarithmic number system

8
Generic Turbo Encoder
Data stream
s
Component Encoder
p1
Component Encoder
Interleave
p2
9
Generic Turbo Decoder
Decoder
Decoder
Interleave
r
q1
q2
10
Decoder Design Options
  • Multiple algorithms used to decode
  • Maximum A-Posteriori (MAP)
  • Most accurate estimate possible
  • Complex computations required
  • Soft-Output Viterbi Algorithm
  • Less accurate
  • Simpler calculations
  • Decoder

11
FPGA Design Options
  • Goal Make an adaptive decoder

Decoder
Received Data
Original sequence
Parity
12
Component Encoder
M
M
Generator Function
  • M blocks are 1-bit registers
  • Memory provides encoder state

13
Encoder State
0
1
00
00
00
GF
01
01
01
0
1
10
10
10
11
11
11
1
0
Time
14
Viterbis Algorithm
  • Determine most likely output
  • Simulate encoder state given received values

Time
15
Viterbis Algorithm
  • Write Compute branch metric (likelihood)
  • Traceback Compute path metric, output data
  • Update Compute distance between paths
  • Rank paths by path metric and choose best
  • For N memory
  • Must calculate 2N-1 paths for each state

16
Adaptive SOVA
  • SOVA Inflexible path system scales poorly
  • Adaptive SOVA Heuristic
  • Limit to M paths max
  • Discard if path metric below threshold T
  • Discard all but top M paths when too many paths

17
Implementing in Hardware
Control
Branch Metric Unit
Add Compare Select
Survivor memory
r
q
18
Implementing in Hardware
  • Add, Compare, Select
  • Append path metric
  • Discard paths
  • Survivor Memory
  • Store / discard path bits
  • Controller
  • Control memory
  • select paths
  • Branch Metric Unit
  • Compute likelihood
  • Consider all possible next states

19
Implementing in Hardware
  • Add, Compare, Select Unit

Present State Path Values
Next State Path Values
Path Distance
Compute, ComparePaths
gt T
Branch Values
Threshold
20
Dynamic Reconfiguration
  • Bit Error Rate (BER)
  • Changes with signal strength
  • Changes with number of paths used
  • Change hardware at runtime
  • Weak signal use many paths, save accuracy
  • Strong signal use few paths, save power
  • Sample SNR every 250k bits, reconfigure

21
Dynamic Reconfiguration
22
Experimental Results
K (Number of encoder bits) proportional to
average speed, power
23
Experimental Results
  • FPGA decoding has a much higher throughput
  • Due to parallelism

24
Experimental Results
  • ASOVA performs worse than commercial cores
  • However, in other metrics it is much better
  • Power
  • Memory usage
  • Complexity

25
Future Work
  • Use present reconfiguration means to design
  • Partial reconfiguration
  • Dynamic voltage scaling
  • Compare to power efficient software methods

26
Power-Efficient Implementation of a Turbo Decoder
in SDR System
  • Turbo coding systems are created by using one of
    three general processor types
  • Fixed Point (FXP)
  • Cheapest, simplest to implement, fastest
  • Floating Point (FLP)
  • More precision than fixed point
  • Logarithmic Numbering System (LNS)
  • Simplifies complex operations
  • Complicates simple add/subtract operations

27
Logarithmic Numbering System
  • X s, x log(b)x
  • S sign bit, remaining bits used for number
    value
  • Example
  • Let b 2,
  • Then the decimal number 8 would be represented as
    log(2)8 3
  • Numbers are stored in computer memory in 2s
    compliment form (3 01111101) (sign bit 0)

28
Why use Logarithmic System?
  • Greatly simplifies multiplication, division,
    roots, and exponents
  • Multiplication simplifies to addition
  • E.g. 8 4 32, LNS gt 3 2 5
  • (25 32)
  • Division simplifies to subtraction
  • E.g. 8 / 4 2, LNS gt 3 2 1
  • (21 2)

29
Why use Logarithmic System?
  • Roots are done as right shifts
  • E.g. sqrt(16) 4,
  • LNS gt 4 shifted right 2
  • (22 4)
  • Exponents are done as left shifts
  • E.g. 82 64, LNS gt 3 shifted left 6
  • (26 64)

30
So why not use LNS for all processors?
  • Unfortunately addition and subtraction are
    greatly complicated in LNS.
  • Addition log(b)x y x log(b)1 bz
  • Subtraction log(b)x - y x log(b)1 -
    bz
  • Where z y x
  • Turbo coding/decoding is computationally intense,
    requiring more mults, divides, roots, and exps,
    than adds or subtracts

31
Turbo Decoder block diagram
  • Use present reconfiguration means to design
  • Partial reconfiguration
  • Dynamic voltage scaling
  • Compare to power efficient software methods
  • Each bit decision requires a subtraction, table
    look up, and addition

32
Proposed new block diagram
  • As difference between ea and eb becomes larger,
    error between value stored in lookup table vs.
    computation becomes negligible.
  • For this simulation a difference of gt5 was used

33
How it works
  • For d gt 5
  • New Mux (on right) ignores SRAM input and simply
    adds 0 to MAX result.
  • d gt 5, pre-Decoder circuitry disables the SRAM
    for power conservation.

34
Comparing the 3 simulations
  • Comparisons were done between a 16-bit fixed
    point microcontroller, a 16-bit floating point
    processor, and a 20-bit LNS processor.
  • 11-bits would be sufficient for FXP and FLP, but
    16-bit processors are much more common
  • Similarly 17-bits would suffice for LNS
    processor, but 20-bit is common type

35
Power Consumption
36
Latency
  • Recall Max(a,b) ln(eaeb)

37
Power savings
  • Pre-Decoder circuitry adds 11.4 power
    consumption compared to SRAM read.
  • So when an SRAM read is required, we use 111.4
    of the power compared to the unmodified system
  • However, when SRAM is blocked we only use 11.4
    of the power we used before.

38
Power savings
  • The CACTI simulations for the system reported
    that the Max operation accounted for 40 of all
    operations in the decoder
  • The Max operations for the modified system
    required 69 of the power when compared to the
    unmodified system.
  • This leads to an overall power savings of
  • 69 40 27.6

39
Conclusion
  • Turbo codes are computationally intense,
    requiring more complex operations than simple
    ones
  • LNS processors simplify complex operations at the
    expense of making adding and subtracting more
    difficult

40
Conclusion
  • Using a LNS processor with slight modifications
    can reduce power consumption by 27.6
  • Overall latency is also reduced due to ease of
    complex operations in LNS processor when compared
    to FXP or FLP processors.
Write a Comment
User Comments (0)
About PowerShow.com