Title: From SODA to Scotch: The Evolution of a Wireless Baseband Processor
1From SODA to Scotch The Evolution of a Wireless
Baseband Processor
- Mark Woh (University of Michigan - Ann Arbor)
- Yuan Lin (University of Michigan - Ann Arbor)
- Sangwon Seo (University of Michigan - Ann Arbor)
- Scott Mahlke (University of Michigan - Ann Arbor)
- Trevor Mudge (University of Michigan - Ann Arbor)
- Chaitali Chakrabarti (Arizona State University)
- Richard Bruce (ARM Ltd.)
- Danny Kershaw (ARM Ltd.)
- Alastair Reid (ARM Ltd.)
- Mladen Wilder (ARM Ltd.)
- Krisztian Flautner (ARM Ltd.)
2From SODA to Scotch What is this talk about?
- If a fully programmable 3G baseband processor
commercially viable? - The SODA processor was the first full research
design ISCA06 - ARM RD developed the Ardbeg SDR commercial
prototype - What we will present
- Comparison study between SODA and Ardbeg
- Lessons learned in the evolution
2
3Mobile Computing
- In 2007, world-wide mobile telephone
subscription 3.3 billion1 - Half of the worlds population
- Some countries have mobile penetration over 100
- Largest consumer electronic device in terms of
volume - Wireless multimedia anywhere at anytime
1. Global cellphone penetration reaches 50 pct,
Reuter, Nov. 29th, 2007
3
4Wireless Communication
4
5Software Defined Radio
5
6Software Defined Radio
6
7Software Defined Radio
7
8Advantages of Soft Radio
- Design factor
- Protocol complexity
- Multi-mode operation
- Prototyping and bug fixes
- Cost factor
- Time-to-market
- Silicon area
- Higher volume
- Longevity of platform
8
9Mobile SDR Design Challenges
- SDR Design Objectives for 3G and WiFi
- Throughput requirements
- 40Gops peak throughput
- Power budget
- 100mW500mW peak power
9
10First Generation SDR Processor SODA
- Our first attempt was the SODA processor
- Design at 180nm technology
- Built for WCDMA and 802.11a in mind
- Sub 500mW operation estimated at 90nm
11SODA
- System
- Heterogeneous multi-core architecture
- Multi-level scratchpad memories
- PE
- SIMD/Scalar/AGU LIW
- 32-lane 16-bit SIMD
- 16-bit scalar datapath
- Scalar-to-SIMD
- SIMD-to-scalar
- Iterative Perfect Shuffle Network
11
12SODA Summary
Picochip 130nm
Mobile SDR requirements
SODA 180nm
SODA 90nm
Sandbridge 90nm
TI C6x 90nm
NXP EVP 90nm req. ASICs
12
13Ardbeg SDR Processor
Sparse Connected VLIW
Application Specific Hardware Block Floating
Point
3 Read/2 Write RF for VLIW
8,16,32 bit fixed point support
Fused Permute ALU operations
Combined Scalar/Vector Memory
128-lane 8-bit Banyan Network
Multiple Data Address Accesses
14Evolution to Ardbeg Lessons Learned
- Ardbeg achieved 3x speedup overall at 30 lower
power than SODA - To get these improvements many lessons were
learned as a result of the studies done - We will present a few of these studies
- 1) Benefit of Wide SIMD
- 2) VLIW on SIMD support
- 3) Support for Complex Shuffle Network
- 4) Application Specific Hardware
151) Benefiting from Wide SIMD
- Increasing SIMD width still a good idea for SDR
- But area becomes a big concern
- 32 wide 16-bit SIMD at 90nm seems a good fit
162) VLIW Support for Wide SIMD
- VLIW execution on top of the SIMD datapath
- 3 read ports, 2 write ports
- Shared between SIMD units
- 2-issue SIMD LIW
- Only support the most frequently used SIMD op
pairs
AGU
32-lane SIMD ALU
E X
W B
AGU
AGU
SIMD RF
128-lane SSN
E X
W B
Interconnects
Interconnects
SIMD scalar trans. unit
E X
W B
SIMD
scalar RF
16-bit ALU
E X
W B
Scalar
16
172) VLIW on SIMD Support
- There is a distinct set of instructions that
execute frequently at the same time - We want to take advantage of this in order to
reduce complexity of VLIW
182) VLIW on SIMD Support
- 3 Read/ 2 Write provides us for the most case the
best overall design point
193) Support for Shuffle Network
2 stage 16-lane Banyan network
- 7-stage single-cycle SSN
- Banyan network
- 128-lane 8-bit (64-lane 16-bit)
19
203) Support for Shuffle Network
- 64-Wide Banyan gives us close to a simple
iterative interconnect energy with crossbar like
performance
214) Application Specific Optimizations
- Application specific hardware
- Turbo coprocessor
- Block-floating point support
- Fused Permute-ALU operations
- Interleaving support
- Trade-off programmability for performance
- Less soft than SODA
- But more energy efficient for common operations
21
224) Application Specific Optimizations
- Some kernels are common among many different
protocols - Many protocols use the same Error Correction
algorithms - Turbo Coprocessor is one of them
- Tradeoff between Programmable vs ASIC
- ASIC implementations is around 5x more efficient
than programmable implementation - SODA PE 2Mbps with 111mW in 90nm
- ASIC 2Mbps with 21mW in 90nm
23Overall Improvements
- Achieves between 1.5-7x speedup for wireless
algorithms compared to SODA
24Summary of Ardbeg
- Power vs Throughput for protocols on different
processors
25Summary of Ardbeg
- Ardbeg is lower power at same throughput
- We are getting closer to ASICs
26Conclusion
- SODA ? Ardbeg
- Overall 1.5-7x improvement across multiple
wireless algorithms - 30 less power over SODA (with turbo also in
software) - Fully programmable research design evolved to a
commercial design that is less soft - Feasible to design programmable solutions that
start to approach ASIC efficiency - ASICs are locally optimal for single kernels but
combined create an inefficient system - Programmability allows time multiplexing of
hardware Less hardware, same amount of work
26
27 Questions?