Flexible wireless communication architectures - PowerPoint PPT Presentation

Loading...

PPT – Flexible wireless communication architectures PowerPoint presentation | free to download - id: 14c0c7-MDdmM



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Flexible wireless communication architectures

Description:

Parallel Viterbi needs re-ordering for SWAPs. Exploiting ... no need to re-compile or load another code. as long as parallelism/cluster ratio is constant ... – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 42
Provided by: Srid
Learn more at: http://www.ece.rice.edu
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Flexible wireless communication architectures


1
Flexible wireless communication architectures
  • Sridhar Rajagopal
  • Department of Electrical and Computer Engineering
  • Rice University, Houston TX
  • Faculty Candidate Seminar Southern Methodist
    University
  • April 23, 2003

This work has been supported in part by NSF,
Nokia and Texas Instruments
2
Future wireless devices demand flexibility
  • Multiple algorithms and environments supported in
    same device
  • High data rate mobile devices with multimedia
  • Flexible algorithms Multiple antennas, complex
    signal processing
  • Flexible architectures High performance (Mbps),
    low power (mW)
  • Fast design with structured exploration

3
Flexibility needed in different layers
Application Layer
Puppeteer project at Rice
http//www.cs.rice.edu/CS/Systems/Puppeteer/
Network Layer
MAC Layer
Physical Layer
Analog RF
4
Research vision Attain flexibility
  • Algorithms
  • Flexibility support variety of sophisticated
    algorithms
  • Architectures
  • Flexibility adapts hardware to algorithms
  • Fast, structured design exploration

5
Contributions Algorithms
  • Multi-user channel estimationJnl. Of VLSI Sig.
    Proc.02, ASAP00
  • Matrix-inversions
  • Numerical techniques
  • conjugate-gradient descent for complexity
    reduction
  • Multi-user detection ISCAS01
  • Block-based computation to streaming computations
  • Pipelining, lower memory requirements
  • Parallel, fixed-point, streaming VLSI
    implementations IEEE Trans. Wireless Comm.02

6
Contributions Architectures
  • Heterogeneous DSP-FPGA system designs
    ICSPAT00
  • Computer arithmeticSymp. On Comp. Arith01
  • Dynamic truncation in ASICs using on-line
    arithmetic
  • with Most Significant Digit First computation
  • Ph.D. Thesis
  • Scalable Wireless Application-specific Processors
    (SWAPs)
  • Rapid, structured architectures with
    flexibility-performance tradeoffs

7
Scalable Wireless Application-specific Processors
  • Family of flexible programmable processors
  • Clusters of ALUs
  • High performance by supporting 100s of ALUs
  • Can provide customization for various algorithms
  • Adapts (swaps) architecture dynamically for
    power

Scale ALUs
Scale Clusters
8
Rapid, structured design for SWAPs
Low complexity, parallel, fixed point algorithms
Architecture Exploration
ASIC design
apply
SWAPs
DSP design
apply
9
Research vision summary
  • Provide a structured framework to rapidly
    explore
  • flexible, high performance, low power
    architectures (SWAPs)
  • Efficient algorithm design for mapping to SWAPs
  • Understanding of algorithms, DSPs and ASICs used
  • Flexibility-performance trade-offs
  • Inter-disciplinary research
  • Wireless communications, VLSI Signal Processing,
    Computer architecture, Computer arithmetic,
    Circuits, CAD, Compilers

10
Talk Outline
  • Research vision
  • SWAPs - Background
  • Algorithm design for SWAPs
  • Architecture design for SWAPs
  • Current and Future Research Goals

11
SWAPs borrow from DSPs
  • DSPs use Instruction Level Parallelism (ILP)
    Subword Parallelism (MMX)
  • Not enough ALUs for GOPs of computation-- Need
    100s
  • TI C6x has 8 ALUs
  • Why not more ALUs?
  • Cannot support more registers (area,ports)
  • Difficult to find ILP as ALUs increase

12
SWAPs borrow from ASICs
  • Exploit data parallelism (DP)
  • Available in many wireless algorithms
  • This is what ASICs do!
  • int i,aN,bN,sumN // 32 bits
  • short int cN,dN,diffN // 16 bits packed
  • for (i 0 ilt 1024 i)
  • sumi ai bi
  • diffi ci - di

DP
ILP
Subword
13
SWAPs borrow from stream processors
  • Kernels (computation) and streams (communication)
  • Use local data in clusters providing GOPs support
  • Imagine stream processor at Stanford Rixner01

Scott Rixner. Stream Processor Architecture,
Kluwer Academic Publishers Boston, MA, 2001.
14
SWAPs are multi-cluster DSPs
Memory Stream Register File (SRF)










ILP















DP
SWAPs adapt clusters to DP Identical clusters,
same operations. Power-down unused FUs, clusters
DSP (1 cluster)
15
Arithmetic clusters in SWAPs
Distributed Register Files (supports more ALUs)
From/To SRF












Cross Point
/
Intercluster Network
/
/
/
Comm. Unit
Scratchpad (indexed accesses)
16
Talk Outline
  • Research vision
  • SWAPs Background
  • Algorithm design for SWAPs
  • Architecture design for SWAPs
  • Current and Future Research Goals

17
SWAPs Physical layer algorithms
Antenna
Baseband processing
Detection
Decoding
Higher (MAC/Network/OS) Layers
RF Front-end
Channel estimation
Complex signal processing algorithms with GOPs of
computation
18
SWAP mapping example Viterbi decoding
  • Multiple antenna systems (MIMO systems)
  • Complexity exponential with transmit x receive
    antennas
  • Estimation Linear MMSE, blind, conjugate
    gradient.
  • Detection FFT, (blind) interference
    cancellation.
  • Decoding Viterbi, Turbo, LDPC. joint schemes
  • SWAP flexibility lets you use the best algorithms
    for the situation
  • Example for concept demonstration Viterbi
    decoding

19
Parallel Viterbi Decoding for SWAPs
ACS Unit
Traceback Unit
Decoded bits
Detected bits
  • Add-Compare-Select (ACS) trellis interconnect
    computations
  • Parallelism depends on constraint length
    (states)
  • Traceback searching
  • Conventional
  • Sequential (No DP) with dynamic branching
  • Difficult to implement in parallel architecture
  • Use Register Exchange (RE)
  • parallel solution

20
Parallel Viterbi needs re-ordering for SWAPs
  • Exploiting Viterbi DP in SWAPs
  • Use RE instead of regular traceback
  • Re-order ACS, RE

21
Talk Outline
  • Research vision
  • SWAP Background
  • Algorithm design for SWAPs
  • Architecture design for SWAPs
  • Current and Future Research Goals

22
SWAP architecture design
  • More clusters better than more ALUs/per cluster
    (if clusters gt 2)
  • Decide how many clusters
  • Exploit DP
  • Decide what to put within each cluster
  • Maximize ILP with high functional unit efficiency
  • Search design space with explore tool
  • Time-power-area characterization






?
?
?
?
ILP








DP
23
Design a SWAP cluster Explore
Auto-exploration of adders and multipliers for
ACS"
(Adder util, Multiplier util)
24
Explore tool benefits
  • Instruction count vs. ALU efficiency
  • What goes inside each cluster
  • Design customized application-specific units
  • Better performance with increased ALU utilization
  • Explore multiple algorithms
  • turn off functional units not in use for given
    kernel
  • Vdd-gating, clock gating techniques

25
Example for SWAP architecture design
DP
  • Explore Algorithm 1 3 adders, 3 multipliers, 32
    clusters
  • Explore Algorithm 2 4 adders, 1 multiplier, 64
    clusters
  • Explore Algorithm 3 2 adders, 2 multipliers, 64
    clusters
  • Explore Algorithm 4 2 adders, 2 multipliers, 16
    clusters
  • Chosen Architecture 4 adders, 3 multipliers, 64
    clusters

ILP
26
SWAP flexibility provides power savings
  • Multiple algorithms
  • Different ALU, cluster requirements
  • Turning off ALUs ( add mul compiler options)
  • Use the right ALUs from explore tool
  • Turning off clusters
  • Data across SRF of all clusters
  • Cluster only has access to its own SRF
  • Next kernel may need data from SRF of other
    clusters
  • Reconfiguration support needs to be provided

27
SWAPs provide cluster reconfiguration
SRF
Mux-Demux Network With Stream buffers
Clusters
Additional latency (few cycles) due to
microcontroller stalls - Minimal loss in
performance
28
Cluster reconfiguration for Viterbi
DP
Can be turned OFF
Packet 1 Constraint length 7 (16 clusters)
Packet 2 Constraint length 9 (64 clusters)
Packet 3 Constraint length 5 (4 clusters)
29
Execution Time (cycles)
SWAPs provide flexibility at negligible overhead
Clusters
Memory
64-bit Rate ½ Packet 1 K 7
Kernels (Computation)
No Data Memory accesses
Packet 2 K 9
Packet 3 K 5
30
SWAP exploration for Viterbi decoding
1000
K 9
K 7
Different SWAPs (Without reconfiguration)
DSP
K 5
Same SWAP (With reconfiguration)
100
Frequency needed to attain real-time (in MHz)
10
Max DP
1
1
10
100
Number of clusters
Ideal C64x (w/o co-proc) needs 200 MHz for
real-time
31
SWAPs Salient features
  • 1-2 orders of magnitude better than a DSP
  • Any constraint length ? 10 MHz at 128 Kbps
  • Same code for all constraint lengths
  • no need to re-compile or load another code
  • as long as parallelism/cluster ratio is constant
  • Power savings due to dynamic cluster scaling

32
Expected SWAP power consumption
  • Power model based on Khailany03
  • 64 clusters and 1 multiplier per cluster
  • 0.13 micron, 1.2 V
  • Peak Active Power 9 mW at 1 MHz (DSP 1 mW)
  • Area 53.7 mm2
  • 10 MHz, 128 Kbps with reconfiguration

DSP, K 9
1
200 mW
Exploring the VLSI Scalability of Stream
Processors, Brucek Khailany et al, Proceedings of
the Ninth Symposium on High Performance Computer
Architecture, February 8-12, 2003
33
Multiuser Estimation-DetectionDecoding
Real-time target 128 Kbps per user
Fading scenarios
Ideal C64x (w/o co-proc) needs 15 GHz for
real-time
34
Expected SWAP power base-station
  • 32 user base-station with 3 Xs per cluster and
    64 clusters
  • 0.13 micron, 1.2 V
  • Peak Active Power 18.19 mW for 1 MHz
    (increased X)
  • Area 93.4 mm2
  • Total Peak Base-station power consumption
  • 18.19 W at 1 GHz for 32 users at 128 Kbps/user

35
Talk Outline
  • Research vision
  • SWAP Background
  • Algorithm design for SWAPs
  • Architecture design for SWAPs
  • Current and Future Research Goals

36
Current research Flexibility vs. performance
  • SWAPs 128 Kbps at 10-100 mW for Viterbi
  • Borrow DP from ASICs!
  • suitable for base-stations
  • Flexibility more important than power
  • suitable for mobile devices
  • Power constraints tighter
  • can be customized for further power savings
  • Handset SWAPs (H-SWAPs)
  • Borrow Task pipelining from ASICs!
  • Application-specific units and specialized comm.
    network

37
Handset SWAPs H-SWAPs
  • Trade Data Parallelism for Task Pipelining

SWAPs (max. clusters and reconfigure)
38
Sample points in architecture exploration
Programmable solutions with increased
customization
DSPs (1 cluster)
SWAPs (multiple)
H-SWAPs (optimized for handsets)
ILP Subword DP Task Pipelining Custom ALUs
ILP Subword DP
ILP Subword
Performance, Power benefits (with decreasing
flexibility)
39
Future Efficient algorithms and mapping
Multiple antenna systems with 1-2
orders-of-magnitude higher complexity
40
Future research Architectures
  • Generalized and structured framework and tools
  • Joint algorithm-architecture exploration
  • Area-time-power-flexibility tradeoffs
  • Potential applications embedded systems
  • Image and Video processing
  • Cameras variety of compression algorithms
  • Biomedical applications
  • Hearing aids DSP running on body heat
  • Sensor networks
  • Compression of data before transmission

Quote Gene Frantz, TI Fellow
41
SWAPs Flexibility, Performance, Power
  • Need flexibility in future wireless devices
  • Algorithms and Architectures
  • Rapid Exploration for Scalable, Wireless
    Application-specific Processors
  • Structured approach with flexibility-performance
    trade-offs
  • SWAPs - flexibility, high performance and low
    power
  • Exploit data parallelism like ASICs
  • 1-2 orders better performance than DSPs
  • Turn off unused clusters and unused ALUs for low
    power
About PowerShow.com