Title: Weekly Group Meeting 20080710 Title: Programmable Baseband Processors (Chapters 5)
1Weekly Group Meeting20080710Title
Programmable Baseband Processors (Chapters 5)
2Radio System Overview
- A typical wireless communication system
3Digital Baseband Processor
- Transmitter performs
- Channel coding
- Modulation
- Symbol shaping
- sReceiver performs
- Filtering, synchronization, Gain control
- Demodulation, channel estimation, and
compensation - Forward error correction
4Baseband Processing Challenges
- Multipath propogation (Fading)
- Data transported trough air gets effected by the
surrounding environment - Multiple propagation paths
- Delayed multi-path signal components add at the
receiver - Some freq.s add constructively, other
destructively - Thus destroying the original signal
- Causing inter-symbol interference
5Baseband Processing Challenges
- Timing and Frequency Offset
- Different reference oscillators in the
transmitter and receiver - Causing slight discrepancy between the
transmitter and receiver - Carrier frequency
- Sample rate
- Uncorrected, limits the useful data rate of a
system
6Baseband Processing Challenges
- Mobility
- Fast fading rapid changes of the channel
- Doppler-spread further increases the frequency
offset
7Baseband Processing Challenges
- Noise and Burst Interference
- Signal degradation
- Bit errors
- FEC techniques are utilized to increase the
reliability of the wireless link - Popular FEC codes and algorithms are the Viterbi
algorithm used for the Convolutional codes, Turbo
codes, Reed-Solomon codes - Interleaving is used to even out bit error and
burst interference or frequency selective fading
8Baseband Processing Challenges
- Dynamic Range
- Fading and other surrounding equipment increase
dynamic range - A dynamic range of 60-100 dB is not uncommon
- Not practical to design systems with such large
dynamic range - Instead AGC circuit are used
9Baseband Processing Challenges
- Processing Latency
- Baseband processing strict hard real-time
procedure - Heavy peak work load for the processor during
computationally demanding tasks, such as - Channel decoding
- Channel estimation
- Gain control calculations
- Hardware must be able to handle peak work load
- Even though it occurs less than 1 percent of the
time
10Programmable Baseband Processors
- Traditionally fixed function hardware have been
used, since baseband processing is
computationally very heavy - Two disadvantages of fixed function hardware
- Low flexibility
- Short product lifetime
- Whereas programmable solutions need only software
update
11Programmable Baseband Processors
- Multimode systems
- A high end cellular telephone will support a
number of standards such as - GSM/GPRS, EDGE, UMTS, WLAN, WiMAX, UWB,
Blutetooth, GPS, DVB-H - One way is to integrate many separate baseband
processing modules - Its drawbacks
- Large silicon area
- Lack of hardware reuse
12Programmable Baseband Processors
- Dynamic MIPS allocation
- Redistribute the resources dynamically
- Focus on either mobility management or high data
rate - During severe fading, we run advanced channel
tracking and compensation algorithms for reliable
communication - In good channels, more resources can be allocated
to symbol processing for high throughput
13Programmable Baseband Processors
14Programmable Baseband Processors
- Hardware multiplexing through programmability
- Most wireless communication schemes use
multiplexing which can be divided into three
classes - OFDM, CDMA, single carrier modulation
- By carefully selecting the functional blocks the
hardware reuse between different standards can be
achieved
15Programmable BasebandProcessors
- Hardware multiplexing on
- LeoCore DSPs
16Spectrum of an OFDM (orthogonal frequency
division multiplexing) communication system
16 National Instruments, Orthogonal Frequency
Division Multiplexing available at www.ni.com,
2004.
17Cyclic Prefix Insertion
16 National Instruments, Orthogonal Frequency
Division Multiplexing available at www.ni.com,
2004.
18OFDM symbol time structure showing insertion of
Cyclic Prefix
4 WiMAX Forum, Mobile WiMAX Part 1 A
Technical Overview and Performance Evaluation,
2006.
19OFDMProcessing Flow
20Job overview
- FFT Computation complexity for different
communication standards
21Hardware Considerations for Programmable OFDM
Processing
- Flexibility
- Multiple FFT sizes must be supported for
different standards - As a bonus, other transforms such as cosine and
Walsh transforms can also be supported - Hardware Reuse
- In many cases, it results into a smaller total
silicon area than a corresponding fixed function
solution
22Hardware Considerations for Programmable OFDM
Processing
23Code Division Multiple Access (CDMA)
- Concurrent transmission in the same spectrum
using orthogonal spreading codes - In a CDMA transmitter
- A binary data is mapped onto complex valued
symbols which are then multiplicated (spread)
with a code from a set of orthogonal set of codes - Length of the code is called the spreading factor
24Code Division Multiple Access (CDMA)
- In a CDMA Receiver
- Data is recovered by calculating a dot product
(de-spread) between the received data and the
assigned code - Dot product will be zero for all other codes
since the spreading codes are selected from a set
of orthogonal codes except the assigned code. - WCDMA
- Can scale the bandwidth of a user by assigning
multiple spreading codes to that user
25Job Overview
- Signal processing in WCDMA and HSDPA can be
divided into - Chip-rate processing
- Symbol-rate processing
- Chip is one complex element of the spreading code
- Synchronization, channel estimation, channel
equalization are performed in chip-rate - Additional channel equalization is performed in
symbol-rate
26Job Overview
- Synchronization
- Responsible of finding the start of the data
frame and identifying the base station parameters - This is done by correlating the received data
with 256 chips long synchronization code - The chip rate of WCDMA/HSDPA is 3.84 MChips/s
- Main operation in the step is complex
multiplication and accumulation (complex dot
product)
27Job Overview
- Channel Equalization
- Two step procedure in WCDMA
- Strongest multi-path components are identified
(using the data from the synchronizer) - Components are aligned in time and added
constructively (using max. ratio combining). This
is known as a Rake receiver. - In HSDPA, (which uses up to 16 QAM), additional
equalization is necessary - The resulting complex-valued symbols (after
de-spread) is equalized by a second linear
equalizer - It uses training symbols inserted in the middle
of the data slot (mid-amble)
28Job Overview
29Hardware considerations for a WCDMA Processor
- All chip and symbol related operations are
performed on complex valued data (true for OFDM
also) - programmable baseband processor needs to do
complex efficient computing - Fairly short symbols with high data rate
- Min loop overhead
- Which means wider execution units for processing
efficiency - In WCDMA, HSDPA, and other CDMA systems
- Complex spreading codes have constant envelop
- Therefore, de-spread operation can be performed
in the complex ALU instead of entirely in a
complex MAC unit. - Addressing support for Rake-addressing
- Implemented as function level accelerators in
memory blocks
30Multi-standard Processor Design
- A processor architecture suitable for OFDM, CDMA,
and single carrier based standard is as follows - Requirements for such a processor
- Efficient instruction set suited for baseband
processing. Use of both natively complex
computing and integer computing. - Efficient hardware reuse through instruction
level acceleration. - Wide execution units to increase processing
parallelism. - High memory bandwidth to support parallel
execution. - Low overhead in processing
- Balance between configurable accelerators and
execution units.
31Complex Computing
- Very large part of processing (FFTs,
frequency/timing offset estimation,
synchronization, and channel estimation) employ
convolution based functions. - Such operations can be performed efficiently in
DSPs using CMAC unit, optimized memory, bus
architecture, and addressing modes. - In baseband processing all operations are
complex-valued - Therefore, complex computing should be supported
throughout the architecture
32Complex Computing
33LeoCore Processor Architecture
- Two main parts
- Natively complex part which operates on vectors
of complex numbers - Natively integer part which operates on integers
and single bits
34LeoCore Processor Architecture
- Two main parts purpose
- Complex part is used to extract soft data symbols
that can be de-mapped into bits - Integer part is used for FEC and bit manipulation
35LeoCore Processor Architecture
- Execution units
- To do complex tasks in an efficient manner
- DSP controller core, multi-lane complex MAC, ALU
SIMD data-paths - Execution units range from a CMAC units capable
of executing a radix-4 FFT butterflies in one
clock cycle, to complex ALUs used by CDMA based
standards
36LeoCore Processor Architecture
- Memory subsystems
- Memory is connected through on-chip network
- The on-chip network allows any memory to be
connected to any execution unit - Amount of memory needed is small but the required
memory bandwidth is very large (several hundred M
sample/s) - Whereas each sample consists of two parts (real
and imaginary)
37LeoCore Processor Architecture
- It uses vector instructions
- E.g., a single instruction that triggers a
complete vector operation such as a complex 128
sample dot-product - This means that the execution unit must be able
to process large data chunks without any
intervention from the processor core - Which in turn means execution unit and memory
subsystem to have - Automatic address generation
- Efficient load/store subsystems
- Therefore, the base architecture utilizes
de-centralized memories, memory addressing
together with vector execution units
38LeoCore Processor Architecture
- HW Acceleration
- Accelerators are also attached to the network
- To improve efficiency function level accelerators
could be used - Function level accelerator is a configurable
piece of hardware which performs a specific task
without support from the processor core. - How to decide which functions to accelerate?
Consider following - MIPS cost, Reuse, Circuit area (considerable
reduction of clock frequency and power)
39LeoCore Processor Architecture
- Typical Accelerators
- Front-end acceleration (filtering/decimation)
- Re-sampling
- Rotor (an NCO and a complex multiplier)
- Packet Detector
- Shaping filter
- Forward Error Correction (FEC)
40Conclusion
- Multi-standard baseband processing can be
implemented in a programmable hardware - Its main features should be
- Support for complex valued computing
- Instruction level acceleration of FFT,
convolution and similar kernel functions - Small total memory but with optimized
architecture meeting - High bandwidth and real-time requirements
- Function level accelerators for channel coding,
and general tasks close to ADC/DAC interface
41Reference
- M. Ismail, D. Gonzalez Radion Design in
Nanometer Technologies 2006 Springer
42Thank You.