Title: Implementation Issues of the Multiuser Detection in CDMA Communication Systems
1Implementation Issues of the Multiuser
Detectionin CDMA Communication Systems
- Gang Xu
- April 28, 1999
- Electrical and Computer Engineering Department,
- Rice University, Houston,TX.
2Outline
- Introduction
- Differencing multistage detection
- DSP implementation
- ASIC implementation
- Summary and future work
3Wireless Communication Systems
Network switching
Base band signal processing
Base band signal processing
RF
RF
base station
base station
4CDMA Technology
- Code-Division Multiple Access
- each user occupies all the time/frequency
resource - users are distinguished by their unique codes
- Benefits
- capacity increase
- high quality in burst interference and multipath
fading wireless link
burst interference
Code 4
Code 3
Code 2
Code 1
Frequency
Time
5CDMA Wireless Uplink
mobile user
K users
Users data bits
SOURCE CODER
CHANNEL CODER
SPREADING
MODULATOR
TRANSMITTER
Noise
Detected bits of K users
DEMODULATOR
DETECTOR
DECODER
CHANNEL ESTIMATOR
base station
RECEIVER
6Challenges
- Multiple access interference
- High algorithmic complexity
- Real time implementations
- Cost, speed and power consumption trade-off
7Contributions of this Thesis
- Algorithmic complexity reduction
- differencing multistage detection
- fixed-point analysis
- DSP implementation
- real-time processing
- multi-step optimizations
- ASIC implementation
- application specific design
- hardware optimizations and scalable design
8CDMA Uplink Signal Model
- Received signal
- K number of users N number of bits
- received amplitude delay n(t)
AWGN - dk transmitted bits (1) sk spreading code
- Modeling assumptions
- direct sequence
- asynchronous multiuser, multi-bit
- AWGN
9Code Matched Filter
- Signal matched filter - integrate and dump at T
- Cross-correlation
10Matrix Notation
- Signal model after the matched filter
- where y matched filter output R
cross-correlation (NKxNK) - d transmitted bits A
amplitude matrix (NKxNK)
11Multistage Detection Varanasi-Aazhang 90
- Multistage detection
- parallel interference cancellation scheme
- iterative method
- non-linear algorithm
- Algorithm
12Differencing Multistage Detector
- Observations
- characteristic of the iterative
methodconvergence - BPSK modulation
- Features
- number of computations decreases stage after
stage. - near-far resistance
- repeat with the same structure
- no multiplication operations
13Data Flow of Differencing Method
1 1 -1 1 -1 -1 1 -1
1 -1 -1 1 1 -1 1 -1
1 -1 -1 1 1 -1 1 -1
? hard decision ? updated hard decision ??
differencing vector ? updated hard decision
(stage 2)
?
?
Stage 1
Stage 2
0 -2 0 0 2 0 0 0
?
?
Differencing Stage 2
?
?
14Comparison of Algorithms
Conventional multistage detector
Differencing multistage detector
15Flops Count
UsersK15 SNR6dB
4
x 10
14
12
Conventional Method
Differencing Method
10
conventional
8
Number of Flops
differencing
6
4
2
0
1
2
3
4
5
6
7
8
Total Number of Iterations
2X speedup for a three-stage detector
16Convergence Pattern
Percentage of NOT converged bits
Percentage of NOT converged bits
100
80
60
40
20
10
0
0
8
2
6
4
Number of stages
SNR(dB)
4
SNR(dB)
6
Number of stages
2
8
0
10
10 Users MAI10dB
20 Users MAI10dB
SNR ?, converge faster Users ?, converge faster
17Joint Synchronization and Detection
Performance of the joint synchronization and
detection (MAI0dB)
MAI 0dB 10 Users
matched filter
-1
10
Bit Error Rate
-2
10
Matched filter
Joint Syn. Decorrelating
Joint Syn. Multistage
single user bound
Single User Bound
0
1
2
3
4
5
6
E
/N
(dB)
b
0
18Implementations
- Methods
- software implementation DSP - TI TMS320C6201
- hardware implementation ASIC - MOSIS chip
- Tools
- DSP C/Assembly benchmark tools
- Magic layout tool
- Goals
- effective multiuser interference cancellation
- real-time performance
- cost-effective design fixed-point design
19Dynamic Range Analysis (Matched Filter Output)
No.of users increase
SNR increase
Bits vs. users and SNR
Bits vs. users and MAI
10 bits/user for 20 users, MAI12dB, SNR6dB
20Software Optimization Techniques
- Algorithmic level optimization
- properties of the cross-correlation matrix
- dot product representation
- with the knowledge of non-zero element address
row oriented scanning
column update
21Software Optimization Techniques (contd)
- Optimizations for C62
- in-line assembly code for dot product
- software pipelining
- intrinsic instructions
- approximate 2MAC (multiply accumulate) /cycle for
16-bit fixed-point or single-precision floating-
point data type - full usage of on-chip memory (14 users maximum)
- Result
- 12 asynchronous users 150kb/s/user
22Detector Performance on the C6x
SNR10dB WindowSize12
350
300
Conventional Method
Differencing Method
250
12users 150kb/s
200
MAX BIT RATE PER USER (kb/s)
150
100
50
8
9
10
11
12
13
14
NUMBER OF USERS
Real-time capability by C62 DSP
23Simulation Environment
- Hardware
- PC host
- TIs C62/C67 EVM
- Software
- TIs Code Composer
- DSP control, profiling and BER display
- Matlab
- Data collection, probability density display
- CDMA parameters (by MS-Visual Basic)
- Parameters setup
24Parameters setup window
Matlab probability density function (PDF) plots
Code Composer window
25ASIC Implementation
- MOSIS Tiny-Chip (40-pin DIP)
- 8 synchronous users
- 12-bit fixed point implementation
- 6000 transistors
- 1.2 ?m CMOS technology
- 190kb/s for each user (_at_12.5MHz)
- 3-stage cascade delay lt 15 ?s
26Advantages of ASICs
- Highly paralleled instructions
- 4 RISC IPC (instructions per cycle)
- accumulating while shifting, loading and storing
- recoding while loading
- Application specific architecture
- faster I/O
- smaller on chip memory
- smaller ALU
27Chip (Single Stage) Architecture
Internal signals External signals
28Chip Layout
2.0 mm
Soft Decisions
Recoding logic
Cross-Correlation
12-bit ALU
293-stage Cascade Mode
30System Timing
Final Output
Load R
1st Stage
2nd Stage
3rd Stage
31Scalable ASIC Design
Xilinx FPGA XC4000 500k gates, 96MHz
32DSP-ASIC Implementation Summary
- TIs C54xx General purpose DSP core ASIC
33Conclusion
- Development a differencing multistage detection
algorithm - 2X speedup in three stages
- Real-time implementation of multistage detector
by DSPs - multi-step optimizations
- 150kb/s/user for 12 users
- Real-time implementation of multistage detector
by ASICs - application specific design
- 190kb/s/user for 8 users
34Future Work
- Joint synchronization and detection on DSPs
- Multiple DSPs for further speedup
- Reduce the complexity of the first stage
- VHDL descriptions of the multistage detector and
FPGA/ASIC implementations
35Other issues
36Detector Performance
bit error rate for a multistage detector
37Optimization Methods Comparison
200
180
156.2
160
140
120
109.2
Achieved Data Rate (kb/s/user)
100
80
60
40
20.9
20
10.3
0
w/o opt.
global opt.
software pipelining
assembly opt.
Optimization Methods
Final optimization result 7 units/cycle
38Detector Architecture
K synchronous users differencing multistage
detector
39Real Time Statistical Result
PDF Plot MAI0dB SNR6dB
MAI12dB SNR6dB
40Interference Cancellation