Survey of Digital Signal Processors - PowerPoint PPT Presentation

About This Presentation
Title:

Survey of Digital Signal Processors

Description:

Gene's Law will have it's challenges to hold the line! Digital Audio. MP3. Real Audio ... Buy. Now? Yes No. What's Driving Gene's Law? DSP Design Constraints ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 27
Provided by: ECS147
Category:

less

Transcript and Presenter's Notes

Title: Survey of Digital Signal Processors


1
Survey of Digital Signal Processors
  • Michael Warner
  • ECD VLSI Communication Systems

2
Agenda
  • Industry Trends
  • DSP Architecture
  • DSP Micro-Architecture
  • DSP Systems

3
Agenda
  • Industry Trends
  • DSP Architecture
  • DSP Micro-Architecture
  • DSP Systems

4
Moores Law Drives Processor Development
But what if energy-delay had to be reduced every
generation by an order of magnitude?
Doubling the number of transistors every 18-24 at
same price point drives significant product
opportunities especially if you have little
regard for power
5
Genes Law DrivesDSP Development
Genes Law will have its challenges to hold the
line!
6
Whats Driving Genes Law?
7
DSP Design Constraints
DEVICE CAPABILITIES
8
Agenda
  • Industry Trends
  • DSP Architecture
  • DSP Micro-Architecture
  • DSP Systems

9
What Makes a DSP a DSP?
  • Single-Cycle MAC
  • Multiple Execution Units
  • High Bandwidth (Flat) Memory Sub-Systems
  • Efficient Zero-Overhead Looping
  • Short Pipeline
  • High Bandwidth I/O
  • Specialized Instruction Sets
  • Sophisticated DMA
  • Little to No Speculation

10
Single Cycle MAC
  • MACs Typically Determine DSP Performance and
    Pipeline Length (EX)
  • Most DSPs Have 2-8 MAC Units
  • MACs Typically Operate in Both a Scalar and
    Vector Mode

11
Multiple Instruction Units
  • VLIW Architectures Driving ILP
  • Typically Instruction Units
  • M-Unit - MAC
  • S-Unit - Shift
  • L-Unit - ALU
  • D-Unit Load/Store
  • Industry Has Converged on a ILP of 8

Registers B0 - B15
Registers A0 - A15
2X
1X
D2
M1
D1
L 1
S1
M2
L2
S2
D
S1
S2
D
S1
S2
D
S1
S2
S1
S2
DL
SL
SL
D
DL
S2
S1
D
S2
D
DL
SL
SL
D
DL
S2
S1
S1
S2
D
S1
DDATA_I2 (load data)
DDATA_I1 (load data)
12
High Bandwidth Memory Sub-Systems
  • Multiple Load-Store Units Required to Feed Data
    Path
  • Tightly Coupled Memory is Typically Dual Ported
  • Harvard Architecture is Heavily Banked

PC
CNTL
ARs
P
MUXES
D
MUX
INTERNAL MEMORY
EXTERNAL MEMORY
C
E
CentralArithmeticLogic Unit
MAC
ALU
SHIFTER
B
A
13
Specialized Instruction Sets
  • Base RISC ISA Plus CISC ISA Driven by End
    Application
  • MAC
  • SAD
  • LMS
  • FIRS
  • Viterbi
  • Support For Both Scalar and Vector Instructions
  • Support For 8, 16 and 32-Bit Instructions
  • Instructions are Highly Orthogonal

14
Scalar (55x) vs VLIW (64x)
  • Scalar DSPs Tend to be More CISC Like
  • Hurts Compiler Performance
  • Improves Energy-Delay
  • Improves Code Density
  • Limits Top End Performance
  • VLIW DSPs Tend to be More RISC Like
  • RISC GP Regs Orthogonality Makes For a Good C
    Compiler
  • Assembler Code Is Challenging
  • RISC ISA Allows for Higher Frequencies
  • Load-Store Hurts Energy-Delay

15
TMS320C54x
16
TMS320C54x Protected Pipeline
CYCLES
P1
X6
Prefetch Calculate address of instruction
Fetch Collect instruction Decode Interpret
instruction Access Collect address of
operand Read Collect operand Execute Perform
operation
Fully loaded pipeline
Note Protected Pipeline Limits
Micro-Architectural Flexibility and Performance
17
TMS320C6xx
C6xx CPU Core
Program Fetch
Control Registers
Instruction Dispatch
Instruction Decode
Control Logic
Data Path 1
Data Path 2
A Register File
B Register File
Test
Emulation
D1
M1
S1
L1
L2
S2
M2
D2
Interrupts
ArithmeticLogicUnit
Auxiliary LogicUnit
MultiplierUnit
18
TMS320C6xx Exposed Pipeline
Fetch
Decode
Execute
PG
PS
PW
PR
DP
DC
E1
E2
E3
E4
E5
  • Fetch
  • PG Program Address Generate
  • PS Program Address Send
  • PW Program Access Ready Wait
  • PR Program Fetch Packet Receive
  • Decode
  • DP Instruction Dispatch
  • DC Instruction Decode
  • Execute
  • E1 - E5 Execute 1 through Execute 5

Execute Packet 1
PG
PS
PW
PR
DP
DC
E1
E2
E3
E4
E5
Execute Packet 2
PG
PS
PW
PR
DP
DC
E1
E2
E3
E4
E5
Execute Packet 3
PG
PS
PW
PR
DP
DC
E1
E2
E3
E4
E5
Execute Packet 4
PG
PS
PW
PR
DP
DC
E1
E2
E3
E4
E5
Execute Packet 5
PG
PS
PW
PR
DP
DC
E1
E2
E3
E4
E5
Execute Packet 6
PG
PS
PW
PR
DP
DC
E1
E2
E3
E4
E5
Execute Packet 7
PG
PS
PW
PR
DP
DC
E1
E2
E3
E4
E5
Note Exposed Pipeline Adds Risk to Programming
Model
19
Agenda
  • Industry Trends
  • DSP Architecture
  • DSP Micro-Architecture
  • DSP Systems

20
Micro-Architectural Challenges
  • Accessing (Flat) On Chip Memory At Speed Within
    2-3 cycles
  • Feeding Multiple Functional Units From a Single
    Register File
  • Running 600Mhz with a 7-9 Stage Pipeline
  • Linking Multiple Functional Units with Result
    Forwarding
  • Implementing CISC Data-path to Meet Area and
    Performance Goals
  • Achieving ARM Like Code Density

21
What Does and Doesnt Work?
  • Do
  • Banked Memory
  • Dual Access Memory
  • Full Custom Register Files
  • Split/Multiple Register Files
  • Custom/Semi-Custom Data-paths
  • Variable Length Instructions
  • CISC ISA
  • Co-Processors
  • Multi-Core
  • Dont
  • Multi-Level Caches
  • Super-Scalar
  • VLIW Packet Descriptors
  • Speculative Branching
  • Full Synthesis
  • Dynamic Logic
  • Consider
  • Multi-Threading

22
Agenda
  • Industry Trends
  • DSP Architecture
  • DSP Micro-Architecture
  • DSP Systems

23
DSP Systems
24
VIOP Platform
  • TNETV3010 Features
  • 6 C55x DSP _at_ 300 MHz
  • Shared Instruction Memory
  • Broadcast DMA
  • 24M Bits of On Chip SRAM

25
DaVinci Platform
26
OMAP Platform
  • OMAP2420 Features
  • ARM 1136 _at_ 330 MHz, VFP (Vector Floating Point),
    32K/32K I/Dcache
  • DSP _at_ 220 MHz
  • 2D/3D graphics accelerator
  • IVA supports still images to gt4 Mpixels, 30 fps
    VGA video decode
  • Output to TV for gaming and video playback
  • Encryption hardware for DRM and security

Imaging VideoAccelerator(IVA)
2D/3DGraphics Accelerator
ARM11 VFP
TMS320C55x DSP
L3 Interconnect
LCD I/FVideoOut
Camera I/F
MemoryController
Internal SRAM
Peripherals
L4 Interconnect
Security
OMAP2420
Write a Comment
User Comments (0)
About PowerShow.com