Title: Using Multichannel DRAM Subsystems to Create Scalable Architecture for Video SOCs
1Using Multichannel DRAM Subsystems to Create
Scalable Architecture for Video SOCs
2Video SoCs Growing Fast in Complexity
- Video SoCs face growing complexity and need much
more memory bandwidth - More and more features
- Advanced trick mode, 2D/3D GFX, Security (DRM)
- HD is now the standard resolution
- Latest and greatest algorithms
- State of the art video compression standards
H.264, VC-1, AVS - Image quality improvements Multi-scaling, noise
reduction, alpha blending, multi-plane video
composing - Features and performance place heavy burden on
memory subsystems - Increasing software burden requires more platform
stability across architecture generations,
product lines and product derivatives
3Example of a Video SOC (current generation)
Basic software stack
OSD
Transport Demux
Video back-end
H.264 MP _at_ L3 decoder
Host CPU
INTERCONNECT
INTERCONNECT
INTERCONNECT
Memory Subsystem
Peripherals
1.5 GB/s 2 GB/s
4Example of a Video SOC (next generation)
dual HD stream decoding
Full software stack
H.264 HiP _at_ L4.1 decoder
2D/3D GFX
Transport Demux
Display processing
H.264 HiP _at_ L4.1 decoder
Host CPU
Audio DSP
INTERCONNECT
Video out
INTERCONNECT
INTERCONNECT
Memory Subsystem 2
Memory Subsystem 1
Peripherals
8 GB/s 11 GB/s
5Concurrency in Video SoCs
- Video SoCs process lots of data in parallel, but
communicate
Transport demux
Graphic Engines
Audio Decode
H.264 Decode
Video Out
6Concurrency in Video SoCs
DRAM
Transport demux
Graphic Engines
Audio Decode
H.264 Decode
Video Out
7Concurrency in Video SoCs
DRAM
Transport demux
Audio Decode
H.264 Decode
Video Out
8DRAM Evolution DRAM Burst Sizes
10
70
64 Bytes
60
8
DDR3
50
Optimal DDR3 burst size EXCEEDS32 Bytes
6
DDR2
40
DRAM Words (BL) or
DDR Width (Bytes)
Minimum DRAM Burst (Bytes)
30
4
DDR
20
2
10
8 Bytes
0
0
2003
2004
2005
2006
2007
2008
2009
DDR1 BL
DDR2 BL
DDR3 BL
DDR Width (Bytes)
DRAM Burst
DDR3 Transition Reduces DRAM Efficiency
9Multichannel Optimizes DRAM Efficiency
From Single toMultichannel
DDR2 DDR3 DDR3
Channels 1 1 2
Data Width (Bytes) 4 4 2
Effective BW 100 84 100
Re-gain lostefficiency
Source Customer (HDTV) System Dataflow
10Multichannel Is Not Easy!
- Major issues
- Load balancing
- Must balance memory traffic evenly among channels
- Maintaining throughput
- Multiple channels cause throughput/ordering
problems for pipelined memories - ? This means software and IP cores must manage
multiple memory regions and be multi-channel-aware
Address
2
Channels
2
Channels
4
Channels
Space
No Interleave
Interleaved
Interleaved
Application View
Region
1
Region
1
Region
1
Region
1
Hole
1
Hole
1
Hole
1
Hole
1
1
1
2
2
Ch
.
1
1
3
2
4
Region
2
Region
2
Region
2
Region
2
1
1
2
2
Ch
.
2
1
3
2
4
Region
3
Region
3
Region
3
Region
3
Hole
2
Hole
2
Hole
2
Hole
2
11Architecture Challenges
- Maximum memory efficiency and memory performance
can be achieved with symmetric and balanced
memory channels - Asymmetric and/or unbalanced channels often leads
to overdesign in order to achieve the performance
requirements - Slight architecture modifications require
rebalancing of channels - Software, address map, product specification
changes - Developing new applications means load
re-balancing - Time consuming and risky
- ? A shared/balanced memory resource avoids
overdesign
12Seamless Multichannel Transition
Application View
Physical Organization
13Seamless Multichannel Transition
Application View
Physical Organization
14Automatic Load Balancing with High Efficiency
Well Balanced Channels Delivers High Memory
Performance
Automatic load balancing achieved with Sonics IMT
152D Bursts, Address Tiling Multichannel
- Two-dimensional block bursts
- 2D transaction using a single read/write command
- Popular for HD video and graphics
162D Bursts, Address Tiling Multichannel
- Two-dimensional block bursts
- 2D transaction using a single read/write command
- Popular for HD video and graphics
- Address tiling
- Rearrange DRAMaddress organization toexploit 2D
locality - Avoids page misses
172D Bursts, Address Tiling Multichannel
- Two-dimensional block bursts
- 2D transaction using a single read/write command
- Popular for HD video and graphics
- Address tiling
- Rearrange DRAMaddress organization toexploit 2D
locality - Avoids page misses
- Channels dividebuffer into columns
- SonicsSX splits 2Dbursts that crosschannel edges
18Multichannel Interleaving
- Interleaving support requires splitting traffic
and delivery to the proper channel - Option 1 Splitting in memory scheduler/controller
- Creates performance bottleneck
- Hard to scale past two channels
- Option 2 Splitting in the Interconnect (Sonics
IMT approach) - Fully-distributed architecture enables
scalability - Network overlaps channel accesses to maximize
throughput - Optimized protocols eliminate reorder buffer area
- Isolating channels from IP cores makes it
transparent to software and other hardware - ? High Performance, Area Optimized and Scalable
19Ideal Solution to Memory Problem for Video SoCs
- Must be built on an architecture that provides
predictability, guarantees QoS, leverages
multithreading and explores concurrency - Automatic load balancing and channel management
to provide scalable memory performance - This approach works for any number of channels of
DRAM - Solution should be transparent to hardware and
software - Decoupling of IP cores and software from the
memory subsystem configuration - ? SonicsSX w/IMT MemMax memory scheduler
20Thank you!alex_at_sonicsinc.com