Using Multichannel DRAM Subsystems to Create Scalable Architecture for Video SOCs PowerPoint PPT Presentation

presentation player overlay
1 / 20
About This Presentation
Transcript and Presenter's Notes

Title: Using Multichannel DRAM Subsystems to Create Scalable Architecture for Video SOCs


1
Using Multichannel DRAM Subsystems to Create
Scalable Architecture for Video SOCs
  • Alex Chao
  • March 18, 2009

2
Video SoCs Growing Fast in Complexity
  • Video SoCs face growing complexity and need much
    more memory bandwidth
  • More and more features
  • Advanced trick mode, 2D/3D GFX, Security (DRM)
  • HD is now the standard resolution
  • Latest and greatest algorithms
  • State of the art video compression standards
    H.264, VC-1, AVS
  • Image quality improvements Multi-scaling, noise
    reduction, alpha blending, multi-plane video
    composing
  • Features and performance place heavy burden on
    memory subsystems
  • Increasing software burden requires more platform
    stability across architecture generations,
    product lines and product derivatives

3
Example of a Video SOC (current generation)
Basic software stack
OSD
Transport Demux
Video back-end
H.264 MP _at_ L3 decoder
Host CPU
INTERCONNECT
INTERCONNECT
INTERCONNECT
Memory Subsystem
Peripherals
1.5 GB/s 2 GB/s
4
Example of a Video SOC (next generation)
dual HD stream decoding
Full software stack
H.264 HiP _at_ L4.1 decoder
2D/3D GFX
Transport Demux
Display processing
H.264 HiP _at_ L4.1 decoder
Host CPU
Audio DSP
INTERCONNECT
Video out
INTERCONNECT
INTERCONNECT
Memory Subsystem 2
Memory Subsystem 1
Peripherals
8 GB/s 11 GB/s
5
Concurrency in Video SoCs
  • Video SoCs process lots of data in parallel, but
    communicate

Transport demux
Graphic Engines
Audio Decode
H.264 Decode
Video Out
6
Concurrency in Video SoCs
DRAM
Transport demux
Graphic Engines
Audio Decode
H.264 Decode
Video Out
7
Concurrency in Video SoCs
DRAM
Transport demux
Audio Decode
H.264 Decode
Video Out
8
DRAM Evolution DRAM Burst Sizes
10
70
64 Bytes
60
8
DDR3
50
Optimal DDR3 burst size EXCEEDS32 Bytes
6
DDR2
40
DRAM Words (BL) or
DDR Width (Bytes)
Minimum DRAM Burst (Bytes)
30
4
DDR
20
2
10
8 Bytes
0
0
2003
2004
2005
2006
2007
2008
2009
DDR1 BL
DDR2 BL
DDR3 BL
DDR Width (Bytes)
DRAM Burst
DDR3 Transition Reduces DRAM Efficiency
9
Multichannel Optimizes DRAM Efficiency
From Single toMultichannel
DDR2 DDR3 DDR3
Channels 1 1 2
Data Width (Bytes) 4 4 2
Effective BW 100 84 100
Re-gain lostefficiency
Source Customer (HDTV) System Dataflow
10
Multichannel Is Not Easy!
  • Major issues
  • Load balancing
  • Must balance memory traffic evenly among channels
  • Maintaining throughput
  • Multiple channels cause throughput/ordering
    problems for pipelined memories
  • ? This means software and IP cores must manage
    multiple memory regions and be multi-channel-aware

Address
2
Channels
2
Channels
4
Channels
Space
No Interleave
Interleaved
Interleaved
Application View
Region
1
Region
1
Region
1
Region
1
Hole
1
Hole
1
Hole
1
Hole
1
1
1
2
2
Ch
.
1
1
3
2
4
Region
2
Region
2
Region
2
Region
2
1
1
2
2
Ch
.
2
1
3
2
4
Region
3
Region
3
Region
3
Region
3
Hole
2
Hole
2
Hole
2
Hole
2
11
Architecture Challenges
  • Maximum memory efficiency and memory performance
    can be achieved with symmetric and balanced
    memory channels
  • Asymmetric and/or unbalanced channels often leads
    to overdesign in order to achieve the performance
    requirements
  • Slight architecture modifications require
    rebalancing of channels
  • Software, address map, product specification
    changes
  • Developing new applications means load
    re-balancing
  • Time consuming and risky
  • ? A shared/balanced memory resource avoids
    overdesign

12
Seamless Multichannel Transition
Application View
Physical Organization
13
Seamless Multichannel Transition
Application View
Physical Organization
14
Automatic Load Balancing with High Efficiency
Well Balanced Channels Delivers High Memory
Performance
Automatic load balancing achieved with Sonics IMT
15
2D Bursts, Address Tiling Multichannel
  • Two-dimensional block bursts
  • 2D transaction using a single read/write command
  • Popular for HD video and graphics

16
2D Bursts, Address Tiling Multichannel
  • Two-dimensional block bursts
  • 2D transaction using a single read/write command
  • Popular for HD video and graphics
  • Address tiling
  • Rearrange DRAMaddress organization toexploit 2D
    locality
  • Avoids page misses

17
2D Bursts, Address Tiling Multichannel
  • Two-dimensional block bursts
  • 2D transaction using a single read/write command
  • Popular for HD video and graphics
  • Address tiling
  • Rearrange DRAMaddress organization toexploit 2D
    locality
  • Avoids page misses
  • Channels dividebuffer into columns
  • SonicsSX splits 2Dbursts that crosschannel edges

18
Multichannel Interleaving
  • Interleaving support requires splitting traffic
    and delivery to the proper channel
  • Option 1 Splitting in memory scheduler/controller
  • Creates performance bottleneck
  • Hard to scale past two channels
  • Option 2 Splitting in the Interconnect (Sonics
    IMT approach)
  • Fully-distributed architecture enables
    scalability
  • Network overlaps channel accesses to maximize
    throughput
  • Optimized protocols eliminate reorder buffer area
  • Isolating channels from IP cores makes it
    transparent to software and other hardware
  • ? High Performance, Area Optimized and Scalable

19
Ideal Solution to Memory Problem for Video SoCs
  • Must be built on an architecture that provides
    predictability, guarantees QoS, leverages
    multithreading and explores concurrency
  • Automatic load balancing and channel management
    to provide scalable memory performance
  • This approach works for any number of channels of
    DRAM
  • Solution should be transparent to hardware and
    software
  • Decoupling of IP cores and software from the
    memory subsystem configuration
  • ? SonicsSX w/IMT MemMax memory scheduler

20
Thank you!alex_at_sonicsinc.com
Write a Comment
User Comments (0)
About PowerShow.com