Motivation - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

Motivation

Description:

Ruchira Sasanka, Manlap Li, Sarita Adve (Univ. of Illinois) ... Exhibits different forms of DLP - sub-word, vectors, streams. Existing Vector/Stream Processors ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 2
Provided by: danie295
Category:

less

Transcript and Presenter's Notes

Title: Motivation


1
ALP Energy Efficient Support for All Levels of
Parallelism for Complex Media Applications
Ruchira Sasanka, Manlap Li, Sarita Adve (Univ.
of Illinois), Yen-Kuang Chen, Eric Debes (Intel)
Motivation
Results
  • Challenges of Complex Media Apps
  • Real-time Performance
  • Energy Efficiency
  • Programmability
  • Nature of DLP in Complex Media Apps
  • DLP interspersed with control
  • Exhibits different forms of DLP
  • - sub-word, vectors, streams
  • Existing Vector/Stream Processors
  • Targeted for large amounts of DLP
  • Not ideal for code with control
  • New programming paradigms
  • Cost of new ISA, vector registers, BW
  • Forward/Backward compatibility
  • Opportunities
  • Lots of parallelism (DLP/TLP/ILP)
  • Existing Support on General Purpose Procs
  • - ILP/TLP CMP/SMT processors
  • - DLP SIMD (e.g., MMX, AltiVec)
  • Already multi-core (and SIMD multi-lane)

MPGenc MPGdec RayTrace SpeechRec
FaceRec
ALP (All Levels of Parallelism)
  • ALP
  • Based on CMP/SMT processors with SIMD
  • Uses Indexed Vectors (vectors of SIMD records)
  • Only a handful of new instructions
  • - only vector loads use vector instructions
  • Vector data stored in L1 cache
  • Supports both vectors and streams
  • Familiar SIMD programming exception model
  • Indexed Vectors
  • Indexed Vector Registers (IVR) e.g., V0, V1
  • Each IVR has a Current Record Pointer (CRP)
  • An instruction can access only current record
  • CRPs auto-incremented on use
  • Computation using SIMD instructions/registers
  • CRPs allow scalar processing on vector data

MPGenc MPGdec
RayTrace SpeechRec FaceRec
1T, 4T, 4x2T 1 thread, 4 thread (CMP) and 8
thread (CMP/SMT) S with SIMD SV with
indexed vectors (ALP is 4x2TSV)
ALP over 1T Energy savings 1.5X-15X, EDP savings
7.3X-873X, and Speedups 5X-58X.
Record 0
V0 (IVR)
Record 1
Sub-word 3
  • Benefits Over SIMD
  • Reduced load/store and overhead instructions
  • Increased exposed parallelism
  • Load latency tolerance and efficient use of L1
  • Energy efficient IVR access (cf. cache accesses)

Record N
Packed Word 0 (Contiguous in memory)
Packed Word 1 (Contiguous in memory)
1
CRP for V0 (Currently Points to Record1)
Programming Example V2 k (V1V2)-16
(A) VLD addrstridelength ? V0
(B) VLD addrstridelength ? V1
(C) VADD V0, V1 ? V3 (D)
VMUL V3, reg1 ? V4 (E) VSUB V4, 16
? V2 (F) VSTORE addrstridelength V2
Conventional Vector code
(1) VLD addrstridelength ? V0 (2)
VLD addrstridelength ? V1 (3)
VALLOCst addrstridelength ? V2
do for all records in vector (4) simd_add V0, V1
? simd_r0 (5) simd_mul simd_r0, simd_r1 ?
simd_r2 (6) simd_sub simd_r2, 16 ? V2 Indexed
Vector Code
Benefits/Drawbacks Over Vectors Few new
instructions Easily handles control intensive
code w/o masks Supports streams and while
loops Flexible scheduling and scalar exception
model Can be scaled back (e.g., for legacy
support) - More dynamic instructions
Write a Comment
User Comments (0)
About PowerShow.com