Asynchronous SIMD - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Asynchronous SIMD

Description:

Throttles issue rate. Inter-chip communication. Requires high percentage of pin bandwidth ... Opens throttle on issue rate. Computer. Science. Asynchronous SIMD ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 17
Provided by: scie53
Category:

less

Transcript and Presenter's Notes

Title: Asynchronous SIMD


1
Asynchronous SIMD
  • Charles C. Weems
  • Computer Science Department
  • University of Massachusetts
  • Amherst, MA 01003-4610
  • weems_at_cs.umass.edu

2
SIMD is Attractive
  • Simple programming model
  • No multitasking or race conditions
  • Delivers high performance
  • Unbeatable for dense fine-grained computation
  • Scalable
  • Simple processors, easy to gang up into millions
  • So why has it failed?

3
Scales Up But Not Down
Cost Issues
  • Need many processor chips in a system
  • Individual chips give no boost in performance
  • Designed to work in groups of more than 16
  • Need to amortize cost of central controller
  • Results in high entry-level system cost
  • No viable mass market
  • Cant recover development cost or follow
    technology curve

4
Synchronous
Performance Issues
  • Must control clock skew across many chips
  • Limits clock rate
  • Central control performs 20 ops to deliver each
    instruction
  • Throttles issue rate
  • Inter-chip communication
  • Requires high percentage of pin bandwidth
  • Intolerant of latency

5
Inefficient
Cost/Performance Issues
  • Never large enough
  • Virtualization adds unacceptable overhead
  • Rarely the right size
  • Usually waste a high percentage of processors
  • Poor memory bandwidth utilization
  • Bandwidth limited at low issue rates
  • Inactive processors still consume bandwidth

6
Inflexible
Usability Issues
  • One monolithic resource
  • High context-switch cost
  • Hard to partition, hard to share
  • Custom interface via dedicated host
  • Not integral to user system
  • I/O limited (except for dedicated devices)
  • Proprietary programming languages

7
Alternative Approach
  • Design a SIMD processor to scale down
  • Accept the implied high virtualization factor
  • Treat it as an opportunity
  • Internal virtual processor tiles provide
  • High communication latency tolerance and
  • Lower bandwidth requirement
  • Predictable access pattern in data memory
  • Can devote more bandwidth to memory
  • High degree of I-cache temporal locality

8
Embracing Virtualization
  • More virtualization better utilization
  • Oversize problems welcome!
  • Fewer unused processors
  • Asynchronous communication
  • No need to synchronize clocks
  • On-chip controller
  • Avoids costly separate device
  • Opens throttle on issue rate

9
Asynchronous SIMD
  • High degree of fine-grained parallelism
  • High communication latency tolerance
  • Simple programming model like SIMD
  • Flexible support for multitasking environments
  • High clock rate (microprocessor speeds)
  • High single-chip performance with scalability

10
Exploiting Virtualization
Receive from Neighbor
Send to Neighbor
Process in send-to-receive order across internal
virtualized array tiles
11
Implications
  • Most access is to local memory
  • Predictable well in advance regular streams
  • Instruction blocks repeated many times
  • Can be cached predecoded
  • Can be prebroadcast by host
  • Communication is asynchronous
  • Low bandwidth requirement, latency tolerant
  • Can use existing standard mechanisms

12
More Implications
  • No global clock -- high local clock rate
  • Powerful single chip -- e.g., PC accelerator
  • Scales up to many chips -- supercomputing
  • As long as virtualization ratio remains high
  • Multichip system can be partitioned
  • Allocable system resource, multiple hosts
  • Low context switch cost for time sharing

13
Example Configuration
Node
SMP
SMP
SMP
SMP
Node
ASIMD
ASIMD
ASIMD
ASIMD
Communication Fabric
Node
SMP
SMP
SMP
SMP
ASIMD
Key Shading indicates allocation of
ASIMD resources Color of type indicates node model
ASIMD
ASIMD
Node
ASIMD
  • Heterogeneity

ASIMD
14
Chip Architecture
Memory Management
Decode Expand Dispatch
Virtualization Management
From Host
I-Cache
Communication Buffer and Synchronization
Processing Element Array
Secondary Streaming Cache
Primary D-Cache
To Other Chips
Global Summary Unit
To Host Other Chips
15
System Architecture
Host Processor and Caches
Main Memory
System Bus
Host Commands, ASIMD Communication
Data Memory and Input/Output
Streaming Secondary Cache and Controller
ASIMD Processor Array Chip
16
Summary
  • ASIMD solves problems of traditional SIMD
  • High performance, low cost, highly usable
  • Retains simple programming model
  • Delivers flexible, scalable high performance
  • Suitable for mass markets
  • Plays well with others (PCs to Supercomputers)
Write a Comment
User Comments (0)
About PowerShow.com