Multithreaded Processors - PowerPoint PPT Presentation

About This Presentation
Title:

Multithreaded Processors

Description:

General Purpose Machine. Scalability. Shared Memory. Simpler Programming Model ... Processor Characteristics Cont'd. Load-Store Architecture. 3 Addressing Modes ... – PowerPoint PPT presentation

Number of Views:77
Avg rating:3.0/5.0
Slides: 26
Provided by: frankc46
Category:

less

Transcript and Presenter's Notes

Title: Multithreaded Processors


1
Multithreaded Processors
Multi-Threaded Processor Architectures
The Tera MTA
Frank Casilio
Computer Engineering
May 15, 1997
2
Problems with MultiProcessors
  • Memory Latency
  • Context Switching Time
  • Communication/Synchronization Latency
  • Poor Programming Model

3
Motivation
  • Reduce/Tolerate Memory Latency
  • General Purpose Machine
  • Scalability
  • Shared Memory
  • Simpler Programming Model

4
Typical Ways To Reduce Latency
  • Fast Buses Networks
  • Hardware Synchronization
  • Prefetching

5
Multi-Threading The Concept
  • Support For Multiple Concurrent Hardware Contexts
  • Swap Contexts During Latencies
  • Tolerates Latency Instead of Reducing It

6
Parameters That Effect Efficiency
  • Number Of Contexts Supported
  • Switching Overhead
  • Run Length (Granularity)
  • Average Latency To Be Hidden

7
Switching Theory
  • Determines How Often Contexts Switch
  • Directly Related to Cost

8
Fine Grained Switching
  • Switches Contexts Every Cycle
  • Many Long Latencies Operations Tolerated

9
Coarse Grained Switching
  • Requires Less Contexts
  • Requires More Complex Processors

10
The TERA MTA
  • First Commercial Multithreaded Machine Since 1978
  • Uniform Shared Memory
  • Fine Grained Architecture

11
The Tera MTA Contd
  • Torodial Interconnection
  • 16-256 Processor Versions
  • 12 Million Dollar Base System

12
Processor Characteristics
  • Support For 128 Threads
  • 16 Protection Domains
  • 0 Context Switching Overhead!!!
  • 1 GFLOP Peak Performance
  • 333 MHz Nominal Speed

13
Processor Characteristics Contd
  • 3 Operations Per Instruction
  • 31 64-bit GPRs
  • 6KW Of Power Dissipation Per Processor

14
Interconnection Network
  • 3-D Torus Contains 3p/2 nodes
  • Packet Switching
  • 3 Cycles of Latency Per Node
  • Messages Are Assigned Random Priorities
  • 2 HIPPI Channels / Processor For Net Connection

15
Memory
  • Either 2p or 4p Units, Interleaved 64 Ways
  • 8, 16, 32 and 64 Bit Addressable
  • 4 Bits per Word Of Access State For
    Synchronization
  • Memory Units Equipped With Error Correcting Code
  • Memory Usage In Random To All Banks
  • 16 MB DRAM Chips

16
Input / Output
  • 20p MB/s In Each Direction
  • At Least p/16 Disk Arrays Are Required
  • System Capacity of 300p GB

17
Operating System
  • Allows Systems To Run p Tasks Truly Parallel
  • Streams Are Dynamically Created w/o OS
    Intervention
  • Processes Are Broken Up Into Tasks By OS

18
Software / Languages
  • Implicit And Explicit Parallelism Is Allowed
  • High Degree of Cray Compatibility
  • Easy To Program b/c Of Architecture

19
System Performance
  • 3.84-12.8 Times Performance Of Cray T90/32
  • 1K x 1K Matrix Multiple in 50 ms
  • Integer Sort of 100M Keys in 36 ms

20
Conclusion
  • Proven Effectiveness
  • Logical Step For Multiprocessor Computers
  • Still Very Pricey
  • Allow General Purpose Workload
  • Scalable
  • Shared Memory

21
Questions?
22
Instruction Pipeline
23
Breakdown Of A Task
24
(No Transcript)
25
Deciding The Of Number Contexts
Write a Comment
User Comments (0)
About PowerShow.com