Hiding Synchronization Delays in a GALS Processor Microarchitecture - PowerPoint PPT Presentation

About This Presentation
Title:

Hiding Synchronization Delays in a GALS Processor Microarchitecture

Description:

Reduced clock power dissipation. Allows modular design of the processor ... reduction in power dissipation. higher frequency. independent domain tuning ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 14
Provided by: gregse
Category:

less

Transcript and Presenter's Notes

Title: Hiding Synchronization Delays in a GALS Processor Microarchitecture


1
Hiding Synchronization Delays in a GALS Processor
Microarchitecture
  • Greg Semeraro
  • David H. Albonesi
  • Grigorios Magklis
  • Michael L. Scott
  • Steven G. Dropsho
  • Sandhya Dwarkadas

2
Why GALS?
  • Simplified clock distribution network
  • Reduced clock power dissipation
  • Allows modular design of the processor
  • Can run each domain at optimal frequency
  • Can use conventional design and testing methods
  • Fine-grained DVS/DFS

3
But there is a cost
  • Inter-domain synchronization can hurt performance
  • Synchronization circuit costs in area and power
  • We have to be careful how we divide the processor

4
The MCD Microprocessor
5
Inter-domain Synchronization
  • Queue design based on Chelcea and Nowick (WVLSI
    00)
  • Modified for Issue Queue configuration
  • Synchronization circuit based on Nyström and
    Martin (WCED 02)
  • Converted to single-rail logic
  • Timing analysis based on Sjogren and Myers
    (ARVLSI 97)
  • Skip a cycle rather than pause the clock

6
Synchronization via Queues
  • FIFO Queue
  • Issue Queue

7
Timing Analysis
  • Source runs with CLK1, destination with CLK2
  • Source writes at edge 1
  • If T gt Ts then the data can be used at edge 2
  • If T lt Ts then the data can be used at edge 3
  • 25 lt Ts lt 35

8
Simulation Methodology
  • Two processor pipelines
  • Alpha 21264
  • StrongARM SA-1110
  • Synchronization penalty was measured against an
    identical synchronous design
  • 30 benchmarks
  • MediaBench, Olden, SPEC 2000

9
Simulation Methodology
  • Simplescalar Wattch MCD
  • Independent clock for each domain
  • Independent jitter for each domain
  • Next edge based on period, last edge, jitter
  • When source and destination clocks are too close,
    one cycle penalty is assessed

10
Synchronization Analysis
  • OoO and superscalar capabilities removed from
    Alpha

11
Synchronization Analysis
  • OoO and superscalar capabilities added to
    StrongARM

12
What we have learned
  • Synchronization penalty doesnt mean performance
    loss
  • Out-of-order execution allows useful work to be
    performed when instructions are delayed
  • Superscalar design means that synchronization
    penalties can be shared across multiple
    instructions
  • For Alpha 95 of penalty hidden
  • For StrongARM 63 of penalty hidden
  • We have to be careful
  • Cannot have too many domains
  • Careful where you split!

13
Conclusions
  • GALS is a good idea for real processors
  • small IPC loss
  • clock network simplification
  • reduction in power dissipation
  • higher frequency
  • independent domain tuning
Write a Comment
User Comments (0)
About PowerShow.com