Exploring Core Designs for Chip Multiprocessors - PowerPoint PPT Presentation

About This Presentation
Title:

Exploring Core Designs for Chip Multiprocessors

Description:

Commercial workloads will not benefit much from OOO / wide-issue ... ROB, instruction window, and # functional units halved for 2-wide processor. Results ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 18
Provided by: matthe125
Category:

less

Transcript and Presenter's Notes

Title: Exploring Core Designs for Chip Multiprocessors


1
Exploring Core Designs for Chip Multiprocessors
  • Allison Holloway
  • Matthew Allen

2
Outline
  • Motivation
  • Hypotheses
  • Methodology
  • Results
  • Conclusions

3
Motivation
  • What should core of a CMP look like?
  • Workloads commercial, scientific
  • OOO wide-issue superscalar?
  • Tradeoffs Performance, Power, Area, Complexity

4
Hypotheses
  • Commercial workloads will not benefit much from
    OOO / wide-issue
  • Scientific workloads will benefit significantly
    from OOO / wide-issue
  • OOO wide-issue will be less beneficial for
    larger scale systems
  • Augmenting an in-order processor with
    non-blocking caches will close OOO gap

5
Methodology
  • Simulator Multifacet, Ruby, Opal (OOO)
  • In-order processor model
  • Looked at Simics functional not comparable
  • Restrict Opal to in-order issue
  • Register renaming not removed
  • Limitations
  • Cant recompile code for scheduling
  • Does not model UltraSPARC issue rules

6
Methodology
  • Workloads
  • Commercial Apache, SPECjbb, OLTP, Zeus
  • Scientific Barnes-Hut, Ocean
  • Issues
  • No 4 processor simulation
  • No cache warmup files

7
Methodology
  • Baseline configuration used
  • ROB, instruction window, and functional units
    halved for 2-wide processor

8
Results
  • OOO vs. in-order provides more performance
    benefit than widening issue from 2 to 4
  • Tolerating cache misses is the key

9
(No Transcript)
10
Results
  • Hypothesis 1 Commercial workloads will not
    benefit much from OOO / wide-issue
  • 30 speedup
  • Hypothesis 2 Scientific workloads will benefit
    significantly from OOO / wide-issue
  • 60 speedup
  • Commercial workloads DO benefit from OOO, but not
    as much as scientific.

11
(No Transcript)
12
Results
  • OOO wide-issue will be less beneficial for
    larger scale systems
  • True, BUT
  • Workloads dont scale above 8 processors (except
    apache)

13
(No Transcript)
14
(No Transcript)
15
(Non) Results
  • Hypothesis 4 Augmenting an in-order processor
    with non-blocking caches will close OOO gap
  • Simulations still running!

16
Future Work
  • Analyze performance trade-offs
  • vs. power? vs. area?
  • 4 processor runs (if possible)
  • Vary of MSHRs

17
Conclusions
  • Out-of-order provides substantial benefit over
    in-order, even for commercial workloads
  • Other methods for tolerating/reducing cache
    misses may be effective
  • Diminishing returns for larger systems, but
    workloads dont scale well
  • Need to consider power and area constraints
Write a Comment
User Comments (0)
About PowerShow.com