Single-Chip Multiprocessor - PowerPoint PPT Presentation

About This Presentation
Title:

Single-Chip Multiprocessor

Description:

Single-Chip Multiprocessor Nirmal Andrews Case for single chip multiprocessors Advances in the field of integrated chip processing. - Gate density (More ... – PowerPoint PPT presentation

Number of Views:103
Avg rating:3.0/5.0
Slides: 21
Provided by: Nirm84
Category:

less

Transcript and Presenter's Notes

Title: Single-Chip Multiprocessor


1
Single-Chip Multiprocessor
  • Nirmal Andrews

2
Case for single chip multiprocessors
  • Advances in the field of integrated chip
    processing.
  • - Gate density (More transistors per
    chip)
  • - Cost of wires
  • Many studies done in Stanford University during
    late 90s and proved CMP (single-chip
    multiprocessor) is better than competing
    technology .

3
Parallelism
  • Parallelism becomes a necessity for improving
    performance.
  • Parallelism made possible using dynamic
    scheduling, multiple instruction issue,
    speculative execution, non-blocking caches etc.,
    (late 90s)
  • Parallelism classifications
  • Instruction level
  • Loop level
  • Thread level - Future trend
  • Process level - Future trend

4
Loop level parallelism
  • To increase amount of parallelism exploit
    parallelism among iterations of a loop.
  • ILP that results from data independent loop
    iterations is LLP.
  • No circular dependencies. This could be avoided
    too using loop unrolling (beyond the scope of
    this lecture).

5
LLP (Loop Level Parallelism)
  • 15 ILP extracted from a basic block in an
    integer programs 7 instructions.
  • for (i1 ilt100 i i1)
  •   ai ai bi         //s1  
    bi1 ci di     //s2
  • s1 depends on s2. So to extract LLP rearrange
  • a1 a1 b1 for (i1 ilt99 i
    i1)
  •   bi1 ci di   ai1
    ai1 bi1 b101 c100 d100
  • No dependencies.

6
Competing technology - Superscalar
  • Executing multiple instruction in the same clock
    cycle.
  • Dynamic scheduling
  • Single processor
  • Redundant functional units on processor
  • Mixture between a scalar and vector processor

7
Competing technology - Superscalar
8
Wide issue superscalar
9
Fetch Phase
  • 3 phase Fetch, Issue, Execution
  • Bottlenecks Issue and Execution phase.
  • Fetch phase Provide large and accurate window of
    decoded instructions - 3 issues instruction
    misalignment, cache miss, mispredicted branch.
  • - misprediction reduced to under 5 using
    branch predictor designed by McFarling.
  • - instruction misalignement reduced to under
    3 by dividing cache into banks (Conte).
  • - Roesnblum et al. shows that the 60 of
    latency by cache miss can be hidden.
  • S. McFarling, Combining branch
    predictors, WRL Technical Note TN-36, Digital
    Equipment Corporation, 1993.
  • M. Rosenblum, E. Bugnion, S. Herrod, E.
    Witchel, and A. Gupta, The impact of
    architectural trends on operating system
    performance, Proceedings of 15th ACM
  • symposium on Operating Systems Principles,
    Colorado,December, 1995.

10
Issue phase
  • Issue phase Register renaming.
  • 2 techniques for register renaming
  • - Use a table to map architectural
    registers and physical registers. Ports
    required operands per instruction Instruction
    window size
  • - Use reorder buffer. Comparators required
    to find which physical register should provide
    data to which packet of instruction. Large number
    of comparators required.
  • In HP PA 8000 20 of die space occupied by
    comparators.
  • Quadratic increase in instruction queue register
    with increase in issue width.
  • Queue register uses broadcast to connect to
    registers which increases the wires used
    increased delay and cost

11
Execute phase
  • Execution phase also has similar issues.
  • Increase in issue width causes increase in
    renamed registers leading to quadratic increase
    in register file complexity.
  • Increase in execution unit causes quadratic
    increase in the complexity of bypass logic.
  • Bottleneck Interconnect delay between
    execution units.

12
Architecture - Superscalar
13
Competing technologies Simultaneous Multi
Threading
  • Simultaneous Multi threading architecture is
    similar to that of the superscalar.
  • SMT processors support wide superscalar
    processors with hardware, to execute instructions
    from multiple thread concurrently.
  • Provides latency tolerance.
  • Reduces to conventional wide-issue superscalar
    when no multiple threads possible.

14
Competing technologies - Simultaneous Multi
Threading
  • SMT (Simultaneous Multi-threading)

15
Centralized architecture
  • Disadvantages of centralized architectures such
    as SMT and Superscalars are
  • - Area increases quadratically with cores
    complexity.
  • - Increase in cycle time interconnect
    delays. Delay with wires dominate delay of
    critical path of CPU. Possible to make simpler
    clusters, but results in deeper pipeline and
    increase in branch misprediction penalty.
  • - Design verification cost high, due to
    complexity and single processor
  • - Large demand on memory system.

16
Single Chip multiprocessor
  • Motivation for a decentralized architecture due
    to the disadvantages of competing technologies.
  • Simple individual processors and high clock rate.
  • Low interconnect latency
  • Exploits thread level and processor level
    parallelism.

17
Single chip Multiprocessor architecture
18
Performance comparison
  • Example 8 core Cell processor in the PS3 and the
    3 core Xenon processor in the Xbox 360)
  • Performance chart
  • Run for different benchmark programs.

19
Summary (CMP)
  • CMP (Chip level multiprocessor) provides superior
    performance with simpler hardware.
  • No parallelism Superscalar performance is 30
    better than CMP
  • Fine grained thread-level parallelism
    Superscalar is 10 better in performance
  • Coarse grained thread-level parallelism CMP
    is 50-100 better than superscalar.
  • Disadvantage Slow when no multithreading, equal
    development of software required.

20
Reference
  • K Olukotun, BA Nayfeh, L Hammond, K Wilson, K,
    The case for a single-chip multiprocessor,ACM
    SIGPLAN Notices, 1996.
  • L Hammond, BA Nayfeh, K Olukotun, A Single-Chip
    Multiprocessor, IEEE, Sept 1997.
  • Wikipedia, Superscalar, April 2008.
    http//en.wikipedia.org/wiki/Superscalar.
  • Wikipedia, Multi-core, April 2008.
    http//en.wikipedia.org/wiki/Multi-core_computing.
Write a Comment
User Comments (0)
About PowerShow.com