12th Lecture: Technological Trends and Future Processor Alternatives - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

12th Lecture: Technological Trends and Future Processor Alternatives

Description:

... Intel: the real threat for processor designers is shipping 30 million CPUs only ... To date, most machines enforce such dependences in a rigorous fashion. ... – PowerPoint PPT presentation

Number of Views:142
Avg rating:3.0/5.0
Slides: 19
Provided by: unge
Category:

less

Transcript and Presenter's Notes

Title: 12th Lecture: Technological Trends and Future Processor Alternatives


1
12th Lecture Technological Trends and Future
Processor Alternatives
  • Todays lecture
  • Microprocessors today
  • Trends and principles in the Giga Chip Era
  • Future Processor Alternatives

2
Microprocessors today (Y2K)
  • Chip-Technology 2000/01
  • 0.18-?m CMOS-technology, 10 to 100 M transistors
    per chip, 600 MHz to 1.4 GHz cycle rate
  • Example processors
  • Intel Pentium III 7.5 M transistors, 0.18 ?m
    (Bi-)CMOS, up to 1 GHz
  • Intel Pentium 4 ?? Transistors, 0.18 ?m
    (Bi-)CMOS, up to 1.4 GHz, uses a Trace Cache!
  • Intel IA-64 Itanium (already announced for
    2000?) 0.18 ?m CMOS, 800 MHz successor
    McKinley (announced for 2001) gt 1 GHz
  • Alpha 21364 100 M transistors, 0.18 ?m CMOS (1.5
    volt, 100 watt), ? MHz
  • HAL SPARC uses Trace Cache and Value Prediction
  • Alpha 21464 will be 4-way simultaneous
    multithreaded
  • Sun MAJC will be a two processor-chip, each
    processor a 4-way block-multithreaded VLIW

3
5. Future processors to use fine-grain
parallelism5.1 Trends and principles in the Giga
Chip Era
  • Forecasting the effects of technology is hard
  • Everything that can be invented has been
    invented. US Commissioner of Patents, 1899.
  • I think there is a world market for about five
    computers. Thomas J. Watson Sr., IBM founder,
    1943.

4
Microprocessors tomorrow (Y2K-2012)
  • Moore's Law number of transistors per chip
    double every two years
  • SIA (semiconductor industries association)
    prognosis 1998

5
Design Challenges
  • increasing clock speed,
  • the amount of work that can be performed per
    cycle,
  • and the number of instructions needed to perform
    a task.
  • Today's general trend toward more complex designs
    is opposed by the wiring delay within the
    processor chip as main technological problem.
  • higher clock rates with subquarter-micron
    designs? on-chip interconnecting wires cause a
    significant portion of the delay time in
    circuits.
  • Especially global interconnects within a
    processor chip cause problems with higher clock
    rates.
  • Maintaining the integrity of a signal as it moves
    from one end of the chip to the other becomes
    more difficult.
  • Copper metallization is worth a 20 to 30
    reduction in wiring delay.

6
Application and Economy-Related Trends
  • Applications
  • generally user interactions like video, audio,
    voice recognition, speech processing, and 3D
    graphics.
  • large data sets and huge databases,
  • large data mining applications,
  • transaction processing,
  • huge EDA applications like CAD/CAM software,
  • virtual reality computer games,
  • signal processing and real-time control.
  • Colwell from Intel the real threat for processor
    designers is shipping 30 million CPUs only to
    discover they are imperfect and cause a recall.
  • Economies of scale
  • Fabrication plants now cost about 2 billion, a
    factor of ten more than a decade ago.
    Manufacturers can only sustain such development
    costs if larger markets with greater economies of
    scale emerge. ? Workloads will concentrate on
    human computer interface. ? Multimedia workloads
    will grow and influence architectures.

7
Architectural Challenges and Implications
  • Preserve object code compatibility (may be
    avoided by a virtual machine that targets
    run-time ISAs)
  • It is necessary to find ways of expressing and
    exposing more parallelism to the processor. It is
    doubtful if enough ILP is available.
  • Buses may probably scale. Expect much wider buses
    in future.
  • Memory bottleneck Memory latency may be solved
    by a combination of technological improvements in
    memory chip technology and by applying advanced
    memory hierarchy techniques (other authors
    disagree).
  • Power consumption for mobile computers and
    appliances.
  • Soft errors by cosmic rays of gamma radiation may
    be faced with fault-tolerant design through the
    chip.

8
Possible solutions
  • a focus of processor chips on particular market
    segments
  • multimedia pushes desktop personal computers
    while high-end microprocessors will serve
    specialized applications
  • integrate functionalities to systems on a chip
  • partition a microprocessor in a client chip part
    that focuses on general user interaction enhanced
    by server chip parts that are tailored for
    special applications
  • a CPU core that works like a large ASIC block and
    that allows system developers to instantiate
    various devices on a chip with a simple CPU core
  • and reconfigurable on-chip parts that adapt to
    application requirements.
  • Functional partitioning becomes more important!

9
Future Processor Architecture Principles
  • Speed-up of a single-threaded application -
    today
  • Trace cache
  • Superspeculative
  • Advanced superscalar
  • Speed-up of multi-threaded applications lecture
    13
  • Chip multiprocessors (CMPs)
  • Simultaneous multithreading
  • Speed-up of a single-threaded application by
    multithreading lecture 14
  • Multiscalar processors
  • Trace processors
  • DataScalar
  • Exotics lecture 15
  • Processor-in-memory (PIM) or intelligent RAM
    (IRAM)
  • Reconfigurable
  • Asynchronous

10
Processor Techniques that Speedup Single threaded
Applications
  • Trace Cache tries to fetch from dynamic
    instruction sequences instead of the static code
    cache
  • Advanced superscalar processors scale current
    designs up to issue 16 or 32 instructions per
    cycle.
  • Superspeculative processors enhance wide-issue
    superscalar performance by speculating
    aggressively at every point.

11
The Trace Cache
  • Trace cache is a new paradigm for caching
    instructions.
  • A Trace cache is a special I-cache that captures
    dynamic instruction sequences in contrast to the
    I-cache that contains static instruction
    sequences.
  • Like the I-cache, the trace cache is accessed
    using the starting address of the next block of
    instructions.
  • Unlike the I-cache, it stores logically
    contiguous instructions in physically contiguous
    storage.
  • A trace cache line stores a segment of the
    dynamic instruction trace across multiple,
    potentially taken branches.

12
The Trace Cache (2)
  • Each line stores a snapshot, or trace, of the
    dynamic instruction stream.
  • The trace construction is of the critical path.
  • As a group of instructions is processed, it is
    latched into the fill unit.
  • The fill unit maximizes the size of the segment
    and finalizes a segment when the segment can be
    expanded no further.
  • The number of instructions within a trace is
    limited by the trace cache line size.
  • Finalized segments are written into the trace
    cache.
  • Instructions can be sent from the trace cache
    into the reservation stations (??) without having
    to undergo a large amount of processing and
    rerouting.
  • It is under research if the instructions in the
    Trace cache are
  • fetched but not yet decoded
  • decoded but not yet renamed
  • or decoded and partly renamed
  • Trace cache placement in the microarchitecture is
    dependent on this decision.

13
The Trace Cache- Performance
  • Three applications from the SPECint95 benchmarks
    are simulated on a 16-wide issue machine with
    perfect branch prediction
  • (see Patt-paper).

14
5.3 Superspeculative Processors
  • Idea Instructions generate many highly
    predictable result values in real programs gt
    Speculate on source operand values and begin
    execution without waiting for result from the
    previous instruction.Speculate about true data
    dependences!!
  • reasons for the existence of value locality
  • Due to register spill code the reuse distance of
    many shared values is very short in processor
    cycles. Many stores do not even make it out of
    the store queue before their values are needed
    again.
  • Input sets often contain data with little
    variation (e.g., sparse matrices or text files
    with white spaces).
  • A compiler often generates run-time constants due
    to error-checking, switch statement evaluation,
    and virtual function calls.
  • The compiler also often loads program constants
    from memory rather than using immediate
    operands.
  • See M. H. Lipasti, J. P. Shen Superspeculative
    Microarchitecture for Beyond AD 2000. IEEE
    Computer, Sept. 1997, pp. 59-66

15
Strong- vs. Weak-dependence Model
  • Strong-dependence model for program execution a
    total instruction ordering of a sequential
    program.
  • Two instructions are identified as either
    dependent or independent, and when in doubt,
    dependences are pessimistically assumed to exist.
  • Dependences are never allowed to be violated and
    are enforced during instruction processing.
  • To date, most machines enforce such dependences
    in a rigorous fashion.
  • This traditional model is overly rigorous and
    unnecessarily restricts available parallelism.
  • Weak-dependence model
  • specifying that dependences can be temporarily
    violated during instruction execution as long as
    recovery can be performed prior to affecting the
    permanent machine state.
  • Advantage the machine can speculate aggressively
    and temporarily violate the dependences. The
    machine can exceed the performance limit imposed
    by the strong-dependence model.

16
Implementation of a Weak-dependence Model
  • The front-end engine assumes the weak-dependence
    model and is highly speculative, predicting
    instructions to aggressively speculate past
    them.
  • The back-end engine still uses the
    strong-dependence model to validate the
    speculations, recover from misspeculation, and
    provide history and guidance information to the
    speculative engine.

17
Superflow processor (Lipasti and Shen 1997)
  • The Superflow processor speculates on
  • instruction flow two-phase branch predictor
    combined with trace cache
  • register data flow dependence prediction
    predict the register value dependence between
    instructions
  • source operand value prediction
  • constant value prediction
  • value stride prediction speculate on constant,
    incremental increases in operand values
  • dependence prediction predicts inter-instruction
    dependences
  • memory data flow prediction of load values, of
    load addresses and alias prediction
  • Superflow simulations 7.3 IPC for SPEC95 integer
    suite, up to 9 instructions per cycle when 32
    instructions are potentially issued per cycle
  • With dependence and value prediction a three
    cycle issue nearly matches the performance of a
    single issue dispatch.

18
SuperflowProcessorProposal
Write a Comment
User Comments (0)
About PowerShow.com