MIPS 64-bit processors Project MSCS 521 Computer - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

MIPS 64-bit processors Project MSCS 521 Computer

Description:

MIPS 64-bit processors Project MSCS 521 Computer Architecture MANAN SHAH ( Block Diagram & its detailed explanation, Instruction set) CHINTAN SHIHORA (Overview ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 45
Provided by: academicM4
Category:

less

Transcript and Presenter's Notes

Title: MIPS 64-bit processors Project MSCS 521 Computer


1


2
OVERVIEW
3
Overview
  • The MIPS Instruction Set Architecture has
    evolved over time from the original MIPS 1 ISA,
    through the MIPS 5 ISA, to the current MIPS32 and
    MIPS64 Architectures. All extensions have been
    backward compatible with previous versions of the
    Instruction Set Architecture .
  • In the MIPS 3 level of the Instruction Set
    Architecture, 64-bit integers and addresses were
    added to the instruction set., while in MIPS 4
    and MIPS 5 levels of the Instruction Set
    Architecture added improved floating point
    operations, as well as a set of instructions
    intended to improve the efficiency of generated
    code and of data movement.

4
Overview (cont.)
  • The 64 bit MIPS Architecture is based on the
    MIPS 5 ISA and is backward compatible with the
    MIPS32 Architecture. Both the MIPS32 and MIPS64
    Architectures bring the privileged environment
    into the Architecture definition to address the
    needs of operating systems and other kernel
    software. The MIPS64 Architectures are intended
    to address the need for a high-performance but
    cost-sensitive MIPS instruction set.
  • It include facilities like adding MIPS
    Application Specific Extensions , User Defined
    Instructions, and custom coprocessors to address
    the specific needs.

5
INTRODUCTION TO 64 BIT PROCESSOR
6
What 64-bit refers to?
  • It refers to the number of bits that can be
    processed or transmitted in parallel, in short a
    microprocessor that indicates the width of the
    registers a special high-speed storage area
    within the CPU.
  • 64-bit therefore refers to a processor with
    registers that store 64-bit numbers. 64-bit
    architecture would double the amount of data a
    CPU can process per clock cycle.

7
Need of 64-bit processor
  •  It is needed for the applications that address
    large amounts of data and memory, such as
    high-performance servers, database management
    systems, CAD tools, and digital content creation
    tools.
  • One reason why one can need 64-bit processors is
    because of their enlarged address spaces.
    Thirty-two-bit chips are limited to a maximum of
    2 GB or 4 GB of RAM access. However, a 4-GB limit
    can be a severe problem for server machines and
    machines running large databases. A 64-bit chip
    has none of these constraints because a 64-bit
    RAM address space is essentially infinite 264
    bytes of RAM.

8
FEATURES OF MIPS 64-bit processors
9
Features
  • There are 64-bit virtual addresses
  • There is a 64-bit instruction pointer .
  • New RIP-relative data addressing mode.
  • Flat address space with single code, data, and
    stack space.
  • Dual-Issue 64-bit superscalar architecture
  • High-performance 64-bit integer unit.
  • High-throughput fully pipelined 64 bit floating
    point unit .
  • High performance SysAD interface.

10
Features (cont.)
  • 32-bit or 64-bit multiplexed system
    address/data bus for optimum price/performance.
  • Available with 32-bit or 64-bit external bus
    interface.
  • Supports fractional clock ratios.
  • JTAG boundary scan.
  • Integrated primary caches
  • 32 KB instruction and data are 2-way set
    associative.
  • Virtually indexed physically tagged.

11
Features (cont.)
  • Write-back and write-through on per-page basis.
  • Index address modes (register register).
  • Pipeline restart on first double word for data
    cache misses.
  • 64-bit MIPS instruction set architecture
  • Floating point multiply-add instruction
    increases performance in signal processing and
    Graphics applications.
  • Conditional moves to reduce branch frequency.

12
INSTRUCTION SET FOR MIPS 64-bit processors
13
MIPS 64-bit processorsInstructions
14
BLOCK DIAGRAM FOR MIPS 64-bit processors
15
Block Diagram
It supports four floating-point
multiply-add/subtract instructions which allow
two separate floating-point computations to be
performed with one instruction. The four
instructions are 1. Multiply-add (MADD) 2.
Multiply-subtract (MSUB) 3. Negative Multiply-add
(NMADD) 4. Negative Multiply-Subtract (NMSUB)
16
Detailed Explanation (For Block Diagram)
Index 1 ) Large On-chip Caches 2) Dual Entry
TLB 3) Write Buffer 4) Pipelining 5) Dual-Issue
Mechanism 6) Dedicated Integer and FP ALUs 7)
Separate FP Execution Units 8) Scaleable for
Multiple Processors 9) Secondary Cache
Support 10) Multiple Cache Sizes 11) Simultaneous
Access 12) Flexible Clocking Mechanism 13)
On-chip Clock Multiplication Circuitry
17
Large on- chip Caches (Detailed explanation-
Block diagram)
  • MIPS 64 bit processor contains separate 32 kB
    data and instruction caches.
  • Each cache is 2-way set associative, which helps
    to increase the hit rate over a direct-mapped
    implementation
  • Cache lines may be classified as write-through
    or write-back on a per-page basis.
  • Both caches are virtually indexed and physically
    tagged.
  • a) A virtually indexed cache allows the cache
    access to begin as soon as the virtual address is
    generated, as opposed to waiting for the virtual
    to physical translation. The cache is accessed at
    the same time as the address translation is
    performed. The physical address is then compared
    against the corresponding instruction or data
    cache tag. If the compare is valid, the data
    which has been retrieved from the cache is used.
    If the compare is not valid, meaning that the
    address requested does not reside in the cache,
    the data is not used and a cache miss is
    generated.

18
Large on- chip Caches (cont.) (Detailed
explanation- Block diagram)
  • b) While in Physically tagged data cache
    allows for coherency between the primary and
    secondary caches in a system.
  • Having large primary caches allows more of the
    application to be executed on-chip, reducing
    accesses to slower secondary cache and main
    memory. This in turn reduces bus utilization and
    allows the application to run faster since fewer
    off-chip accesses are required.

19
Dual Entry TLB (Detailed explanation- Block
diagram)
  • The TLB of the MIPS 64 bit processor contains
    48 dual entries. This implementation is
    equivalent to a 96-entry TLB
  • Each virtual page number entry equates to two
    physical frame numbers one even and one odd.
  • The lower bit of the Virtual Page Number is
    used to determine whether the even or odd PFN
    will be used.
  • The TLB is fully-associative.

20
Write Buffer (Detailed explanation- Block
diagram)
  • Writes to external memory
  • The write buffer holds up to four 64-bit
    address and data pairs, or one cache line to be
    written out.
  • Since data cache writebacks are typically
    performed on a line basis, an entire line can be
    written to the buffer, allowing the CPU to resume
    normal execution.
  • Without a write buffer, the CPU would have to
    write a single 64-bit doubleword, then wait until
    the memory operation completes, before writing
    another.

21
Write Buffer (cont.) (Detailed explanation-
Block diagram)
  • The write buffer allows the CPU to write data
    into the buffer without accessing the system bus.
  • For uncached write cycles, the write buffer can
    significantly increase performance by allowing
    the pipelining of multiple writes.
  • With cacheable write cycles, the buffer allows
    the CPU to write data to the buffer and
    immediately begin processing the next write data.
  • Without the buffer, the CPU would output the
    write data, then be forced to wait until the
    uncached write operation has completed before
    processing the next write.

22
Pipelined Writes (Detailed explanation- Block
diagram)
  • Write cycles can be performed back-to-back
    without any dead clocks between cycles.
  • In the original R4000 architecture there is a
    two clock delay between the generation of
    back-to-back addresses. This results in two dead
    clocks between back-to-back cycles.
  • The pipelined write protocol also uses the write
    buffer to allow pipelining of write cycles.
  • In the MIPS 64 bit processor, performance is
    significantly increased by eliminating the two
    null cycles between each write cycle.

23
Pipelining (Detailed explanation- Block diagram)
  • A pipeline is divided into
  • Fetch
  • Arithmetic operation
  • Memory access
  • Write back

A non-pipelined execution
Pipelined execution
24
Pipelining (cont.)
  • In the example shown in Figure , each stage
    takes one processor clock cycle to complete.
  • Thus it takes four clock cycles (ignoring
    delays or stalls) for the instruction to
    complete. In this example, the execution rate of
    the pipeline is one instruction every four clock
    cycles.
  • Conversely, because only a single execution can
    be fetched before completion, only one stage is
    active at any time.

25
Parallel Pipelining
  • Instead of waiting for an instruction to be
    completed before the next instruction can be
    fetched , a new instruction is fetched each clock
    cycle.
  • There are four stages to the pipeline so the
    four instructions can be executed simultaneously,
    one at each stage of the pipeline.
  • Instructions in Figure are executed at a rate
    four times that of the pipeline shown in the
    previous figure.

26
SuperPipeline
  • Figure below shows a superpipelined
    architecture.
  • Each stage is designed to take only a fraction
    of an external clock cyclein this case, half a
    clock.
  • Therefore more than one instruction can be
    completed each cycle.

27
SuperScalar Pipeline
  • A superscalar architecture also allows more
    than one instruction to be completed each clock
    cycle.

28
How Pipelining Works
  •   The processor fetches and decodes four
    instructions per cycle and then appends them to
    one of the three instruction queue.
  • Each queue determines the execution order
    based on the availability of the required FUs.
  • Though initially fetched and decoded in order,
    processor to have up to 32 instructions in
    various stages of execution.

29
How Pipelining Works (cont.)
  • Initially, Instructions proceed through the
    instruction fetch pipeline which consist of
    fetch, decode, and issue stages
  •   in the fetch stage. Four instructions are
    fetched and aligned.
  • in the decode stage, the instructions are
    decoded, register renaming as performed, and
    branch instructions are predicted
  • in the issue stage (first half), the
    instructions are written to one of three 16-entry
    instructions queue, the availability of the
    operands is also determined.
  • (second half is on the next slide)

30
How Pipelining Works (cont.)
  • Depending on the type, the instruction
    proceeds to one of the five instruction
    pipelines.
  • There are two integer and two floating-point
    pipelines, and one load/store execution pipeline.
  • Each of these pipelines begins when a queue
    issue and instruction and continue as follows
  •  
  • in the issue stage (second half ), the
    processor reads operands from the register files,
  • the execution begins and takes
  • a) one stages in the case of integer pipelines
  • b) two stages in the case of the load/store
    pipeline
  • c) three stages in the case of floating-point
    pipeline

31
Floating point Co-processor
  •   Performance is gained on floating-point codes
    by allowing the integer unit to execute the
    necessary loads and stores of floating-point
    values. As well as index register updates and
    branching.
  • The issue logic allows the dual of the integer
    instruction and a floating-point instruction.

32
Dual Issue Mechanism (Detailed explanation-
Block diagram)
  •   The dual-issue mechanism implemented in 64 bit
    MIPS processor allows a floating-point ALU
    instruction to be issued simultaneously with any
    other instruction type.
  • Whenever a floating-point ALU instruction is
    fetched with any non- FP-ALU instruction, both
    instructions can be issued in the same cycle.
  • Load and store instructions in one pipeline
    usually provide enough data bandwidth to permit a
    new instruction to be issued every cycle for a
    fix period.
  • Well structured code can take full advantage of
    this pipeline structure.

33
Dedicated Integer FP ALU (Detailed
explanation- Block diagram)
  • Separate Integer and FP ALUs allow
    instructions of both types to be performed
    simultaneously.
  • Integer instructions are not stalled while
    long latency floating-point operations are being
    executed.
  • Use Running CAD-type applications as both
    fixed-point and floating-point math calculations.

34
Scalable for Multiple processor (Detailed
explanation- Block diagram)
  • The 64 bit MIPS processor incorporates 8
    external signals.
  • These signals allow for arbitration and data
    coherency between processors.
  • Therefore, Symmetric multiprocessing systems
    implementing the full Modified Exclusive Shared
    Invalid cache consistency protocol in both
    primary and secondary caches, as well as other
    styles of multiprocessing will be supported.

35
Separate FP Execution Units (Detailed
explanation- Block diagram)
  • In addition to the dual-issue mechanism, the 64
    bit MIPS processor also contains separate
    acceleration hardware for most floating-point ALU
    instructions.
  • This allows long-latency operations such as
    divide and square-root to be performed in a
    dedicated unit, thereby allowing other
    shorter-latency operations such as MADD and
    subtract to be overlapped while the divide or
    square-root operation is in progress.

36
Secondary Cache Support (Detailed explanation-
Block diagram)
  • The 64 bit MIPS processor contains a dedicated
    secondary cache interface.
  • These signals provide an efficient interface
    between the processor, the secondary cache, and
    the secondary cache tag RAM.
  • All AM interface signals such as data and chip
    enables, output enable, address match, cache
    valid, line index, and word index are provided by
    the processor.
  • The secondary cache also supports multiple
    cache sizes and both the write-through and
    write-back data transfer protocols.
  • Data transfers to the secondary cache share the
    64-bit system bus.

37
Multiple Cache Sizes (Detailed explanation-
Block diagram)
  • The secondary cache can be configured as 512
    kB, 1Mbyte, or 2 Mbyte, allowing large
    applications to run within the secondary cache,
    reducing the number of accesses to slower main
    memory.
  • The secondary cache is accessed through the
    system bus.
  • Uncached bus cycles are not evaluated by the
    secondary cache control logic as they travel to
    the external agent.
  • Uncached operations such as video screen updates
    can be passed directly to the system
    logic responsible for routing the data to the
    screen without any delays from the secondary
    cache logic.

38
Simultaneous Access (Detailed explanation- Block
diagram)
  • To maximize data throughput, the main memory
    accesses can be initiated while the secondary
    cache tag is being compared.
  • If the requested address is found to be in the
    secondary cache, the memory access is aborted
    if the address is not found in the secondary
    cache, then main memory access can be initiated
    and the data can be retrieved more quickly.

39
Flexible Clocking Mechanism (Detailed
explanation- Block diagram)
  • The clocking mechanism in the 64 bit MIPS
    processor offers a number of pipeline frequencies
    based on the frequency of the input clock.
  • Single External Clock Signal
  • A single clock signal is used for the system
    interface, as opposed to three. The processor
    eliminates the Rclock, Tclock, and MasterOut
    clock signals that existed in the previous
    processors.
  • Having only one clock simplifies system
    design, as well as reducing the circuit
    complexity of the internal clock mechanism.

40
On Chip Clock Multiplication Circuitry (Detailed
explanation- Block diagram)
  • The 64 bit processor includes on-chip clock
    frequency multiplication circuitry to support
    200-MHz internal operation from an external
    50-MHz clock.
  • The processor has the option of operating
    internally at 2, 3, or 4 times the frequency of
    the external clock.
  • Maximum bus speed of the system interface is
    100 MHz.

41
PROS CONS
42
Advantages
  • It can handle more memory and larger files. 
  • 64-bit architecture will allow systems to address
    up to 1 terabyte (1000GB) of memory
  • 64-bit machines also offer faster I/O speeds to
    things like hard disk drives and video cards.
    These features can greatly increase system
    performance.

43
Disadvantages
  • The same data occupies more space in memory. This
    increases the memory requirements of a given
    process and can create problems for efficient
    processor cache utilization.
  • 64-bit systems sometimes lack equivalents to
    software that is written for 32-bit
    architectures. The most severe problem is
    incompatible device drivers. Although most
    software can run in a 32-bit compatibility mode,
    it is usually impossible to run a driver in that
    mode.

44
References
  • http//en.wikipedia.org/wiki/MIPS_architecture
  • 2) http//en.wikipedia.org/wiki/Superscalar
  • 3) http//www.intel.com/cd/ids/developer/asmo-na/e
    ng/
  • microprocessors/ia32/pentium4/optimization/44015.
    htm
  • 4)MIPS Architecture. 17 April 2004.
    Wikipedia,
  • The Free Encyclopedia http//en.wikipedia.org
    /wiki/Main_Page 23
  • April 2004 http//en.wikipedia.org/wiki/MIPS_
    architecture.
  • 5) http//www.google.com/search?hlenq2010740_00
    44045B15D.pdf
  • 6) http//books.google.com/books?idNibfj2aXwLYCp
    gPA384dqMIPSR5000
  • MicroprocessorandpipeliningoperationsignY
    GolNlOk5S_ePkXDKiVdnfORDY
  • 7) http//books.google.com/books?idJEYKyfZ3yF0Cp
    gPA195dq
  • MIPSR5000Microprocessorandpipeliningopera
    tionsig
  • qr82jZMTWo8Z0YWqMWScerbF0XQPPA195,M1
Write a Comment
User Comments (0)
About PowerShow.com