MIPS 64-bit processors Project MSCS 521 Computer

About This Presentation

Title:

MIPS 64-bit processors Project MSCS 521 Computer

Description:

MIPS 64-bit processors Project MSCS 521 Computer Architecture MANAN SHAH ( Block Diagram & its detailed explanation, Instruction set) CHINTAN SHIHORA (Overview ... – PowerPoint PPT presentation

Number of Views:80

Avg rating:3.0/5.0

Slides: 45

Provided by: academicM4

Category:

more less

Transcript and Presenter's Notes

Title: MIPS 64-bit processors Project MSCS 521 Computer

1

2
OVERVIEW
3
Overview

The MIPS Instruction Set Architecture has
evolved over time from the original MIPS 1 ISA,
through the MIPS 5 ISA, to the current MIPS32 and
MIPS64 Architectures. All extensions have been
backward compatible with previous versions of the
Instruction Set Architecture .
In the MIPS 3 level of the Instruction Set
Architecture, 64-bit integers and addresses were
added to the instruction set., while in MIPS 4
and MIPS 5 levels of the Instruction Set
Architecture added improved floating point
operations, as well as a set of instructions
intended to improve the efficiency of generated
code and of data movement.

4
Overview (cont.)

The 64 bit MIPS Architecture is based on the
MIPS 5 ISA and is backward compatible with the
MIPS32 Architecture. Both the MIPS32 and MIPS64
Architectures bring the privileged environment
into the Architecture definition to address the
needs of operating systems and other kernel
software. The MIPS64 Architectures are intended
to address the need for a high-performance but
cost-sensitive MIPS instruction set.
It include facilities like adding MIPS
Application Specific Extensions , User Defined
Instructions, and custom coprocessors to address
the specific needs.

5
INTRODUCTION TO 64 BIT PROCESSOR
6
What 64-bit refers to?

It refers to the number of bits that can be
processed or transmitted in parallel, in short a
microprocessor that indicates the width of the
registers a special high-speed storage area
within the CPU.
64-bit therefore refers to a processor with
registers that store 64-bit numbers. 64-bit
architecture would double the amount of data a
CPU can process per clock cycle.

7
Need of 64-bit processor

It is needed for the applications that address
large amounts of data and memory, such as
high-performance servers, database management
systems, CAD tools, and digital content creation
tools.
One reason why one can need 64-bit processors is
because of their enlarged address spaces.
Thirty-two-bit chips are limited to a maximum of
2 GB or 4 GB of RAM access. However, a 4-GB limit
can be a severe problem for server machines and
machines running large databases. A 64-bit chip
has none of these constraints because a 64-bit
RAM address space is essentially infinite 264
bytes of RAM.

8
FEATURES OF MIPS 64-bit processors
9
Features

There are 64-bit virtual addresses
There is a 64-bit instruction pointer .
New RIP-relative data addressing mode.
Flat address space with single code, data, and
stack space.
Dual-Issue 64-bit superscalar architecture
High-performance 64-bit integer unit.
High-throughput fully pipelined 64 bit floating
point unit .
High performance SysAD interface.

10
Features (cont.)

32-bit or 64-bit multiplexed system
address/data bus for optimum price/performance.
Available with 32-bit or 64-bit external bus
interface.
Supports fractional clock ratios.
JTAG boundary scan.
Integrated primary caches
32 KB instruction and data are 2-way set
associative.
Virtually indexed physically tagged.

11
Features (cont.)

Write-back and write-through on per-page basis.
Index address modes (register register).
Pipeline restart on first double word for data
cache misses.
64-bit MIPS instruction set architecture
Floating point multiply-add instruction
increases performance in signal processing and
Graphics applications.
Conditional moves to reduce branch frequency.

12
INSTRUCTION SET FOR MIPS 64-bit processors
13
MIPS 64-bit processorsInstructions
14
BLOCK DIAGRAM FOR MIPS 64-bit processors
15
Block Diagram
It supports four floating-point
multiply-add/subtract instructions which allow
two separate floating-point computations to be
performed with one instruction. The four
instructions are 1. Multiply-add (MADD) 2.
Multiply-subtract (MSUB) 3. Negative Multiply-add
(NMADD) 4. Negative Multiply-Subtract (NMSUB)
16
Detailed Explanation (For Block Diagram)
Index 1 ) Large On-chip Caches 2) Dual Entry
TLB 3) Write Buffer 4) Pipelining 5) Dual-Issue
Mechanism 6) Dedicated Integer and FP ALUs 7)
Separate FP Execution Units 8) Scaleable for
Multiple Processors 9) Secondary Cache
Support 10) Multiple Cache Sizes 11) Simultaneous
Access 12) Flexible Clocking Mechanism 13)
On-chip Clock Multiplication Circuitry
17
Large on- chip Caches (Detailed explanation-
Block diagram)

MIPS 64 bit processor contains separate 32 kB
data and instruction caches.
Each cache is 2-way set associative, which helps
to increase the hit rate over a direct-mapped
implementation
Cache lines may be classified as write-through
or write-back on a per-page basis.
Both caches are virtually indexed and physically
tagged.
a) A virtually indexed cache allows the cache
access to begin as soon as the virtual address is
generated, as opposed to waiting for the virtual
to physical translation. The cache is accessed at
the same time as the address translation is
performed. The physical address is then compared
against the corresponding instruction or data
cache tag. If the compare is valid, the data
which has been retrieved from the cache is used.
If the compare is not valid, meaning that the
address requested does not reside in the cache,
the data is not used and a cache miss is
generated.

18
Large on- chip Caches (cont.) (Detailed
explanation- Block diagram)

b) While in Physically tagged data cache
allows for coherency between the primary and
secondary caches in a system.
Having large primary caches allows more of the
application to be executed on-chip, reducing
accesses to slower secondary cache and main
memory. This in turn reduces bus utilization and
allows the application to run faster since fewer
off-chip accesses are required.

19
Dual Entry TLB (Detailed explanation- Block
diagram)

The TLB of the MIPS 64 bit processor contains
48 dual entries. This implementation is
equivalent to a 96-entry TLB
Each virtual page number entry equates to two
physical frame numbers one even and one odd.
The lower bit of the Virtual Page Number is
used to determine whether the even or odd PFN
will be used.
The TLB is fully-associative.

20
Write Buffer (Detailed explanation- Block
diagram)

Writes to external memory
The write buffer holds up to four 64-bit
address and data pairs, or one cache line to be
written out.
Since data cache writebacks are typically
performed on a line basis, an entire line can be
written to the buffer, allowing the CPU to resume
normal execution.
Without a write buffer, the CPU would have to
write a single 64-bit doubleword, then wait until
the memory operation completes, before writing
another.

21
Write Buffer (cont.) (Detailed explanation-
Block diagram)

The write buffer allows the CPU to write data
into the buffer without accessing the system bus.
For uncached write cycles, the write buffer can
significantly increase performance by allowing
the pipelining of multiple writes.
With cacheable write cycles, the buffer allows
the CPU to write data to the buffer and
immediately begin processing the next write data.
Without the buffer, the CPU would output the
write data, then be forced to wait until the
uncached write operation has completed before
processing the next write.

22
Pipelined Writes (Detailed explanation- Block
diagram)

Write cycles can be performed back-to-back
without any dead clocks between cycles.
In the original R4000 architecture there is a
two clock delay between the generation of
back-to-back addresses. This results in two dead
clocks between back-to-back cycles.
The pipelined write protocol also uses the write
buffer to allow pipelining of write cycles.
In the MIPS 64 bit processor, performance is
significantly increased by eliminating the two
null cycles between each write cycle.

23
Pipelining (Detailed explanation- Block diagram)

A pipeline is divided into
Fetch
Arithmetic operation
Memory access
Write back

A non-pipelined execution
Pipelined execution
24
Pipelining (cont.)

In the example shown in Figure , each stage
takes one processor clock cycle to complete.
Thus it takes four clock cycles (ignoring
delays or stalls) for the instruction to
complete. In this example, the execution rate of
the pipeline is one instruction every four clock
cycles.
Conversely, because only a single execution can
be fetched before completion, only one stage is
active at any time.

25
Parallel Pipelining

Instead of waiting for an instruction to be
completed before the next instruction can be
fetched , a new instruction is fetched each clock
cycle.
There are four stages to the pipeline so the
four instructions can be executed simultaneously,
one at each stage of the pipeline.
Instructions in Figure are executed at a rate
four times that of the pipeline shown in the
previous figure.

26
SuperPipeline

Figure below shows a superpipelined
architecture.
Each stage is designed to take only a fraction
of an external clock cyclein this case, half a
clock.
Therefore more than one instruction can be
completed each cycle.

27
SuperScalar Pipeline

A superscalar architecture also allows more
than one instruction to be completed each clock
cycle.

28
How Pipelining Works

The processor fetches and decodes four
instructions per cycle and then appends them to
one of the three instruction queue.
Each queue determines the execution order
based on the availability of the required FUs.
Though initially fetched and decoded in order,
processor to have up to 32 instructions in
various stages of execution.

29
How Pipelining Works (cont.)

Initially, Instructions proceed through the
instruction fetch pipeline which consist of
fetch, decode, and issue stages
in the fetch stage. Four instructions are
fetched and aligned.
in the decode stage, the instructions are
decoded, register renaming as performed, and
branch instructions are predicted
in the issue stage (first half), the
instructions are written to one of three 16-entry
instructions queue, the availability of the
operands is also determined.
(second half is on the next slide)

30
How Pipelining Works (cont.)

Depending on the type, the instruction
proceeds to one of the five instruction
pipelines.
There are two integer and two floating-point
pipelines, and one load/store execution pipeline.
Each of these pipelines begins when a queue
issue and instruction and continue as follows
in the issue stage (second half ), the
processor reads operands from the register files,
the execution begins and takes
a) one stages in the case of integer pipelines
b) two stages in the case of the load/store
pipeline
c) three stages in the case of floating-point
pipeline

31
Floating point Co-processor

Performance is gained on floating-point codes
by allowing the integer unit to execute the
necessary loads and stores of floating-point
values. As well as index register updates and
branching.
The issue logic allows the dual of the integer
instruction and a floating-point instruction.

32
Dual Issue Mechanism (Detailed explanation-
Block diagram)

The dual-issue mechanism implemented in 64 bit
MIPS processor allows a floating-point ALU
instruction to be issued simultaneously with any
other instruction type.
Whenever a floating-point ALU instruction is
fetched with any non- FP-ALU instruction, both
instructions can be issued in the same cycle.
Load and store instructions in one pipeline
usually provide enough data bandwidth to permit a
new instruction to be issued every cycle for a
fix period.
Well structured code can take full advantage of
this pipeline structure.

33
Dedicated Integer FP ALU (Detailed
explanation- Block diagram)

Separate Integer and FP ALUs allow
instructions of both types to be performed
simultaneously.
Integer instructions are not stalled while
long latency floating-point operations are being
executed.
Use Running CAD-type applications as both
fixed-point and floating-point math calculations.

34
Scalable for Multiple processor (Detailed
explanation- Block diagram)

The 64 bit MIPS processor incorporates 8
external signals.
These signals allow for arbitration and data
coherency between processors.
Therefore, Symmetric multiprocessing systems
implementing the full Modified Exclusive Shared
Invalid cache consistency protocol in both
primary and secondary caches, as well as other
styles of multiprocessing will be supported.

35
Separate FP Execution Units (Detailed
explanation- Block diagram)

In addition to the dual-issue mechanism, the 64
bit MIPS processor also contains separate
acceleration hardware for most floating-point ALU
instructions.
This allows long-latency operations such as
divide and square-root to be performed in a
dedicated unit, thereby allowing other
shorter-latency operations such as MADD and
subtract to be overlapped while the divide or
square-root operation is in progress.

36
Secondary Cache Support (Detailed explanation-
Block diagram)

The 64 bit MIPS processor contains a dedicated
secondary cache interface.
These signals provide an efficient interface
between the processor, the secondary cache, and
the secondary cache tag RAM.
All AM interface signals such as data and chip
enables, output enable, address match, cache
valid, line index, and word index are provided by
the processor.
The secondary cache also supports multiple
cache sizes and both the write-through and
write-back data transfer protocols.
Data transfers to the secondary cache share the
64-bit system bus.

37
Multiple Cache Sizes (Detailed explanation-
Block diagram)

The secondary cache can be configured as 512
kB, 1Mbyte, or 2 Mbyte, allowing large
applications to run within the secondary cache,
reducing the number of accesses to slower main
memory.
The secondary cache is accessed through the
system bus.
Uncached bus cycles are not evaluated by the
secondary cache control logic as they travel to
the external agent.
Uncached operations such as video screen updates
can be passed directly to the system
logic responsible for routing the data to the
screen without any delays from the secondary
cache logic.

38
Simultaneous Access (Detailed explanation- Block
diagram)

To maximize data throughput, the main memory
accesses can be initiated while the secondary
cache tag is being compared.
If the requested address is found to be in the
secondary cache, the memory access is aborted
if the address is not found in the secondary
cache, then main memory access can be initiated
and the data can be retrieved more quickly.

39
Flexible Clocking Mechanism (Detailed
explanation- Block diagram)

The clocking mechanism in the 64 bit MIPS
processor offers a number of pipeline frequencies
based on the frequency of the input clock.
Single External Clock Signal
A single clock signal is used for the system
interface, as opposed to three. The processor
eliminates the Rclock, Tclock, and MasterOut
clock signals that existed in the previous
processors.
Having only one clock simplifies system
design, as well as reducing the circuit
complexity of the internal clock mechanism.

40
On Chip Clock Multiplication Circuitry (Detailed
explanation- Block diagram)

The 64 bit processor includes on-chip clock
frequency multiplication circuitry to support
200-MHz internal operation from an external
50-MHz clock.
The processor has the option of operating
internally at 2, 3, or 4 times the frequency of
the external clock.
Maximum bus speed of the system interface is
100 MHz.

41
PROS CONS
42
Advantages

It can handle more memory and larger files.
64-bit architecture will allow systems to address
up to 1 terabyte (1000GB) of memory
64-bit machines also offer faster I/O speeds to
things like hard disk drives and video cards.
These features can greatly increase system
performance.

43
Disadvantages

The same data occupies more space in memory. This
increases the memory requirements of a given
process and can create problems for efficient
processor cache utilization.
64-bit systems sometimes lack equivalents to
software that is written for 32-bit
architectures. The most severe problem is
incompatible device drivers. Although most
software can run in a 32-bit compatibility mode,
it is usually impossible to run a driver in that
mode.

44
References