RISC Architecture and Super Computer - PowerPoint PPT Presentation

1 / 85

About This Presentation

Title:

RISC Architecture and Super Computer

Description:

RISC Architecture and Super Computer Prof. Sin-Min Lee Department of Computer Science San Jose State University The Basis for RISC Use of simple instructions One of ... – PowerPoint PPT presentation

Number of Views:192

Avg rating:3.0/5.0

Slides: 86

Provided by: Lee149

Category:

more less

Transcript and Presenter's Notes

Title: RISC Architecture and Super Computer

1
RISC Architecture and Super Computer
CS147 Lecture 20

Prof. Sin-Min Lee
Department of Computer Science
San Jose State University

2
(No Transcript)
3
(No Transcript)
4
(No Transcript)
5
(No Transcript)
6
(No Transcript)
7
(No Transcript)
8
(No Transcript)
9
(No Transcript)
10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
The Basis for RISC

Use of simple instructions
One of their key realizations was that a sequence
of simple instructions produces the same results
as a sequence of complex instructions, but can be
implemented with a simpler (and faster) hardware
design. Reduced Instruction Set Computers---RISC
machines---were the result.

19
Addressing modes

Limited number of addressing modes
The effective address is computed in a single
clock cycle.

20
Instruction Pipeline

Similar to a manufacturing assembly line
Fetch an instruction
Decode the instruction
Execute the instruction
Store results
Each stage processes simultaneously (after
initial latency)
Execute one instruction per clock cycle

21
Pipeline Stages

Some processors use 3, 4, or 5 stages

22
(No Transcript)
23
(No Transcript)
24
RISC characteristics

Simple instruction set.
In a RISC machine, the instruction set contains
simple, basic instructions, from which more
complex instructions can be composed.
Same length instructions.

25
RISC characteristics

Each instruction is the same length, so that it
may be fetched in a single operation.
1 machine-cycle instructions.
Most instructions complete in one machine
cycle, which allows the processor to handle
several instructions at the same time. This
pipelining is a key technique used to speed up
RISC machines.

26
Instructions Pipelines

It is to prepare the next instruction while the
current instruction is still executing.
A Three states RISC pipelines is
Fetch instruction
Decode and select registers
Execute the instruction

Clock Stage 1 2 3 4 5 6 7
1 i1 i2 i3 i4 i5 i6 i7
2 - i1 i2 i3 i4 i5 i6
3 - - i1 i2 i3 i4 i5
27
RISC vs. CISC

RISC have fewer and simpler instructions,
therefore, they are less complex and easier to
design. Also, it allow higher clock speed than
CISC. However, When we compiled high-level
language. RISC CPU need more instructions than
CISC CPU.
CISC are complex but it doesnt necessarily
increase the cost. CISC processors are backward
compactable.

28
Why RISC is better The 80/20 rule Analysis of
the instruction mix generated by CISC compilers,
shows that more than 80 of the instructions
generated and executed used only 20 of an
instruction set. It was an obvious conclusion
that if this 20 of instruction was speeded up,
the performance benefits would be far greater.
Further analysis shows that these instructions
tend to perform the simpler operations and use
only the simpler addressing modes. For the CISC
machine, all the effort invested in processor
design to provide complex instructions and
thereby reduce the compiler workload was being
wasted. .
29

Less cost Since only the simpler instructions
are needed, the processor hardware required to
implement them could be reduced in complexity.
Therefor it should be possible to design a more
performance processor with less cost.
Good performance With a simpler instruction set,
it should possible for a processor to execute
its instruction in a single clock cycle. Higher
performance can be achieved.

30
Pipelining A key RISC technique RISC designers
are concerned primarily with creating the
fastest chip possible, and so they use a number
of techniques, including pipelining.
Pipelining is a design technique where the
computer's hardware processes more than one
instruction at a time, and doesn't wait for one
instruction to complete before starting the
next.
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
The advantages of RISC Implementing a processor
with a simplified instruction set design
provides several advantages over implementing a
comparable CISC design (1) Speed. Since a
simplified instruction set allows for a
pipelined, superscalar design RISC processors
often achieve 2 to 4 times the performance of
CISC processors using comparable semiconductor
technology and the same clock rates. (2) Simpler
hardware. Because the instruction set of a RISC
processor is so simple, it uses up much less
chip space extra functions, such as memory
management units or floating point arithmetic
units, can also be placed on the same chip.
Smaller chips allow a semconductor manufacturer
to place more parts on a single silicon wafer,
which can lower the per-chip cost dramatically.
(3) Shorter design cycle. Since RISC processors
are simpler than corresponding CISC processors,
they can be designed more quickly, and can take
advantage of other technological developments
sooner than corresponding CISC designs, leading
to greater leaps in performance between
generations.
35
Early RISC Machines IBM 801 1980
120 instructions No microcode
32 bit instructions MSI technology
Berkeley RISC Coined RISC and CISC
Promoted architecture and
implementation innovations as RISC
Single VLSI chip implementation Stanford
MIPS Concentrated on compiler
technology to improve system performance
36
IBM 801 Put in hardware what
Could not be moved to compile time
Could not be efficiently implemented in
executable code by a compiler Could
be implemented as random logic Architecture
32 32 bit registers
Separate data and instruction caches
Two stage pipeline, decode-operand fetch-execute,
shift-set conditions-write Delayed
branches, Branch with execute Compilers
No intent on letting end users program in
assembly
37
Berkeley RISC Unlike IBM 801 No
heavy reliance on compiler technology
Single chip implementation Argues that
RISC is the best way to use scarce silicon area
Influential because Introduced
RISC and CISC terms First single chip
RISC processor Introduced several
innovations at once Great marketing
job
38
Current RISC RISC -gt SPARC MIPS -gt
MIPS R2-4000 IBM 801 -gt IBM RT -gt IBM
RS/6000 HP-PA RISC ARM M88000
PowerPC i860 I960
39
(No Transcript)
40
(No Transcript)
41
(No Transcript)
42
(No Transcript)
43
(No Transcript)
44
(No Transcript)
45
(No Transcript)
46
(No Transcript)
47
(No Transcript)
48
Instruction Pipeline

An instruction pipeline is very similar to a
manufacturing assembly line. Imagine an assembly
line partitioned into four stages
1st stage receives some parts, performs its
assembly task, and passes the results to the
second stage
2nd stage takes the partially assembled product
from the first stage, performs its task, and
passes its work to the third stage
3rd stage does its work, passing the results to
the last stage, which completes the task and
outputs its results.

As the first piece moves from the first stage to
the second stage, a new set of parts for a new
piece enters the first stage. Ultimately, every
stage processes a piece simultaneously. This is
how time is saved. Each product requires the same
amount of time to be processed (actually slightly
more, to account for the transfers between
stages), but products are manufactured more
quickly because several are being created at the
same time.

50
An instruction pipeline processes an instruction
the way the assembly line processes a product.

1st stage fetches the instruction from
memory.
2nd stage decodes the instruction and fetches
any required operands.
3rd stage executes the instruction,
4th stage stores the result.

51
Consider a nonpipelined machine with 6 execution
stages of lengths 50 ns, 50 ns, 60 ns, 60 ns, 50
ns, and 50 ns. - Find the instruction latency
on this machine.    - How much time does it
take to execute 100 instructions?
Instruction latency 505060605050 320 ns
Time to execute 100 instructions 100320
32000 ns
52
Suppose we introduce pipelining on this machine.
Assume that when introducing pipelining, the
clock skew adds 5ns of overhead to each execution
stage.       - What is the instruction latency
on the pipelined machine?       - How much time
does it take to execute 100 instructions?
Solution Remember that in the pipelined
implementation, the length of the pipe stages
must all be the same, i.e., the speed of the
slowest stage plus overhead. With 5ns overhead it
comes to
53
The length of pipelined stage MAX(lengths of
unpipelined stages) overhead 60 5 65 ns
Instruction latency 6x65 ns 390nsTime to
execute 100 instructions 6561 65199
390 6435 6825 ns
54
Instructions Pipelines

It is to prepare the next instruction while the
current instruction is still executing.
A Three states RISC pipelines is
Fetch instruction
Decode and select registers
Execute the instruction

Clock Stage 1 2 3 4 5 6 7
1 i1 i2 i3 i4 i5 i6 i7
2 - i1 i2 i3 i4 i5 i6
3 - - i1 i2 i3 i4 i5
55
What is the speedup obtained from pipelining?
Solution Speedup is the ratio of the average
instruction time without pipelining to the
average instruction time with pipelining.
Average instruction time not pipelined 320 ns
Average instruction time pipelined 65 ns
Speedup 320 / 65 4.92
56

Each instruction is the same length, so that it
may be fetched in a single operation.
1 machine-cycle instructions.
Most instructions complete in one machine
cycle, which allows the processor to handle
several instructions at the same time. This
pipelining is a key technique used to speed up
RISC machines.

57
(No Transcript)
58

This is one possible configuration of an RISC
pipeline, the pipeline implemented in the SPARC
MB86900 CPU. The IBM 801, the first RISC
computer, also uses a four-stage instruction
pipeline. Other processors, such as the RISC II,
use only three stages they combine the execute
and store result operations in to a single stage.

59
The MIPS processor uses a five-stage pipeline it
decodes the instruction and selects the operand
registers in separate stages. These three
configurations are shown in the following figure.
60

Note that each stage has a register that latches
its data at the end of the stage to synchronize
data flow between stages. The flow of
instructions through each pipeline is shown in
the following Figure.

61
(No Transcript)
62
A Single Pipelined Control Unit Offers Several
Advantage

The primary advantage is the reduced hardware
requirements of the pipeline.
A second advantage of instruction pipelines is
the reduced complexity of the memory interface.

Many video game systems like Sony Play Station
and Nintendo use small (66MHZ in PS1) RISC
processors. These machines are Single Purpose
machines and always run the same types of
programs, so small RISC processors give excellent
performance results on machines like these.
Pocket PCs like the Palm Pilot and Compaqs
Ipaq series also use small RISC processors.
Again, a machine like this is basically single
purpose. Yes, you can do lot of things with
them, but often you use a calendar, MP3 player,
and maybe a word processor.

64
So, why dont I have a RISC processor at home?
(Continued)

RISC based PC processors are still quite a bit
more expensive than their CISC counterparts.
When you write code for a RISC based machine,
you are writing code native to that particular
processor. Compatibility become an extreme issue
Another RISC processor using the same OS wont
be able to run software that you coded on the
previous machine.
The rather bright fellows at INTEL have come up
with a solution for you. The current processor
you own (provided that it is a x486 or higher) is
a CRISC processor.

65
CRISC I shouldnt have to tell you what this
stands for

Intel realized that while the x86 CISC set is
very large there are a few instructions that are
quite common and only do one thing (ex. JMP,
MOV, INC. etc.)
Intel decided to take those common instructions,
adjust them to be the same size and then
hardwired them into the CPUs core so they could
be executed in a RISC like fashion.
Yes, your Pentium III processor at home will
behave like a RISC processor, sometimes. This
helps gain more efficiency from the CPU while
remaining backwards compatible

66
Why Use Pipelining?

Pipelining allows you to start the process of
executing one instruction before the previous one
has completed
Even if there are delays in any one stage of the
process for one instruction, it is still more
efficient than non-pipelined processors
Pipelining is introduced with the 486 processor

67
Review of 6- Stage execution process

FETCH Instructions are fetched from a
MICROCODE ROM (CISC)
DECODE Instructions are decoded into simple
code that the CPU understands (often called
Micro-ops)
ISSUE/SCHEDULE Once instructions have been
decoded, they are placed into a pool and then
issued to a unit (Integer, FPU, MMX) for
execution
EXECUTE The instruction is executed here
RETIRE Results are analyzed and put back into
their proper order
WRITE BACK The results of the instructions are
written to memory (committed to code)

68
(No Transcript)
69
(No Transcript)
70
Super Scalar

Put simply, a super scalar processor has two or
more integer execution units that run in parallel
(they can execute instructions simultaneously)
The Pentium Processor is the first INTEL super
scalar processor
The scheduling unit can issue instructions
simultaneously to different units to be executed
at the same time

71
Data Flow
72
Performance Improvement

The speedup is the ratio of the time needed to
process n instruction using a non-pipelined
control unit to the time needed using a pipelined
control unit
Sn n T1 / (n k -1) Tk

73
Pipeline Problems

Memory access
Fetch an instruction in one clock cycle
Include cache memory
Branch statements
The instruction that are in pipeline should not
be there

74
Register Windowing

More than 100 registers, not always accessible
Global registers are always accessible
The remaining registers are windowed, accessible
at specific times

75
SPARC Processor Register Windowing
76
Keeping Track

A window point register contains the value of the
window that is currently active
A window mask register contains 1 bit per window
and denotes which windows contain valid data.

77
Subroutine Calls

Register windows provide greatest benefit during
subroutine calls
During the calling process, the register window
is moved down one position.
CPU can pass parameters to the subroutine via the
registers that overlap
Same register can be used to return results to
the calling routine.

78
Example
79
Example (cont)
80
RISC Advantages

RISC have fewer and simpler instructions.
Their control units are less complex and easier
to design
Run at higher clock frequencies
Reduced amount of space needed on the processor
chip -gt more space for additional registers
Easier to incorporate parallelism
Compilers are less complex

81
CISC Advantages