Title: RISC Architecture and Super Computer
1RISC Architecture and Super Computer
CS147 Lecture 20
- Prof. Sin-Min Lee
- Department of Computer Science
- San Jose State University
2(No Transcript)
3(No Transcript)
4(No Transcript)
5(No Transcript)
6(No Transcript)
7(No Transcript)
8(No Transcript)
9(No Transcript)
10(No Transcript)
11(No Transcript)
12(No Transcript)
13(No Transcript)
14(No Transcript)
15(No Transcript)
16(No Transcript)
17(No Transcript)
18The Basis for RISC
- Use of simple instructions
- One of their key realizations was that a sequence
of simple instructions produces the same results
as a sequence of complex instructions, but can be
implemented with a simpler (and faster) hardware
design. Reduced Instruction Set Computers---RISC
machines---were the result.
19Addressing modes
- Limited number of addressing modes
- The effective address is computed in a single
clock cycle.
20Instruction Pipeline
- Similar to a manufacturing assembly line
- Fetch an instruction
- Decode the instruction
- Execute the instruction
- Store results
- Each stage processes simultaneously (after
initial latency) - Execute one instruction per clock cycle
21Pipeline Stages
- Some processors use 3, 4, or 5 stages
22(No Transcript)
23(No Transcript)
24RISC characteristics
- Simple instruction set.
- In a RISC machine, the instruction set contains
simple, basic instructions, from which more
complex instructions can be composed. - Same length instructions.
25RISC characteristics
- Each instruction is the same length, so that it
may be fetched in a single operation. - 1 machine-cycle instructions.
- Most instructions complete in one machine
cycle, which allows the processor to handle
several instructions at the same time. This
pipelining is a key technique used to speed up
RISC machines.
26Instructions Pipelines
- It is to prepare the next instruction while the
current instruction is still executing. - A Three states RISC pipelines is
- Fetch instruction
- Decode and select registers
- Execute the instruction
Clock Stage 1 2 3 4 5 6 7
1 i1 i2 i3 i4 i5 i6 i7
2 - i1 i2 i3 i4 i5 i6
3 - - i1 i2 i3 i4 i5
27RISC vs. CISC
- RISC have fewer and simpler instructions,
therefore, they are less complex and easier to
design. Also, it allow higher clock speed than
CISC. However, When we compiled high-level
language. RISC CPU need more instructions than
CISC CPU. - CISC are complex but it doesnt necessarily
increase the cost. CISC processors are backward
compactable.
28Why RISC is better The 80/20 rule Analysis of
the instruction mix generated by CISC compilers,
shows that more than 80 of the instructions
generated and executed used only 20 of an
instruction set. It was an obvious conclusion
that if this 20 of instruction was speeded up,
the performance benefits would be far greater.
Further analysis shows that these instructions
tend to perform the simpler operations and use
only the simpler addressing modes. For the CISC
machine, all the effort invested in processor
design to provide complex instructions and
thereby reduce the compiler workload was being
wasted. .
29- Less cost Since only the simpler instructions
are needed, the processor hardware required to
implement them could be reduced in complexity.
Therefor it should be possible to design a more
performance processor with less cost. - Good performance With a simpler instruction set,
it should possible for a processor to execute
its instruction in a single clock cycle. Higher
performance can be achieved.
30Pipelining A key RISC technique RISC designers
are concerned primarily with creating the
fastest chip possible, and so they use a number
of techniques, including pipelining.
Pipelining is a design technique where the
computer's hardware processes more than one
instruction at a time, and doesn't wait for one
instruction to complete before starting the
next.
31(No Transcript)
32(No Transcript)
33(No Transcript)
34The advantages of RISC Implementing a processor
with a simplified instruction set design
provides several advantages over implementing a
comparable CISC design (1) Speed. Since a
simplified instruction set allows for a
pipelined, superscalar design RISC processors
often achieve 2 to 4 times the performance of
CISC processors using comparable semiconductor
technology and the same clock rates. (2) Simpler
hardware. Because the instruction set of a RISC
processor is so simple, it uses up much less
chip space extra functions, such as memory
management units or floating point arithmetic
units, can also be placed on the same chip.
Smaller chips allow a semconductor manufacturer
to place more parts on a single silicon wafer,
which can lower the per-chip cost dramatically.
(3) Shorter design cycle. Since RISC processors
are simpler than corresponding CISC processors,
they can be designed more quickly, and can take
advantage of other technological developments
sooner than corresponding CISC designs, leading
to greater leaps in performance between
generations.
35Early RISC Machines IBM 801 1980
120 instructions No microcode
32 bit instructions MSI technology
Berkeley RISC Coined RISC and CISC
Promoted architecture and
implementation innovations as RISC
Single VLSI chip implementation Stanford
MIPS Concentrated on compiler
technology to improve system performance
36IBM 801 Put in hardware what
Could not be moved to compile time
Could not be efficiently implemented in
executable code by a compiler Could
be implemented as random logic Architecture
32 32 bit registers
Separate data and instruction caches
Two stage pipeline, decode-operand fetch-execute,
shift-set conditions-write Delayed
branches, Branch with execute Compilers
No intent on letting end users program in
assembly
37Berkeley RISC Unlike IBM 801 No
heavy reliance on compiler technology
Single chip implementation Argues that
RISC is the best way to use scarce silicon area
Influential because Introduced
RISC and CISC terms First single chip
RISC processor Introduced several
innovations at once Great marketing
job
38Current RISC RISC -gt SPARC MIPS -gt
MIPS R2-4000 IBM 801 -gt IBM RT -gt IBM
RS/6000 HP-PA RISC ARM M88000
PowerPC i860 I960
39(No Transcript)
40(No Transcript)
41(No Transcript)
42(No Transcript)
43(No Transcript)
44(No Transcript)
45(No Transcript)
46(No Transcript)
47(No Transcript)
48Instruction Pipeline
- An instruction pipeline is very similar to a
manufacturing assembly line. Imagine an assembly
line partitioned into four stages - 1st stage receives some parts, performs its
assembly task, and passes the results to the
second stage - 2nd stage takes the partially assembled product
from the first stage, performs its task, and
passes its work to the third stage - 3rd stage does its work, passing the results to
the last stage, which completes the task and
outputs its results.
49- As the first piece moves from the first stage to
the second stage, a new set of parts for a new
piece enters the first stage. Ultimately, every
stage processes a piece simultaneously. This is
how time is saved. Each product requires the same
amount of time to be processed (actually slightly
more, to account for the transfers between
stages), but products are manufactured more
quickly because several are being created at the
same time.
50An instruction pipeline processes an instruction
the way the assembly line processes a product.
- 1st stage fetches the instruction from
memory. - 2nd stage decodes the instruction and fetches
any required operands. - 3rd stage executes the instruction,
- 4th stage stores the result.
51Consider a nonpipelined machine with 6 execution
stages of lengths 50 ns, 50 ns, 60 ns, 60 ns, 50
ns, and 50 ns. - Find the instruction latency
on this machine. - How much time does it
take to execute 100 instructions?
Instruction latency 505060605050 320 ns
Time to execute 100 instructions 100320
32000 ns
52Suppose we introduce pipelining on this machine.
Assume that when introducing pipelining, the
clock skew adds 5ns of overhead to each execution
stage. - What is the instruction latency
on the pipelined machine? - How much time
does it take to execute 100 instructions?
Solution Remember that in the pipelined
implementation, the length of the pipe stages
must all be the same, i.e., the speed of the
slowest stage plus overhead. With 5ns overhead it
comes to
53The length of pipelined stage MAX(lengths of
unpipelined stages) overhead 60 5 65 ns
Instruction latency 6x65 ns 390nsTime to
execute 100 instructions 6561 65199
390 6435 6825 ns
54Instructions Pipelines
- It is to prepare the next instruction while the
current instruction is still executing. - A Three states RISC pipelines is
- Fetch instruction
- Decode and select registers
- Execute the instruction
Clock Stage 1 2 3 4 5 6 7
1 i1 i2 i3 i4 i5 i6 i7
2 - i1 i2 i3 i4 i5 i6
3 - - i1 i2 i3 i4 i5
55What is the speedup obtained from pipelining?
Solution Speedup is the ratio of the average
instruction time without pipelining to the
average instruction time with pipelining.
Average instruction time not pipelined 320 ns
Average instruction time pipelined 65 ns
Speedup 320 / 65 4.92
56- Each instruction is the same length, so that it
may be fetched in a single operation. - 1 machine-cycle instructions.
- Most instructions complete in one machine
cycle, which allows the processor to handle
several instructions at the same time. This
pipelining is a key technique used to speed up
RISC machines.
57(No Transcript)
58- This is one possible configuration of an RISC
pipeline, the pipeline implemented in the SPARC
MB86900 CPU. The IBM 801, the first RISC
computer, also uses a four-stage instruction
pipeline. Other processors, such as the RISC II,
use only three stages they combine the execute
and store result operations in to a single stage.
59The MIPS processor uses a five-stage pipeline it
decodes the instruction and selects the operand
registers in separate stages. These three
configurations are shown in the following figure.
60- Note that each stage has a register that latches
its data at the end of the stage to synchronize
data flow between stages. The flow of
instructions through each pipeline is shown in
the following Figure.
61(No Transcript)
62A Single Pipelined Control Unit Offers Several
Advantage
- The primary advantage is the reduced hardware
requirements of the pipeline. - A second advantage of instruction pipelines is
the reduced complexity of the memory interface.
63- Many video game systems like Sony Play Station
and Nintendo use small (66MHZ in PS1) RISC
processors. These machines are Single Purpose
machines and always run the same types of
programs, so small RISC processors give excellent
performance results on machines like these. - Pocket PCs like the Palm Pilot and Compaqs
Ipaq series also use small RISC processors.
Again, a machine like this is basically single
purpose. Yes, you can do lot of things with
them, but often you use a calendar, MP3 player,
and maybe a word processor.
64So, why dont I have a RISC processor at home?
(Continued)
- RISC based PC processors are still quite a bit
more expensive than their CISC counterparts. - When you write code for a RISC based machine,
you are writing code native to that particular
processor. Compatibility become an extreme issue
Another RISC processor using the same OS wont
be able to run software that you coded on the
previous machine. - The rather bright fellows at INTEL have come up
with a solution for you. The current processor
you own (provided that it is a x486 or higher) is
a CRISC processor.
65CRISC I shouldnt have to tell you what this
stands for
- Intel realized that while the x86 CISC set is
very large there are a few instructions that are
quite common and only do one thing (ex. JMP,
MOV, INC. etc.) - Intel decided to take those common instructions,
adjust them to be the same size and then
hardwired them into the CPUs core so they could
be executed in a RISC like fashion. - Yes, your Pentium III processor at home will
behave like a RISC processor, sometimes. This
helps gain more efficiency from the CPU while
remaining backwards compatible
66Why Use Pipelining?
- Pipelining allows you to start the process of
executing one instruction before the previous one
has completed - Even if there are delays in any one stage of the
process for one instruction, it is still more
efficient than non-pipelined processors - Pipelining is introduced with the 486 processor
67Review of 6- Stage execution process
- FETCH Instructions are fetched from a
MICROCODE ROM (CISC) - DECODE Instructions are decoded into simple
code that the CPU understands (often called
Micro-ops) - ISSUE/SCHEDULE Once instructions have been
decoded, they are placed into a pool and then
issued to a unit (Integer, FPU, MMX) for
execution - EXECUTE The instruction is executed here
- RETIRE Results are analyzed and put back into
their proper order - WRITE BACK The results of the instructions are
written to memory (committed to code)
68(No Transcript)
69(No Transcript)
70Super Scalar
- Put simply, a super scalar processor has two or
more integer execution units that run in parallel
(they can execute instructions simultaneously) - The Pentium Processor is the first INTEL super
scalar processor - The scheduling unit can issue instructions
simultaneously to different units to be executed
at the same time
71Data Flow
72Performance Improvement
- The speedup is the ratio of the time needed to
process n instruction using a non-pipelined
control unit to the time needed using a pipelined
control unit - Sn n T1 / (n k -1) Tk
73Pipeline Problems
- Memory access
- Fetch an instruction in one clock cycle
- Include cache memory
- Branch statements
- The instruction that are in pipeline should not
be there
74Register Windowing
- More than 100 registers, not always accessible
- Global registers are always accessible
- The remaining registers are windowed, accessible
at specific times
75SPARC Processor Register Windowing
76Keeping Track
- A window point register contains the value of the
window that is currently active - A window mask register contains 1 bit per window
and denotes which windows contain valid data.
77Subroutine Calls
- Register windows provide greatest benefit during
subroutine calls - During the calling process, the register window
is moved down one position. - CPU can pass parameters to the subroutine via the
registers that overlap - Same register can be used to return results to
the calling routine.
78Example
79Example (cont)
80RISC Advantages
- RISC have fewer and simpler instructions.
- Their control units are less complex and easier
to design - Run at higher clock frequencies
- Reduced amount of space needed on the processor
chip -gt more space for additional registers - Easier to incorporate parallelism
- Compilers are less complex
81CISC Advantages
- New complex processors incorporate the design of
the previous designs. - Backward compatibility with other processors in
their series.
82(No Transcript)
83(No Transcript)
84(No Transcript)
85(No Transcript)