Structure of Computer Systems - PowerPoint PPT Presentation

About This Presentation

Title:

Structure of Computer Systems

Description:

Structure of Computer Systems Course 4 The Central Processing Unit - CPU CPU - Central Processing Unit Classic (idyllic) view Incorporates 2 of the 5 components ... – PowerPoint PPT presentation

Number of Views:33

Avg rating:3.0/5.0

Slides: 29

Provided by: sebes4

Category:

more less

Transcript and Presenter's Notes

Title: Structure of Computer Systems

1
Structure of Computer Systems

Course 4
The Central Processing Unit - CPU

2
CPU - Central Processing Unit

Classic (idyllic) view
Incorporates 2 of the 5 components of the von
Neumanns classical model
ALU
CU Control Unit
It is the brain (intelligent part) of a computer
Fetch (read) instruction, decode/interpret it,
read data, execute instruction and store the
result
Do its job in a synchronized and sequential way
one thing at a time

3
CPU - Central Processing Unit

Todays view
Contains all kind of computer components
Multiple CPUs
symmetric, asymmetric,
multiple cores,
multiple ALUs, specialized ALUs (e.g. floating
point, multimedia MMX, SSE2)
Memory multiple levels of cache memory (L0, L1,
L2, Trace cache)
Interfaces and Peripheral devices (in case of
microcontrollers and DSPs)
Serial channels
Parallel interfaces,
Timers, counters
Converters (ADC, DAC)
Network interfaces
Interrupt system
Bus controller(s) and arbiter(s)
Memory management units
Execute instructions in parallel and in a
speculative order
Intelligence may be distributed in memories and
interfaces as well
Where is that nice idyllic image ?

4
Starting with the beginning

A simple computer
Attributes sequential, one (accumulator)
register, one memory for instructions and data

Legend CG - clock generator PhG phase
generator PC program counter IR instruction
register Acc - accumulator
5
A simple computer

How does it work?
4 phases
IF instruction fetch read the instruction
into IR
Dec - Decode the instruction generate control
signals
PreEx - Prepare execution e.g. read the data
from memory
Exe Execute e.g. adding, subtraction

6
A simple computer

Example 1 ADD Acc, M100h
IF Sel0 gt Address PC IR_ld impuls gt IR
ADD 100
Dec Sel1 gtAddress IR_adr100 Inc1
increment PC
PreEx Op_sel code_add gt ALU is doing an
adding
Exe Acc_ld gt Acc Acc M100

7
A simple computer

Example 2 JMP 200h
IF Sel0 gt Address PC IR_ld impulse gt
IRJMP 200
Dec Inc 1 gt increment PC
PreEx PC_ld 1 gt PCIR_addr100
Exe
Example 3 SHR Acc
IF and Dec the same
PreEx
Exe Acc_shr 1 gt shift the accumulator one
position to the right

8
A simple computer

Homework try to implement
MOV Maddr, Acc
MOV Acc, Maddr
Conditional jump (e.g if Acc0, gt0, lt0)
MOV Acc, 0

9
A simple computer

Issues
Every instruction executed in a fixed (4) number
of steps
Too many for simple instructions
Too few for complex instructions (e.g. multiply)
Only one internal register hard to operate with
data
No Input and Output devices
Limited number of possible operations small
instruction set
Possible improvements
Variable number of phases -gt the phase generator
should depend on the instruction code
Multiple internal registers -gt 2 buses input
data output data
Front panel with 7segment LEDs and switches
Increase the number of instructions -gt more
complex Decoder and Command and Control Unit

10
A more sophisticated computer, but still simple
the MIPS architecture

Attributes
Sequential
32 internal registers of 16 bits
Instructions fixed length, variable content
Harvard memory architecture separate instruction
and data memory
An instruction is executed in 5 phases
IF instruction fetch
ID decode the instruction and prepare (read)
the data
Ex execute the instruction
M - operation with the memory
Wb write back store the result
Instruction types
R Register ex. ADD RS, RD,RT
I Immediate ex. ADDI RT,RS, constant
LW RT, offset(RS)
J Jump ex. JMP target

11
MIPS architecture

Instruction formats
Fixed length (4 bytes) but multiple content
R register type instructions
ltinstrgt rd, rs, rt
rd destination register
rs source register
rt target register
Ex add s1, s2, s3 s1s2s3

12
MIPS architecture Instruction formats

I immediate type instruction - with immediate
value (constant)
ltinstrgt rt, rs, IMM
rs source register
rt target register
Ex addi s1, s2, 55 s1s255
J jump type instructions
ltinstrgt LABEL
Ex j et1 jump

13
MIPS architecture

Address generation and instruction fetch

PC_MUX_Sel1
PC_ld
IR_ld
4
Op_code
MUX
Program Memory
PC
Address
IR
Instr. code
op_address
Add
0
MUX
const.
Jump address
PC_MUX_Sel2
PC PC4 - increment the PC PCJump_Address
absolute jump PCPC Jump_Address relative jump
14
MIPS architecture

Decode and data preparation

Exec cmds.
DEC
op_code
Mem. cmds.
WB cmds.
Instruction register
reg. 0
MUX
A (data)
reg. 1
reg. 2
IR
op1_ad
reg. 31
op2_ad
MUX
B (data)
Register Block
address
I (Immediate value)
15
MIPS architecture

Execute and memorize

Data out
16
MIPS architecture

Write back the result

17
MIPS architecture

The whole picture

Clk
Phase gen.
Clock gen.
Instr. dec
4
IR
PC
Instr. mem
Data Mem
Regs
Regs
ALU
0
18
Pipeline execution

What does it mean?
Work as an assembly line
idea General Motors around 1900
How to do it?
Specialized components (units) for every phase of
instruction execution
Memorize the partial results in temporary buffers
What can we achieve?
Higher execution speed at the same clock
frequency
CPI 1

19
Sequential v.s. Pipeline execution

Sequential execution CPI5
Pipeline execution CPI1 (in the ideal case)

T1 T2 T3 T4 T5 T6
T7 T8 T9 T10
i1
IF ID Ex M Wb
IF ID Ex M Wb
i2
i3
IF ID Ex M Wb
i4
IF ID Ex M Wb
i5
IF ID Ex M Wb
20
Superscalare and superpipeline architectures

Superscalar
Multiple pipelines
2 instructions are fetched every clock
CPI ½
Superpipeline
phases require only half clock period
CPI 1/2

T1 T2 T3 T4 T5 T6
instr. i IF ID Ex M Wb
instr. i1 IF ID Ex M Wb
instr. i2 IF ID Ex M
Wb
instr. i3 IF ID Ex M
Wb
T1 T2 T3 T4 T5 T6
instr. i IF ID Ex M Wb
instr. i1 IF ID Ex M Wb
instr. i2 IF ID Ex M
Wb
instr. i3 IF ID Ex M
Wb
21
Pipelined MIPS architecture
22
Pipeline architecture

There is no free meal!
Hazard cases
Data hazard
Data dependency between consecutive instructions
Control hazard
Jump/branch instructions change the normal
(sequential) order of instruction execution
Structural hazard
Instructions in different phases use the same
structural component (e.g. ALU, registers,
memory, bus, etc.)
Result reduce the speed and the efficiency of
the pipeline architecture

23
Hazard cases in pipeline architectures

Data hazard
Data hazard types
RAW - read after write
Occurs very often avoided through forwarding
(see Common data bus)
WAR write after read
It is rare in classic pipeline more often in
superscalar pipelines
WAW write after write
RAR not a hazard

24
Hazard cases in pipeline architectures

Data hazard (cont.)
Solutions
Detection and Stall phases
instruction with unsolved data dependency waits
in the instruction fetch stage until the data
is available
the next instructions are also stalled
Register renaming
multiple copies of a register (see alias
registers for Pentium Pro)
instructions with no logical dependency between
them can get different copies of the same
register
avoid artificial data dependency caused by the
limited number of internal registers
Forwarding (see Common data bus)
transfer a result in advance before it is written
in the final place (register or memory location)
Out-of-order execution
speculative execution (see Pentium Pro
architecture)

25
Hazard cases in pipeline architectures

Structural hazard
Solutions
Detection and Stall phases
Redundant functional units see Pentium
processors
Harvard memory organization separate code and
data memory see microcontrollers
Multiple buses see DSPs
Out-of-order execution

26
Hazard cases in pipeline architectures

Control hazard
Solutions
Stall phases
Branch prediction
Out-of-order execution

27
Pipeline architecture hazard cases

Solving hazard cases
Detect hazard cases and introduce stall phases
Rearrange instructions
re-arrange instructions in order to reduce the
dependences between consecutive instructions
Methods
Static scheduling made before program execution
optimization made by the compiler or user
Dynamic scheduling made during program
execution optimization made by the processor
out-of-order execution
Branch prediction techniques

28
Static v.s. dynamic scheduling

Static scheduling
The optimal order of instructions is established
by the compiler, based on information about the
structure of the pipeline
Advantages it is made once and benefit every
time the code is executed
Drawback compiler should know about the
structure of the hardware (e.g. pipeline stages,
phases of every instruction) compiler must be
changed when the processor version changes
Dynamic scheduling
The hardware has the capacity to reorder
instruction to avoid or reduce the effect of
hazard cases
Advantage the processor knows best its
structure optimization can be better connected
to the hardware some dependences are reviled on
at run-time
Drawbacks reordering decisions are made every
time the code is executed mode complex hardware
is needed