Lecture 4 Instruction Set Architecture

About This Presentation

Title:

Lecture 4 Instruction Set Architecture

Description:

Lecture 4 Instruction Set Architecture Instruction Set Architecture 1950s to 1960s: Computer Architecture Course Computer Arithmetic 1970 to mid 1980s: Computer ... – PowerPoint PPT presentation

Number of Views:216

Avg rating:3.0/5.0

Slides: 100

Provided by: 6649732

Category:

more less

Transcript and Presenter's Notes

Title: Lecture 4 Instruction Set Architecture

1
Lecture 4Instruction Set Architecture
2
Instruction Set Architecture

1950s to 1960s Computer Architecture Course
Computer Arithmetic
1970 to mid 1980s Computer Architecture Course
Instruction Set Design, especially ISA
appropriate for compilers
1990s Computer Architecture Course Design of
CPU, memory system, I/O system, Multiprocessors

3
Languages of Computers

Machine Language
Programs consist of machine instructions
Directly executable without preprocessing
Direct manipulation of machine registers
Efficient in view of machine resource utilization
Difficult to program
Assembly language
Improved version of machine language with
emphasis on user-friendliness
Symbolic machine language(symbols for operations
and addresses)
Assembler is needed to translate into a machine
language program
High-Level Language
Programs consist of statements, each of which can
be translated into several machine language
instructions
Need a compiler to translate into a machine
language program
Relatively easy to program compare to ML or AL
Hardware resource utilization may be inefficient

4
Semantic Gap Between ML and HLL

As Hardware cost goes down, Software cost goes up

Shortage of programmers
Unreliable Software gt Unreliable Computers
Response Keep the programming cost down
Develop powerful, complex user-friendly HLL
HLL programmers are easy to train
Greater Semantic Gap between HLL and Machine
Language
Execution inefficiency
Software complexity
Compiler complexity
To offset the semantic gap
Large instruction set
Variety of addressing modes
Hardware/Firmware implementation of HLL
primitives

5
Instruction Set

Boundary between Designers(architects) and
programmers
For designers Specification of the function of
CPU
For Programmers A pool of functions from which
they choose to use in the program

One would expect that human language should
directly reflect the characteristics of human
intellectual capabilities that language should be
a direct mirror of mind in ways which other
systems of knowledge and belief cannot. - Noam
Chomsky

Instruction Set
Language of a machine
Characterizes the machines capability and
behavior
Performance Issues
Memory Bandwidth is used 1/2 for Instructions and
1/2 for Data
For efficient utilization of MB, instruction
representation must as compact as possible whilst
still being compatible with data
von Neumann Bottleneck exists in MB

6
Memory Bandwidth Issue

Memory Bandwidth is used by CPU and I/O

Memory Bandwidth given to CPU is used for
Instruction Fetches and Operand Fetches or
Operand Stores
Consider an AC-machine ADD X, or LDA X

7
Machine Language

Machine Language
Vocabulary
Operations
Addressing Modes for operands addresses and the
next instruction address
Syntax
Methods of representing operation(OP-code),
operands, addresses in an instruction
Instruction format
Encoding of Instruction fields
Grammar
Rules of using instructions to make a program

8
Components of an Instruction

Operation Code(OP-code)
Format specifier
Long / Short
Field definition
Operation
Types of operands
Operand Address(es)
Operand itself
Address themselves(including abbreviated)
Address modification specification
Automatic indexing
Relative address
Sequencing

9
Instruction Set and Computer Architecture

Computer Architectures are classified into three
classes according to the Register Structures for
operands storage

Stack Computer Architecture

AC Computer Architecture

General Purpose Register Computer Architecture

10
Stack Computer Architecture
Instruction Operation
PUSH X if F1, then S overflow
POP X if E1, then empty S
if E1, then empty S
Unary Instr.
(Shift Left)
if E1, then empty S
Binary Instr.
(ADD)
if SP(n-1),
11
Characteristics of the Stack Architecture

Instruction length is short
No need to represent the address(es) of
operand(s) in functional instructions
Instruction execution time is fast
Operand(s) access is fast because they are in the
stack(register)
Operand(s) must be stored in the stack before
operating on them
Inconvenient to prepare data in the stack
Frequent use of PUSH and POP instructions to
prepare data in the stack - memory access

12
AC Computer Architecture
(CPA)
(ADD X)
Transfer Instruction

Characteristics
- Instruction execution time of binary
instructions are slow
One of the operands must be read from memory
- Instruction length is longer than in the stack
architecture
One of the operands memory address must be
specified in the instruction
although AC(a data register) can be implied
- Frequency of LDA/STA instructions is high
There is only one data register

13
GPR Computer Architecture
Unary Instruction
Binary Instruction
Transfer Instruction
Characteristics - Instruction length is short
because register addresses are used
for operands - Instruction execution time is
fast because all the operands are in the
registers - Frequency of using LD/ST
instructions depends on the number of
registers - Opportunities of storing the results
of operations in GPR is high because there
are many registers
14
Computer Architecture?

. . . the attributes of a computing system as
seen by the programmer, i.e. the conceptual
structure and functional behavior, as distinct
from the organization of the data flows and
controls, the logic design, and the physical
implementation.
Amdahl, Blaaw, and Brooks, 1964

SOFTWARE
15
Towards Evaluation of ISA and Organization
16
Interface Design

A Good Interface
Lasts through many implementations (portability,
compatibility)
Is used in many different ways (generality)
Provides convenient functionality to higher
levels
Permits an efficient implementation at lower
levels

17
Evolution of Instruction Sets
Single Accumulator (EDSAC 1950)
18
Evolution of Instruction Sets

Major advances in computer architecture are
typically associated with landmark instruction
set designs
Ex Stack(B1700) vs GPR (System S/360)
Design decisions must take into account
technology(component)
machine organization
programming languages
compiler technology
operating systems
And they in turn influence these

19
Design Space of ISA

Five Primary Dimensions
Number of explicit operands ( 0, 1, 2, 3 )
Operand Storage Where besides memory?
Effective Address How is memory location
specified?
Type and Size of Operands byte, int, float,
vector, . . .
How is it specified?
Operations add, sub, mul, . . .
How is it specified?
Other Aspects
Successor How is it specified?
Conditions How are they determined?
Encoding Fixed or variable? Wide?
Parallelism

20
Number of Explicit Operands

To optimize the memory bandwidth required by
instructions(for fetching from Memory), the
number of explicitly specified operands in the
instruction needs to be reduced
2 operands(GPR machine)
2 source operands(1 of the source operands is
destroyed after execution to store the result)
1 operand(AC machine)
1 of the operands is implied to a specific
hardware register called Accumulator(AC)(result
of the execution is also stored in this register)
0 operand(Stack machine)
Both of the operands and the result are implied
to a stack

21
Operand Storage

Storage
Memory
- Long memory addressing
- Need to represent the address with a few bits
Relative addressing with displacement
Page/Segment addressing
Register
- General purpose register
Short register addressing
- AC
Stack(register)
- Does not need for addresses

22
Address Space and Storage Space

Address Space
Consists of addresses that programmers can use
Storage Space
Consists of physical storage locations
For a simple low cost machine, the Address Space
and the Storage Space are identical
Programmers program with the actual storage
addresses
Modern computers provide the storage systems with
Independent Address and Storage Spaces
An Effective Address(EA) needs to be obtained
from the Address used in the program to access
the operand from the memory
Usually the Address Space is much larger than the
Storage Space
Virtual Storage System

23
Effective Address

Address and Physical Storage Location are two
different concepts.
Addresses of Operands are represented or
implied in the instruction.
Operands address needs to be mapped into an
Effective Address of the physical
storage location

Basic Addressing Modes(A or R in instructions)
Immediate opdA of M refer limited
value Direct EAA simple limited addr
space Indirect EAMA large addr space
multiple M refer Register EAR no M refer
limited addr space R Indirect EA MR large addr
space extra M refer Displacement EA
AR flexibility complexity Stack opdSTOP n
o M refer limited applications
24
Specification of Type and Size of Operand

Specification of the Type of the operand
Usually different op-codes for different types of
operands
Specification of the Size of the operand
op-code represents the resolution of the operand
address
bit, byte, half word(upper/lower half) , word,
...
Length of operands
Implicit
Variable length
Specified explicitly in the instruction
Specified by a designated register
Specified by the delimiter marks in the operand
reserved-bit delimiter(field or word mark)
reserved-bit configuration(record or group mark)

25
(No Transcript)
26
Operation

Specification
Encoded to reduce the instruction length reason
Types
Minimal Instruction Set
Complex Instruction Set vs RISC

27
Four Types of Operations

Functional
ADD, AND, CPA, CPC, ROL, CLA, CLC, INC,
Transfer
LDA, STA(LD, ST),
Control
JMP, JNA, JZA, JZC(SMA, SZA, SZC),
Input/Output
INP, OUT,

28
Minimal Instruction Set
29
Why NOT Use a Minimal Instruction Set?
Inefficient Program Size(M bandwidth) -
Large IC and CPI Programming difficulty
30
Instruction Set DesignOperations to Include in
the Instruction Set

Trade-off 3 Es(Elegance, Efficiency,
Environment)
Elegance
Completeness(Even Bn instruction is complete)
Symmetry AC lt f(AC, MX) and MX lt
f(AC, MX)
Flexibility, Generality
Efficiency
Space
Bit budget
Efficient specification of address
Fewer instructions require fewer bits to encode
OP-code
Frequency of use arguments
Bandwidth arguments(NOP simply waste memory
bandwidth)
Ratio of overheads non-functional to
functional
Environment
Multiprogramming(Relocation, Protection, Sharing)
Code generation by compilers(Compiler favors only
a little portion of instruction set)

31
ISA Metrics

Aesthetics
Orthogonal
No special registers, few special cases, all
operand modes available with any data type or
instruction type
Completeness
Support for a wide range of operations and target
applications
Regularity
No overloading for the meanings of instruction
fields
Streamlined
Resource needs easily determined
Ease of compilation (programming?)
Ease of implementation
Scalability

32
Powerful Instruction
Overhead for Execution(O)
(E)

Rich, Powerful Instruction
Instruction with longer Execution Time(E) to
balance the overhead penalty(O)
Instruction which has a large E/O

33
Powerful Instructions

Extended Arithmetic Function
Multiply, divide, Trigonometric Functions, etc
Automatic Indexing
BCT R1, addr (R1 lt- R1 - 1, if R1 0 then PC
lt- addr)
BXLE R1, R3, addr (R1 lt- R1 R3,
if R3odd, R1 lt R3,
PC lt- addr if R3even, R1 lt
R31, PC lt- addr)
Subroutine Linkage
JMS X (MX lt- PC, PC lt- X1)

34
Powerful Instructions

Process State Exchange(Context Switch)
Instructions required in the multiprogramming
environments

Otherwise LD R1, addr LD R2,
addr1 LD R5, addr4
XJ(Exchange Jump of CDC 6000 series)
35
Basic ISA ClassesType of Internal Storage
36
Stack Machines

Instruction set
Arithmetic operators(, -, , /, . . .)
push A, pop A

37
The Case Against Stacks

Performance is derived from the existence of
several fast registers, not from the way they are
organized
Data does not always surface when needed
Constants, repeated operands, common
sub-expressions
so TOP and Swap instructions are required
Code density is about equal to that of GPR
instruction sets
Registers have short addresses
Keep things in registers and reuse them
Slightly simpler to write a poor compiler, but
not an optimizing compiler

38
(No Transcript)
39
GPR Machines
GPR(General Purpose Register)

Faster than memory
Easier for a compiler to use
Used to hold variables, intermediate operands
the memory traffic reduces
the code density improves
How many registers?
depends on how they are used by the compiler

40
How Many Registers in RF
6 algorithms from CALGO(ACM) written in 4
languages ALGOL,BASIC, BLISS,FORTRAN
We need to try to keep the live registers in the
RF
41
GPR Machines

Maximum number of operands(O)
two or three operands
Number of memory addresses(M)
0,1,2,3

42
GPR Machines
Type Register-register (0,3) Register-memory (1,
2) Memory-memory (3,3)
Advantages Simple,
fixed-length instr. encoding. Simple
code generation model Data can be accessed
without loading first. Instruction format tends
to be easy to encode and yields good
density. Program becomes most compact. No waste
of registers for temporaries.
Disadvantages Higher instruction count. Some
instructions are short and bit encoding may be
wasteful. A source operand is destroyed. Clocks
per instruction varies by operand
location. Large variation in instruction sizes
and in work per instruction. Memory accesses
create memory bottleneck.
43
R-R vs RM
ABC
RR Instructions LD R1,A LD R2,B LD R3,C ADD R4
,R1,R2 ADD R5,R4,R3 RM instructions LD R1,A AD
D R1,B ADD R1,C
RM instructions reduce IC
44
What About Actual Programs

Consider a GPR machine with a large register
file.
- Highly probable that the intermediate data can
be found in a register
- Thus, LD/ST instruction will be used less
frequently
- However, frequency of using LD/ST instructions
in the computers that use RM instructions will
reduced further

45
VAX-11
Variable format, 2- and 3-address instructions

32-bit word size, 16 GPR (4 reserved)
Rich set of addressing modes (apply to any
operand)
Rich set of operations
bit field, stack, call, case, loop, string,
poly, system
Rich set of data types (B, W, L, Q, O, F, D,
G, H)
Condition codes

46
Kinds of Addressing Modes
Addressing Mode value in is the
operand

Register direct Ri
Immediate (literal) v
Direct (absolute) Mv
Register indirect MRi
BaseDisplacement MRi v
BaseIndex MRi Rj
Scaled Index MRi Rjd v, eg. d8
Autoincrement MRi1
Autodecrement MRi - 1
Memory Indirect M MRi
Indirection Chains

47
Memory Addressing Modes (VAX)
48
Operand Address bitsDisplacement Values

This value is related to the operand address
field when the address is represented by the
displacement from the base address
Wide distribution
The vast majority --- positive
A majority of the large displacements -negative

49
Operand Address bits Immediate Addressing Mode
Percentage of operations that use immediates
50
Operand Address bits Immediate Addressing Mode
51
Operations in the Instr. Set
Operator type Examples

Add, Subtract, Data transfer
Arithmetic and logical
Load, Store, Move, Control
Branch, Jump, Procedure Call, Return,
Trap System
Operating System Call, VMM instructions Floatin
g Point
Floating Point Add Decimal
Decimal Add, Decimal-to-Character
Conversion String
String Move, String Compare, String
Search Graphics
Pixel operations, Compress/Decompress op.
52
Operations in the Instr. Set
Integer average ( total executed) 22 20 16
12 8 6 5 4 1 1 96
Rank 1 2 3 4 5 6 7 8 9 10 Total
80x86 instructions load conditional
branch compare store add and sub move
reg-reg call return
53
Control Flow Instructions
54
(No Transcript)
55
RISC
56
Instruction Execution CharacteristicsType of
Operations
Relative Dynamic Frequencies of statements in HLL
programs

What type of statements is most frequent?
Assignment statements dominate
Functional instructions and Transfer
instructions
Movements of data must be made simple, thus fast
Conditional Statements(if and loop together)
Instructions with Control function
Sequence control mechanism is important

57
Instruction Execution CharacteristicsTime
Consumed by Statements
Machine instruction weighted Average No.
of machine Instr. / Statements x Frequency of
Occurrences Memory reference weighted
Average No. of memory references / Statement x
Frequency of Occurrences Most time consuming
statement is procedure CALL/RETURN
58
Instruction Execution CharacteristicsType of
Operands

Majority of references to scalar
80 are local to a procedure
References to arrays/structure require index or
pointer

Locations of operands(Average per instruction)
0.5 operands in memory
1.4 operands in registers

59
Instruction Execution CharacteristicsProcedure
Calls

Two most significant aspects in implementing
procedure Call/Returns
Number of parameters
Depth of nesting
Statistics on Number of Parameters
98 of dynamically called procedures were passed
fewer than 6 parameters
92 of them used fewer than 6 local scalar
variables

60
Multiple Register Sets
Multiple register sets - Assume that we have
several sets of registers that each set can be
used by each different procedure - Saves some
time in procedure CALL/RETURN simply by changing
the R set pointer value
61
Instruction Execution CharacteristicsDepth of
Procedure Nesting
Procedure Nesting and Register Set Window
t
Depth
Shifting register set window need to save the
information in one register
set in the memory so that a register set can
be used by the new procedure
Statistics Window depth of 8 will need to shift
only on less than 1 of calls and
returns
62
RISC Philosophy(1)Make the Most Frequent
Statements Execute Fast
Most frequent statements are Assignment Type of
Statements and each of them are translated by the
compiler into a set of Functional Instructions
and/or Transfer Instruction. Thus Functional and
Transfer Instructions need to be made to execute
fast.
Instruction Cycle of Functional Instruction or
Transfer Instruction
63
Assignment Statements

To make the Instruction Fetch fast
Short OP-code part Small number of instructions
in the instruction set
Short Operand Address part Make the operands in
the registers instead of M
To make the Instruction Preparation fast
Fixed length instruction
Fixed format instruction
Simple addressing modes
To make the Operand Fetch fast
Make the operands available from registers
instead of memory
Needs a large register file
To make the Instruction Execution fast
Multiple register set Overlapping MRS
Instruction execution pipeline

64
RISC Philosophy(2)Make the Most Time-Consuming
Statements Execute Fast

Methods of passing Parameters
Through memory
Parameters are stored in the memory locations
which are commonly accessible by both calling
and called procedures
Execution of CALL and RETURN instructions are
very slow due to the memory accesses, especially
when there are many parameters to pass
Through registers
Parameters are stored in the registers in CPU
Calling procedure needs to save the registers,
which are not used for passing parameters, in
the memory. This results in a lot of memory
accesses and makes the execution times of these
instructions slow.

65
CISC and RISC

RISC
A limited and simple instruction set
A large number of GPR(Register File)
An emphasis on optimizing the instruction
pipeline

66
Large Register File
Quick access to operands is desirable -
Assignment Statements rely on Functional and
Transfer Instructions - Functional
Instructions heavily rely on registers -
Frequency of Transfer Instructions depends on the
number of registers in the register file
If the number of registers is small, it needs a
strategy to keep the most frequently accessed
operands in registers to minimize Register-Memory
traffic - Software approach Maximize
register usage by compiler (Requires
sophisticated program analysis) - Hardware
approach More registers in the register file
67
Register Window

Fact
Statistically, most operand references are to
local scalars - 80
Local variables to a procedure cannot be accessed
by other procedure(s)
Problem
Local changes with each procedure CALL/RETURN
CALL/RETURN occurs frequently
Parameters need to be passed around
Observations
Statistically, a few parameters(lt6) and local
variables(lt6)
Statistically, depth of procedure activation
fluctuates within relatively narrow range(lt8)
Solution
Multiple small sets of registers
Each set is assigned to a different procedures
Windows for adjacent procedures overlap to allow
parameter passing

68
Multiple Register Set
Each Register Set is assigned to a different
procedure - Size of a Register Set is equal to
the size of a window - Parameters need to be
copied in the called/calling procedures Register
Set, however, there is no need to copy all the
registers from the switched off register
set - Require register move instructions
69
Overlapping Register Window
When multiple of Register Sets are implemented in
a large Register File, we call a Register Set as
a Register Window. Multiple register sets still
require to copy the parameter values between
register sets. Overlapping Register Window -
Portions of register windows overlap for passing
parameters - At any time only one window is
visible - No need for moving information for
parameter passing
How about global variables?
70
Global Variables

Global Variables are commonly accessible by all
the procedures
Assign to memory locations by compiler
Straight forward but inefficient for the
frequently accessed global variables because of
frequent memory accesses
Set aside a set of Global Variable registers
Available to all procedures
Unified register numbering system to simplify
instruction format
e.g. R0 R7 Global

R8 R13 Current window

71
Linear Organization of Register Windows
72
(No Transcript)
73
(No Transcript)
74
(No Transcript)
75
Code Size

Smaller programs
Program takes less memory space
Smaller program improves performance
Fewer instructions
Fewer bytes to fetch
In paging environment, occupy in fewer pages and
reduces page faults
CISC
Smaller number of instructions in the
program(program may be shorter but not
necessarily smaller space)

76
Example
CISC
Memory Traffic Instruction 56
bits Data 32 x 3 96 bits Total MB
used 56 96 152 bits
RISC LD Rb B
LD Rc
C ADD Ra Rb
Rc ST Ra
A
Memory Traffic Instruction 112
bits Data 96 bits Total MB used 200 bits
77
Characteristic of RISC(1, 2)

(1) 1 Instruction per cycle(memory cycle)
Machine cycle IF IP Time to fetch the
operands from registers
Perform operation Store the result in
a register
RISC instruction ltgt CISC micro-instruction

gt No need to
microprogram(Hardwired control)
(2) Register-to-Register operation
With only simple Load and Store operations for
accessing memory(Load/Store Arch.)
Simplifies the instruction set, and control unit

78
Characteristic of RISC(3, 4)

(3) Simple Addressing Modes - Shorten EA
generation time
Almost all instructions use register addressing
Relative addressing using PC, BAR, and Index
address
Other complex modes may be synthesized by software

(4) Simple Instruction Format - Shorten
instruction Decoding Time
Usually one format
Fixed length/align on word boundary
Fixed field length

79
Characteristic of RISC(5)

(5) Pipelining (We will learn this later in
detail)
At this time, you just need to know that
- Instruction execution hardware can be made of
a few inter- connected independent
sub-modules, called pipeline STAGEs

- An instruction execution progresses at each
pipeline stage in sequence - When an
instruction completes its execution at the i-th
stage, the next instruction commences
its execution at the i-th stage - Thus, in the
ideal situation, throughput increases nearly n
times, where n is the number of
pipeline stages - Branch instruction makes the
pipelined execution inefficient
80
Pipelined Execution
1 instruction execution
I0
Execution of a Sequence of Instructions
I0
S3
At 4t I0
N instructions complete at (n3)t When n
is large it becomes nt Thus, 1 instruction
in every t
I1
S3
At 5t I1
I2
At 6t I2
I3
At 7t I3
I4
At 8t I4
81
A "Typical" RISC

32-bit fixed format instruction (3 formats)
32 32-bit GPR (R0 contains zero, DP takes a pair)
3-address, R-R functional instruction
Single address mode for load/store base
displacement
no indirection
Simple branch conditions
Delayed branch

see SPARC, MIPS, MC88100, AMD2900, i960, i860
PARisc, DEC Alpha, Clipper, CDC
6600, CDC 7600, Cray-1, Cray-2, Cray-3
82
Branch Displacement
83
Implementation of Conditional Branch Instructions
Evaluating branch conditions
How condition is tested Test special bits set
by ALU operations, possibly under program
control. Test arbitrary registers set by the
result of a comparison. Compare is part of the
branch. Often compare is limited to subset.
Name Condition Code(CC) Condition
Register Compare and Branch
Advantages Sometimes condition is set for free,
if not 2 instrs for a branch. Simple. 2
instrs for a branch 1 instr. rather than 2 for
a branch
Disadvantages CC is an extra state. CCs
constrain the ordering of instrs since they pass
info from one instr to a branch. Uses up a
register. May be too much work per instruction.
84
Putting It All Together DLX Architecture

Read Section 2.8 ---- MUST
DLX emphasizes
A simple load-store instruction set
Design for pipelining efficiency
A fixed instr. set encoding
Efficiency as a compiler target

85
Example MIPS
86
The Different Goals for VAX and MIPS

VAX - simple compilers and code density
powerful addressing modes
powerful instructions
efficient instruction encoding
few registers
MIPS - high performance via pipelining, ease of
HW implementation, compatibility with highly
optimizing compiler
simple instruction
simple addressing modes
fixed-length instruction formats
a large number of registers

87
VAX vs. MIPS
88
Fallacies and Pitfalls

Pitfall Designing a high-level instruction set
feature specifically oriented to
supporting a high-level language structure.
Fallacy There is such a thing as a typical
program.
Fallacy An architecture with flaws cannot be
successful.
80x86 supports Segmentation while other support
page
Extended AC for integer, while others use
GPR
Stack for FP operations, while others
abandoned stack
Fallacy You can design a flawless architecture.
All architecture design involves trade-off made
in the context of a set of HW and SW technologies.

89
Most Popular ISA of All TimeIntel 80x86

1971 Intel invents microprocessor 4004/8008,
8080 in 1975
1975 Gordon Moore realized one more chance for
new ISA before ISA locked in for decades
Hired CS people in Oregon
Werent ready in 1977 (CS people did 432 in 1980)
Started crash effort for 16-bit microcomputer
1978 8086 dedicated registers, segmented
address, 16- bit
8088 8-bit external bus version of 8086 added
as after thought

90
Most Popular ISA of All TimeIntel 80x86

1980 IBM selects 8088 as basis for IBM PC
1980 8087 floating point coprocessor adds
60 instructions using hybrid stack/register
scheme
1982 80286 24-bit address, protection, memory
mapping
1985 80386 32-bit address, 32-bit GP registers,
paging
1989 80486 Pentium in 1992 faster MP few
instructions

91
80x86 Addressing/Protection
92
80x86 Instruction Format

8086 in blue 80386 extensions in red

93
80x86 Instruction Encoding Address Specifier
Field Mod, Reg, R/M

r w0 w1 r/m mod0 mod1
mod2 mod3
16b 32b 16b 32b 16b 32b 16b 32b
0 AL AX EAX 0 addrBXSI EAX same same same same
same
1 CL CX ECX 1 addrBXDI ECX addr addr addr
addr as
2 DL DX EDX 2 addrBPSI EDX mod0 mod0 mod0 mo
d0 reg
3 BL BX EBX 3 addrBPSI EBX d8 d8 d16 d32
field
4 AH SP ESP 4 addrSI (sib) SId8 (sib)d8 SId8
(sib)d32
5 CH BP EBP 5 addrDI d32 DId8 EBPd8 DId16
EBPd32
6 DH SI ESI 6 addrd16 ESI BPd8 ESId8 BPd16
ESId32
7 BH DI EDI 7 addrBX EDI BXd8 EDId8 BXd16
EDId32

Address Specifier Reg3 bits, R/M3 bits, Mod2
bits
94
80x86 Instruction EncodingSc/Index/Base field
Base Scaled Index Mode Used when mod
0,1,2 in 32-bit mode and r/m 4 2-bit
Scale Field 3-bit Index Field 3-bit Base Field

0 EAX EAX
1 ECX ECX
2 EDX EDX
3 EBX EBX
4 no index ESP
5 EBP if mod0, d32 if modltgt0, EBP
6 ESI ESI
7 EDI EDI

95
80x86 Addressing Mode Usage for 32-bit Mode
Register indirect 10 10 6 2 7 Base 8-bit
disp 46 43 32 4 31 Base 32-bit
disp 2 0 24 10 9 Indexed 1 0 1 0 1
Based indexed 8b disp 0 0 4 0 1 Based
indexed 32b disp 0 0 0 0 0 Base Scaled
Indexed 12 31 9 0 13 Base Scaled Index
8b disp 2 1 2 0 1 Base Scaled Index 32b
disp 6 2 2 33 11 32-bit Direct 19 12 20
51 26
96
80x86 Length Distribution
97
Instruction Counts 80x86 vs. DLX
gcc 3,771,327,742 3,892,063,460 1.03
espresso 2,216,423,413
2,801,294,286 1.26 spice 15,257,026,309
16,965,928,788 1.11 nasa7 15,603,040,963
6,118,740,321 0.39
98
Intel Compiler vs. Compilers YOU Can Buy

66 MHz Pentium Comparison SpecInt92 SpecFP92
Intel Internal Optimizing Compiler 64.6 59.7
Best 486 Compiler (June 1993) 57.6 39.9
Typical 486 Compiler in 1990,
when Intel
started project 41.0 32.5
Integer Intel 1.1X faster, FP 1.5X faster
.
486 Comparison SpecInt92 SpecFP92
Intel Internal Optimizing Compiler 35.5 17.5
Best 486 Compiler (June 1993) 32.2 16.0
Typical 486 Compiler in 1990,
when Intel started project 23.0 12.8
Integer Intel 1.1X faster, FP 1.1X faster

99
Intel Summary

Archeology history of instruction design in a
single product
Address size 16 bit vs. 32-bit
Protection Segmentation vs. paged
Temp. storage accumulator vs. stack vs.
registers
Golden Handcuffsof binary compatibility affect
design 20 years later, as Moore predicted
Not too difficult to make faster, as Intel has
shown
HP/Intel announcement of common future
instruction set by 2000 means end of 80x86???
Beauty is in the eye of the beholder
At 50M/year sold, it is a beautiful business