Chapter 2: Instruction Set Architecture

About This Presentation

Title:

Chapter 2: Instruction Set Architecture

Description:

Alpha. Chip can be configured to operate either way. DEC ... Cray T3E Alpha's are big endian. CA00g_F-2-15. Byte Ordering Example. union { unsigned char c[8] ... – PowerPoint PPT presentation

Number of Views:105

Avg rating:3.0/5.0

Slides: 87

Provided by: Rand229

Category:

more less

Transcript and Presenter's Notes

Title: Chapter 2: Instruction Set Architecture

1
Chapter 2 Instruction Set Architecture

Yirng-An Chen
Dept. of CIS
Computer Architecture
Fall, 2000

2
Computer ArchitectureHistorical Perspective

1950s to 1960s Computer Architecture Course
Computer Arithmetic
1970s to mid 1980s Computer Architecture
Course Instruction Set Design, especially ISA
appropriate for compilers
1990s Computer Architecture Course Design of
CPU, memory system, I/O system, Multiprocessors

3
Instruction Set Architecture (ISA)
software
instruction set
hardware
4
Interface Design

A good interface
Lasts through many implementations (portability,
compatability)
Is used in many differeny ways (generality)
Provides convenient functionality to higher
levels
Permits an efficient implementation at lower
levels

use
time
imp 1
Interface
use
imp 2
use
imp 3
5
Evolution of Instruction Sets
Single Accumulator (EDSAC 1950)
Accumulator Index Registers
(Manchester Mark I, IBM 700 series 1953)
Separation of Programming Model from
Implementation
High-level Language Based
Concept of a Family
(B5000 1963)
(IBM 360 1964)
General Purpose Register Machines
Complex Instruction Sets
Load/Store Architecture
(Vax, Intel 432 1977-80)
(CDC 6600, Cray 1 1963-76)
RISC
CISC
(Mips,Sparc,HP-PA,IBM RS6000,PowerPC 1987)
(Intel x86 1980-199x)
Mixed CISC RISC?
(IA-64. . .1999)
6
Basic Issues in Instruction Set Design

What operations and How many
Load/store/Increment/branch are sufficient to do
any computation, but not useful (programs too
long!!).
How (many) operands are specified?
Most operations are dyadic (e.g., A?BC) Some
are monadic (e.g., A? ?B).
How to encode them into instruction format?
Instructions should be multiples of Bytes.
Typical Instruction Set
32-bit word
Basic operand addresses are 32-bit long.
Basic operands (like integer) are 32-bit long.
In general, Instruction could refer 3 operands
(A?BC).
Challenge Encode operations in a small number of
bits.

7
What Must be Specified?

Instruction Format (encoding)
How is it decoded?
Location of operands and result
Where other than memory?
How many explicit operands?
How are memory operands located?
Data type and Size
Operations
What are supported?

8
Basic ISA Classes

Accumulator
1 Address add A (acc ? acc MemA).
Stack
0 address add (tos ? tos second of stack).
General Purpose Register
2 addresses add A, B EA(A)
?EA(A)EA(B)
3 addresses add A, B, C EA(A)
?EA(C)EA(B)
Load/Store (register-register)
ALU operations No memory reference.
3 addresses add R1, R2, R3 R1 ? R2 R3
load R1, R2
R1 ?MemR2
store R1, R2
MemR1 ? R2
Comparison Bytes per Instruction? Number of
Instructions? Cycles per instruction?

9
Comparison of ISA Classes

Code Sequence for C AB

Memory efficiency? Instruction access? Data
access?

10
Stack Machine

Instruction Set Push, Pop, , -, , /, etc.
Example AB - (ABC)
Push A
Push B
Push A
Push B
Push C
-
Drawbacks
Duplicate data accesses.
Not good for an optimizing compiler.

11
General Purpose Register

All machines use general purpose registers after
1975.
Advantages of registers
Registers are faster than memory.
Registers are easier for a compiler to use.
E.g. (AB) - (CD) - (EF) can do multiplication
in any order, but stack?
Registers can hold variables.
Memory traffic is reduced.
Code density improved (since register name with
fewer bits than memory address).

12
Examples of Register Usage

Typical ALU Instructions
MIPS add Rd, Rs, Rt ? (0,3)
80x86 ADD AL, SI ? (1,2)
VAX CMPB (R0), (R0) ? (2,2)

13
Pros and Cons

Register-Register (0,3)
Simple, fixed length instruction encoding.
Simple code-generation model.
Similar number of clocks to execute.
Higher instruction count.
Bit encoding may be wasteful.
Memory-memory (3,3)
Most compact.
Different Instruction size.
Memory access bottleneck.
Register-Memory (1,2)
Data access without loading first.
Easy to encode and yield good density.
One operand is destroyed.
Limited number of registers.

14
Byte Ordering

Idea
Bytes in long word numbered 0 to 3
Which is most (least) significant?
Can cause problems when exchanging binary data
between machines
Big Endian Byte 0 is most, 3 is least
IBM 360/370, Motorola 68K, Sparc.
Little Endian Byte 0 is least, 3 is most
Intel x86, VAX
Alpha
Chip can be configured to operate either way
DEC workstation are little endian
Cray T3E Alphas are big endian

15
Byte Ordering Example
union unsigned char c8
unsigned short s4 unsigned int i2
unsigned long l1 dw
16
Byte Ordering Example (Cont).
int j for (j 0 j lt 8 j) dw.cj 0xf0
j printf("Characters 0-7 0xx,0xx,0xx,0xx
,0xx,0xx,0xx,0xx\n", dw.c0, dw.c1,
dw.c2, dw.c3, dw.c4, dw.c5, dw.c6,
dw.c7) printf("Shorts 0-3
0xx,0xx,0xx,0xx\n", dw.s0, dw.s1,
dw.s2, dw.s3) printf("Ints 0-1
0xx,0xx\n", dw.i0, dw.i1) printf("Lon
g 0 0xlx\n", dw.l0)
17
Byte Ordering on Alpha
Little Endian
f0
f1
f2
f3
f4
f5
f6
f7
c3
c2
c1
c0
c7
c6
c5
c4
LSB
MSB
LSB
MSB
LSB
MSB
LSB
MSB
s1
s0
s3
s2
LSB
MSB
LSB
MSB
i0
i1
LSB
MSB
l0
Print
Output on Alpha
Characters 0-7 0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0
xf6,0xf7 Shorts 0-3 0xf1f0,0xf3f2,0xf5f4,
0xf7f6 Ints 0-1 0xf3f2f1f0,0xf7f6f5f4
Long 0 0xf7f6f5f4f3f2f1f0
18
Byte Ordering on x86
Little Endian
f0
f1
f2
f3
f4
f5
f6
f7
c3
c2
c1
c0
c7
c6
c5
c4
LSB
MSB
LSB
MSB
LSB
MSB
LSB
MSB
s1
s0
s3
s2
LSB
MSB
LSB
MSB
i0
i1
LSB
MSB
l0
Print
Output on Pentium
Characters 0-7 0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0
xf6,0xf7 Shorts 0-3 0xf1f0,0xf3f2,0xf5f4,
0xf7f6 Ints 0-1 0xf3f2f1f0,0xf7f6f5f4
Long 0 f3f2f1f0
19
Byte Ordering on Sun
Big Endian
f0
f1
f2
f3
f4
f5
f6
f7
c3
c2
c1
c0
c7
c6
c5
c4
LSB
MSB
LSB
MSB
LSB
MSB
LSB
MSB
s1
s0
s3
s2
MSB
LSB
MSB
LSB
i0
i1
MSB
LSB
l0
Print
Output on Sun
Characters 0-7 0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0
xf6,0xf7 Shorts 0-3 0xf0f1,0xf2f3,0xf4f5,
0xf6f7 Ints 0-1 0xf0f1f2f3,0xf4f5f6f7
Long 0 0xf0f1f2f3
20
Addressing Modes
21
Addressing Modes(Cont.)
Memory
22
Addressing Modes(Cont.)
Scaled Add R1, 100(R2) R3 RegsR1 ?
RegsR1Mem100 RegsR2RegsR3d
100
R2
R3
Operand
d
Registers
Memory
23
Addressing Mode Usage

3 Programs from SPEC89 on VAX
Others ? 0.

24
Displacement Address Size

Average of 5 programs from SPECint92 and
SPECfp92.
X-axis is log2 of displacement.
1 of addresses gt 16 bits.

Integer Average
FP Average
25
Immediate Addressing Mode

10 Programs from SPECInt92 and SPECfp92

26
Immediate Addressing Mode

50 to 60 fit within 8 bits
75 to 80 fit within 16 bits

gcc
spice
Tex
27
Addressing Mode Summary

Important data addressing modes
Displacement
Immediate
Register Indirect
Displacement size should be 12 to 16 bits.
Immediate size should be 8 to 16 bits.

28
Instruction Operations

Arithmetic and Logical
add, subtract, and , or, etc.
Data transfer
Load, Store, etc.
Control
Jump, branch, call, return, trap, etc.
Synchronization
Test Set.
String
string move, compare, search.

29
Top-9 x86 Instructions

Simple Instructions dominates instruction
frequency.

30
Methods of Testing Condition

Condition code Status bits are set by ALU
operations.
Add r1, r2, r3 and bz label
Extra status bits
Condition register
cmp r1, r2, r3 and bgt r1, label
Simple, but use up a register
Compare and branch
bgt r1, r2, label
One instruction
Too much work per instruction

31
Conditional Branch Distance

Short displacement fields often sufficient for
branch

FP Average
Integer Average
32
Conditional Branch Addressing

PC-relative, since most branches from current PC
address
At least 8 bits.
Compare Equal/Not Equal most important for
integer programs.

33
Data Types and Usage

Byte, half word (16 bits), word (32 bits), double
word (64 bits).
Arithmetic
Decimal 4bit per digit.
Integers 2s complement
Floating-point IEEE standard-- single, double,
extended precision.

34
Instruction Format

Fixed
Operation, address specifier 1, address specifier
2, address specifier 3.
MIPS, SPARC, Power PC.
Variable
Operation of operands, address specifier1, ,
specifier n.
VAX
Hybrid
Intel x86
operation, address specifier, address field.
Operation, address specifier 1, address specifier
2, address field.
Operation, address field, address specifier 1,
address specifier 2.
Summary
If code size is most important, use variable
format.
If performance is most important, use fixed
format.

35
Summary ISA

Use general purpose registers with a load-store
architecture.
Support these addressing modes displacement,
immediate, register indirect.
Support these simple instructions load, store,
add, subtract, move register, shift, compare
equal, compare not equal, branch, jump, call,
return.
Support these data size 8-,16-,32-bit integer,
IEEE FP standard.
Provide at least 16 general purpose registers
plus separate FP registers and aim for a minimal
instruction set.

36
Sparc Processors

Reduced Instruction Set Computer (RISC)
Simple instructions with regular formats
Key Idea make the common case fast!
infrequent operations can be synthesized using
multiple instructions
Assumes compiler will do optimizations
e.g., scalar optimization, register allocation,
scheduling, etc.
ISA designed for compilers, not assembly language
programmers
A 2nd Generation RISC Instruction Set
Architecture
Designed for superscalar processors (i.e. gt1 inst
per cycle)
avoids some of the pitfalls of earlier RISC ISAs
(e.g., delay slots)
Reference books
Sparc Architecture, Assembly Language
programming, C by Richard P. Paul
The Sparc Architecture Manual by David L.
Weaver/Tom Germond

37
Translation Process
38
Abstract Machines
1) loops 2) conditionals 3) goto 4) Proc. call 5)
Proc. return
ASM
1) byte 2) 4-byte word 3) 8-byte word 4)
contiguous word allocation 5) address of initial
byte
3) branch/jump 4) jump link
39
Basic Data Types

Integral
Stored operated on in general registers
Signed vs. unsigned depends on instructions used
Sparc Bytes C
byte 1 unsigned char
half word 2 unsigned short
word 4 unsigned int
Floating Point
Stored operated on in floating point registers
Special instructions for four different formats
(only 2 we care about)
UltraSparc Bytes C
S_floating 4 float
T_floating 8 double

40
Sprac Register Convention

General Purpose Registers 32 total (32- or
64-bits), Store integers and pointers
Usage Conventions Established as part of
architecture
Used by all compilers, programs, and libraries
and Assured object code compatibility

l0 l1 l2 l3 l4 l5 l6 l7 i0 i1 i2 i3 i4 i5 i6,fp i
7
g0 g1 g2 g3 g4 g5 g6 g7 o0 o1 o2 o3 o4 o5 o6,sp o
7
Always zero
Local data
Global data
Local data or arguments to called routine
Integer arguments
Frame pointer
Stack pointer
Call address
41
Floating Point Unit

Implemented as Separate Unit
Hardware to add, multiply, and divide
Floating point data registers
Various control status registers
Floating Point Formats
S_Floating (C float) 32 bits
T_Floating (C double) 64 bits
Floating Point Data Registers
32 registers, each 4 bytes
Labeled f0 to f31

f0
f1
f2
f3
f4
f5
f6
f7
f8
f9
f10
f11
f12
f13
f14
f15
f16
f17
f18
f19
f20
f21
f22
f23
f24
f25
f26
f27
f28
f29
f30
f31
42
Instruction Formats
43
Program Representations

C Code

Compiled to Assembly
int test2(int x,int y) return (xxx) -
(yyy)
.align 4 .global test1 .type test1,function .
proc 04 test1 sll o0,1,g2 !g2o02 add
g2,o0,g2 !g2o0g2 sll o1,1,o0
!o0o12 add o0,o1,o0 !o0o0o1 retl
!return sub g2,o0,o0 !o0g2-o0
Obtain with command gcc -O -S code.c Produces
file code.s
Place result in o0
44
Prog. Representation (Cont.)
Object
Disassembled
0x0 lttest1gt sll o0, 1, g2 0x4 lttest14gt add
g2, o0, g2 0x8 lttest18gt sll o1, 1, o0 0xc
lttest112gt add o0, o1, o0 0x10
lttest116gt retl 0x14 lttest120gt sub g2, o0,
o0
0x0 lttest1gt 0x852a2001 0x84008008
0x912a6001 0x90020009 0x81c3e008 0x90208008

Run gdb on object code
x/6 0x0
Print 6 words in hexadecimal starting at address
0x0
disassemble test1
Print disassembled version of procedure

45
Alternate Disassembly

Sparc program dis
/usr/ccs/bin/dis file.o
Prints disassembled version of object code file
Code not yet linked
Addresses of procedures and global data not yet
resolved

0 85 2a 20 01 sll o0, 1, g 4 84 00
80 08 add g2, o0, g2 8 91 2a 60 01
sll o1, 1, o0 c 90 02 00 09 add
o0, o1, o0 10 81 c3 e0 08 jmp o7
8 14 90 20 80 08 sub g2, o0, o0
46
Pointer Examples

C Code

Annotated Assembly
int iaddp(int xp,int yp) int x xp
int y yp return x y
iaddp ld o0,g2 !g2xp ld o1,o0
!o0yp retl add g2,o0,o0 !o0xy
void incr(int sum, int val) int old
sum int new oldval sum new
incr ld o0,g2 !g2sum add
g2,o1,g2 !g2oldval retl st g2,o0
!store g2 to sum
47
Array Indexing

C Code

Annotated Assembly
long int arefl(long int a, long int
i) return ai
arefl sll o1,2,o1 !o1 i 4 retl ld
o0o1,o0 !Load ai to o0
long int garray10 long int gref(long int
i) return garrayi
.common garray,40,4 grefl sethi
hi(garray),g2 or g2,lo(garray),g2 sll
o0,2,o0 !o0 i 4 retl ld
o0g2,o0 !Load garrayi to o0
48
Structures Pointers
struct rec int i int a3 int p
Annotated Assembly

C Code

void set_i(struct rec r, int val) r-gti
val
set_i retl st o1,o0 !r-gti val
find_a sll o1,2,o1 !1idx4 add
o1,o0,o1 !1r retl ld o14,o0
!0r-gtaidx
int find_a(struct rec r,int idx) return
r-gtaidx
void set_p(struct rec r,int ptr) r-gtp
ptr
set_p retl st o1,o016 !r-gtpptr
49
Branches

Unconditional Branches
ba label
Conditional Branches
cmp Ra, Rb
bCond label
Reseult of Ra relative to Rb is set to
flagsZ(Is Zero),N(Is Negative),V(Is too large,
overflow)
Cond branch condition, relative to zero
be Equal Z1
bne Not Equal Z0
bl Less Than (NV)1
ble Less Than or Equal (ZNV)1
bg Greater Than (ZNV)0
bge Greater Than or Equal (NV)0

50
Conditional Branches
Annotated Assembly
C Code
condbr mov o0,g2 !g2x cmp g2,o1 !compare
x and y ble .LL2 !branch if xlty mov
0,o0 !v0 sll g2,1,o0 !o0x2 add
o0,g2,o0 !o0xxx add o0,o1,o0
!o0xxxy .LL2 retl nop
int condbr(int x,int y) int v 0 if (x gt
y) v xxxy return v
51
Do-While Loop Example
C Code
Annotated Assembly
int fact(int x) int result 1 do
result x-- while (x gt 1) return
result
fact mov 1,g2 !result1 smul
g2,o0,g2 !resultx .LL6 add
o0,-1,o0 !x-- cmp o0,1 !if(xgt1) then bg,a
.LL6 !continue looping smul
g2,o0,g2 !resultx retl !return mov
g2,o0 !copy result to o0
52
While Loop Example
C Code
Annotated Assembly
int ifact(int x) int result 1 while (x gt
1) result x-- return result
ifact cmp o0,1 !if(xlt1) then ble
.LL9 !branch to LL9 mov 1,g2 !result1 smul
g2,o0,g2 !resultx .LL12 add
o0,-1,o0 !x-- cmp o0,1 ! If xgt1 then bg,a
.LL12 !continue looping smul g2,o0,g2 !result
x .LL9 retl !return result mov g2,o0 !copy
result to o0
53
For Loops in C
for (init test update ) body
direct translation
init while(test ) body update
54
For Loop Example

C Code

Annotated Assembly
/ Find max ele. in array / int amax(int a,int
count) int i int result a0 for (i
1 i lt count i) if (ai gt result)
result ai return result
amax mov o0,o2 !copy a to o2 mov
1,g3 !i1 cmp g3,o1 !if (igtcount), bge
.LL15 !branch to return ld o2,o0 !resulta0
sll g3,2,g2 !g2i2 .LL20 ld
o2g2,g2 !g2ai cmp g2,o0 !if (ai lt
res), bg,a .LL16 !skip then part mov
g2,o0 !resultai .LL16 add
g3,1,g3 !i cmp g3,o1 ! if (i lt count), bl
.LL20 ! continue looping sll g3,2,g2
!g2i2 .LL15 retl !return result nop
for (init test update ) body
init while(test ) body update
55
Jumps

Characteristics
transfer of control is unconditional
target address is specified by a register, or
constant
Format
jmpl address,rd
rd stores the return address
synonyms for jmpl
jmp address -gt jmpl address, o7
ret -gt jmpl i78, g0
retl -gt jmpl o78, g0

56
Compiling Switch Statements
C Code

Implementation Options
Series of conditionals
Good if few cases
Slow if many
Jump Table
Lookup branch target
Avoids conditionals
Possible when cases are small integer constants
GCC
Picks one based on case structure

typedef enum ADD, MULT, MINUS, DIV, MOD, BAD
op_type char unparse_symbol(op_type op)
switch (op) case ADD return '' case
MULT return '' case MINUS return
'-' case DIV return '/' case MOD
return '' case BAD return '?'
57
Switch Statement Example

C Code

Enumerated Values ADD 0 MULT 1 MINUS 2 DIV 3 MOD 4
BAD 5
typedef enum ADD, MULT, MINUS, DIV, MOD,
BAD op_type char unparse_symbol(op_type op)
switch (op) case ADD return ''
case MULT return '' case MINUS
return '-' case DIV return '/' case
MOD return '' case BAD return '?'

Assembly Setup
unparse_symbol cmp o0,5 !if opgt 5 then
bgu .LL1 !branch to return sethi
hi(.LL9),g2 ! or g2,lo(.LL9),g2 !g2jta
b0 sll o0,2,g3 !g3op4 ld
g3g2,g2 !g2jtabop jmp g2 !jump
to jtab code nop
58
Jump Table
Table Contents
Targets Completion
.LL9 .word .LL3 .word .LL4 .word .LL5 .word .LL6
.word .LL7 .word .LL8
.LL3 b .LL1 ! return mov 43,o0 .LL4 b
.LL1 ! return mov 42,o0 .LL5 b .LL1 !
return - mov 45,o0 .LL6 b .LL1 ! return
/ mov 47,o0 .LL7 b .LL1 ! return mov
37,o0 .LL8 mov 63,o0 ! return
? .LL1 retl nop
Enumerated Values ADD 0 MULT 1 MINUS 2 DIV 3 MOD 4
BAD 5
59
Procedure Calls Returns

Maintain the return address in a special register
(o7 or i7)
Procedure call
call address Save return addr in o7, branch to
address
Procedure return
ret Jump to address in i78
retl Jump to address in o78(leaf procedure)

C Code
Annotated Assembly
int callee() return 5 int caller()
return callee()
. callee retl !return to o78 mov
5,o0 !copy 5 to o0 . caller . call
callee,0 !save current address to o7 nop .
60
Sparc Register windows
61
Stack-Based Languages
Stack (grows down)

Languages that support recursion
e.g., C, Pascal
Stack Allocated in Frames
state for procedure invocation
return point, arguments, locals
Code Example

yoo
who
amI
yoo() who()
who() amI()
amI() amI()
amI
amI
62
Register Saving Conventions

When procedure yoo calls who
yoo is the caller, who is the callee
Caller Save Registers
not guaranteed to be preserved across procedure
calls
can be immediately overwritten by a procedure
without first saving
useful for storing local temporary values within
a procedure
if yoo wants to preserve a caller-save register
across a call to who
save it on the stack before calling who
restore after who returns
Callee Save Registers
must be preserved across procedure calls
if who wants to use a callee-save register
save current register value on stack upon
procedure entry
restore when returning

63
Register Saving Examples

Callee Save
Callee must save / restore if overwriting

Caller Save
Caller must save / restore if live across
procedure call

yoo or 31, 17, 1 stq 1, 8(sp)
save 1 bsr 26, who ldq 1, 8(sp) restore 1
addq 1, 1, 0 ret 31, (26)
yoo or l0, 17, l1 call who
ret restore
who save sp,-112,sp !save local regs or g1,
6, l1 !overwrite l1 ret restore
!restore local regs
who or 31, 6, 1 overwrite 1 ret
31, (26)
Sparc use callee-save approach
64
Sparc Stack Frame

Conventions
Agreed upon by all program/compiler writers
Allows linking between different compilers
Enables symbolic debugging tools
Run Time Stack
Save context
Registers (l and i)
Storage for local variables
Parameters to called functions
Required to support recursion

65
Stack Frame Requirements

Procedure Categories
Leaf procedures that do not use stack
Do not call other procedures
Can fit all temporaries in caller-save registers
Leaf procedures that use stack
Do not call other procedures
Need stack for temporaries
Non-leaf procedures
Must use stack.
Stack Frame Structure
Must be at least 8-byte aligned
pad the region for locals and temporaries as
needed

66
Stack Frame Example
Assembly
rfact save sp,-112,sp !save regs to
stack cmp i0,1 !if xlt1 then ble,a
.LL2 !branch to return mov 1,i0 !executed
when jump call rfact,0 !call rfact add
i0,-1,o0 !x-1 as argument smul
i0,o0,i0 !multiplication .LL2 ret !return r
estore !restore from stack
C Code
/ Recursive factorial / int rfact(int x) if
(x lt 1) return 1 return x rfact(x-1)

Stack frame 112 bytes
Frame ptr _at_ sp 112
Save registers l and i
No floating pt. regs. used

sp 112
. . .

sp 12
sp 8
sp 4
sp 0
67
Stack Frame Example 2
C Code
sp 152
sp 148
void show_facts(void) int i int vals10
vals0 1L for (i 1 i lt 10 i)
valsi valsi-1 i for (i 9 i gt 0
i--) printf("Fact(d) d\n", i,
valsi)
. . .

sp 12
sp 8
sp 4
sp 0

Stack frame 152 bytes
Frame ptr _at_ sp 152
Local storage for vals
fp-20 to fp-56

68
Stack Frame Example 2 (Cont.)
show_facts save sp,-152,sp mov
1,o0 !o01 st o0,fp-56 !vals01 mov
o0,l0 !l01 add fp,-56,o2 !o2fp-56 .LL8
sll l0,2,o1 !o1l0ltlt2 add
l0,-1,o1 !o1l0-1 (i-1) sll o1,2,o1
!o14 ld o2o1,o1 !o1
valsi-1 smul l0,o1,o1 !o1l0 add
l0,1,l0 !i cmp l0,9 !if ilt10 then ble
.LL8 !loop st o1,o2o0 !store to
valsi mov 9,l0 !l09 sethi hi(.LLC0),l2
!set l2 2print add fp,-56,l1 !l1fp-56 sll
l0,2,o2 !o2l04 .LL15 or
l2,lo(.LLC0),o0 !print address mov
l0,o1 !o1i call printf,0 !call printf ld
l1o2,o2 !o2valsi addcc
l0,-1,l0 !i-- bpos .LL15 !loop sll
l0,2,o2 !o2l04 ret restore
C Code
void show_facts(void) int i int vals10
vals0 1 for (i 1 i lt 10 i)
valsi valsi-1 i for (i 9 i gt 0
i--) printf("Fact(d) d\n", i,
valsi)
sp 152
sp 148
. . .

sp 12
sp 8
sp 4
sp 0
69
Stack Addrs as Procedure Args
rfact2 save sp,-120,sp !sp -120 cmp
i0,1 !if xlt 1 ble .LL19 !jump to LLl9 mov
1,o0 !o01 add i0,-1,o0 !o0 x -1 call
rfact2,0 !call rfact2 add fp,-20,o1 !calculate
val mov i0,o0 ! ld fp-20,o1 !load from
val smul i0,o0,o0 !multiplication .LL19 st
o0,i1 !store to result ret !return restore
C Code
void rfact2(int x,int result) if (x lt 1)
result 1 else int val
rfact2(x-1,val) result x val
return

Stack frame 120 bytes
val stored at fp - 20
fp -20 passed as second argument (o1) to
recursive call of rfact2

70
Floating Point Code Example

Compute Inner Product of Two Vectors
Single precision

in_ProdF sethi hi(.LLC0),o3 or
o3,lo(.LLC0),o3 mov 0,g3 !i0 cmp
g3,o2 !if igtn then bge .LL3 !jump to
return ld o3,f0 !f00.0 .LL5 sll
g3,2,g2 !g2i4 ld o0g2,f2 !f2xi
ld o1g2,f3 !f3yi fmuls
f2,f3,f2 !f2xiyi add
g3,1,g3 !i cmp g3,o2 !if iltn then bl
.LL5 !loop fadds f0,f2,f0 !resultxiyi
.LL3 retl !return nop
float inner_prodF (float x, float y, int
n) int i float result 0.0 for (i 0
i lt n i) result xi yi
return result
71
Double Precision
in_ProdD sethi hi(.LLC1),o3 ldd
o3lo(.LLC1),f0 !result0.0 mov 0,g3 !
i0 cmp g3,o2 !If i gtn then bge .LL9 !branch
to LL9 nop .LL11 sll g3,3,g2 !g2i4 ldd
o0g2,f2 !f2 xi ldd o1g2,f4
!f4yi fmuld f2,f4,f2 !f2xiyi add
g3,1,g3 !i cmp g3,o2 !if iltn then bl
.LL11 !looping faddd f0,f2,f0 !resultxiy
i .LL9 retl !return nop
double inner_prodD (double x, double y,
int n) int i double result 0.0 for (i
0 i lt n i) result xi
yi return result
72
Numeric Format Conversion

Between Floating Point and Integer Formats
Special conversion instructions fdtos, fstod,
fstoi, fitos,
Convert source operand in one format to
destination in other
Both source destination must be FP register
Transfer to from GP registers via stack
store/load

C Code
Conversion Code
float double2float(double d) return (float)
d
fdtos f2,f0
Convert double to float
st o0,sp100 ld sp100,f2 fitos f2,f0
float int2float(int i) return (float) i
Pass through stack and convert
73
Structure Allocation

Principles
Allocate space for structure elements
contiguously
Access fields by offsets from initial location
Offsets determined by compiler

typedef struct char c int i2 double
d struct_ele, struct_ptr
c
i0
i1
d
0
4
8
16
24
74
Alignment

Requirements
Primitive data type requires K bytes
Address must be multiple of K
Specific Cases
Word data address must be multiple of 4
Double word data address must be multiple of 8
Reason
Memory accessed by (aligned) words
Compiler
Inserts gaps within structure to ensure correct
alignment of fields

75
Structure Access
Result Computation
C Code
int struct_i(struct_ptr p) return p-gti
.align 4 add o0,4,o0 !address of 4th byte
int struct_i1(struct_ptr p) return p-gti1
ld o08,o0 !word at 8th byte
double struct_d(struct_ptr p) return p-gtd
ldd o016,f0!Double at 16th byte
char struct_c(struct_ptr p) return p-gtc
ldsb o0,o0 !byte at 0th byte
76
Arrays vs. Pointers

Recall
Can access stored data either with pointer or
array notation
Differ in how storage allocated
Array declaration allocates space for array
elements
Pointer declaration allocates space for pointer
only

C Code for Allocation
typedef struct char c int i double
d pstruct_ele, pstruct_ptr
pstruct_ptr pstruct_alloc(void) pstruct_ptr
result (pstruct_ptr)
malloc(sizeof(pstruct_ele)) result-gti (int
) calloc(2, sizeof(int)) return
result
c
i
d
0
8
16
4
77
Accessing Through Pointer
C Code
Result Computation
int pstruct_i(pstruct_ptr p) return p-gti
ld o04,o0 !word at 4th byte
int pstruct_i1(pstruct_ptr p) return
p-gti1
i quad word at 4th byte from p ld
o04,g2 Retrieve i1 ld g24,o0
c
i
d
8
16
4
78
Arrays of Structures

Principles
Allocated by repeating allocation for array type
Accessed by computing address of element
Attempt to optimize
Minimize use of multiplication
Exploit values determined at compile time

C Code
Address Computation
/ Index into array of struct_ele's
/ struct_ptr a_index (struct_ele a, int
idx) return aidx
sll o1,1,g2 !g2idx2 add g2,o1,g2 !g2id
x3 sll g2,3,g2 !g2idx24 add
o0,g2,o0 !aidx24
79
Aligning Array Elements

Requirement
Must make sure alignment requirements met when
allocate array of structures
May require inserting unused space at end of
structure

typedef struct double d int i2 char
c rev_ele, rev_ptr
rev_ele a2
a must be multiple of 8
Alignment OK
80
Nested Allocations

Principles
Can nest declarations of arrays and structures
Compiler keeps track of allocation and access
requirements

typedef struct int x int y point_ele,
point_ptr typedef struct point_ele ll
point_ele ur rect_ele, rect_ptr
81
Nested Allocation (cont.)
C Code
Computation
int area(rect_ptr r) int width
r-gtur.x - r-gtll.x int height r-gtur.y -
r-gtll.y return width height
ld i08,o0 !o0ur.x ld i0,o1
!o1ll.x sub o0,o1,o0 !o0width ld
i012,o2 !o2ur.y ld i04,o1
!o1ll.y sub o2,o1,o1 !o1hight smul
o0,o1,o0 !o0area
82
Union Allocation

Principles
Overlay union elements
Allocate according to largest element
Programmer responsible for collision avoidance

typedef union char c int i2 double
d union_ele, union_ptr
c
i0
i1
d
0
4
8
83
Example Use of Union
typedef enum CHAR, INT, DOUBLE
utype typedef struct utype type
union_ele e store_ele, store_ptr

Structure can hold 3 kinds of data
Never use 2 forms simultaneously
Identify particular kind with flag type

void print_store(store_ptr p) switch
(p-gttype) case CHAR printf("Char
c\n", p-gte.c) break case INT
printf("Int0 d, Int1 d\n",
p-gte.i0, p-gte.i1) break case DOUBLE
printf("Double g\n", p-gte.d)
84
Using Union to Access Bit Patterns
typedef union float f unsigned u
bit_float_t
float bit2float(unsigned u) bit_float_t arg
arg.u u return arg.f
void show_parts(float f) int sign, exp,
significand bit_float_t arg arg.f f /
Get bit 31 / sign (arg.u gtgt 31) 0x1 /
Get bits 30 .. 23 / exp (arg.u gtgt 23)
0xFF / Get bits 22 .. 0 / significand
arg.u 0x7FFFFF