Title: Assembly Process
1Assembly Process
2Machine Code Generation
- Assembling a program entails translating the
assembly language into binary machine code - This requires more than simply mapping assembly
instructions to machine instructions - Each instruction is bound to an address
- Labels are bound to addresses
- Assembly instructions which refer to labels
generate machine instructions which contain the
label's address - Pseudo-instructions are translated into one or
more machine instructions
3Instruction Format
addi 13,7,50
0010 00
00111
01101
0000 0000 0011 0010
16 bits
6 bits
5 bits
5 bits
immediate operand
opcode
add 13,7,8
0000 00
00 111
01000
01101
000 0010 0000
extended opcode
opcode
4The symbol table
- The assembler scans the source code and generates
the appropriate bit string for each line
encountered - The assembler must remember
- what memory locations have been allocated
- to which address each label is bound
- A symbol table is a list of (label, address)
pairs - When the data and text segments have been
generated, they are stored as an executable file - The file is used by a program called the loader
to initialize memory to the appropriate state
before execution
5Instructions
- The .text directive tells the assembler that the
lines which follow are instructions. - By default, the text segment starts at 0x00400000
- In some cases, a symbol may not have an assigned
address yet when the assembler scans the line
where it belongs - A second pass through the code can update
instructions containing unresolved labels - Maintain a list of addresses in which each
unresolved label appears - When the labeled is added to the symbol table,
all locations in the corresponding list are
updated to hold the address associated with the
label
6Branch offset in the MIPS R2000
- In machine code, the target address in a branch
must be specified as an offset from the address
of the branch. - During execution, this offset is simply added to
the program counter to fetch the next instruction - PC contains the address
- Offset is measured in words, not bytes
- PC_NEW offset4 PC_OLD
- To calculate the offset, the assembler uses the
formula - offset (target instruction address
(branch instruction address))/4
7Branch offset calculation
- The offset is stored in the instruction as a word
offset rather than a byte offset. - Instructions are only stored at word boundaries
- For both target and branch instruction, the least
two bits of the address are zero - An offset maybe negative
- If the target instruction preceded the branch
instruction - The offset is stored in the 16-bit immediate
field - This means the branch can only jump about 215
instructions before or after the current address - 215 instructions (words) 217 bytes
8Branch offset calculation
- An entry in the SPIM instruction list
offset in bytes (__start 0x00400000) 0x00400000
(0x00400068) - 104
stored offset ffe6 -26 -104/4
offset calculation, in bytes ignores PC increment
0x00400068 0x1440ffe6 bne 2, 0, -104
__start-0x00400068 44 bnez v0, __start
machine code
orignal assembly code
instruction address
line number in source file
9Jump target calculation
- The jump instruction has two forms
- Pseudo-direct, for j and jal
- Register direct for jr and jalr
- jr and jalr specify a register containing the
address to be loaded into the PC - j and jal specify most of the address of the
target within the instruction. - However, they have a range of at most
one-sixteenth of the memory space
f e d c b a 9 8 7 6 5 4 3 2 1 0
10Jump target calculation
- The target address is a 32 bit quantity
- Since all word addresses are multiples of 4 there
is no need to store the last two bits - The jump instruction format has 26 bits for the
target address - The remaining 6 bits of the instruction are used
for the opcode - The highest-order 4 bits of the target are taken
from the address currently stored in the program
counter
11Jump Target Calculation
- jump instructions have a range of 226 words or
226 x 22 228 bytes - This range is NOT symmetric about the jump
instruction
f e d c b a 9 8 7 6 5 4 3 2 1 0
0x0fffff7c
0x80000080
-0x00000080
12Program relocation
- It is possible that program modules are developed
separately by individual programmers. When these
programs are to be loaded into memory they should
not be assigned overlapping memory space. - Thus,the modules have to be relocated
- relative addresses are relocatable
- Any absolute references must be "fixed" by the
loader - Use a logical base address known at load time
- Absolute addresses are stored as offsets from
this TBD base
13From source to executable
high-level source code
lib
obj
asm
exe
obj
asm
linker
loader
assembler
memory
compiler
14Some examples of assembling code
- .data
- a1 .word 3
- a2 .word 16, 16, 16, 16
- a3 .word 5
- .text
- __start
- la 6, a2
- loop
- lw 7, 4(6)
- mul 9, 10, 7
- b loop
- li v0, 10
- syscall
15Some examples of assembling code
- Symbol Table
- symbol address
- a1 1000 0000
- a2 1000 0004
- a3 1000 0014
- __start 0040 0000
- loop 0040 0008
- Memory map of data section
- address contents
- 1000 0000 0000 0003
- 1000 0004 0000 0010
- 1000 0008 0000 0010
- 1000 000c 0000 0010
- 1000 0010 0000 0010
- 1000 0014 0000 0005
- .data
- a1 .word 3
- a2 .word 16, 16, 16, 16
- a3 .word 5
- .text
- __start
- la 6, a2
- loop
- lw 7, 4(6)
- mult 9, 10, 7
- b loop
- li v0, 10
- syscall
16Translate pseudo-instructions
- lui 6, 6, 0x1000
- ori 6, 6, 0x0004
- lw 7, 4(6)
- mult 10, 7
- mflo 9
- b loop
- ori v0, 0, 10
- syscall
- la 6, a2
- loop
- lw 7, 4(6)
- mul 9, 10, 7
- b loop
- li v0, 10
- syscall
17Translate to machine code
-
- lui 6, 0x1000
- ori 6, 0x0004
- lw 7, 4(6)
- mult 10, 7
- mflo 9
- b loop
- ori v0, 0, 10
- syscall
address contents 00400000 3c06 1000 (lui)
00400004 34c6 0004 (ori) 00400008 8cc7 0004
(lw) 0040000c 012a 0018 (mult) 00400010 0000
4812 (mflo) 00400014 1000 xxxx (beq) 00400018
3402 000a (ori) 0040001c 0000 000c (syscall)
18Resolve relative references
-
- lui 6, 0x1000
- ori 6, 0x0004
- lw 7, 4(6)
- mult 10, 7
- mflo 9
- b loop
- ori v0, 0, 10
- syscall
address contents 00400000 3c06 1000 00400004
34c6 0004 00400008 8cc7 0004 0040000c 012a
0018 00400010 0000 4812 00400014 1000 fffd
(-3) 00400018 3402 000a 0040001c 0000 000c
0x400008 - (0x400014)/4 -12/4 -3 0xfffd