CME212 Introduction to Large Scale Computing in Engineering - PowerPoint PPT Presentation

1 / 53
About This Presentation
Title:

CME212 Introduction to Large Scale Computing in Engineering

Description:

CME212 Introduction to Large Scale Computing in Engineering ... Created a large set of instructions. Decoding was complicated ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 54
Provided by: henri3
Category:

less

Transcript and Presenter's Notes

Title: CME212 Introduction to Large Scale Computing in Engineering


1
Representation II
  • Representing Composite Types, Disassembly,
    Function Calls, Stack, Heap

2
Instructions
  • The back end of the compiler generates the
    machine code from the intermediate representation
  • This code generation allows for many
    optimizations
  • Usually, the compiler is conservative.
  • Example Strict IEEE FP compliance,
    non-associative algebras (cannot reorder)

3
CISC
  • Complex Instruction Set Computers
  • Specialized machine instructions for things like
    managing linked lists, evaluating polynomials
  • Created a large set of instructions
  • Decoding was complicated
  • Hard for compiler to utilize all instructions
  • Useful for hand coded assembly
  • x86 (IA-32) is usually thought of as a CISC
    instruction set

4
RISC
  • Reduced Instruction Set Computer
  • Uses only a small set of atomic operations that
    can be combined to form more complex ones
  • Easy to decode
  • Developed by Patterson (Berkeley) and Hennessy
    (Stanford) in the 80s
  • SPARC, MIPS, PowerPC, Alpha
  • The compiler had to do more work
  • Very cumbersome to do RISC assembly programming
    by hand

5
Today
  • RISC and CISC has converged into something
    inbetween
  • IA-32 instructions are CISC but decomposed into
    RISC-like micro-instructions internally
  • Many of the ideas of RISC have survived
  • Less debate today
  • Most ISAs today are 64-bit

6
Assembler Code
  • Human-readable machine code, called assembly
    code, can be produced by the compiler
  • gcc S myfile.c
  • You can also reverse engineer assembly code from
    machine code using a disassembler
  • objdump d file
  • where file be can an executable or object file

7
Compiling Into Assembly
  • C Code

Generated Assembly
int sum(int x, int y) int t xy
return t
_sum pushl ebp movl esp,ebp movl
12(ebp),eax addl 8(ebp),eax movl
ebp,esp popl ebp ret
Obtain with command gcc -O -S code.c Produces
file code.s
8
Basic Operation
  • Modern computers are of LOAD-STORE type
  • Other types are accumulators, stack machines
  • These machines store operands in a register
    file or simply registers
  • Small scratch memory very close to the
    arithmetical units
  • Data must be loaded from memory into a register,
    operated upon, and then stored back

9
LOAD-STORE
  • A regular calculator works like an accumulator
  • Early computers worked this way too
  • You have one register on which your arithmetic
    operations can work
  • In a LOAD-STORE machine you have several
    accumulators or memory locations where you can
    store temporaries

10
Registers
  • Registers are a scarce resource
  • Store words
  • Special registers for floating-point
  • The compiler tries to maximize the usage of the
    registers
  • Called register allocation
  • If you run out of registers, you must temporarily
    store results back in memory and then retreive
    them again
  • Register spill
  • Degrades performace

11
Registers in C
  • There are two keywords that control register
    allocation from C
  • The register keyword forces a variable to a
    register
  • Used for heavily accessed variables
  • Today, most compilers can figure this out
    themselves
  • You cannot take the address of something that is
    stored in a register
  • The volatile keyword forces the results to be
    written back to memory
  • Used in low-level and parallel progamming

12
Example
  • register int a 23
  • volatile int b 43

13
Machine Instructions (x86,IA32)
14
Low-level control flow
  • High-level constructs are mapped onto conditional
    and unconditional jumps (branches)
  • Unconditional jump in C goto statement
  • Jump targets (a new PC location) can be
    considered as labels
  • You can defined labels in C too

15
goto Example
  • if(x
  • goto less
  • val x-y
  • goto done
  • less
  • / y is larger than x /
  • val y-x
  • done
  • use(val)

if(xx-y . . use(val)
16
Conditional Codes
  • The are special registers which hold condition
    codes
  • These registers are set by the test and compare
    instructions
  • Control flow instructions use the control codes
    to see the results of a conditional
  • If (code) set register to (setl,..)
  • If (code) jump to (jmpl,)

17
goto Example, Again
if(xif(xdone less val y-x done
  • load x into reg0
  • load y into reg1
  • compare reg0,reg1
  • jump to less if larger than
  • reg2 subtract(reg0,reg1)
  • jump to done
  • less
  • reg2 subtract(reg1,reg0)
  • done

18
Loops
  • while, do and for loops are also transformed into
    conditional and unconditional jumps
  • A for loop contains a loop header which controls
    the execution of the loop and the loop body which
    do the actual work

for (Init Test Update ) Body
19
for Loops
while Version
for Version
Init while (Test ) Body Update
for (Init Test Update ) Body
goto Version
do-while Version
Init if (!Test) goto done loop Body
Update if (Test) goto loop done
Init if (!Test) goto done do
Body Update while (Test) done
20
Calling Functions
  • A function typically have input and output
    arguments and local variables
  • As we use jumps we also need the return address
    to be able to get back after the call
  • Both of these problems can be solved using a stack

21
Stacks
  • Works like a stack of papers
  • Two operations
  • Push (place something on the top)
  • Pop (remove something from the top)
  • In algorithm language stacks are LIFO,
    last-in-first-out

22
Stacks and Calls
  • Most machines push the return address onto the
    stack before doing the call
  • After this the PC is set to the address of the
    subroutine
  • At the end of the subroutine, the return address
    can be popped from the stack

23
Arguments
  • Subroutines typically need many registers to be
    able to do stuff efficiently
  • Before a call the registers are spilled to
    memory, called a save
  • Typically these are pushed onto the stack
  • Next, we push the return address
  • And finally the arguments onto the stack
  • After we get back from the subroutine, we can pop
    the saved registers (called a restore) from the
    stack

24
Stack Example
  • save
  • push address to Return_label
  • push arguments
  • call my_routine
  • Return_label
  • restore

Stack
Arguments
Return Address
Saved registers
my_routine pop arguments do stuff pop return
address jmp Return_label
25
More on Calls
  • Passing arguments using the stack is slow
  • We would like to use registers
  • Complicates register allocation
  • Some processors have special input and output
    registers
  • Passes arguments through these
  • Limited amount
  • If the number of arguments is large, the stack is
    used

26
ABI
  • The scheme for subroutine calls is usually
    defined in the Application Binary Interface (ABI)
  • Different compilers generate the same code
  • Linux Standard Base
  • http//www.linux-foundation.org/en/LSB
  • SPARC
  • http//www.sparc.org

27
Stacks and Recursion
  • Stacks used to implement recursion in an elegant
    way
  • Fortran does not use a stack. To do recursion in
    Fortran you must declare the function as
    recursive
  • Intermediate values are deferred by pushing the
    return value onto the stack
  • Stack grows for each recursive call
  • You can get a stack overflow error

28
C and Stacks
  • Automatic variables are typically stored onto the
    stack
  • When the function returns the arguments are
    popped
  • They can however still be stored in memory
  • Implementations of stacks usually use a stack
    pointer to know where we are
  • Old values might still be present in memory

29
Stack size
  • The address space sizes are controlled by the
    shell
  • The maximum stack size is given by the command
    ulimit
  • The shell also control other things such as, the
    number of files, core files, maximum virtual
    memory
  • ulimit -a

30
Where are My Variables?
  • C variables will be allocated at different
    locations depending on the scope and extent
  • Global Variables
  • Automatic Variables
  • Memory from malloc(), calloc()

31
0xffffffff
kernel virtual memory (code, data, heap, stack)
memory invisible to user code
0xc0000000
user stack (created at runtime)
Automatic variables
memory mapped region for shared libraries
0x40000000
Dynamically allocated data
run-time heap (managed by malloc)
Unitialized data, pointers, global variables
read/write segment (.data, .bss)
loaded from the executable file
read-only segment (.init, .text, .rodata)
0x08048000
Progam code. Read only data
unused
0
32
Memory Allocation
Address of a is 0xffbff474 Address of b is
0xffbff470 Address of c is 0xffbff46c Address of
d is 0xffbff468 Address of e is
0xffbff460 Address of f is 0xffbff3e0 Address of
g is 0x20b9c
include int main(void) int
a,b,c,d double e char f128 static int
g printf(Address of a is 0xx\n,a)
printf(Address of b is 0xx\n,b)
printf(Address of c is 0xx\n,c)
printf(Address of d is 0xx\n,d)
printf(Address of e is 0xx\n,e)
printf(Address of f is 0xx\n,f)
printf(Address of g is 0xx\n,g)
33
Array Example
typedef int zip_dig5 zip_dig cmu 1, 5, 2,
1, 3 zip_dig mit 0, 2, 1, 3, 9 zip_dig
ucb 9, 4, 7, 2, 0
  • Notes
  • Declaration zip_dig cmu equivalent to int
    cmu5
  • Example arrays were allocated in successive 20
    byte blocks
  • Not guaranteed to happen in general

34
Array Accessing Example
  • Computation
  • Register reg0 contains starting address of array
  • Register reg1 contains array index
  • Desired digit at 4reg1 reg0

int get_digit(zip_dig z, int dig) return
zdig
  • Memory Reference Code

reg0 z reg1 dig store 4reg1 in reg2 add
reg2 to reg0 reg0 4reg1 reg0 store
value at address reg0 in reg3 Mem4reg1reg0
35
Array Loop Example
int zd2int(zip_dig z) int i int zi 0
for (i 0 i zi return zi
  • Original Source

int zd2int(zip_dig z) int zi 0 int zend
z 4 do zi 10 zi z z
while(z
  • Transformed Version
  • As generated by GCC
  • Eliminate loop variable i
  • Convert array code to pointer code
  • Express in do-while form
  • No need to test at entrance

36
Multidimensional arrays
  • Memory is one-dimensional
  • Multidimensional arrays need to be mapped onto a
    one-dimensional block of memory
  • In C, we have two alternatives
  • Nested arrays anij
  • Static or dynamic allocation
  • Multi-level arrays aij
  • Static or dynamic allocation

37
Nested Arrays
  • Dimensions are stacked consecutively using an
    index mapping
  • Consider a square two-dimensional array of size N

j
i
Array(i,j) - ArrayjiN
38
Static Nested Array Example
define PCOUNT 4 zip_dig pghPCOUNT 1, 5,
2, 0, 6, 1, 5, 2, 1, 3, 1, 5, 2, 1,
7, 1, 5, 2, 2, 1
  • Declaration zip_dig pgh4 equivalent to int
    pgh45
  • Variable pgh denotes array of 4 elements
  • Allocated contiguously
  • Each element is an array of 5 ints
  • Allocated contiguously
  • Row-Major ordering of all elements guaranteed

39
Static Nested Array Element Access
  • Array Elements
  • Aij is element of type T
  • Address A (i C j) K

int ARC
Aij
Ai
  
  
A i j
  
  
A
AiC4
A(R-1)C4
A(iCj)4
40
Static Multi-Level Array Example
zip_dig cmu 1, 5, 2, 1, 3 zip_dig mit
0, 2, 1, 3, 9 zip_dig ucb 9, 4, 7, 2, 0
  • Variable univ denotes array of 3 elements
  • Each element is a pointer
  • 4 bytes
  • Each pointer points to array of ints

define UCOUNT 3 int univUCOUNT mit, cmu,
ucb
41
Element Access in Multi-Level Array
  • Computation
  • Element access MemMemuniv4index4dig
  • Must do two memory reads
  • First get pointer to row array
  • Then access element within array

int get_univ_digit (int index, int dig)
return univindexdig
reg0 index reg1 dig store 4reg0 to
reg2 4index store addr of univ in reg3 add
reg2 to reg3 univ 4index load data at
address reg3 into reg4 Memuniv4index store
4reg1 to reg2 4dig add reg2 to reg4
Memuniv4index4dig load data at address
reg4 into reg5 MemMemuniv4index
42
Static Array Element Accesses
  • Similar C references
  • Nested Array
  • Element at
  • Mempgh20index4dig
  • Different address computation
  • Multi-Level Array
  • Element at
  • MemMemuniv4index4dig

int get_pgh_digit (int index, int dig)
return pghindexdig
int get_univ_digit (int index, int dig)
return univindexdig
43
Dynamic Nested Arrays in C
  • Strength
  • Can create matrix of arbitrary size
  • Can choose row or column major order
  • Programming
  • Must do index computation explicitly
  • Performance
  • Accessing single element costly
  • Must do multiplication by dimension

int get_element(int a, int i,int j, int n)
return ainj
44
Dynamic Multi-level Arrays in C
  • Multi-level
  • Pointer-to-pointer, bracket indexing ij
  • Same dual mem address calculations as for static
    multi-level arrays
  • Can be packed (contiguous storage), bracket
    indexing

int array1 (int )malloc(nrows sizeof(int
)) for(i 0 i
(int )malloc(ncolumns sizeof(int))
int array2 (int )malloc(nrows sizeof(int
)) array20 (int )malloc(nrows ncolumns
sizeof(int)) for(i 1 i
array2i array20 i ncolumns
45
Structs
  • The individual components are laid out in memory
    in their declaration order
  • There might still be gaps due to alignment of
    data, i.e. to place data on addresses that are a
    multiple of 2,4 or 8
  • Some ISAs require certain aligment of data to
    simplify the design
  • C has support for bitfields, which are tiny
    members of a struct using only a few bits each
  • Useful in low-level systems programming to pack
    data

46
Bitfield example
  • struct
  • / field 4 bits wide /
  • unsigned field1 4
  • /
  • unnamed 3 bit field
  • unnamedfields allow for padding
  • /
  • unsigned 3
  • /
  • one-bit field
  • can only be 0 or -1 in two's complement!
  • /
  • signed field2 1
  • / align next field on a storage unit /
  • unsigned 0
  • unsigned field3 6
  • full_of_fields

47
Incomplete Array Type (c99)
  • struct s int n double d
  • struct s p1, p2
  • size_t sz
  • sz sizeof(struct s) // sz offsetof(struct
    s, d)
  • p1 malloc(sz 8 sizeof (double))
  • p2 malloc(sz 5 sizeof (double))
  • / p1 behaves now as if it had been declared as
  • struct int n double d8 p1
  • p2 behaves now as if it had been declared as
  • struct int n double d5 p2
  • /

48
The Heap
  • When you request memory using the standard C
    library functions, it will be placed in an area
    of the virtual address space called the heap
  • The heap can grow quite large, but is ultimately
    limited by the word size of the CPU
  • Parts of the address space are also reserved for
    the system and other parts of your program
  • The stack usually grows downwards towards the
    heap, which means that they can meet
  • Always check return values to see if you got any
    memory

49
0xffffffff
kernel virtual memory (code, data, heap, stack)
memory invisible to user code
0xc0000000
user stack (created at runtime)
Automatic variables
memory mapped region for shared libraries
0x40000000
Dynamically allocated data
run-time heap (managed by malloc)
Unitialized data, pointers, global variables
read/write segment (.data, .bss)
loaded from the executable file
read-only segment (.init, .text, .rodata)
0x08048000
Progam code. Read only data
unused
0
50
The Inner Workings of the Heap
  • The top of the heap is given by a kernel variable
    called brk
  • To grow the heap you can call the UNIX function
    sbrk(2)
  • The standard C library uses this function
    internally
  • Memory on the heap can also be reused
  • If memory has been freed you may not need to
    increase the heap.
  • This memory can be reused
  • See Bryant/OHallaron 10.9-10.11

51
The Heap Puzzle
  • The standard C library keeps track of the chunks
    of memory you request
  • Start address plus size in bytes
  • The free() function marks chunks as reusable
  • Programs that do a lot of mallocs and frees can
    fragment the heap
  • You cannot move a chunk as this would give a new
    start address which means that all pointers
    storing this address needs to be updated
  • If memory is reused it must fit the new request
  • If you do not match each malloc() with a free()
    you have created a memory leak
  • Once a pointer has been overwritten there is no
    chance of calling free()

52
Garbage Collection
  • You can construct memory allocators that detect
    when chunks are available for reuse
  • You do not call free()
  • Java uses garbage collection
  • The are such allocators for C too
  • The Boehm-Demers-Weiser (BDW) GC
  • http//www.linuxjournal.com/article/6679
  • Garbage collection can degrade performace
  • The GC is activate in cycles to sweep for free
    chunks
  • However, it usually reduces or eliminates memory
    leaks

53
Discussion Section
  • Todays Location
  • Terman 102-104 (elaine Linux cluster)
  • Same time (415-530)
  • Topics bitwise operators, linking, command-line
    arguments
Write a Comment
User Comments (0)
About PowerShow.com