CME212 Introduction to Large Scale Computing in Engineering - PowerPoint PPT Presentation


PPT – CME212 Introduction to Large Scale Computing in Engineering PowerPoint presentation | free to view - id: 23f52-ZWIxM


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

CME212 Introduction to Large Scale Computing in Engineering


CME212 Introduction to Large Scale Computing in Engineering ... Created a large set of instructions. Decoding was complicated ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 54
Provided by: henri3


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: CME212 Introduction to Large Scale Computing in Engineering

Representation II
  • Representing Composite Types, Disassembly,
    Function Calls, Stack, Heap

  • The back end of the compiler generates the
    machine code from the intermediate representation
  • This code generation allows for many
  • Usually, the compiler is conservative.
  • Example Strict IEEE FP compliance,
    non-associative algebras (cannot reorder)

  • Complex Instruction Set Computers
  • Specialized machine instructions for things like
    managing linked lists, evaluating polynomials
  • Created a large set of instructions
  • Decoding was complicated
  • Hard for compiler to utilize all instructions
  • Useful for hand coded assembly
  • x86 (IA-32) is usually thought of as a CISC
    instruction set

  • Reduced Instruction Set Computer
  • Uses only a small set of atomic operations that
    can be combined to form more complex ones
  • Easy to decode
  • Developed by Patterson (Berkeley) and Hennessy
    (Stanford) in the 80s
  • SPARC, MIPS, PowerPC, Alpha
  • The compiler had to do more work
  • Very cumbersome to do RISC assembly programming
    by hand

  • RISC and CISC has converged into something
  • IA-32 instructions are CISC but decomposed into
    RISC-like micro-instructions internally
  • Many of the ideas of RISC have survived
  • Less debate today
  • Most ISAs today are 64-bit

Assembler Code
  • Human-readable machine code, called assembly
    code, can be produced by the compiler
  • gcc S myfile.c
  • You can also reverse engineer assembly code from
    machine code using a disassembler
  • objdump d file
  • where file be can an executable or object file

Compiling Into Assembly
  • C Code

Generated Assembly
int sum(int x, int y) int t xy
return t
_sum pushl ebp movl esp,ebp movl
12(ebp),eax addl 8(ebp),eax movl
ebp,esp popl ebp ret
Obtain with command gcc -O -S code.c Produces
file code.s
Basic Operation
  • Modern computers are of LOAD-STORE type
  • Other types are accumulators, stack machines
  • These machines store operands in a register
    file or simply registers
  • Small scratch memory very close to the
    arithmetical units
  • Data must be loaded from memory into a register,
    operated upon, and then stored back

  • A regular calculator works like an accumulator
  • Early computers worked this way too
  • You have one register on which your arithmetic
    operations can work
  • In a LOAD-STORE machine you have several
    accumulators or memory locations where you can
    store temporaries

  • Registers are a scarce resource
  • Store words
  • Special registers for floating-point
  • The compiler tries to maximize the usage of the
  • Called register allocation
  • If you run out of registers, you must temporarily
    store results back in memory and then retreive
    them again
  • Register spill
  • Degrades performace

Registers in C
  • There are two keywords that control register
    allocation from C
  • The register keyword forces a variable to a
  • Used for heavily accessed variables
  • Today, most compilers can figure this out
  • You cannot take the address of something that is
    stored in a register
  • The volatile keyword forces the results to be
    written back to memory
  • Used in low-level and parallel progamming

  • register int a 23
  • volatile int b 43

Machine Instructions (x86,IA32)
Low-level control flow
  • High-level constructs are mapped onto conditional
    and unconditional jumps (branches)
  • Unconditional jump in C goto statement
  • Jump targets (a new PC location) can be
    considered as labels
  • You can defined labels in C too

goto Example
  • if(x
  • goto less
  • val x-y
  • goto done
  • less
  • / y is larger than x /
  • val y-x
  • done
  • use(val)

if(xx-y . . use(val)
Conditional Codes
  • The are special registers which hold condition
  • These registers are set by the test and compare
  • Control flow instructions use the control codes
    to see the results of a conditional
  • If (code) set register to (setl,..)
  • If (code) jump to (jmpl,…)

goto Example, Again
if(xif(xdone less val y-x done
  • load x into reg0
  • load y into reg1
  • compare reg0,reg1
  • jump to less if larger than
  • reg2 subtract(reg0,reg1)
  • jump to done
  • less
  • reg2 subtract(reg1,reg0)
  • done

  • while, do and for loops are also transformed into
    conditional and unconditional jumps
  • A for loop contains a loop header which controls
    the execution of the loop and the loop body which
    do the actual work

for (Init Test Update ) Body
for Loops
while Version
for Version
Init while (Test ) Body Update
for (Init Test Update ) Body
goto Version
do-while Version
Init if (!Test) goto done loop Body
Update if (Test) goto loop done
Init if (!Test) goto done do
Body Update while (Test) done
Calling Functions
  • A function typically have input and output
    arguments and local variables
  • As we use jumps we also need the return address
    to be able to get back after the call
  • Both of these problems can be solved using a stack

  • Works like a stack of papers
  • Two operations
  • Push (place something on the top)
  • Pop (remove something from the top)
  • In algorithm language stacks are LIFO,

Stacks and Calls
  • Most machines push the return address onto the
    stack before doing the call
  • After this the PC is set to the address of the
  • At the end of the subroutine, the return address
    can be popped from the stack

  • Subroutines typically need many registers to be
    able to do stuff efficiently
  • Before a call the registers are spilled to
    memory, called a save
  • Typically these are pushed onto the stack
  • Next, we push the return address
  • And finally the arguments onto the stack
  • After we get back from the subroutine, we can pop
    the saved registers (called a restore) from the

Stack Example
  • save
  • push address to Return_label
  • push arguments
  • call my_routine
  • Return_label
  • restore

Return Address
Saved registers
my_routine pop arguments do stuff pop return
address jmp Return_label
More on Calls
  • Passing arguments using the stack is slow
  • We would like to use registers
  • Complicates register allocation
  • Some processors have special input and output
  • Passes arguments through these
  • Limited amount
  • If the number of arguments is large, the stack is

  • The scheme for subroutine calls is usually
    defined in the Application Binary Interface (ABI)
  • Different compilers generate the same code
  • Linux Standard Base
  • http//
  • http//

Stacks and Recursion
  • Stacks used to implement recursion in an elegant
  • Fortran does not use a stack. To do recursion in
    Fortran you must declare the function as
  • Intermediate values are deferred by pushing the
    return value onto the stack
  • Stack grows for each recursive call
  • You can get a stack overflow error

C and Stacks
  • Automatic variables are typically stored onto the
  • When the function returns the arguments are
  • They can however still be stored in memory
  • Implementations of stacks usually use a stack
    pointer to know where we are
  • Old values might still be present in memory

Stack size
  • The address space sizes are controlled by the
  • The maximum stack size is given by the command
  • The shell also control other things such as, the
    number of files, core files, maximum virtual
  • ulimit -a

Where are My Variables?
  • C variables will be allocated at different
    locations depending on the scope and extent
  • Global Variables
  • Automatic Variables
  • Memory from malloc(), calloc()

kernel virtual memory (code, data, heap, stack)
memory invisible to user code
user stack (created at runtime)
Automatic variables
memory mapped region for shared libraries
Dynamically allocated data
run-time heap (managed by malloc)
Unitialized data, pointers, global variables
read/write segment (.data, .bss)
loaded from the executable file
read-only segment (.init, .text, .rodata)
Progam code. Read only data
Memory Allocation
Address of a is 0xffbff474 Address of b is
0xffbff470 Address of c is 0xffbff46c Address of
d is 0xffbff468 Address of e is
0xffbff460 Address of f is 0xffbff3e0 Address of
g is 0x20b9c
include int main(void) int
a,b,c,d double e char f128 static int
g printf(Address of a is 0xx\n,a)
printf(Address of b is 0xx\n,b)
printf(Address of c is 0xx\n,c)
printf(Address of d is 0xx\n,d)
printf(Address of e is 0xx\n,e)
printf(Address of f is 0xx\n,f)
printf(Address of g is 0xx\n,g)
Array Example
typedef int zip_dig5 zip_dig cmu 1, 5, 2,
1, 3 zip_dig mit 0, 2, 1, 3, 9 zip_dig
ucb 9, 4, 7, 2, 0
  • Notes
  • Declaration zip_dig cmu equivalent to int
  • Example arrays were allocated in successive 20
    byte blocks
  • Not guaranteed to happen in general

Array Accessing Example
  • Computation
  • Register reg0 contains starting address of array
  • Register reg1 contains array index
  • Desired digit at 4reg1 reg0

int get_digit(zip_dig z, int dig) return
  • Memory Reference Code

reg0 z reg1 dig store 4reg1 in reg2 add
reg2 to reg0 reg0 4reg1 reg0 store
value at address reg0 in reg3 Mem4reg1reg0
Array Loop Example
int zd2int(zip_dig z) int i int zi 0
for (i 0 i zi return zi
  • Original Source

int zd2int(zip_dig z) int zi 0 int zend
z 4 do zi 10 zi z z
  • Transformed Version
  • As generated by GCC
  • Eliminate loop variable i
  • Convert array code to pointer code
  • Express in do-while form
  • No need to test at entrance

Multidimensional arrays
  • Memory is one-dimensional
  • Multidimensional arrays need to be mapped onto a
    one-dimensional block of memory
  • In C, we have two alternatives
  • Nested arrays anij
  • Static or dynamic allocation
  • Multi-level arrays aij
  • Static or dynamic allocation

Nested Arrays
  • Dimensions are stacked consecutively using an
    index mapping
  • Consider a square two-dimensional array of size N

Array(i,j) - ArrayjiN
Static Nested Array Example
define PCOUNT 4 zip_dig pghPCOUNT 1, 5,
2, 0, 6, 1, 5, 2, 1, 3, 1, 5, 2, 1,
7, 1, 5, 2, 2, 1
  • Declaration zip_dig pgh4 equivalent to int
  • Variable pgh denotes array of 4 elements
  • Allocated contiguously
  • Each element is an array of 5 ints
  • Allocated contiguously
  • Row-Major ordering of all elements guaranteed

Static Nested Array Element Access
  • Array Elements
  • Aij is element of type T
  • Address A (i C j) K

int ARC
A i j
Static Multi-Level Array Example
zip_dig cmu 1, 5, 2, 1, 3 zip_dig mit
0, 2, 1, 3, 9 zip_dig ucb 9, 4, 7, 2, 0
  • Variable univ denotes array of 3 elements
  • Each element is a pointer
  • 4 bytes
  • Each pointer points to array of ints

define UCOUNT 3 int univUCOUNT mit, cmu,
Element Access in Multi-Level Array
  • Computation
  • Element access MemMemuniv4index4dig
  • Must do two memory reads
  • First get pointer to row array
  • Then access element within array

int get_univ_digit (int index, int dig)
return univindexdig
reg0 index reg1 dig store 4reg0 to
reg2 4index store addr of univ in reg3 add
reg2 to reg3 univ 4index load data at
address reg3 into reg4 Memuniv4index store
4reg1 to reg2 4dig add reg2 to reg4
Memuniv4index4dig load data at address
reg4 into reg5 MemMemuniv4index…
Static Array Element Accesses
  • Similar C references
  • Nested Array
  • Element at
  • Mempgh20index4dig
  • Different address computation
  • Multi-Level Array
  • Element at
  • MemMemuniv4index4dig

int get_pgh_digit (int index, int dig)
return pghindexdig
int get_univ_digit (int index, int dig)
return univindexdig
Dynamic Nested Arrays in C
  • Strength
  • Can create matrix of arbitrary size
  • Can choose row or column major order
  • Programming
  • Must do index computation explicitly
  • Performance
  • Accessing single element costly
  • Must do multiplication by dimension

int get_element(int a, int i,int j, int n)
return ainj
Dynamic Multi-level Arrays in C
  • Multi-level
  • Pointer-to-pointer, bracket indexing ij
  • Same dual mem address calculations as for static
    multi-level arrays
  • Can be packed (contiguous storage), bracket

int array1 (int )malloc(nrows sizeof(int
)) for(i 0 i
(int )malloc(ncolumns sizeof(int))
int array2 (int )malloc(nrows sizeof(int
)) array20 (int )malloc(nrows ncolumns
sizeof(int)) for(i 1 i
array2i array20 i ncolumns
  • The individual components are laid out in memory
    in their declaration order
  • There might still be gaps due to alignment of
    data, i.e. to place data on addresses that are a
    multiple of 2,4 or 8
  • Some ISAs require certain aligment of data to
    simplify the design
  • C has support for bitfields, which are tiny
    members of a struct using only a few bits each
  • Useful in low-level systems programming to pack

Bitfield example
  • struct
  • / field 4 bits wide /
  • unsigned field1 4
  • /
  • unnamed 3 bit field
  • unnamedfields allow for padding
  • /
  • unsigned 3
  • /
  • one-bit field
  • can only be 0 or -1 in two's complement!
  • /
  • signed field2 1
  • / align next field on a storage unit /
  • unsigned 0
  • unsigned field3 6
  • full_of_fields

Incomplete Array Type (c99)
  • struct s int n double d
  • struct s p1, p2
  • size_t sz
  • sz sizeof(struct s) // sz offsetof(struct
    s, d)
  • p1 malloc(sz 8 sizeof (double))
  • p2 malloc(sz 5 sizeof (double))
  • / p1 behaves now as if it had been declared as
  • struct int n double d8 p1
  • p2 behaves now as if it had been declared as
  • struct int n double d5 p2
  • /

The Heap
  • When you request memory using the standard C
    library functions, it will be placed in an area
    of the virtual address space called the heap
  • The heap can grow quite large, but is ultimately
    limited by the word size of the CPU
  • Parts of the address space are also reserved for
    the system and other parts of your program
  • The stack usually grows downwards towards the
    heap, which means that they can meet
  • Always check return values to see if you got any

kernel virtual memory (code, data, heap, stack)
memory invisible to user code
user stack (created at runtime)
Automatic variables
memory mapped region for shared libraries
Dynamically allocated data
run-time heap (managed by malloc)
Unitialized data, pointers, global variables
read/write segment (.data, .bss)
loaded from the executable file
read-only segment (.init, .text, .rodata)
Progam code. Read only data
The Inner Workings of the Heap
  • The top of the heap is given by a kernel variable
    called brk
  • To grow the heap you can call the UNIX function
  • The standard C library uses this function
  • Memory on the heap can also be reused
  • If memory has been freed you may not need to
    increase the heap.
  • This memory can be reused
  • See Bryant/OHallaron 10.9-10.11

The Heap Puzzle
  • The standard C library keeps track of the chunks
    of memory you request
  • Start address plus size in bytes
  • The free() function marks chunks as reusable
  • Programs that do a lot of mallocs and frees can
    fragment the heap
  • You cannot move a chunk as this would give a new
    start address which means that all pointers
    storing this address needs to be updated
  • If memory is reused it must fit the new request
  • If you do not match each malloc() with a free()
    you have created a memory leak
  • Once a pointer has been overwritten there is no
    chance of calling free()

Garbage Collection
  • You can construct memory allocators that detect
    when chunks are available for reuse
  • You do not call free()
  • Java uses garbage collection
  • The are such allocators for C too
  • The Boehm-Demers-Weiser (BDW) GC
  • http//
  • Garbage collection can degrade performace
  • The GC is activate in cycles to sweep for free
  • However, it usually reduces or eliminates memory

Discussion Section
  • Todays Location
  • Terman 102-104 (elaine Linux cluster)
  • Same time (415-530)
  • Topics bitwise operators, linking, command-line