Memory Management - PowerPoint PPT Presentation

1 / 145
About This Presentation
Title:

Memory Management

Description:

Parkinson's law: 'Programs expand to fill the memory available to hold them' ... Large (if memory was cheep it would have been large and we wouldn't have to ... – PowerPoint PPT presentation

Number of Views:199
Avg rating:3.0/5.0
Slides: 146
Provided by: csBg
Category:

less

Transcript and Presenter's Notes

Title: Memory Management


1
Memory Management
  • Important expensive resource
  • Parkinsons law Programs expand to fill the
    memory available to hold them.
  • Ideally programmers want memory that is
  • fast
  • non volatile
  • Large (if memory was cheep it would have been
    large and we wouldnt have to discuss its
    management).
  • Strong relation
  • multi-programming lt-gt memory management

2
Memory Management
  • Memory hierarchy
  • small amount of fast, expensive memory cache -
    lt1M
  • some medium-speed, medium price main memory (RAM)
    512M?
  • gigabytes of slow, cheap disk storage (portion
    used for virtual memory) 16G?
  • Memory manager handles the memory hierarchy

3
Memory Management - Motivation
  • n processes, each spending a fraction p of their
    time waiting for i/o, gives a probability of pn
    of all processes waiting for i/o simultaneously
  • cpu utilization 1 - pn

4
Utilizing Memory
  • Assume each process takes 200k and so does the
    operating system
  • Assume there is 1Mb of memory available and that
    p0.8
  • space for 4 processes ? 60 cpu utilization
  • Another 1Mb enables 9 processes? 87 cpu
    utilization

5
Types of memory managers
  • Those that move processes back and forth between
    main-memory and disk
  • And those who dont
  • Simplest form one process in memory at a time.
  • User types a command
  • System loads the program to the main memory and
    executes it.
  • System reports when its done.

6
Multiprogramming with Fixed Partitions
  • How to organize the memory ?
  • How to assign jobs to partitions ?
  • Separate queues vs. single queue

7
Allocating memory - growing segments
8
Memory Allocation and Fragmentation
job queue
process memory time
P1 600K 10
P2 1000K 5
P3 300K 20
P4 700K 8
P5 500K 15
9
Memory Allocation - Keeping Track (bitmaps
linked lists)
10
Strategies for Allocation
  • First fit do not search too much..
  • Next fit - start search from last location
  • Best fit - a drawback generates small holes
  • Worst fit - solves the above problem, badly
  • Quick fit - several queues of different sizes
  • ( Try allocating 2 on the previous slide )
  • Main problem of memory allocation -
    Fragmentation
  • Internal wasted parts of allocated space
  • External wasted unallocated space

11
The Buddy System
  • An example elaborate scheme the Buddy system
    (Knuth 1973)
  • Separate lists of free holes of sizes of powers
    of two
  • For any request, pick the 1st hole of the right
    size
  • Not very good memory utilization
  • Freed blocks can only be merged with their
    neighbors of their own size

12
The Buddy System
13
Fragmentation
  • External Fragmentation total memory space
    exists to satisfy a request, but it is not
    contiguous.
  • Internal Fragmentation allocated memory may be
    slightly larger than requested memory this size
    difference is memory internal to a partition, but
    not being used.
  • Reduce external fragmentation by compaction
  • Shuffle memory contents to place all free memory
    together in one large block.
  • Compaction is possible only if relocation is
    dynamic, and is done at execution time.
  • I/O problem
  • Latch job in memory while it is involved in I/O.
  • Do I/O only into OS buffers.

14
Memory Compaction
15
Swapping
  • A process can be swapped temporarily out of
    memory to a backing store, and then brought back
    into memory for continued execution.
  • Backing store fast disk large enough to
    accommodate copies of all memory images for all
    users must provide direct access to these memory
    images.
  • Roll out, roll in swapping variant used for
    priority-based scheduling algorithms
    lower-priority process is swapped out so
    higher-priority process can be loaded and
    executed.
  • Major part of swap time is transfer time total
    transfer time is directly proportional to the
    amount of memory swapped.
  • Modified versions of swapping are found on many
    systems, i.e., UNIX, Linux, and Windows.

16
Schematic View of Swapping
17
Managing memory by Swapping
  • Processes from disk to memory and from memory to
    disk
  • Whenever there are too many jobs to fit in memory
  • Swapping can help solve fragmentation
  • Allocating memory
  • Freeing memory and holes
  • possible solution swapping and memory compaction
  • since swapping is performed on whole processes it
    results in a noticeable response time
  • longer queues of blocked processes can lead to
    many swaps
  • Allocating swap space
  • Processes are swapped in/out from the same
    location
  • Allocate maximum space? Or estimate maximum
  • Dont allocate swap space for memory-resident
    processes (e.g. Daemons)

18
Swapping in Unix
  • When? Kernel run out of memory
  • a fork system call no space for child process
  • a brk system call to expand data segment
    (new?)
  • a stack becomes too large
  • Who?
  • a blocked process with highest priority
  • a process which consumed much CPU
  • How much space?
  • maximum
  • use holes and first/best fit (old unix)

19
Issues - Relocation and Linking
  • Compile time - create absolute code
  • Load time - linker lists relocatable
    instructions and loader changes instructions (at
    each reload..)
  • Execution time - special hardware needed to
    support moving of processes during run time
  • Dynamic Linking - used with system libraries and
    includes only a stub in each user routine,
    indicating how to locate the memory-resident
    library function (or how to load it, if needed)

20
Binding of Instructions and Data to Memory
Address binding of instructions and data to
memory addresses can happen at three different
stages.
  • Compile time If memory location known a priori,
    absolute code can be generated must recompile
    code if starting location changes.
  • Load time Must generate relocatable code if
    memory location is not known at compile time.
  • Execution time Binding delayed until run time
    if the process can be moved during its execution
    from one memory segment to another. Need
    hardware support for address maps (e.g., base and
    limit registers).

21
Dynamic Linking
  • Linking postponed until execution time.
  • Small piece of code, stub, used to locate the
    appropriate memory-resident library routine.
  • Stub replaces itself with the address of the
    routine, and executes the routine.
  • Operating system needed to check if routine is in
    processes memory address.
  • Dynamic linking is particularly useful for
    libraries.

22
Logical vs. Physical Address Space
  • The concept of a logical address space that is
    bound to a separate physical address space is
    central to proper memory management.
  • Logical address generated by the CPU also
    referred to as virtual address.
  • Physical address address seen by the memory
    unit.
  • Logical and physical addresses are the same in
    compile-time and load-time address-binding
    schemes logical (virtual) and physical addresses
    differ in execution-time address-binding scheme.

23
Paging and Virtual Memory
  • enable an address space that is independent of
    physical memory
  • 232 addresses for a 32 bit (address bus) machine
    - virtual addresses
  • can be achieved by segmenting the executable
    (with segment registers..) or by dividing memory
    using another method
  • Paging - Divide memory into fixed-size blocks
    (page-frames)
  • Small enough blocks - many for one process
  • Allocate to processes non-contiguous memory
    chunks - avoiding holes..

24
Memory-Management Unit (MMU)
  • Hardware device that maps virtual to physical
    address.
  • In MMU scheme, the value in the relocation
    register is added to every address generated by a
    user process at the time it is sent to memory.
  • The user program deals with logical addresses it
    never sees the real physical addresses.

25
Paging
26
Memory Management Unit
27
MMU Operation - page fault if accessed page is
absent
28
Pages the dataPage frames the physical memory
locations
  • Page Table Entries (PTE) contain (per page)
  • Page frame number (physical address)
  • Present/absent bit (valid bit)
  • Dirty (modified) bit
  • Referenced (accessed) bit
  • Protection
  • Caching disable/enable

page frame number
29
Page vs Page-table sizes -Tradeoffs
  • A logical address of 24 bits (16MB) (on 32-bit
    machine with op-codes of 8 bits) can be divided
    into
  • 1K page and 16K entries table (16K 8 128K )
  • 4K page and 4K entries table (4K 8 32K )
  • Large pages less number of pages, but waste in
    last page.
  • Small pages- larger tables (also waste of space)
  • A logical address of 32 bits (4GB) can be
    divided into
  • 1K page and 4M entries table (4M 8 32M! )
  • 4K page and 1M entries table (1M 8 8M )
  • Huge tables! what to do?

30
Two-Level Paging Example
  • A logical address (on 32-bit machine with 4K page
    size) is divided into
  • a page number consisting of 20 bits.
  • a page offset consisting of 12 bits.
  • Since the page table is paged, the page number is
    further divided into
  • a 10-bit page number.
  • a 10-bit page offset.
  • Thus, a logical address is as follows
  • Where pi is an index into the outer page table,
    and p2 is the displacement within the page of the
    outer page table.

31
Two-Level Page-Table Scheme
32
Two-Level Paging Example - Vax
  • A logical address (on 32-bit machine) is divided
    into
  • a page number consisting of 23 bits.
  • a page offset consisting of 9 bits (page size
    1/2K!).
  • Since the page table is paged, the page number is
    further divided into
  • a 21-bit page number.
  • a 2-bit section index. (code, heap, stack,
    system)
  • Thus, a logical address is as follows
  • Where s is an index into the section table, and
    p is the pointer to the page table. Note, Section
    table is always in memory. Page table may be
    swapped. Its max size is 2M 4 8MB!

33
SPARC 3 level pagingContext table (MMU
hardware) - 1 entry per process
34
Page table considerations
  • Can be very large (1M pages for 32bits addresses)
  • Must be fast (every instruction needs it)
  • One extreme will have it all in hardware - fast
    registers that hold the page table and are loaded
    with each process, too expensive for the above
    size
  • The other extreme has it all in memory (using a
    page table base register (ptbr) to point to it -
    each memory reference during instruction
    translation is doubled...
  • To avoid keeping complete page tables in memory -
    make them multilevel (and avoid the danger of
    accumulating memory references per instruction by
    caching)

35
Multilevel Paging and Performance
  • Since each level is stored as a separate table in
    memory, covering a logical address to a physical
    one may take four memory accesses.
  • Even though time needed for one memory access is
    quintupled, caching permits performance to remain
    reasonable.
  • Cache hit rate of 98 percent yields effective
    access time 0.98 x 120 0.02 x 520
    128 nanoseconds.Which is only a 28 percent
    slowdown in memory access time.

36
Inverted page tables
  • for very large memories (page tables) one can
    have an inverted page table sorted by
    (physical) page frames
  • IBM RT HP Spectrum (thinking of 64 bit
    memories)
  • to avoid linear search for every virtual
    address of a process use a hash table (one or a
    few memory references)
  • only one page table the physical one for all
    processes currently in memory
  • in addition to the hash table, associative
    memory registers are used to store recently used
    page table entries
  • the only way to deal with a 64 bit memory 4k
    size pages two-level page tables can result in
    242 entries

37
Inverted Page Table Architecture
38
Shared Pages
39
Motivation for Virtual Memory
  • Unused code
  • Error routines
  • Rare functionality
  • Unused data
  • Array larger then needed
  • Garbage not collected

40
Demand Paging
  • Bring a page into memory only when it is needed.
  • Less I/O needed
  • Less memory needed
  • Faster response
  • More users
  • Page is needed ? reference it
  • Invalid reference ? abort
  • not-in-memory ? bring to memory

41
In-memory Bit
  • With each page table entry a valid-invalid bit is
    associated (1 ? in-memory, 0 ? not-in-memory).
  • Initially valid-invalid but is set to 0 on all
    entries.
  • Example of a page table snapshot.
  • During address translation, if valid-invalid bit
    in page table entry is 0 ? page fault.

42
Page Fault
  • If there is ever a reference to a page, first
    reference will trap to OS ? page fault
  • OS looks at another table to decide
  • Invalid reference ? abort.
  • Just not in memory.
  • Get empty frame.
  • Swap page into frame.
  • Reset tables, validation bit1.
  • Restart instruction Least Recently Used
  • block move
  • Auto increment/decrement location

43
What Happens if there is no Free Frame
  • Page replacement find some page in memory, but
    not really in use, swap it out.
  • Algorithm
  • Performance want an algorithm which will result
    in minimum number of page faults.
  • Same page may be brought into memory several times

44
Page fault Handling
  • 1. trap to kernel, save PC on stack and
    (sometimes) partial state in registers (and/or
    stack)
  • 2. assembly routine saves volatile information
    and calls the operating system
  • 3. find requested virtual page
  • 4. check protection. If legal, find free page
    frame (or invoke page replacement algorithm)
  • 5. if replacing, check if modified and start
    write to disk. Mark frame busy. Call scheduler
    to block process until the write-to-disk process
    has completed.

45
Page fault Handling (contnd.)
  • 6. transfer of requested page from disk
    (scheduler runs alternative processes)
  • 7. upon transfer completion, enter page table,
    mark new page as valid and update all other
    parameters
  • 8. back up faulted instruction which was in
    principle in mid execution now the PC can be
    set back to its initial value
  • 9. schedule faulting process, return from
    operating system
  • 10. restore state (i.e. all volatile information
    stored by the assembly routine) and return to
    user space for execution of faulted process

46
Problem - instruction backup
  • page faulting instructions trap to OS
  • OS must restart instruction
  • The page fault may originate at the op-code or
    any of the operands - PC value useless
  • the location of the instruction itself is lost
  • worse still, undoing of autoincrement or
    autodecrement - was it already performed ??
  • Hardware solutions
  • Register to store PC value of instruction and
    register to store changes to other registers
    (increment/decrement)
  • Micro-code dumps all information on the stack
  • Restart complete instruction and redo increments
    etc.
  • Do nothing - RISC ......

47
Memory access with page faults
  • P probability of a page fault
  • MA memory access time
  • PF time to process page faults
  • EMA Effective Memory Access
  • (1-p) x MA P x PF
  • where
  • PF page-fault interrupt service time
  • Read-in page time (maybe write-page too?)
  • Restart process time

48
Effective memory access
  • For MA 100nsec and PF 25msec
  • if P 0.001
  • ? MA 10025 x 106 / 103 25100nsec
  • if P 10-5
  • ? MA 100250 350nsec

49
Associative Memory - content addressable
memorypage insertion - complete entry from page
tablepage deletion - just the modified bit to
page table
50
Associative Memory - comments
  • With a large enough hit-ratio the average access
    time is close to 0
  • Only a complete virtual address (all levels) can
    be counted as a hit
  • with multi-processing associative memory can be
    cleared on context switch - wasteful..
  • Add a field to the associative memory to hold
    process ID and a special register for PID

51
Fundamental Concepts (1)
  • Virtual address space layout for 3 user processes
  • White areas are private per process
  • Shaded areas are shared among all processes

52
Fundamental Concepts (2)
  • Mapped regions with their shadow pages on disk
  • The lib.dll file is mapped into two address
    spaces simultanously

53
Page Replacement Algorithms
  • Page fault forces choice
  • which page must be removed
  • make room for incoming page
  • Modified page must first be saved
  • unmodified just overwritten
  • Better not to choose an often used page
  • will probably need to be brought back in soon

54
Optimal page replacement
  • Demand comes in for pages (3 Physical pages). The
    Reference string
  • 7, 5, 1, 0, 5, 4, 7, 0, 2,
    1, 0, 7
  • an optimal algorithm faults on
  • 7 5 1 (0,1) - (4,5) - - (2,4) (1,2)
    - -
  • altogether 4 page - replacements
  • take FIFO for example
  • 7 5 1 (0,7) - (4,5) (7,1) - (2,0) (1,4)
    (0,7)(7,2)
  • 3 additional page-replacements

55
Good old FIFO
  • implemented as a queue
  • the usual drawback
  • oldest page may be a referenced (needed) page
  • second chance FIFO
  • if reference bit is on - move to end of queue
  • Better to implement as a circular queue
  • save overhead of movements on the queue

56
LRU - Least Recently Used
  • Approximate the optimal algorithm -
  • most recently used page as most probable next
    reference
  • Replace page used furthest in the past
  • Not easy to implement - needs counting of
    references
  • Use a large counter (number of operations) and
    save in a field in the page table, for each page
    reference operation
  • Another option is to use a bit array of nxn bits
  • In both cases the page entry with the smallest
    number attached to it is selected for replacement

57
LRU vs. Optimal
  • reference string
  • 7 0 1 2 0 3 0 4
    2 3 0 3 2 1 2 0 1
    7 0 1
  • page frames
  • Figure 9.10 Optimal page-replacement
    algorithms
  • reference string
  • 7 0 1 2 0 3 0 4
    2 3 0 3 2 1 2 0 1
    7 0 1
  • Page frames
  • Figure 9.11 LRU page-replacement algorithm.

58
Second Chance Page Replacement Algorithm
  • Operation of a second chance
  • pages sorted in FIFO order
  • Page list if fault occurs at time 20, A has R bit
    set(numbers above pages are loading times)
  • When A moves forward its R bit is cleared!

59
The Clock Page Replacement Algorithm
60
Page replacement NRU - Not Recently Used
  • There are 4 classes of pages, according to
    reference and modification bits
  • Select a page at random from the least-needed
    class
  • Easy scheme to implement
  • Prefers a frequently referenced (not modified)
    page on an old modified page
  • Class b is interesting, can only happen when
    clock tick generates an erasure of the referenced
    bit..

61
LRU Realizing in Hardware
  • Use a large counter (64 bits) and save in a field
    in the page table, for each page reference
    operation. At PF find minimum how ?
  • Another option is to use for each page a counter
    with shift. For each Page reference Shift all
    counters and put 1 for the referenced page.
    Select page with most zeroes from the left too
    many counter shifts!
  • Another option is to use a bit array of nxn bits
    and use only TWO operations set row to 1s, set
    column to 0s.
  • In all cases, too much overhead for the Hardware
  • Needed an (approximate) Software solution

62
LRU with bit tables
Reference string is 0,1,2,3,2,1,0,3,2,3
63
NFU - Not Frequently Used
  • In order to record frequently used pages add a
    counter to all table entries but dont update
    each memory reference, but each Clock tick!
  • At each clock tick add the R bit to the counters
  • Select page with lowest counter for replacement
  • problem remembers everything
  • remedy (an aging algorithm)
  • shift-right the counter before adding the
    reference bit
  • add the reference bit at the left
  • Less operations than LRU, depends on the
    intervals used for updating

64
NFU - the aging simulation version
65
Differences between LRU and NFU
  • If two pages have the same number of zeroes
    before the first 1, who to select?
  • If two pages have both counters 0s who to
    select? (counter too short)
  • Therefore its only an Approximation!

66
Modelling (static) paging algorithms
  • Beladys anomaly
  • Example FIFO with reference string 123412512345

67
Characterizing page replacement
  • a Reference string (of requested pages)
  • number of virtual pages n
  • number of physical page frames m - static
  • a page replacement algorithm
  • can be represented by an array M of n rows

1
68
Stack Algorithms
  • Definition Set of pages in physical memory with
    m page frames is a subset of the pages in
    physical memory with m1 page frames (for every
    reference string)
  • Stack algorithms have no anomaly
  • Example LRU, optimal replacement
  • FIFO is not a stack algorithm
  • Useful definition
  • Distance string distance from top of stack

69
Predicting page fault number
  • Ci is the number of times that i is in the
    distance string
  • the number of page faults with m frames is
  • Fm

70
The Distance String
  • Probability density functions for two
    hypothetical distance strings

71
Page Allocation Policies (2)
  • Page fault rate as a function of the number of
    page frames assigned

72
Page Frame Allocation
  • for a page-fault rate p, memory access time of
    100 nanosecs and page-fault service time of 25
    millisecs the effective access time is (1-p) x
    100 p x 25,000,000
  • for p of 0.001 the effective access time is
    still larger than 100 nanosecs by a factor of 250
  • for a goal of only a 10 degradation in access
    time we need p 0.0000004
  • policies for page-frame allocation must allocate
    as much as possible to processes, to enhance
    performance leave no unassigned page-frame
  • difficult to know how much frames to allocate to
    processes differ in size structure priority

73
Allocation to multiprocesses
  • Fair share is not the best policy (static !!)
  • allocate according to process size so, so
  • must be a minimum for running a process...

Age
A6
A6
74
Thrashing
  • If a process does not have enough pages, the
    page-fault rate is very high. This leads to
  • Low CPU utilization.
  • Operating system thinks that it needs to increase
    the degree of multiprogramming.
  • Another process added to the system.
  • Thrashing ? a process is busy swapping pages in
    and out.

75
Thrashing Diagram
  • Why does paging work?Locality model
  • Process migrates from one locality to another.
  • Localities may overlap.
  • Why does thrashing occur?? size of locality gt
    total memory size

76
Working-Set Model
  • ? ? working-set window ? a fixed number of page
    references Example 10,000 instruction
  • WSSi (working set of Process Pi) total number
    of pages referenced in the most recent ? (varies
    in time)
  • If ? too small will not encompass entire
    locality.
  • If ? too large will encompass several localities.
  • If ? ? ? will encompass entire program.
  • D ? WSSi ? total demand frames
  • If D gt m ? Thrashing
  • Policy if D gt m, then suspend one of the
    processes.

77
Working-Set Model
  • The working set is the set of pages used by the K
    most recent memory references
  • The function w(k,t) is the size of the working
    set at time t
  • How do we estimate w(k,t) WITHOUT update on each
    memory reference?

78
Working set model
79
Dynamic Page Allocation - lookback ?
  • 0 2 1 3 5 4 6 3 7 5 7 3 3 5 6 4
  • with 5 page frames (LRU)
  • p p p p p p p - p - - - - - -
    - optimal
  • with ? 5 (and LRU)
  • p p p p p p p - p - - (4)(3) - p(4)
    p(4)
  • for a window of size 5 the allocated WS is
    decreasing after request 12 and 14
  • the maximum page allocation is ?
  • extra page fault, because of the size of the WS
  • after the last request, page 4, the number of
    allocated page frames increases again (4)

80
Keeping track of the Working Set
  • Approximate with interval timer a reference
    bit.
  • Example ? 10,000
  • Timer interrupts after every 5000 time units.
  • Keep in memory 2 bits for each page.
  • Whenever a timer interrupts copy and sets the
    values of all reference bits to 0.
  • If one of the bits in memory 1 ? page in
    working set.
  • Why is this not completely accurate?
  • Improvement 10 bits and interrupt every 1000
    time units.

81
Dynamic set Aging
  • the look-back window cannot be based on memory
    references - too expensive
  • one way to enlarge the time gap between updates
    is to use some clock tick triggering
  • reference bits are updated by the hardware
  • some algorithm sets-off reference bits, but uses
    also an additional data structure to store the
    current virtual time of the process - aging.
    The current virtual time is stored for each
    entry with R 1, this is done every clock
    interrupt.
  • At PF time, the table is scanned and the entry
    with R0 and the largest age (virtual time
    stored time), is selected.
  • Why virtual time? Since we need to keep times
    independently for processes.
  • This idea can be a basis for page replacement
    that selects the oldest pages among the
    non-referenced

82
The Working Set Page Replacement Algorithm (2)
  • The working set algorithm

83
Dynamic set - Clock Algorithm
  • WSClock is a global clock algorithm - for pages
    held by all processes in memory
  • Circling the clock, the algorithm uses the
    reference bit and an additional data structure,
    ref(frame), is set to the current virtual time
    of the process
  • WSClock Use an additional condition that
    measures elapsed (process) time and compares it
    to ?
  • replace page when two conditions apply
  • reference bit is unset
  • Tp -- ref(frame) gt ?

84
The WSClock Page Replacement Algorithm
85
Dynamic set - WSClock Example
  • 3 processes p0, p1 and p2
  • current (virtual) times of the 3 processes are
  • Tp0 50 Tp1 70 Tp2 90
  • WSClock replace when Tp -- ref(frame) gt ?
  • the minimal distance (window size) is ? 20
  • The clock hand is currently pointing to page
    frame 4
  • page-frames 0 1 2 3 4 5 6
    7 8 9 10
  • ref. bit 0 0 1 1 1 0 1
    0 0 1 0
  • process ID 0 1 0 1 2 1 0
    0 1 2 2
  • last_ref 10 30 42 65 81 57 31 37 31
    47 55
  • 13 13 39
  • gt20

86
Review of Page Replacement Algorithms
87
Comment - Page size analysis
  • To minimize wasted memory
  • process size s
  • page size p
  • page table entry size e
  • Fragmentation overhead is
  • Table space overhead is
  • Total overhead is
  • Minimize overhead
  • Example s 128k e 8bytes
  • optimal page size is 1488 bytes... i.e. use
    1k or 2k or 4k

88
Virtual Memory - Advantages
  • Programs use much smaller physical memory than
    their maximum requirements (much code or data is
    unused)
  • more programs can run concurrently in memory
  • Programs can use much larger (virtual) memory
  • simplifies programming and enable using powerful
    software
  • swapping time is smaller
  • All physical memory can be used, whether
    consecutive or not.
  • More flexible memory protection

89
Virtual Memory - Disadvantages
  • Special hardware for address translation - some
    instructions may require 5-6 address
    translations!
  • Difficulties in restarting instructions
    (chip/microcode complexity)
  • Complexity of OS!
  • Overhead - a Page-fault is an expensive operation
    in terms of both CPU and I/O overhead.
  • Difficulty of optimizing memory utilization -
    e.g. Buffering in DBMSs. Dangers of Thrashing!

90
Additional issues - Locking and Sharing
  • i/o channel/processor (DMA) transfers data
    independently
  • page must not be replaced during transfer
  • OS can use a lock variable per page
  • Pages of editors code - shared among processes
  • swapping out, or terminating, process A (and its
    pages) may cause many page faults for process B
    that shares them
  • looking up for evicted pages in all page tables
    is impossible
  • solution maintain special data structures for
    shared pages
  • nice idea transfer page from (kernel) process
    sending data to process receiving it

91
Handling the backing store
  • need to store non-resident pages on disk
  • the backing store (disk swap area) need to be
    managed
  • allocate swap area to (whole) processes and
    address pages by offset from swap address
  • processes grow during execution - assign separate
    swap areas to Text Data and Stack
  • allocate disk blocks when needed - needs disk
    addresses in memory to keep track of swapped pages

92
Backing Store
  • (a) Paging to static swap area
  • (b) Backing up pages dynamically

93
Implementation Issues
  • Four times when the OS is involved with paging
  • Process creation
  • determine program size
  • create page table
  • Process execution
  • MMU reset for new process
  • TLB flushed
  • Page fault
  • determine virtual address causing fault
  • swap target page out, needed page in
  • Process termination
  • release page table, pages

94
Cleaning Policy
  • Need for a background process, paging daemon
  • periodically inspects state of memory
  • When too few frames are free
  • selects pages to evict using a replacement
    algorithm
  • It can use same circular list (clock)
  • as regular page replacement algorithm but with
    diff ptr

95
Locking Pages in Memory
  • Virtual memory and I/O occasionally interact
  • Proc issues call for read from device into buffer
  • while waiting for I/O, another processes starts
    up
  • has a page fault
  • buffer for the first proc may be chosen to be
    paged out
  • Need to specify some pages locked
  • exempted from being target pages

96
Separation of Policy and Mechanism
  • Page fault handling with an external pager
  • Example use DBMS!

97
Page Daemons - Unix
  • It is assumed useful to keep a number of free
    pages
  • freeing of page frames can be done by a page
    daemon - a process that sleeps most of the time
  • awakened periodically to inspect the state of
    memory - if there are too few free page frames
    then it frees page frames
  • yet another type of (global) dynamic page
    replacement policy
  • this strategy performs better than evicting pages
    when needed (and writing the modified to disk in
    a hurry)
  • The net result is the use of all of available
    memory as page-pool

98
Page replacement - Unix
  • The page daemon uses a two handed clock
    algorithm
  • Any global clock algorithm either clears the
    reference bit or grabs the (unreferenced) page
    from its process. It is fast and just uses the
    reference bit
  • a two-handed clock algorithm clears the
    reference bit first and grabs with its second
    hand. It has the parameter of the angle between
    the hands - small angle leaves only busy pages
  • interesting idea on fork - keep the same page
    for offspring and only copy-upon-write (Linux)
  • another interesting idea (Linux) inspect user
    pages in virtual memory order (global clock) and
    in system order (first unused cache, second
    unused shared, third, unused heaviest user
    process)
  • bdflush a daemon to flush dirty pages

99
and in Windows 2000
  • Processes have working sets defined by two
    parameters - the minimal and maximal of pages
  • the WS of processes is updated at the occurrence
    of each page fault (i.e. the data structure WS) -
  • PF and WS lt Min add to WS
  • PF and WS gt Max remove from WS
  • Memory is managed by keeping a number of free
    pages, which is a complex function of memory use,
    at all times (at most one disk reference per PF)
  • when the balance-set-manager is run (every
    second) and it needs to free pages -
  • surplus pages (to the WS) are removed from a
    process (large background before small
    foreground)
  • counters of reference for pages are maintained
    (on a multi-processor refs bits dont work since
    they are local)

100
Memory Management System Calls
  • The principal Win32 API functions for mapping
    virtual memory in Windows 2000

101
Implementation of Memory Management
  • A page table entry for a mapped page on the
    Pentium

102
Physical Memory Management (1)
  • Various page lists and transitions between them

103
Segmentation
  • several logical address spaces per process
  • a compiler needs segments for
  • source text
  • symbol table
  • constants segment
  • stack
  • parse tree
  • compiler executable code
  • Most of these segments grow during execution

symbol table
symbol table
Source Text
source text
constant table
parse tree
call stack
104
Segmentation - segment table
105
Sharing of segments
106
Segmentation vs. Paging
consideration Paging Segmentation
Need the program be aware of the technique ? no yes
How many linear address spaces ? 1 many
Can the total address space exceed physical memory ? yes yes
Can procedures and data be distinguished ? no yes
Sharing of procedures among users facilitated ? no yes
Motivation for the technique Get a large linear space Programs and data in logical independent address spaces
107
Segmentation Architecture
  • Logical address consists of a two tuple
  • ltsegment-number, offsetgt,
  • Segment table maps two-dimensional physical
    addresses each table entry has
  • base contains the starting physical address
    where the segment reside in memory.
  • limit specifies the length of the segment.
  • Segment-table base register (STBR) points to the
    segment tables location in memory.
  • Segment-table length register (STLR) indicates
    number of segments used by a program
  • segment number s is legal if s
    lt STLR.

108
Segmentation Architecture (Cont.)
  • Protection. With each entry in segment table
    associate
  • validation bit 0 ? illegal segment
  • read/write/execute privileges
  • Protection bits associated with segments code
    sharing occurs at segment level.
  • Since segments vary in length, memory allocation
    is a dynamic storage-allocation problem (i.e.
    Fragmentation problem)

109
Segmentation with Paging
  • MULTICS combined segmentation and paging
  • 218 segments of up to 64k words (36 bits)
  • addresses of 34 bits -
  • 18 bit segment number
  • 16 bit - page number (6) offset within page
    (10)
  • Each process has a segment table (STBR)
  • The segment table is a segment and is paged
    (8bits page 10 offset). STBR added to 18bits
    seg-num
  • Each segment is a separate virtual memory with a
    page table (6 bits)
  • Segment tables contain segment descriptors 18
    bits page table address 9 bits segment length.

110
MULTICS segment descriptors
111
Segmentation - Memory reference procedure
  • 1. Use segment number to find segment descriptor
  • segment table is itself paged because it is
    large, so in actuality a STBR is used to locate
    page of descriptor
  • 2. Check if segments page table is in memory
  • if not a segment fault occurs
  • if there is a protection violation TRAP (fault)
  • 3. page table examined, a page fault may occur.
  • if page is in memory the address of start of page
    is extracted from page table
  • 4. offset is added to the page origin to
    construct main memory address
  • 5. perform read/store etc.

112
MULTICS Address Translation Scheme
113
segmentation and paging - locating addresses
114
Segmentation with Paging MULTICS
  • Simplified version of the MULTICS TLB
  • Existence of 2 page sizes makes actual TLB more
    complicated

115
Multics - Additional checks during Segment link
(call)
  • Since Segments are mapped to files, ACLs
    (access-control list) are checked with first
    access (open)
  • Protection rings are checked
  • Parameters may be passed via special gates
  • A most advanced Architecture!

116
Paged segmentation on the INTEL 80386
  • 16k segments, each up to 1G (32bit words)
  • 2 types of segment descriptors
  • Local Descriptor Table (LDT), for each process
  • Global (GDT) system etc.
  • access by loading a 16bit selector to one of the
    6 segment registers CS, DS, SS, (holding the
    16bit selector during run time, 0 means
    not-in-use)
  • Selector points to segment descriptor (8 bytes)

Privilege level (0-3)
0 GDT/ 1 LDT
13
1
2
Index
117
80386 - segment descriptors
118
80386 - Forming the linear address
  • Segment descriptor is in internal (microcode)
    register
  • If segment is not zero (TRAP) or paged out (TRAP)
  • Offset size is checked against limit field of
    descriptor
  • Base field of descriptor is added to offset (4k
    page-size)

119
80386 - paged segmentation (contnd.)
  • Combine descriptor and offset into linear address
  • If paging disabled, pure segmentation (286
    compatibility). Linear address is physical
    address
  • Paging is 2-level
  • page directory (1k) page table (1k)
  • pages are 4k bytes each (12bit offset)
  • Page directory is pointed to by a special
    register
  • PTEs have 20bits page frame and 12 bits of
    modified, accessed, protection, etc.
  • Small segments have just a few page tables

120
80386 - 2-level paging
121
Segmentation with Paging Pentium (4)
  • Mapping of a linear address onto a physical
    address

122
Intel 30386 address translation
123
The end
  • The end

124
Dynamic Loading
  • Routine is not loaded until it is called
  • Better memory-space utilization unused routine
    is never loaded.
  • Useful when large amounts of code are needed to
    handle infrequently occurring cases.
  • No special support from the operating system is
    required implemented through program design.

125
Dynamic Linking
  • Linking postponed until execution time.
  • Small piece of code, stub, used to locate the
    appropriate memory-resident library routine.
  • Stub replaces itself with the address of the
    routine, and executes the routine.
  • Operating system needed to check if routine is in
    processes memory address.
  • Dynamic linking is particularly useful for
    libraries.

126
Memory Protection
  • Hardware
  • history IBM 360 had a 4bit protection code in
    PSW and memory in 2k partitions - process code in
    PSW matches memory partition code
  • Two registers - base limit
  • base is added by hardware without changing
    instructions dynamic relocation
  • every request is checked against limit
    runtime bound checking
  • reminder In the IBM/pc there are segment
    registers (but no limit)

127
Modeling Multiprogramming
Degree of multiprogramming
  • CPU utilization as a function of number of
    processes in memory

128
No page tables - MIPS R2000
  • 64 entry associative memory for virtual pages
  • if not found, TRAP to the operating system
  • software uses some hardware registers to find the
    virtual page needed
  • a second trap may happen by page fault...

129
Inverted page tables
  • for very large memories (page tables) one can
    have an inverted page table sorted by
    (physical) page frames
  • IBM RT HP Spectrum (thinking of 64 bit
    memories)
  • to avoid linear search for every virtual
    address of a process use a hash table (one or a
    few memory references)
  • only one page table the physical one for all
    processes currently in memory
  • in addition to the hash table, associative
    memory registers are used to store recently used
    page table entries
  • the only way to deal with a 64 bit memory 4k
    size pages two-level page tables can result in
    242 entries

130
Inverted Page Table Architecture
131
Problem - instruction backup
  • page faulting instructions trap to OS
  • OS must restart instruction
  • The page fault may originate at the op-code or
    any of the operands - PC value useless
  • the location of the instruction itself is lost
  • worse still, undoing of autoincrement or
    autodecrement - was it already performed ??
  • Hardware solutions
  • Register to store PC value of instruction and
    register to store changes to other registers
    (increment/decrement)
  • Micro-code dumps all information on the stack
  • Restart complete instruction and redo increments
    etc.
  • Do nothing - RISC ......

132
Assignment 3 Virtual Memory
  • In your third assignment you will implement a
    virtual memory simulator.
  • VMs goal is to give the user the ability to
    write programs without the concern of physical
    memory size in her computer.
  • The simulator will enable simulation of paging
    hardware and page-replacement software and
    testing of various page replacement strategies.

133
The main questions
  • Which page replacement algorithm to use?
  • how to maintain the page tables?
  • Before we can answer these questions we must
    review our hardware.

134
The main components
  • Swapper - very simple swapper device,
    simulating a paging disk. It reads/writes pages
    from/to a specific page address.
  • Fast memory - the physical memory and some
    info.It has the ability to read/write a byte or
    a page from/to a specific address. For the same
    price, it includes also a table with the
    following info on each page ID ,Dirty bit,
    Reference bit
  • MMU the hardware translator from logical to
    physical addresses. Has limited amount of space
    to store information. When a page is not in
    physical memory, the MMU will trap to the page
    replacement manager.

135
And two more
  • Page Replacement Manager - acts as the OS in
    time of a trap from the MMU. When called to duty,
    it chooses a page from physical memory and
    replaces it with the requested page.
  • VM The object that the user has to interface.
    All other components are transparent to the user.
    It provides read/write from/to any address in the
    virtual address space, and requests from the
    system some statistical data (e.g. hit ratio).

136
Back to our questions
  • Which page replacement algorithm to use?
  • Answer you will have to design a LRU
    approximation algorithm with the given hardware
    in the fast memory.
  • For comparison, also a FIFO algorithm.

137
  • How to maintain the page table?
  • Answer Use a 2-level page table. The first
    level is stored in the MMU cache memory. The 2nd
    level tables are page sized each and are located
    in the physical memory.Important 2nd level
    tables may be swapped in and out of memory.

138
A Typical configuration.
Physical Memory
1 6 V
2 7 V
5 I
6 4 V
4
2
7
1
6
Kernel space
Swap device
User space
3 I
4 3 v
7 5 v
8 I
8
5
3
Kernel space
User space
User Page no 1
no adr V/I
1 2 V
2 I
3 1 V
4 I
First Level Table in MMU.
1
139
What happens if
  • The user wishes to write to user page no 6.
  • The user wishes to write to user page no 5, while
    the next candidate to be swapped out is user page
    no 6.
  • The user wishes to write to user page no 3, while
    the next candidate to be swapped out is user page
    no 7.

140
The scenario
  1. User wishes to Read/Write a character from/to
    address v_adr in the virtual memory that belongs
    to a virtual page number pg.
  2. The virtual memory queries the MMU for the
    physical address of v_adr .
  3. The MMU first checks (in the first level table)
    if the second level page (that contains the entry
    for pg) is in physical memory. If it is, go to 6.
  4. Notify the Page Replacement manager that a page
    fault occurred provide the required information.
  5. The Page Replacement Manager chooses a page p
    from the second level pages section in the
    physical memory and replaces the requested page
    with p. Then it updates both entries in the first
    level table. Go to 3.

141
  1. Look for the physical address of pg in the
    appropriate second level page table entry. If it
    is in physical memory, then return correct
    physical address of v_adr and go to 9.
  2. The MMU notifies the Page Replacement Manager
    that a page fault occurred.
  3. The Page Replacement Manager chooses page sp from
    the user pages section of the physical memory and
    replaces the requested page with sp. Then it
    updates both entries in the appropriate second
    level pages (But the second level page containing
    the entry of sp might not be in physical memory.
    In that case we have another page fault that has
    to be taken care of). Go to 6
  4. The VM receives from the MMU the physical address
    of v_adr and reads/writes from/to that physical
    address.

142
For evaluating your assignment
  • virtual void pf_history() 0 for each page
    fault, displays on screen a record serial
    number, type(kernel/user), Page In, Page out
  • virtual double hit_ratio() 0
  • virtual void showMemoryTable()
  • virtual void showPhysicalAddress(int adr)0
  • virtual void showFirstLevelPageTable()0
  • virtual void showSecondLevelPageTable(int i)0
  • Important these methods are for evaluation only
    and will not change the simulators configuration.

143
Segmentation - Dynamic Linking
144
Fundamental Concepts (1)
  • Virtual address space layout for 3 user processes
  • White areas are private per process
  • Shaded areas are shared among all processes

145
Fundamental Concepts (2)
  • Mapped regions with their shadow pages on disk
  • The lib.dll file is mapped into two address
    spaces simultanously
Write a Comment
User Comments (0)
About PowerShow.com