Chapter 3 Memory Management

About This Presentation

Title:

Chapter 3 Memory Management

Description:

Title: Your name Your titile Author: zx Last modified by: Created Date: 1/5/2006 8:04:39 AM Document presentation format: Custom Company – PowerPoint PPT presentation

Number of Views:59

Avg rating:3.0/5.0

Slides: 71

Provided by: zx3

Category:

more less

Transcript and Presenter's Notes

Title: Chapter 3 Memory Management

1
Chapter 3 Memory Management
Page Management

Li Wensheng
wenshli_at_bupt.edu.cn

2
Outline

Data Structure
Page Scanner Operation
Page-out Algorithm
Hardware Address Translation Layer

3
PagesThe Basic Unit of Solaris Memory

Physical memory is divided into pages.
A pages identity is its vnode/offset pair.
The hardware address translation (HAT) and
address space layers manage the mapping between
a physical page and its virtual address space.

4
The Page Structure
5
The Page Hash List

global hash list -- an array of pointers to
linked lists of pages
VM system hashes pages with identity onto a
global hash list so that they can be located by
vnode/offset.
Three page functions search the global page hash
list
page_find()
page_lookup()
page_lookup_nowait()

6
Locating Pages by Vnode/Offset Identity
7
MMU-Specific Page Structures

need to keep machine-specific data about every
page, e.g. the HAT information that describes
how the page is mapped by the MMU.
struct machpage
The contents of the machine-specific page
structure are hidden from the generic kernel.
only the HAT machine-specific layer can see or
manipulate its contents

8
Machine-Specific Page Structures sun4u Example
9
Physical Page Lists

a segmented global physical page list, consisting
of segments of contiguous physical memory.
Contiguous physical memory segments are added
during system boot.
Can also added and deleted dynamically when
physical memory is added and removed while the
system is running.

10
arrangement of the physical page lists
11
Free List and Cache List

hold pages that are not mapped into any address
space and that have been freed by page_free().
free list
Does not have a vnode/offset associated
Pages are put on the free list at process exits
is generally very small
cache list
still have a vnode/offset
Seg_map free-behind and seg_vn executables and
libraries (for reuse)

12
The Page-Level Interfaces
Method Description
page_create() Creates pages. Page coloring is based on a hash of the vnode offset. page_create() is provided for backward compatibility only. Dont use it if you dont have to. Instead, use the page_create_va() function so that pages are correctly colored.
page_create_va() Creates pages, taking into account the virtual address they will be mapped to. The address is used to calculate page coloring.
page_exists() Tests that a page for vnode/offset exists.
page_find() Searches the hash list for a page with the specified vnode and offset that is known to exist and is already locked
page_first() Finds the first page on the global page hash list
page_free() Frees a page. Pages with vnode/offset go onto the cache list other pages go onto the free list
page_isfree() Checks whether a page is on the free list
page_ismod() Checks whether a page is modified. This function checks only the software bit in the page structure. To sync the MMU bits with the page structure, you may need to call hat_pagesync() before calling page_ismod().
13
The Page-Level Interfaces (Cont.)
Method Description
page_isref() Checks whether a page has been referenced checks only the software bit in the page structure. To sync the MMU bits with the page structure, you may need to call hat_pagesync() before calling page_isref().
page_isshared() Checks whether a page is shared across more than one address space.
page_lookup() Finds a page representing the specified vnode/offset. If the page is found on a free list, then it will be removed from the free list
page_lookup_nowait() Finds a page representing the specified vnode/offset that is not locked or on the free list
page_needfree() Informs the VM system we need some pages freed up. Calls to page_needfree() must be symmetric, that is they must be followed by another page_needfree() with the same amount of memory multiplied by -1, after the task is complete.
page_next() Finds the next page on the global page hash list.
14
The Page Throttle

implemented in the page_create() and
page_create_va() functions
causes page creates to block when the PG_WAIT
flag is specified, that is, when available is
less than the system global, throttlefree.
throttlefree is set to the same value as minfree.
memory allocated through the kernel memory
allocator specifies PG_WAIT and is subject to the
page-created throttle.

15
Page Sizes
System Type System Type MMU Page Size Capability Solaris 2.x Page Size
Early SPARC systems sun4c 4K 4K
microSPARC-I, -II sun4m 4K 4K
SuperSPARC-I, -II sun4m 4K, 4M 4K, 4M
UltraSPARC-I, -II sun4u 4K, 64K, 512K, 4M 8K, 4M
Intel x86 architecture i86pc 4K, 4M 4K, 4M
16
Page Coloring

page placement policy affects processor
performance
The optimal placement of pages often depends on
the memory access patterns of the application.
in a random order
in some sort of stridden ordered
How page placement can affect performance?
The UltraSPARC-I -II implementations
The L1 cache is 16 Kbytes
The L2 (external) cache can vary between 512
Kbytes and 8 Mbytes
The L2 cache is arranged in lines of 64 bytes,
and transfers are done to and from physical
memory in 64-byte units.

17
Page Coloring (Cont.)

Assume
we have a 32-Kbyte L2 cache
page size of 8 Kbytes
four page-sized slots on the L2 cache
The cache does not necessarily read and write
8-Kbyte units from memory it does that in
64-byte chunks, so 32-Kbyte cache has 1024
addressable slots.

18
Page Coloring (Cont.)
offsets 0 and 32678 map to the same cache
line. If we were now to access these two
addresses, cache ping-pong effect occurs.
we program to virtual memory rather than physical
memory.The OS must provide a sensible mapping
between virtual memory and physical memory
19
Page Coloring (Cont.)

physical pages are assigned to an address space
from the order they appear in the free list.
page coloring algorithm
the free list of physical pages is organized into
specifically colored bins, one color bin for each
slot in the physical cache.
When a page is put on the free list, the
page_free() algorithms assign it to a color bin.
When a page is consumed from the free list
(page_create_va() function ), the
virtual-to-physical algorithm takes the page from
a physical color bin.

20
Page Coloring (Cont.)

The kernel supports a default algorithm and two
optional algorithms.
The default algorithm was chosen according to the
following criteria
Fairly consistent, repeatable results
Good overall performance for the majority of
applications
Acceptable performance across a wide range of
applications

21
Solaris Page Coloring Algorithms
algorithm algorithm description Solaris Availability Solaris Availability Solaris Availability
No. Name 2.5.1 2.6 7
0 Hashed VA The physical page color bin is chosen on a hashed algorithm to ensure even distribution of virtual addresses across the cache. Default Default Default
1 P.Addr V.Addr The physical page color is chosen so that physical addresses map directly to the virtual addresses (as in the example). Yes Yes Yes
2 Bin Hopping Physical pages are allocated with a round-robin method. Yes Yes Yes
6 Kesslers Best Bin Kessler best bin algorithm. Keep history per process of used colors and chooses least used color if multiple, use largest bin. E10000 only (default) E10000 only (default) Not Available
22
Outline

Data Structure
Page Scanner Operation
Page-out Algorithm
Hardware Address Translation Layer

23
Page Scanner

Is the memory management daemon that manages
system wide physical memory
When there is a memory shortage, the page scanner
runs to steal memory from address spaces, by
taking pages that havent been used recently
syncing them up with their backing store
freeing them
If paged-out virtual memory is required again, a
memory page fault occurs.

24
Page Scanner (Cont.)

The balancing of page stealing and page faults
determines which parts of virtual memory will be
backed and which will be moved out to swap.
global page replacement / local page replacement
The subtleties of which pages are stolen govern
the memory allocation policies and can affect
different workloads in different ways.
Enhancements to minimize page stealing from
extensively shared libraries and executables
Priority paging to prevent application, shared
library, and executable paging on systems with
ample memory.

25
Page Scanner Operation

tracks page usage by reading a per-page hardware
bit from the MMU for each page
Two bits for each page Reference bit modify
bit
awakened when the amount of memory on the
free-page list falls below a system threshold
typically 1/64th of total physical memory.
scans through pages in physical page order
looking for pages that havent been used recently
to page out to the swap device and free

26
Two-handed Clock Algorithm

front hand clears the referenced and modified
bits for each page
back hand inspects the referenced and modified
bits some time later
Pages havent been referenced or modified are
swapped out and freed
scan rate is controlled by the amount of free
memory on the system
The gap between the front and back hand is fixed
by a boot-time parameter, handspreadpages.

27
Outline

Data Structure
Page Scanner Operation
Page-out Algorithm
Hardware Address Translation Layer

28
Introduction to page-out algorithm

Steals pages when memory is lower than lotsfree
Scanner runs
Starts scanning at slowscan (pages/sec)
Four times/second when memory is short
Awoken by page allocator if very low
Puts memory out to backing store
Uses a Least Recently Used process
Kernel threads does the scanning

29
Page Scanner Parameters
Parameter Description Min Default
Lotsfree starts stealing anonymous memory pages 512K 1/64 th of memory
Desfree scanner is started at 100 times/second Minfree ½ of lotsfee
Minfree start scanning every time a new page is created ½ of desfree
Throttlefree page_create routine makes the caller wait until free pages are Available Minfree
Fastscan scan rate (pages per second) when free memory minfree slowscan minimum of 64MB/s or ½ memory size
Slowscan scan rate (pages per second) when free memory lotsfree 100
Maxpgio max number of pages per second that the swap device can handle 60 60 or 90 pages per spindle
hand-spreadpages number of pages between the front hand (clearing) and back hand (checking) 1 Fastscan
min_percent_cpu CPU usage when free memory is at lotsfree 4 (1 clock tick) of a single CPU
30
Scan Rate Parameters (Assuming No Priority
Paging)
Stsrts scanning at slowscan
Scans faster as the amount of free memory
approaches 0
31
Scan Rate Parameters calculation

lotsfree is calculated at startup as 1/64th of
memory
slowscan parameter is 100 by default on Solaris
systems
fastscan is set to total physicalmemory/2
If total physical memory is 1G, then
Lotsfree2048 pages/sec fastscan8192 pages/sec
If free memory falls to 12 Mbytes (1536 pages)

32
Not Recently Used Time

The time between the front hand and back hand
short time ? the most active pages remain intact
long time ? only the largely unused pages are
stolen
varies from just a few seconds to several
hours,according to
the number of pages between front and back hand
the scan rate
Example
Scan rate 2000pages/sec
hand spread 8192 pages/sec
Clear/check time 4 seconds

33
Shared Library Optimizations

prevents scanner from stealing pages from
extensively shared libraries
looks at the share reference count for each page
if the page is shared more than a certain amount,
then it is skipped during the page scan
operation.
threshold parameter po_share
8 134217728, By default, starts at 8
A page shared by more than po_share processes
will be skipped
Each time around, it is decremented ?

34
The Priority Paging Algorithm

Purpose overcome adverse behavior that results
from the memory pressure caused by the file
system.
puts a higher priority on a processs pages
its heap, stack, shared libraries, and
executables.
permits scanner to
pick file system cache pages only when ample
memory is available
only steal application pages when there is a true
memory shortage.

35
The Priority Paging Algorithm

a new paging parameter, cachefree
When the amount of free memory lies between
cachefree and lotsfree, the page scanner steals
only file system cache pages
scanner wakes up when memory falls below
cachefree rather than below lotsfree

36
Scan Rate Interpolation with the Priority Paging
Algorithm
37
Page Scanner CPU Utilization Clamp

Purpose to prevent the page-out daemon from
using too much processor time
Two parameters
min_percent_cpu, default 4 of a single CPU
max_percent_cpu, default 80 of a single CPU
CPU time can be used
From min_percent_cpu to max_percent_cpu
min_percent_cpu when free memory is at lotsfree
(cachefree with priority paging enabled)
max_percent_cpu if free memory were to fall to
zero

38
Parameters That Limit Pages Paged Out

Maxpgio
limits the rate at which I/O is queued to the
swap devices
defaults to 40 or 60 I/Os per second
Often set to 100 times the number of swap
spindles
Maxpgio can also indirectly affect file system
throughput

39
Page Scanner Implementation

implemented as two kernel threads
Page scanner thread scans pages
Page-out thread pushes the dirty pages queued
for I/O

40
Page Scanner Architecture
41
Scanner Schedpaging()

waken up
called four times per second by a callout,
triggered by the clock() thread if memory falls
below minfree
triggered by the page allocator if memory falls
below throttlefree
calculates two setup parameters for the page
scanner thread
the number of pages to scan
the number of CPU ticks that the scanner thread
can consume
triggers the scanner through a condition variable

42
Page scanner thread

cycles through the physical page list
The front and back hand each have a page pointer
front hand is incremented first to clear the
referenced and modified bits for pointed page
back hand is then incremented to check the status
of the pointed page (using check_page() function)
If modified, placed in the dirty page queue
If not referenced, freed

43
Page-out thread

uses a preinitialized list of async buffer
headers as the queue for I/O requests
The number of entries is controlled by parameter
async_request_size, initialized with 256
Requests to queue more I/Os will be blocked
if the entire queue is full
if the rate of pages queued has exceeded the
maxpgio
removes I/O entries from the queue
initiates I/O by calling the vnode putpage()

44
The Memory Scheduler

swap out entire processes to conserve memory
removing all of a processs thread structures and
private pages
setting flags in the process table to indicate
that this process has been swapped out
Not expensive but affects processs performance
launched at boot time
does nothing unless memory is less than desfree
looking for processes that can completely swap
out
soft-swap out / hard-swap out

45
Soft Swapping

takes place when the 30-second average for free
memory is below desfree
memory scheduler looks for processes that have
been inactive for at least maxslp seconds
If found
swaps out the thread structures for each thread
pages out all of the private pages of memory for
that process

46
Hard Swapping

takes place when all of the following are true
At least two processes are on the run queue,
waiting for CPU.
The average free memory over 30 seconds is
consistently less than desfree.
Excessive paging is going on
determined to be true if page-out page-in gt
maxpgio
Use a much more aggressive approach to find
memory
First, the kernel is requested to unload all
modules and cache memory that are not currently
active
Then, processes are sequentially swapped out
until the desired amount of free memory is
returned

47
Memory Scheduler Parameters
Parameter Affect on Memory Scheduler
desfree If the average amount of free memory falls below desfree for 30 seconds, then the memory scheduler is invoked.
maxslp When soft-swapping, the memory scheduler starts swapping processes that have slept for at least maxslp seconds. The default for maxslp is 20 seconds and is tunable
maxpgio When the run queue is greater than 2, free memory is below desfree, and the paging rate is greater than maxpgio, then hard swapping occurs, unloading kernel modules and process memory.
48
Outline

Data Structure
Page Scanner Operation
Page-out Algorithm
Hardware Address Translation Layer

49
Introduction to HAT

Hardware Address Translation (HAT)
controls the hardware that manages mapping of
virtual to physical memory
provides interfaces that implement the creation
and destruction of mappings between virtual and
physical memory
provides a set of interfaces to probe and control
the MMU
implements all of the low-level trap handlers to
manage page faults and memory exceptions

50
Solaris Virtual Memory Layers
51
Solaris Memory Model
52
Address Apace

Process Address Space
Process Text and Data
Stack (anon memory) and Libraries
Heap (anon memory)
Kernel Address Space
Kernel Text and Data
Kernel map Space (data structures, caches)
32-bit kernel map (64-bit kernels only)
Trap table
Critical virtual memory data structures
Mapping File System Cache (segmap)

53
The Address Space
54
Role of the HAT layer in virtual-to-physical
translation

hides the platform-specific implementation
used by the segment drivers to implement the
segment drivers view of virtual-to-physical
translation
use hat to hold top-level translation information
hat structure is platform specific
hat is referenced by the address space structure
HAT-specific data structures existing in every
page represent the translation information at a
page level
HAT layer is called when the segment drivers want
to manipulate the hardware MMU

55
Summarizes HAT functions
Function Description
hat_chgattr() Changes the protections for the supplied virtual address range.
hat_clrattr() Clears the protections for the supplied virtual address range.
hat_free_end() Informs the HAT layer that a process has exited.
hat_free_start() Informs the HAT layer that a process is exiting.
hat_get_mapped_size() Returns the number of bytes that have valid mappings.
hat_getattr() Gets the protections for the supplied virtual address range.
hat_memload() Creates a mapping for the supplied page at the supplied virtual address. Used to create mappings.
hat_setattr() Sets the protections for the supplied virtual address range.
hat_stats_disable() Finishes collecting stats on an address space.
hat_stats_enable() Starts collecting page reference and modification stats on an address space.
hat_swapin() Allocates resources for a process that is about to be swapped in.
hat_swapout() Allocates resources for a process that is about to be swapped out.
hat_sync() Synchronizes the struct_page software referenced and modified bits with the hardware MMU.
hat_unload() Unloads a mapping for the given page at the given address.
56
Virtual Memory Contexts Address Spaces

A virtual memory context is a set of
virtual-to-physical translations that maps an
address space
contexts change when
scheduler wants to switch execution from one
process to another
a trap or interrupt from user mode to kernel
occurs
virtual memory context zero refers to kernel
context
HAT layer implements functions to create, delete,
and switch virtual memory contexts
Different hardware MMUs support different numbers
of concurrent virtual memory contexts

57
Hardware Translation Acceleration

translation lookaside buffer (TLB)
a hardware cache of recent translations
The number of entries in the TLB is typically 64
on SPARC systems
TLB fill
hardware
such as Intel and older SPARC implementations
software algorithms
like the UltraSPARC architecture

58
The UltraSPARC-I -II HAT

The UltraSPARC-I -II MMUs do the following
Implement mapping between a 44-bit virtual
address and a 41-bit physical address
Support page sizes of 8 Kbytes, 64 Kbytes, 512
bytes, and 4 Mbytes

59
Virtual-to-Physical Translation
60
Translation Table Entry (TTE)

TTE is a translation map entry, one for each page
TTE contains a virtual address tag and the high
bits of the physical address
TTEs must be loaded into the TLB
When MMU finds the TTE entry that matches the
virtual page number and current context, it
retrieves the physical page information

61
Relationship of TLBs, TSBs, and TTEs
Translation Software Buffer software cache of
TTEs a direct-mapped cache of the TLB an array
of TTEs in regular physical memory
62
TSB Size
Memory Size Kernel TSB Entries Kernel TSB Size User TSB Entries User TSB Size
lt 32 Mbytes 2048 128 Kbytes
32 Mbytes 64 Mbytes 4096 256 Kbytes 8192 16383 512 Kbytes 1 Mbyte
32 Mbytes 2 Gbytes 4096 262,144 512 Kbytes 16 Mbytes 16384 524,287 1 Mbyte 32 Mbytes
2 Gbytes 8 Gbytes 262,144 16 Mbytes 524,288 2,097,511 32 Mbytes 128 Mbytes
8 Gbytes -gt 262,144 16 Mbytes 2,097,512 128 Mbytes
63
Address Space Identifiers

describe the MMU mode and hardware used to access
pages
derived from the instruction being executed and
the current trap level
grouped into three different modes of physical
memory access
The MMU translation context used to index TLB
entries is derived from the ASI

ASI Description Derived Context
Primary The default address translation used for regular SPARC Instructions The address space translation is done through TLB entries that match the context number in the MMU primary context register
Secondary A secondary address space context used for accessing another address space context without requiring a context switch The address space translation is done through TLB entries that match the context number in the MMU secondary context register
Nucleus The address translation used for TLB miss handlers, system calls, and interrupts The nucleus context is always zero (the kernels context).
64
UltraSPARC-I II Watchpoint Implementation

watchpoint registers describe the address of
watchpoints for the address space
Virtual address / physical address
Watchpoint traps are generated when
watchpoints are enabled, and
the data MMU detects a load or store to the
virtual or physical address specified by the
virtual address data watchpoint register or the
physical data watchpoint register

65
UltraSPARC-I -II Protection Modes
Condition Condition Condition Resultant Protection Mode
TTE in D-MMU TTE in I-MMU Writable Attribute Bit Resultant Protection Mode
Yes No 0 Read-only
No Yes Dont Care Execute-only
Yes No 1 Read/Write
Yes Yes 0 Read-only/Execute
Yes Yes 1 Read/Write/Execute
66
UltraSPARC-I -II MMU-Generated Traps
Trap Description
Instruction_access_miss A TTE for the virtual address of an instruction was not found in the instruction TLB
Instruction_access_exception An instruction privilege violation or invalid instruction address occurred
Data_access_MMU_miss A TTE for the virtual address of a load was not found in the data TLB
Data_access_exception A data access privilege violation or invalid data address occurred
Data_access_protection A data write was attempted to a read-only page
Privileged_action An attempt was made to access a privileged address space
Watchpoint Watchpoints were enabled and the CPU attempted to load or store at the address equivalent to that stored in the watchpoint register
Mem_address_not_aligned An attempt was made to load or store from an address that is not correctly word aligned
67
TLB Performance and Large Pages

large pages
typically 4 Mbytes in size
optimize the effectiveness of the hardware TLB
memory performance is largely influenced by the
effectiveness of the TLB
because of the time spent servicing TLB misses
TLBs are limited in size
only 64 entries in UltraSPARC-I and -II

68
TLB reach

TLB reach -- the amount of memory that TLB can
address concurrently
TLB reach TLB entries Page size
648 Kbytes, or 512 Kbytes
increase TLB reach
Increase the number of entries in the TLB
Increase the page size that each entry reflects
A trade-off method -- use two or more different
page sizes at the same time
8-Kbyte, 64-Kbyte, 512-Kbyte. Or 4-Mbyte pages

69
Solaris Support for Large Pages

8 Kbytes
a good mix of performance across the range of
smaller machines to larger machines
hurts large-memory scientific applications and
large-memory databases
hurts kernel performance
4 Mbytes
speeds up the kernel code path
frees up valuable TLB slots for hungry
applications
accelerates graphics performance
Large-Page Database Performance Improvements