CHAPTER 7 LARGE AND FAST: EXPLOITING MEMORY HIERARCHY - PowerPoint PPT Presentation

Loading...

PPT – CHAPTER 7 LARGE AND FAST: EXPLOITING MEMORY HIERARCHY PowerPoint presentation | free to view - id: 6e7050-YWQzN



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

CHAPTER 7 LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

Description:

CHAPTER 7 LARGE AND FAST: EXPLOITING MEMORY HIERARCHY Topics to be covered Principle of locality Memory hierarchy Cache concepts and cache organization – PowerPoint PPT presentation

Number of Views:15
Avg rating:3.0/5.0
Date added: 20 May 2020
Slides: 66
Provided by: Sun162
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: CHAPTER 7 LARGE AND FAST: EXPLOITING MEMORY HIERARCHY


1
CHAPTER 7LARGE AND FAST EXPLOITING MEMORY
HIERARCHY
  • Topics to be covered
  • Principle of locality
  • Memory hierarchy
  • Cache concepts and cache organization
  • Virtual memory concepts
  • Impact of cache virtual memories on performance

2
PRINCIPLE OF LOCALITY
  • Two types of locality inherent in programs are
  • Temporal locality Locality in time
  • If an item is referenced, it will tend to be
    referenced again soon
  • Spatial locality Locality in space
  • If an item is referenced, items whose
    addresses are close by will tend to be
    referenced soon
  • The principle of locality allows the
    implementation of
  • memory hierarchy.

3
MEMORY HIERARCHY
  • Consists of multiple levels of memory with
    different speeds and sizes.
  • Goal is to provide the user with memory at a low
    cost, while providing access at the speed offered
    by the fastest memory.

4
MEMORY HIERARCHY (Continued)
  • CPU
  • Speed Size
    Cost/bit Implemented Using
  • Cache Fastest
    Smallest Highest SRAM
  • Memory
  •  
  • Main DRAM
  • Memory
  • Secondary Slowest Biggest
    Lowest Magnetic Memory Disk
  • Memory hierarchy in a computer

5
CACHE MEMORY
  • Cache represents the level of memory hierarchy
  • between the main memory and the CPU.
  • Terms associated with cache
  • Hit The item requested by the processor is
    found in some block
  • in the cache.
  • Miss The item requested by the processor is not
    found in the
  • cache.

6
Terms associated with cache (Continued)
  • Hit rate The fraction of the memory access
    found in the cache. Used as a measure of
    performance of the cache.
  • Hit rate (Number of hits) ? Number of
    access
  • (Number of hits) ? ( hits
    misses)
  • Miss rate The fraction of memory access not
    found in cache.
  • Miss rate 1.0 - Hit rate

7
Terms associated with cache (Continued)
  • Hit time Time to access the cache memory
  • Includes the time needed to determine
    whether the access is a hit or miss.
  • Miss penalty
  • Time to replace a cache block with the
    corresponding block from the memory ? the time to
    deliver this block to the processor

8
Cache Organizations
  • Three types of cache organizations available
  • Direct-mapped cache
  • Set associative cache
  • Fully associative cache

9
DIRECT MAPPED CACHE
  • Each main memory block is mapped to exactly one
    location in the
  • cache. (It is assumed for right now that 1 block
    1word)
  • For each block in the main memory, a
    corresponding cache
  • location is assigned based on the address of the
    block in the main
  • memory.
  • Mapping used in a direct-mapped cache
  • Cache index (Memory block address) modulo
    (Number of blocks in the cache)

10
Example of a Direct-Mapped Cache
  • Figure 7.5 A direct-mapped cache of 8
    blocks

11
Accessing a Cache Location and Identifying a Hit
  • We need to know
  • Whether a cache block has valid information
  • - done using valid bit
  • and
  • Whether the cache block corresponds to the
    requested word
  • - done using tags

12
CONTENTS OF A CACHE MEMORY BLOCK
  • A cache memory block consists of the data bits,
    tag bits and
  • a valid (V)bit.
  • V bit is set only if the cache block has valid
    information.
  • Cache V Tag Data
  • index

13
CACHE AND MAIN MEMORY STRUCTURE
  • Cache Memory
  • index V Tag Block address
    Data
  •  
  • 0 0
  • 1 1
  • 2 Block
    (K words)
  • K-1
  • Block
  • length
  • (K words)
  • Word
  • length

14
CACHE CONCEPT
  • CPU
  •  
  • Word transfer
  • Cache
  •  
  • Block transfer
  • Main Memory

15
IDENTIFYING A CACHE HIT
  • The index of a cache block and the tag contents
    of
  • that block uniquely specify the memory address of
  • the word contained in the cache block.
  • Example
  • Consider a 32-bit memory address and a cache
    block size of one
  • word. The cache has 1024 words. Compute the
    following.
  • Cache index ? bits
  • Byte offset ? bits
  • Tag ? bits
  • Actual cache size ? bits

16
Example (Continued)
  • Figure. 7.7 Identifying a hit on the cache
    block

17
A Cache Example
  • The series of memory address references given as
    word addresses are 22, 26, 22, 26, 16, 3, 16, and
    18. Assume a direct-mapped cache with 8 one-word
    blocks that is initially empty. Label each
    reference in the list as a hit or miss and show
    the contents of the cache after each reference.

18
A Cache Example (Continued)
  • Index V Tag Data Index V Tag
    Data
  •  
  • 000 000
  • 001 001
  • 010 010
  • 011 (a) 011 (b)
  • 100 100
  • 101 101
  • 110 110
  • 111 111
  • Initial state of the cache

19
A Cache Example (Continued)
  • Index V Tag Data Index V Tag
    Data
  •  
  • 000 000
  • 001 001
  • 010 010
  • 011 (c) 011 (d)
  • 100 100
  • 101 101
  • 110 110
  • 111 111

20
A Cache Example (Continued)
  • Index V Tag Data Index V Tag
    Data
  •  
  • 000 000
  • 001 001
  • 010 010
  • 011 (e) 011 (f)
  • 100 100
  • 101 101
  • 110 110
  • 111 111

21
HANDLING CACHE MISSES
  • If the cache reports a miss, then the
    corresponding block has to be loaded into the
    cache from the main memory.
  • The requested word may be forwarded immediately
    to the processor as the block is being updated
  • or
  • The requested word may be delayed until the
    entire block has been stored in the cache

22
HANDLING CACHE MISSES FOR INSTRUCTIONS
  • Decrement PC by 4
  • Fetch the block containing the missed instruction
    from the main memory and write the block into the
    cache
  • Write the instruction block into the data
    portion of the referenced cache block
  • Copy the upper bits of the referenced memory
    address into the tag field of the cache memory
  • Turn the valid (V) bit on
  • Restart the fetch cycle - this will refetch the
    instruction, this time finding it
    in cache

23
HANDLING CACHE MISSES FOR DATA
  • Read the block containing the missed data from
    the main memory and write the block into the
    cache
  • Write the data into the data portion of the
    referenced cache block
  • Copy the upper bits of the referenced memory
    address into the tag field of the cache memory
  • Turn the valid (V) bit on
  • The requested word may be forwarded immediately
    to the processor as the block is being updated
  • or
  • The requested word may be delayed until the
    entire block has been stored in the cache

24
CACHE WRITE POLICIES
  • Two techniques used in handling a write to a
    cache
  • block in response to a cache write miss
  • Write-through Technique
  • Updates both the cache and the main memory
    for each
  • write
  • Write-back Technique
  • Writes to cache only and postpones
    updating the main
  • memory until the block is replaced in
    the cache

25
CACHE WRITE POLICIES (Continued)
  • The write-back strategy usually employs a dirty
    bit associated with
  • each cache block, much the same as the valid bit.
  •  
  • The dirty bit will be set the first time a value
    is written to the block.
  • When a block in the cache is to be replaced, its
    dirty bit is examined.
  • If the dirty bit has been set, the block is
    written back to the main memory
  • otherwise it is simply overwritten.

26
TAKING ADVANTAGE OF SPATIAL LOCALITY
  • To take advantage of the spatial locality we
    should have a
  • block that is larger than one word in length
    (multiple-word
  • block).
  • When a miss occurs, the entire block (consisting
    of
  • multiple words that are adjacent) will be
    fetched from the
  • main memory and brought into cache.
  • The total number of tags and valid bits, in a
    cache with
  • multiword block, is less because each tag and
    valid bit are used for
  • several words.
  •  

27
Cache with Multiple-word Blocks - Example
  • Figure 7.9 A 16 KB cache using 16-word blocks

28
Identifying a Cache Block for a Given Memory
Address
  • For a given memory address, the corresponding
    cache
  • block can be determined as follows
  • Step 1 Identify the memory block that contains
    the given memory address
  • Memory block address (Word address) div
    (Number of words in the block) Memory
    block address (Byte address) div (Number of
    bytes in the block)
  • (Memory block address is essentially the
    block number in the main memory.)
  • Step 2 Compute the cache index corresponding to
    the memory block
  • Cache block address (Memory Block address)
    Modulo (Number of blocks
    in cache)
  •  

29
HANDLING CACHE MISSES FOR A MULTIWORD BLOCK
  • For a cache read miss, the corresponding block is
  • copied into the cache from the memory.
  •  
  • A cache write miss, in a multiword block cache,
    is
  • handled in two steps
  • Step 1 Copy the corresponding block from memory
    into cache
  • Step 2 Update the cache block with the requested
    word

30
EFFECT OF A LARGER BLOCK SIZE ON PERFORMANCE
  • In general, the miss rate falls when we increase
    the block size.
  • The miss rate may actually go up, if the block
    size is made very
  • large compared with the cache size, due to the
    following reasons
  • The number of blocks that can be held in the
    cache will become small, giving rise to a great
    deal of competition for these blocks.
  • A block may be bumped out of the cache before
    many of its words can be used.
  • Increasing the block size increases the cost of a
    miss (miss penalty)

31
DESIGNING MEMORY SYSTEMS TO SUPPORT CACHES
  • Three memory organizations are widely used
  • One-word-wide memory organization
  • Wide memory organization
  • Interleaved memory organization
  • Figure 7.11 Three options for designing the
    memory system

32
Figure 7.11 Three options for designing the
memory system
33
Example
  • Consider the following memory access times
  • 1 clock cycle to send the address
  • 10 clock cycles for each DRAM access
    initiated
  • 1 clock cycle to send a word of data
  • Assume we have a cache block of four words.
    Discuss
  • the impact of the different organizations on miss
    penalty
  • and bandwidth per miss.

34
MEASURING AND IMPROVING CACHE PERFORMANCE
  • Total cycles CPU spends on a program equals
  • (Clock cycles CPU spends executing the program)
  • (Clock cycles CPU spends waiting for the memory
    system)
  •  
  • Total CPU time Total CPU cycles Clock cycle
    time
  • (CPU execution clock cycles
    Memory-stall clock cycles) Clock cycle
    time

35
MEASURING AND IMPROVING CACHE PERFORMANCE
(Continued)
  • Memory-stall clock cycles Read-stall cycles
    Write-stall cycles
  • Read-stall cycles Number of reads Read miss
    rate Read miss penalty
  • Write-stall cycles Number of writes Write
    miss rate Write miss penalty
  •  
  • Total memory access Number of reads Number of
    writes
  • Therefore,
  • Memory-stall cycles Total memory accesses Miss
    rate Miss penalty

36
MEASURING AND IMPROVING CACHE PERFORMANCE
  • Two ways of improving cache performance
  • Decreasing the cache miss rate
  • Decreasing the cache miss penalty

37
Example
  • Assume the following
  • Instruction cache miss rate for gcc 5
  • Data cache miss rate for gcc 10
  • If a machine has a CPI of 4 without memory stalls
    and a miss
  • penalty of 12 cycles for all misses, determine
    how much faster a
  • machine would run with a perfect cache that never
    missed. The
  • frequency of data transfer instructions in gcc is
    33.

38
REDUCING CACHE MISSES BY MORE FLEXIBLE PLACEMENT
OF BLOCKS
  • Direct Mapped Cache A block could go in
    exactly one place
  • Set Associative Cache There are a fixed number
    of locations
  • where each block can be placed.
  • Each block in the memory maps to a unique set in
    the cache given by the index field.
  • A block can be placed in any element of that set.
  • The set corresponding to a memory block is given
    by
  • Cache set (Block address) modulo (Number of
    sets in the cache)
  • A set associative cache with n possible locations
    for a block is
  • called an n-way set associative cache.

39
REDUCING CACHE MISSES BY MORE FLEXIBLE PLACEMENT
OF BLOCKS (Continued)
  • Fully Associative Cache A block can be placed
    in any location in the cache.
  • To find a block in a fully associative cache, all
    the entries
  • (blocks) in the cache must be searched
  • Figure 7.14 Examples of direct-mapped, set
    associative and fully associative caches
  • The miss rate decreases with the increase in the
    degree
  • of associativity.

40
Figure 7.14 Examples of direct-mapped, set
associative and fully associative caches

41
LOCATING A BLOCK IN THE CACHE
  • Fig. 7.17 Locating a block in a four-way set
    associative cache

42
CHOOSING WHICH BLOCK TO REPLACE
  • Direct-mapped Cache
  •  
  • When a miss occurs, the requested block can go
    in exactly one position. So the block occupying
    that position must be replaced.

43
CHOOSING WHICH BLOCK TO REPLACE (Continued)
  • Set associative or fully associative Cache
  •  
  • When a miss occurs we have a choice of where to
    place the requested block, and therefore a choice
    of which block to replace.
  • Set associative cache
  • All the blocks in the selected set are
    candidates for replacement.
  • Fully associative cache
  • All blocks in the cache are candidates for
    replacement.

44
Replacing a Block in Set Associative and Fully
Associative Caches
  • Strategies employed are
  • First-in-first-out (FIFO)
  • The block replace is the one that was brought
    in in first
  • Least-frequently used (LFU)
  • The block replaced is the one that is least
    frequently used
  • Random Blocks to be replaced are randomly
    selected
  • Least Recently Used (LRU)
  • The block replaced is the one that has been
    unused for the
  • longest time.
  • LRU is the most commonly used replacement
    technique.

45
REDUCING THE MISS PENALTY USING MULTILEVEL CACHES
  • To further close the gap between the fast clock
    rates of modern
  • processors and the relatively long time required
    to access
  • DRAMs, high-performance microprocessors support
    an
  • additional level of caching.
  • This second-level cache (often off chip in a
    separate set of
  • SRAMs) will be accessed whenever a miss occurs in
    the
  • primary cache.
  •  
  • Since the access time for the second-level cache
    is significantly
  • less than the access time of the main memory, the
    miss penalty
  • of the primary cache is greatly reduced.

46
Evolution of Cache organization
  • 80386 No on-chip cache. Employs a
    direct-mapped external cache with a block size
    of 16 bytes (4, 32-bit words). Employs
    write-through technique.
  • 80486 Has a single on-chip cache of 8 KByte
    with a block size of 16 bytes and a 4-way set
    associative organization. Employs write-through
    technique and LRU replacement algorithm.
  • Pentium/Pentium Pro
  • Employs split instruction and data caches (2
    on-chip caches, one for data and one for
    instructions). Each cache is 8 KByte with a block
    size of 32 bytes and a 4-way set associative
    organization. Employs a write-back policy and
    the LRU replacement algorithm.
  • Supports the use of a 2-way set associative
    level 2 cache of 256 or 512 Kbytes with a block
    size of 32, 64, or 128 bytes. Employs a
    write-back policy and the LRU replacement
    algorithm. Can be dynamically configured to
    support write-through caching. In Pentium Pro,
    the secondary cache is on a separate die, but
    packaged together with the processor.

47
Evolution of Cache organization (Continued)
  • Pentium II Employs split instruction and
    data caches. Each cache is 16 Kbytes.
    Supports the use of a level 2 cache of 512
    Kbytes.
  • PII Xeon Employs split instruction and
    data caches. Each cache is 16 Kbytes.
    Supports the use of a level 2 cache of 1 Mbytes
    or 2 Mbytes.
  • Celeron Employs split instruction and
    data caches. Each cache is 16 Kbytes.
    Supports the use of a level 2 cache of 128
    Kbytes.
  • Pentium III Employs split instruction and
    data caches. Each cache is 16 Kbytes.
    Supports the use of a level 2 cache of 512
    Kbytes.

48
Evolution of Cache Organization (Continued)
  • Pentium 4 Employs split instruction and data
    caches (2 on-chip caches, one for data and one
    for instructions). Each cache is 8 KByte with a
    block size of 64 bytes and a 4-way set
    associative organization. Employs a write-back
    policy and the LRU replacement algorithm.
  • Supports the use of a 2-way set associative
    level 2 cache of 256 Kbytes with a block size
    of 128 bytes. Employs a write-back policy and
    the LRU replacement algorithm. Can be dynamically
    configured to support write-through caching.

49
Evolution of Cache organization (Continued)
  • Power PC
  • Model 601 has a single on-chip 32 Kbytes, 8-way
    set associative cache with a block size of 32
    bytes.
  • Model 603 has two on-chip 8 Kbytes, 2-way set
    associative caches with a block size of 32 bytes.
  • Model 604 has two on-chip 16 KByte, 4-way set
    associative caches with a block size of 32 bytes.
    Uses LRU replacement algorithm and write-through
    and write-back techniques. Employs a 2-way set
    associative level 2 cache of 256 or 512 Kbytes
    with a block size of 32 bytes.
  • Model 620 has two on-chip 32 Kbytes, 8-way set
    associative caches with a block size of 64 bytes.
  • Model G3 has two on-chip 32 Kbytes, 8-way set
    associative caches with a block size of 64 bytes.
  • Model G4 has two on-chip 32 Kbytes, 8-way set
    associative caches with a block size of 32 bytes

50
ELEMENTS OF CACHE DESIGN
  • The key elements that serve to classify and
    differentiate cache architectures are as follows
  • Cache size
  • Mapping function
  • Direct
  • Set associative
  • Fully associative
  • Replacement algorithms (for set and fully
    associative)
  • Least-recently used (LRU)
  • First-in-first-out (FIFO)
  • Least-frequently used (LFU)
  • Random
  • Write policy
  • Write-through
  • Write-back
  • Block size
  • Number of caches
  • Single-level or two-level
  • Unified or split

51
  • CACHE SIZE
  • The size of the cache should be small enough so
    that the overall average cost per bit is close to
    that of the memory alone and large enough so that
    the overall average access time is close to that
    of the cache alone.
  •  
  • Large caches tend to be slightly slower than
    small ones (because of the additional gates
    involved).
  •  
  • Cache size is also limited by the available chip
    and board area.
  • Because the performance of the cache is very
    sensitive to the nature of the workload, it is
    almost impossible to arrive at an optimum cache
    size. But studies have suggested that cache sizes
    of between 1K and 512K words would be optimum.

52
  • MAPPING FUNCTION
  • The choice of the mapping function dictates how
    the cache is organized.
  •  
  • The direct mapping technique is simple and
    inexpensive to implement. The main disadvantage
    is that there is a fixed cache location for any
    given block. Thus, if for example a program
    happens to repeatedly reference words from two
    different blocks that map into the same cache
    location, then the blocks will be continually
    swapped in the cache, and the hit ratio will be
    very low.
  •  
  • With (fully) associative mapping, there is
    flexibility as to which block to replace when a
    new block is read into the cache. Replacement
    algorithms are designed to maximize the hit
    ratio. The principal disadvantage is the complex
    circuitry required to examine the tags of all
    cache locations in parallel.
  • Set associative mapping is a compromise that
    exhibits the strengths of both the direct and
    fully associative approaches without their
    disadvantages. The use of two blocks per set is
    the most common set associative organization. It
    significantly improves the hit ratio over direct
    mapping.

53
  • REPLACEMENT ALGORITHMS
  • For set associative and fully associative
    mapping, a replacement algorithm is required. To
    achieve high speed, such algorithms must be
    implemented in hardware.
  • WRITE POLICY
  • The write-through technique, even though simple
    to implement, has the disadvantage that it
    generates substantial memory traffic and may
    create a bottleneck. The write-back technique
    minimizes memory writes. The disadvantage is
    that portions of the main memory are invalid, and
    hence access by I/O modules can be allowed only
    through the cache. This calls for complex
    circuitry and a potential bottleneck.
  • BLOCK SIZE
  • Larger blocks reduce the number of blocks that
    fit into a cache. Because each block fetch
    overwrites older cache contents, a small number
    of cache blocks result in data being overwritten
    shortly after it is fetched. Also, as a block
    becomes larger, each additional word is farther
    from the requested word, therefore less likely to
    be needed in the near future.
  •  
  • The relationship between block size and hit
    ratio is complex, depending on the locality
    characteristics of a given program. Studies have
    shown that a block size of from 4 to 8
    addressable units (words or bytes) would be
    reasonably close to optimum.

54
  • NUMBER OF CACHES
  • Two aspects have to be considered here number
    of levels of caches and the use of unified versus
    split caches.
  •  
  • Single- Versus Two-Level Caches As logic
    density has increased, it has become possible to
    have cache on the same chip as the processor the
    on-chip cache. The on-chip caches reduces the
    processors external bus activity and therefore
    speeds up execution times and increase the
    overall system performance. If the system is
    also provided with an off-chip or external cache,
    then the system is said to have two-level cache,
    with the on-chip cache designated as level 1 (L1)
    and the external cache designated as level 2
    (L2). In the absence of an external cache, for
    every on-chip cache miss, the processor will have
    to access the DRAM. Due to the typical slow bus
    speed and slow DRAM access time the overall
    performance of the system will go down. The
    potential savings due to the use of an L2 cache
    depends on the hit rates in both the L1 and L2
    caches. Studies have shown that, in general, the
    use of L2 cache does improve performance.
  • Unified Versus Split Cache When the on-chip
    cache first made its appearance, many of the
    designs consisted of a single on-chip cache used
    to store both data and instructions. More
    recently, it has become common to split the
    on-chip cache into two one dedicated to
    instructions and one dedicated to data.
  •  
  • Unified cache has the following advantages For
    a given cache size, a unified cache has a higher
    hit rate than split caches because it balances
    the load between instruction and data fetches
    automatically. Only one cache needs to be
    designed and implemented.
  •  
  • The advantage of split cache is that it
    eliminates the contention for cache between the
    instruction fetch unit and the execution unit.
    This is extremely important in any design that
    implements pipelining of instructions.

55
VIRTUAL MEMORY
  • Virtual memory permits each process to use the
    main memory as if it were the only user, and to
    extend the apparent size of accessible memory
    beyond its actual physical size.
  • The virtual address generated by the CPU is
    translated into a physical address, which in turn
    can be used to access the main memory. The
    process of translating the virtual address into a
    physical address is called memory mapping or
    address translation.
  • Page A virtual memory block
  • Page fault A virtual memory miss
  • Figure 7.19 The virtual addressed memory with
    pages mapped to the main memory
  • Figure7.20 Mapping from a virtual to a physical
    address

56
Figure 7.19 The virtual addressed memory with
pages mapped to the main memory
57
Figure7.20 Mapping from a virtual to a physical
address
58
PLACING A PAGE AND FINDING IT AGAIN
  • Operating system must maintain a page table.
  • Page Table
  • Maps virtual pages to physical pages or else to
    locations in the secondary memory
  • Resides in memory
  • Indexed with the page number from the virtual
    address
  • Contains the pp of the corresponding vp
  • Each program has its own page table, which maps
    the virtual address
  • space of that program to main memory.
  • No tags are required in the page table because
    the page table contains a
  • mapping for every possible virtual page.

59
PLACING A PAGE AND FINDING IT AGAIN (Continued)
  • Page table register
  • Indicates the location of the page table in the
    memory
  • Points to the start of the page table.
  •  
  • Figure 7.21 Mapping from a virtual to a physical
    address using the page table register and the
    page table

60
Figure 7.21 Mapping from a virtual to a physical
address using the page table register and
the page table
61
PAGE FAULTS
  • If the valid bit for a virtual page is off, a
    page fault occurs.
  • Operating system is given control at this point
    (exception mechanism)
  • OS finds the page in the next level of the
    hierarchy (magnetic disc for example)
  • OS decide where to place the requested page in
    main memory
  • OS also creates a data structure that tracks
    which processes and which virtual addresses use
    each physical page.
  • When a page fault occurs, if all the pages in
    the main memory are in use, the OS has to choose
    a page to replace. The algorithm usually employed
    is the LRU replacement algorithm.

62
WRITES
  • In a virtual memory system, writes to the disk
    take hundreds of
  • thousands of cycles. Hence write-through is
    impractical. The
  • strategy employed is called write-back (copy
    back).
  •  
  • Write-back technique
  • Individual writes are accumulated into a page
  • When the page is replaced in the memory, it is
    copied back into the disk.

63
MAKING ADDRESS TRANSLATION FAST THE
TRANSLATION-LOOKASIDE BUFFER (TLB)
  • If a CPU has to access a page table resident in
    the memory
  • to translate every memory access, the virtual
    memory would
  • have too much overhead. Instead a TLB cache can
    be used
  • to implement the page table.
  •  
  • Figure 7.23 TLB acts as a cache for page table
    references

64
Figure 7.23 TLB acts as a cache for page table
references
65
FIG. 7.24 INTEGRATING VIRTUAL MEMORY, TLBs,
CACHE
About PowerShow.com