COMP 206: Computer Architecture and Implementation - PowerPoint PPT Presentation

About This Presentation
Title:

COMP 206: Computer Architecture and Implementation

Description:

Number of MM accesses depends on page table organization ... Single bus and MM. Two or more CPUs, each with WB cache ... MM if all cache copies are Clean ... – PowerPoint PPT presentation

Number of Views:94
Avg rating:3.0/5.0
Slides: 28
Provided by: Montek5
Learn more at: http://www.cs.unc.edu
Category:

less

Transcript and Presenter's Notes

Title: COMP 206: Computer Architecture and Implementation


1
COMP 206Computer Architecture and Implementation
  • Montek Singh
  • Wed., Nov. 6, 2002
  • Topic 1. Virtual Memory 2. Cache Coherence

2
Virtual Memory Access Time
  • Assume existence of TLB, physical cache, MM, disk
  • Processor issues VA
  • TLB hit
  • Send RA to cache
  • TLB miss
  • Exception Access page tables, update TLB, retry
  • Memory reference may involve accesses to
  • TLB
  • Page table in MM
  • Cache
  • Page in MM
  • Each of these can be a hit or a miss
  • 16 possible combinations

3
Virtual Memory Access Time (2)
  • Constraints among these accesses
  • Hit in TLB ? hit in page table in MM
  • Hit in cache ? hit in page in MM
  • Hit in page in MM ? hit in page table in MM
  • These constraints eliminate eleven combinations

4
Virtual Memory Access Time (3)
  • Number of MM accesses depends on page table
    organization
  • MIPS R2000/R4000 accomplishes table walking with
    CPU instructions (eight instructions per page
    table level)
  • Several CISC machines implement this in
    microcode, with MC88200 having dedicated hardware
    for this
  • RS/6000 implements this completely in hardware
  • TLB miss penalty dominated by having to go to
    main memory
  • Page tables may not be in cache
  • Further increase in miss penalty if page table
    organization is complex
  • TLB misses can have very damaging effect on
    physical caches

5
Page Size
  • Choices
  • Fixed at design time (most early VM systems)
  • Statically configurable
  • At any moment, only pages of same size exist in
    system
  • MC68030 allowed page sizes between 256B and 32KB
    this way
  • Dynamically configurable
  • Pages of different sizes coexist in system
  • Alpha 21164, UltraSPARC 8KB, 64KB, 512KB, 4MB
  • MIPS R10000, PA-8000 4KB, 16Kb, 64KB, 256 KB, 1
    MB, 4 MB, 16 MB
  • All pages are aligned
  • Dynamic configuration is a sophisticated way to
    decrease TLB miss
  • Increasing TLB entries increases processor
    cycle time
  • Increasing size of VM page increases internal
    memory fragmentation
  • Needs fully associative TLBs

6
Segmentation and Paging
  • Paged segments Segments are made up of pages
  • Paging system has flat, linear address space
  • 32-bit VA (10-bit VPN1, 10-bit VPN2, 12-bit
    offset)
  • If, for given VPN1, we reach max value of VPN2
    and add 1, we reach next page at address (VPN1,
    0)
  • Segmented version has two-dimensional address
    space
  • 32-bit VA (10-bit segment , 10-bit page
    number, 12-bit offset)
  • If, for given segment , we reach max page number
    and add 1, we get an undefined value
  • Segments are not contiguous
  • Segments do not need to have the same size
  • Size can even vary dynamically
  • Implemented by storing upper bound for each
    segment and checking every reference against it

7
Example 1 Alpha 21264 TLB
  • Figure 5.36

8
Example 2 Hypothetical Virtual Mem
  • Figure 5.37

9
Cache Coherence
  • Section 6.3 Appendix I

10
Cache Coherence
  • Common problem with multiple copies of mutable
    information (in both hardware and software)
  • If a datum is copied and the copy is to match
    the original at all times, then all changes to
    the original must cause the copy to be
    immediately updated or invalidated. (Richard L.
    Sites, co-architect of DEC Alpha)

Copy becomes stale
1 2 3 4 A A A C - A B B
Copies diverge hard to recover from
11
Example of Cache Coherence
  • I/O in uniprocessor with primary unified cache
  • MM copy and cache copy of memory block not always
    coherent
  • WT cache
  • MM copy stale while write update to MM in transit
  • WB cache
  • MM copy stale while cache copy Dirty
  • Inconsistency of no concern if no one
    reads/writes MM copy
  • If I/O directed to main memory, need to maintain
    coherence

12
Example of Cache Coherence (contd)
  • Uniprocessor with a split primary cache
  • I-cache contains instruction
  • D-cache contains data
  • Often contents are disjoint
  • If self-modifying code is allowed, then same
    cache block may appear in both caches, and
    consistency must be enforced
  • MS-DOS allows self-modifying code
  • Strong motivation for unified caches in Intel
    i386 and i486
  • Pentium has split primary cache, and supports SMC
    by enforcing coherence between I and D caches
  • Coordinating primary and secondary caches in
    uniprocessor
  • Shared memory multiprocessors

13
Two Snoopy Protocols
  • We will discuss two protocols
  • A simple three-state protocol
  • Section 6.3 Appendix I of HP3
  • The MESI protocol
  • IEEE standard
  • Used by many machines, including Pentium and
    PowerPC 601
  • Snooping
  • monitor memory bus activity by individual caches
  • taking some actions based on this activity
  • introduces a fourth category of miss to the 3C
    model coherence misses
  • First, we need some notation to discuss the
    protocols

14
Notation Write-Through Cache
15
Notation Write-Back Cache
16
Three-State Write-Invalidate Protocol
  • Minor modification of WB cache
  • Assumptions
  • Single bus and MM
  • Two or more CPUs, each with WB cache
  • Every cache block in one of three states
    Invalid, Clean, Dirty (called Invalid, Shared,
    Exclusive in Figure 6.10 of HP3)
  • MM copies of blocks have no state
  • At any moment, a single cache owns bus (is bus
    master)
  • Bus master does not obey bus command
  • All misses (reads or writes) serviced by
  • MM if all cache copies are Clean
  • the only Dirty cache copy (which is no longer
    Dirty), and MM copy is written instead of being
    read

17
Understanding the Protocol
  • Only two global states
  • Most up-to-date copy is MM copy, and
  • all cache copies are Clean
  • Most up-to-date copy is a single unique
  • cache copy in state Dirty
  • Bus owner Clean
  • Another Clean copy exists
  • Can read without notifying
  • other caches
  • Bus owner Dirty
  • No other cache copies
  • Can read or write without
  • notifying other caches
  • Bus owner Clean
  • No other cache copies
  • Can read without notifying
  • other caches

18
State Diagram of Cache Block (Part 1)
19
State Diagram of Cache Block (Part 2)
20
Comparison with Single WB Cache
  • Similarities
  • Read hit invisible on bus
  • All misses visible on bus
  • Differences
  • In single WB cache, all misses are serviced by
    MM in three-state protocol, misses are serviced
    either by MM or by unique cache block holding
    only Dirty copy
  • In single WB cache, write hit is invisible on
    bus in three-state protocol, write hit of Clean
    block
  • invalidates all other Clean blocks by a Bus Write
    Miss (necessary action)
  • But Bus Write Miss causes a completely
    unnecessary block transfer from MM to cache
    (which is then written by CPU)

21
Correctness of Three-State Protocol
  • Problem State transition of FSM is supposed to
    be atomic, but they are not in this protocol,
    because of the bus
  • Example CPU read miss in Dirty state
  • CPU access to cache detects a miss
  • Request bus
  • Acquire bus, and change state of cache block
  • Evict dirty block to MM
  • Put Bus Read Miss on bus
  • Receive requested block from MM or another cache
  • Release bus, and read from cache block just
    received
  • Bus arbitration may cause gap between steps 2 and
    3
  • Whole sequence of operations no longer atomic
  • App. I.1 argues that protocol will work correctly
    if steps 3-7 are atomic, i.e., bus is not a
    split-transaction bus

22
Adding More Bits to Protocols
  • Add third bit, called Shared, to Valid and Dirty
    bits
  • Get five states (M, O, E, S, I)
  • Developed in context of Futurebus, with
    intention of explaining all snoopy protocols, all
    of which use 3, 4, or 5 states

23
MESI Protocol
  • Four-state, write-invalidate
  • Improved version of three-state protocol
  • Clean state split into Exclusive and Shared
    states
  • Dirty state equivalent to Modified state
  • Several slightly different versions of MESI
    protocol
  • Will describe version implemented by Futurebus
  • PowerPC 601 MESI protocol does not support
    cache-to-cache transfer of blocks

24
State Diag. of MESI Cache Block (Part 1)
25
State Diag. of MESI Cache Block (Part 2)
26
Comparison with Three-State Protocol
  • Similarities
  • Read hit invisible on bus
  • All misses handled the same way
  • Differences
  • Big improvement in handling write hits
  • Write hit in Exclusive state invisible on bus
  • Write hit in Shared state involves no block
    transfer, only a control signal
  • Exclusive state
  • Can be read or written
  • Shared state
  • Can be read only
  • Modified state
  • Can be read and written

27
Comments on Write-Invalidate Protocols
  • Performance
  • Processor can lose cache block through
    invalidation by another processor
  • Average memory access time goes up, since writes
    to shared blocks take more time (other copies
    have to be invalidated)
  • Implementation
  • Bus and CPU want to simultaneously access same
    cache
  • Either same block or different blocks, but
    conflict nonetheless
  • Three possible solutions
  • Use a single tag array, and accept structural
    hazards
  • Use two separate tag arrays for bus and CPU,
    which must now be kept coherent at all times
  • Use a multiported tag array (both Intel Pentium
    and PowerPC 601 use this solution)
Write a Comment
User Comments (0)
About PowerShow.com