A LowOverhead Coherence Solution for Multiprocessors with Private Cache Memories - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

A LowOverhead Coherence Solution for Multiprocessors with Private Cache Memories

Description:

Snoopy caches ... Other Snoopy Methods. Broadcast-Invalidate ... Snoopy cache actions: Read With Intent to Modify This is the 'write' cycle. ... – PowerPoint PPT presentation

Number of Views:97
Avg rating:3.0/5.0
Slides: 28
Provided by: cmo1
Category:

less

Transcript and Presenter's Notes

Title: A LowOverhead Coherence Solution for Multiprocessors with Private Cache Memories


1
A Low-Overhead Coherence Solution for
Multiprocessors with Private Cache Memories
  • Also known as Snoopy cache
  • Paper by Mark S. Papamarcos and Janak H. Patel
  • Presented by Cameron Mott 3/25/2005

2
Outline
  • Goals
  • Outline
  • Examples
  • Solutions
  • Details on this method
  • Results
  • Analysis
  • Success
  • Comments/Questions

3
Goals
  • Reduce bus traffic
  • Reduce bus wait
  • Increase possible number of processors before
    saturation of bus
  • Increase processor utilization
  • Low cost
  • Extensible
  • Long length of life for strategy

4
Structure
  • The typical layout for a multi-processor machine

5
Difficulties
  • Bus speed and saturation limits the processor
    utilization (there is a single time-shared bus
    with an arbitration mechanism).
  • This scheme suffers from the well-known data
    consistency or cache coherence problem where
    two processors have the same writable data block
    in their private cache.

6
Coherence example
  • Process communication in shared-memory
    multiprocessors can be implemented by exchanging
    information through shared variables
  • This sharing can result in several copies of a
    shared block in one or more caches at the same
    time.

7
Enforcing Coherence Styles
  • Hardware based
  • Use a global table, the table keeps track of what
    memory is held and where.
  • Snoopy cache
  • No need for centralized hardware
  • All processors share the same cache bus
  • Each cache snoops or listens to cache
    transactions from other processors
  • Used in CSM machines using a bus

8
  • Snoopy caches
  • To solve coherence, each processor can send out
    the address of the block that is being written in
    cache, each other processor that contains that
    entry then invalidates the local entry (called
    broadcast invalidate).

9
Other Snoopy Methods
  • Broadcast-Invalidate
  • Any write to cache transmits the address
    throughout the system. Other caches check their
    directory, and purge the block if it exists
    locally. This does not require extra status
    bits, but does eat up a lot of bus time.
  • Improvements to above
  • Introduce a bias filter. The bias filter is a
    small associative memory that stores the most
    frequently invalidated blocks.

10
Goodmans Strategy
  • Goodman proposes his strategy for multiple
    processor systems with independent cache but a
    shared bus.
  • Invalidate is broadcast only when a block is
    written in cache the first time (thus write
    once). This block is also written through to
    main memory. If a block in cache is written to
    more than once (by different processors for
    example), the block must be written back to
    memory before replacing it.

11
Write-Once
  • Combination of write-through and write-back.

12
Example
  • Online example
  • http//www.cs.tcd.ie/Jeremy.Jones/vivio/caches/wri
    teOnceHelp.htm
  • Note that the only browser that displayed this on
    my computer was IE

13
Details
  • Two bits in each block in the cache keep track of
    the status of that block.
  • Invalid The data in this line is not present or
    is not valid.
  • Exclusive-Unmodified (Excl-Unmod) This is an
    exclusive cache line. The line is coherent with
    memory and is held unmodified only in one cache.
    The cache owns the line and can modify it without
    having to notify the rest of the system. No other
    caches in the system may have a copy of this
    line.
  • Shared-Unmodified (Shared-Unmod) This is a
    shared cache line. The line is coherent with
    memory and may be present in several caches.
    Caches must notify the rest of the system about
    any changes to this line. The main memory owns
    this cache line.
  • Exclusive-Modified (Excl-Mod) There is modified
    data in this cache line. The line is incoherent
    with memory, so the cache is said to own the
    line. No other caches in the system may have a
    copy of this line.
  • Other papers discuss MESI caches. How does this
    fit with Papamarcos and Patels work?
  • M Exclusive Modified
  • E Exclusive Unmodified
  • S Shared Unmodified
  • I Invalid

14
Details (cont.)
  • Snoopy cache actions
  • Read With Intent to Modify This is the write
    cycle. If the address on the bus matches a
    Shared or Exclusive line, the line is
    invalidated. If a line is Modified, the cache
    must cause the bus action to abort, write the
    modified line back to memory, invalidate the
    line, and then allow the bus read to retry.
    Alternatively, the owning cache can supply the
    line directly to the requestor across the bus.
  • Read - If the address on the bus matches a Shared
    line there is no change. If the line is
    Exclusive, the state changes to Shared. If a line
    is Modified, the cache must cause the bus action
    to abort, write the modified line back to memory,
    change the line to Shared, and then allow the bus
    read to retry. Alternatively, the owning cache
    can supply the line to the requestor directly and
    change its state to Shared.

15
Flow diagrams
16
  • Other cache can now provide requested memory.
  • This changes the status bit to shared-unmod.
  • Block is also written back to memory if another
    cache had an Excl-Mod entry for that block. The
    status of that block is then changed to
    shared-unmod after being written and shared with
    the other processor.
  • Writes cause any other cache to set the
    corresponding entry to invalid.
  • If memory provided the block, the status becomes
    exclusive-unmod.
  • No signal is necessary if the status is not
    shared-unmod.

17
Problems
  • What if?
  • A block is Shared-Unmodified and two caches
    attempt to change the block at the same time.
  • Depending on the implementation, the bus provides
    the sync mechanism. Only one processor can
    have control of the bus at any one time. This
    provides a contention mechanism to determine
    which processor wins.
  • Requires that this operation is indivisible.

18
Results
  • Results were analyzed using an approximation
    algorithm.
  • Is this appropriate? Can an approximation be
    used to justify the algorithm?
  • Accuracy of the approximation error rate of less
    than 5 in certain circumstances

19
Parameters
20
Miss Ratio
21
Miss Ratio (Cont)
22
Degree of Sharing
23
Write Back Probability
24
Block Transfer Time
25
Cost of implementing
26
Note
  • This algorithm and structure does have a finite
    limit to the number of supported processors.
    Diminishing returns are noted for performance as
    the number of processors increase. Thus, this
    strategy should not be utilized in systems of 30
    processors or more (as an estimate). This all
    depends on the system parameters of course, but
    it is limited by these factors.
  • For a system utilizing a finite number of
    processors, this strategy is very effective, and
    is in use today.

27
References
  • A Low-Overhead Coherence Solution for
    Multiprocessors with Private Cache Memories Mark
    S. Papamarcos and Janak H. Patel
  • Cache Coherence Srini Devadas
    http//csg.csail.mit.edu/u/d/devadas/public_html/
    6.004/Lectures/lect23/sld001.htm
  • Dynamic Decentralized Cache Schemes for MIMD
    Parallel Processors Tu Phan http//www.cs.nmsu.ed
    u/pfeiffer/classes/573/sem/s03/presentations/Dyna
    mic20Decentralized20Cache20Schemes.ppt
  • HP 3rd. Edition Mark Smotherman
    http//www.cs.clemson.edu/mark/464/hp3e6.html
  • Vivio Write Once cache coherency protocol Jeremy
    Jones http//www.cs.tcd.ie/Jeremy.Jones/vivio/cach
    es/writeOnceHelp.htm
Write a Comment
User Comments (0)
About PowerShow.com