Caches for Parallel Architectures (Coherence) - PowerPoint PPT Presentation

About This Presentation
Title:

Caches for Parallel Architectures (Coherence)

Description:

Caches for Parallel Architectures (Coherence) Figures, examples Parallel Computer Architecture: A Hardware/Software Approach, D. E. Culler, J. P. Singh, Morgan ... – PowerPoint PPT presentation

Number of Views:150
Avg rating:3.0/5.0
Slides: 60
Provided by: csl57
Category:

less

Transcript and Presenter's Notes

Title: Caches for Parallel Architectures (Coherence)


1
Caches for Parallel Architectures(Coherence)
  • Figures, examples ap?
  • Parallel Computer Architecture A
    Hardware/Software Approach, D. E. Culler, J. P.
    Singh, Morgan Kaufmann Publishers, INC. 1999.
  • Transactional Memory, D. Wood, Lecture Notes in
    ACACES 2009

2
S?ed?as? ?pe?e??ast??
  • Moores Law (1964)
  • Transistors per IC doubles every 2 years (or 18
    months)
  • ??a?t??? ? ap?d?s? t?? epe?e??ast? d?p?as???eta?
    ???e 2 ?????a.
  • ??? ?a? pe??ss?te?a p??ß??µata
  • Memory wall
  • 1980 memory latency 1 instruction
  • 2006 memory latency 1000 instructions
  • Power and cooling walls
  • ????s? p???p????t?ta? s?ed?asµ?? ?a? epa???e?s??
    (design and test complexity)
  • ?e?????sµ??a pe??????a pe?a?t??? e?µet???e?s??
    ILP
  • ? ?a??????e? ????te?t??????

3
?a??????e? ????te?t?????? (1)
  • ?? p???epe?e??ast?? ?????sa? ?d?a?te?? a??pt???
    ap? t? de?aet?a t?? 90s
  • Servers
  • Supercomputers ??a t?? ep?te??? µe?a??te???
    ep?d?s?? se s?????s? ??a epe?e??ast?
  • St?? µ??e? µa? (CMPs)
  • ?e??s? ??st??? s?ed?asµ?? µ?s? epa?a???s?µ?p???s??
    (replication) s?ed???
  • ??µet???e?s? Thread-Level Parallelism (TLP) ??a
    t?? a?t?µet?p?s? t?? memory wall
  • ?aµ???te?? per-core power, pe??ss?te?a cores.
  • ?p?d?t??? ???s?µ?p???s? p???epe?e??ast??
    (?d?a?te?a se servers) ?p?? ?p???e? thread-level
    parallelism
  • ????s? e?d?af????t?? ??a t? s?ed?as? servers ?a?
    t?? ap?d?s? t???

4
?a??????e? ????te?t?????? (2)
  • ??a a?t? ?d????? se µ?a ??a ep??? ?p?? t?? ?????
    ???? d?ad?aµat????? ?? p???epe?e??ast??
  • Desktop µ??a??µata ??a ???e ???st? µe 2, 4, 6, 8,
    p????e?
  • We are dedicating all of our future product
    development to multicore designs. We believe this
    is a key inflection point for the industry
  • Intel CEO Paul Otellini, 2005

5
?a????µ?s? ?a???????? ????te?t??????
  • Single Instruction stream, Single Data stream
    (SISD)
  • Single Instruction stream, Multiple Data streams
    (SIMD)
  • ????ap??? epe?e??ast??, ?d?e? e?t????,
    d?af??et??? ded?µ??a (data-level parallelism).
  • Multiple Instruction streams, Single Data stream
    (MISD)
  • ????? s?µe?a de? ??e? eµfa??ste? st?? a????
    ??p??? t?t??? s?st?µa (e??a? ?????? ??a fault
    tolerance, p.?. ?p?????st?? p?? e??????? pt?s?
    ae??s?af??).
  • Multiple Instruction streams, Multiple Data
    streams (MIMD)
  • O ???e epe?e??ast?? e?te?e? t?? d???? t?? e?t????
    ?a? epe?e????eta? ta d??? t?? ded?µ??a. ????ap??
    pa??????a ??µata (thread-level parallelism).
  • Ta as???????µe ?????? µe MIMD s?st?µata.
  • Thread-level parallelism
  • ??e????a ?e?t?????a e?te ?? single-user
    multiprocessors est?????ta? st?? ap?d?s? µ?a?
    efa?µ????, e?te ?? multiprogrammed
    multiprocessors e?te???ta? p???ap??? ?e?t?????e?
    ta?t?????a.
  • ??e??e?t?µata ??st???-ap?d?s?? ???s?µ?p????ta?
    off-the-self epe?e??ast??.

6
(No Transcript)
7
MIMD S?st?µata (1)
  • ?a?ade??µata MIMD s?st?µ?t??
  • Clusters (commodity/custom clusters)
  • Multicore systems
  • ???e epe?e??ast?? e?te?e? d?af??et??? process
    (d?e??as?a).
  • process A segment of code that can be executed
    independently. Se ??a p???p????aµµat?st???
    pe??ß?????, ?? epe?e??ast?? e?te???? d?af??et???
    tasks ?? ep?µ???? ???e process e??a? a?e???t?t?
    ap? t?? ?p????pe?.
  • ?ta? p???ap?? processes µ???????ta? ??d??a ?a?
    ???? d?e????se?? (address space) t?te ???µ????ta?
    threads (??µata).
  • S?µe?a ? ???? thread ???s?µ?p??e?ta? ??a ?a
    pe??????e? ?e???? p???ap??? e?te??se??, ?? ?p??e?
    µp??e? ?a p?a?µat?p??????? se d?af??et?????
    epe?e??ast?? a?e???t?ta ap? t? a? µ???????ta? ?
    ??? t? address space.
  • ?? multithreaded (p?????µat????) a???te?t??????
    ep?t??p??? t?? ta?t?????? e?t??es? p???ap???
    processes µe d?af??et??? address space, ?a??? ?a?
    p???ap??? threads p?? µ???????ta? t? ?d?? address
    space.

8
MIMD S?st?µata (2)
  • G?a t?? ap?d?t??? ???s? e??? MIMD s?st?µat?? µe n
    epe?e??ast??, apa?t???ta? t??????st?? n
    threads/processes.
  • ??µ??????a ap? t?? p????aµµat?st? ? t?? compiler
  • Grain Size To µ??e??? (amount of computation)
    t?? ???e thread
  • Fine-grain ?e????? de??de? e?t???? (p.?. ??p??e?
    epa?a???e?? e??? loop, instruction-level
    parallelism)
  • Coarse-grain ??at?µµ???a e?t???? (thread-level
    parallelism)
  • ?a MIMD s?st?µata ???????ta? se 2 ?at?????e? µe
    ß?s? t?? ??????s? t?? ?e?a???a? t?? µ??µ?? t???.
  • Centralized shared-memory architectures
    (????te?t?????? s???e?t??µ???? ?????? µ??µ??)
  • Distributed memory architectures (????te?t??????
    f?s??? ?ata?eµ?µ???? µ??µ??)

9
Centralized Shared-Memory Architectures
  • ?????? a???µ?? epe?e??ast?? (????te??? ap? 100 t?
    2006).
  • ???? ?? epe?e??ast?? µ???????ta? µ?a ?e?t????
    µ??µ?
  • ????ap?? banks
  • point-to-point connections, switches
  • ?e?????sµ??? scalability
  • Symmetric multiprocessors (SMPs)
  • ? µ??µ? ??e? s?µµet???? s??s? µe t???
    epe?e??ast??
  • ?µ???µ??f?? ?????? p??sp??as?? (Uniform Memory
    Access UMA)

10
Distributed Memory Architectures (1)
  • ? µ??µ? µ?????eta? t?p??? se ???e epe?e??ast?.
  • ??e??e?t?µata
  • ?e?a??te?? e???? ????? µ??µ?? a?
    ? p?e????f?a
    t?? p??spe??se??
    ?????ta? t?p??? se ???e ??µß?.
  • ?e??s? ?????? p??sßas?? se
    ded?µ??a ap????e?µ??a
    st??
    µ??µ? t?? ???e ??µß??.
  • ?e???e?t?µata
  • ????p???? a?ta??a?? ded?µ????
    µeta?? epe?e??ast??.
  • ??? d?s???? pa?a???? software ??a
    t?? e?µet???e?s?
    t?? a???µ????
    e????? ????? t?? µ??µ??.
  • ??? µ??t??a ep????????a? ??a a?ta??a?? ded?µ????
  • Shared Address space
  • Message Passing

11
Distributed Memory Architectures (2)
  • Shared address space
  • ?? f?s??? ?ata?eµ?µ??e? µ??µe?
    ???s?µ?p?????ta? sa? ??a?
    µ??ad????, d?aµ???a??µe???
    ????? ded?µ????.
  • ? ?d?a f?s??? d?e????s? se 2
    epe?e??ast?? a?af??eta? st??
    ?d?a t?p??es?a st? ?d?? ??µµ?t?
    t?? f?s????
    µ??µ??.
  • ?p????????a µ?s? t?? ??????
    ????? (implicitly, µe ???s? ap???
    Loads ?a? Stores se shared
    variables).
  • ?? p???epe?e??ast?? a?t??
    ???µ????ta? Distributed
    Shared-Memory (DSM).
  • ? ?????? p??sßas?? e?a?t?ta?
    ap? t?? t?p??es?a st?? ?p??a
    ß??s???ta? ta ded?µ??a ? NUMA (Non-Uniform Memory
    Access).

12
Distributed Memory Architectures (3)
  • Private address space
  • ? ???e epe?e??ast?? ??e? t? d??? t?? address
    space, t? ?p??? de? µp??e? ?a p??spe?aste? ap?
    ??p???? ????.
  • ? ?d?a f?s??? d?e????s? se 2 epe?e??ast??
    a?af??eta? se d?af??et???? t?p??es?e? se
    d?af??et??? ??µµ?t?a µ??µ??.
  • ?p????????a (explicitly) µ?s? µ???µ?t?? ?
    Message-Passing Multiprocessors.
  • ???e s??d?asµ?? send-receive p?a?µat?p??e? ??a
    s???????sµ? ?e????? (pairwise synchronization)
    ?a??? ?a? µ?a µetaf??? ded?µ???? ap? µ??µ? se
    µ??µ? (memory-to-memory copy)
  • p.?. clusters

13
Shared Memory Architectures (1)
14
Shared Memory Architectures (2)
  • ?as??? ?d??t?ta t?? s?st?µ?t?? µ??µ??
  • ???e a?????s? µ?a? t?p??es?a?, ?a p??pe? ?a
    ep?st??fe? t?? te?e?ta?a t?µ? p?? ???ft??e se
    a?t?.
  • ?as??? ?d??t?ta t?s? ??a ta se???a?? p?????µµata,
    ?s? ?a? ??a ta pa??????a.
  • ? ?d??t?ta a?t? d?at??e?ta? ?ta? p???ap?? threads
    e?te????ta? se ??a epe?e??ast?, ?a??? ß??p???
    t?? ?d?a ?e?a???a µ??µ??.
  • Sta p???epe?e??ast??? s?st?µata, ?µ??, ???e
    epe?e??ast?? ??e? t? d??? t?? µ???da ???f??
    µ??µ?? (cache).
  • ???a?? p??ß??µata
  • ??t???afa µ?a? µetaß??t?? e??a? p??a??? ?a
    ?p?????? se pa?ap??? ap? µ?a caches.
  • ?? µ?a e???af? de? e??a? ??at? ap? ????? t???
    epe?e??ast??, t?te ?p???e? pe??pt?s? ??p???? ?a
    d?aß????? t?? pa??? t?µ? t?? µetaß??t?? p?? e??a?
    ap????e?µ??? st?? cache t???.
  • ???ß??µa S???fe?a? ???f?? ???µ?? (Cache
    Coherence)

15
???ß??µa coherence sta µ???epe?e??ast???
s?st?µata?
16
Direct Memory Access
  • DMA CPU st?? µ??µ?
  • ??se??
  • a) HW cache invalidation for DMA writes or cache
    flush for DMA reads
  • b) SW OS must ensure that the cache lines are
    flushed before an outgoing DMA transfer is
    started and invalidated before a memory range
    affected by an incoming DMA transfer is accessed
  • c) Non cacheable DMAs

17
?a??de??µa ???ß??µat?? S???fe?a? ???f?? ???µ??
  • ?? epe?e??ast?? ß??p??? d?af??et??? t?µ? ??a t?
    µetaß??t? u µet? t? ?e?t?????a 3
  • ?e t?? write back caches, ? t?µ? p?? ???feta?
    p?s? st? µ??µ? e?a?t?ta? ap? t? p??a cache ?a?
    p?te d????e? ? a?t????fe? ded?µ??a
  • ?pa??de?t?, a??? s?µßa??e? s????!

18
?a??de??µa
  • ??? ta?t?????e? a?a???e?? 100 ap? t?? ?d??
    ???a??asµ? ap? 2 d?af??et??? ATMs.
  • ???e transaction se d?af??et??? epe?e??ast?.
  • ? d?e????s? pe????eta? st?? ?ata????t? r3.

19
?a??de??µa (????? caches)
  • ????? caches ? ?a???a p??ß??µa!

20
?a??de??µa (Incoherence)
  • Write-back caches
  • 3 p??a?? a?t???afa memory, p0, p1
  • To s?st?µa e??a? p??a?? ?a e??a? incoherent.

21
?a??de??µa (Incoherence)
  • Write-through caches
  • ???a 2 d?af??et??? a?t???afa!
  • ?a? p??? p??ß??µa! (p.?. ?st? ?t? ? p0 e?te?e?
    ?a? ???? a??????)
  • ?? write-through caches de? ?????? t? p??ß??µa!

22
Cache Coherence (1)
  • ??at???s? t?? ßas???? ?d??t?ta?
  • ???e a?????s? µ?a? t?p??es?a?, ?a p??pe? ?a
    ep?st??fe? t?? ????????? t?µ? p?? ???ft??e se
    a?t?.
  • ??? ????eta? t? ?????????
  • Se???a?? p?????µµata
  • ????eta? s?µf??a µe t? se??? p?? ep?ß???eta? ap?
    t?? ??d??a.
  • ?a??????a p?????µµata
  • ??? threads µp??e? ?a ??????? st?? ?d?a d?e????s?
    t?? ?d?a ??????? st??µ?.
  • ??a thread µp??e? ?a d?aß?se? µ?a µetaß??t?
    a???ß?? µet? t?? e???af? t?? ap? ??p??? ????,
    a??? ???? t?? ta??t?ta? µet?d?s?? ? e???af? a?t?
    de? ??e? ???e? a??µa ??at?.
  • ? se??? p?? ep?ß???e? ? ??d??a? ????eta? e?t??
    t?? thread.
  • ?pa?te?ta? ?µ?? ?a? ? ???sµ?? µ?a? se???? p?? ?a
    af??? ??a ta threads (global ordering).

23
Cache Coherence (2)
  • ?st? ?t? ?p???e? µ?a ?e?t???? µ??µ? ?a? ?aµ?a
    cache.
  • ???e ?e?t?????a se µ?a ??s? µ??µ?? p??spe???e?
    t?? ?d?a f?s??? ??s?.
  • ? µ??µ? ep?ß???e? µ?a ?a?????? se??? st??
    ?e?t?????e? ???? t?? threads se a?t? t? ??s?.
  • ?? ?e?t?????e? ???e thread d?at????? t? se??? t??
    p?????µµat?? t??.
  • ???e d??ta?? p?? d?at??e? t? se??? t??
    ?e?t??????? t?? ep?µ????? p????aµµ?t?? e??a?
    ap?de?t? / ??????.
  • O? te?e?ta?a ????eta? ? p?? p??sfat? ?e?t?????a
    se µ?a ?p??et??? a???????a p?? d?at??e? t??
    pa?ap??? ?d??t?te?.
  • Se ??a p?a?µat??? s?st?µa de? µp??e? ?a
    ?atas?e?aste? a?t? ? ?a?????? se???.
  • ???s? caches.
  • ?p?f??? serialization.
  • ?? s?st?µa p??pe? ?a e??a? ?atas?e?asµ??? ?ste ta
    p?????µµata ?a s?µpe??f????ta? sa? ?a ?p???e a?t?
    ? ?a?????? se???.

24
Cache Coherence - ???sµ??
  • ??a s?st?µa e??a? coherent (s??af??) a? ??a ???e
    e?t??es? ta ap?te??sµata (?? t?µ?? p??
    ep?st??f??ta? ap? t?? ?e?t?????e? a?????s??)
    e??a? t?t??a, ?ste se ???e ??s? ?a µp????µe ?a
    ?atas?e??s??µe µ?a ?p??et??? a???????a?? se???
    ???? t?? ?e?t??????? st? ??s? a?t?, p?? ?a e??a?
    s??ep?? µe ta ap?te??sµata t?? e?t??es?? ?a? st??
    ?p??a
  • ?? ?e?t?????e? ???e thread p?a?µat?p?????ta? µe
    t?? se??? ?at? t?? ?p??a ??????a? ap? a?t? t?
    thread.
  • ? t?µ? p?? ep?st??feta? ap? µ?a ?e?t?????a
    a?????s?? e??a? ? t?µ? t?? te?e?ta?a? e???af??
    st? s???e???µ??? ??s? s?µf??a µe t?? ?p??et???
    a???????a?? se???.
  • 3 s?????e? ??a ?a e??a? ??a s?st?µa coherent.

25
Cache Coherence - S?????e?
  • 1. A read by processor P to a location X that
    follows a write by P to X, with no writes of X by
    another processor occurring between the write and
    the read by P, always returns the value written
    by P.
  • ??at???s? t?? se???? t?? p?????µµat??.
  • ?s??e? ?a? ??a uniprocessors.
  • 2. A read by a processor to location X that
    follows a write by another processor to X returns
    the written value if the read and write are
    sufficiently separated in time and no other
    writes to X occur between the two accesses.
  • write propagation
  • ??a ?e?t?????a a?????s?? de? µp??e? ?a ep?st??fe?
    pa???te?e? t?µ??.
  • 3. Writes to the same location are serialized
    that is, two writes to the same location by any
    two processors are seen in the same order by all
    processors. (e.g. if values 1 and then 2 are
    written to a location, processors can never see
    the value of the location as 2 and then later
    read it as 1)
  • write serialization. ??e?a??µaste read
    serialization

26
Bus Snooping Cache Coherence (1)
  • ???s? d?ad??µ??
  • ???sf??e? µ?a ap?? ?a? ??µ?? ???p???s? ??a cache
    coherence.
  • ???ß??µata scalability.
  • ??e? ?? s?s?e??? p?? e??a? s??dedeµ??e? p??? st?
    d??d??µ? µp????? ?a pa?a????????? ??a ta bus
    transactions.
  • ??e?? f?se?? se ???e transaction
  • ??a?t?s?a ? bus arbiter ap?fas??e? p??a s?s?e??
    ??e? t? d??a??µa ?a ???s?µ?p???se? t? bus
  • ?p?st??? e?t????/d?e????s?? ? ep??e?µ??? s?s?e??
    µetad?de? t? e?d?? t?? e?t???? (read / write)
    ?a??? ?a? t? d?e????s? t?? a?t?st????? ??s??.
    ???? pa?a????????? ?a? ap?fas????? a? t???
    e?d?af??e? ? ???.
  • ?etaf??? ded?µ????

27
Bus Snooping Cache Coherence (2)
  • ??µet???e?s? t?? cache block state
  • ???e cache µa?? µe ta tag ?a? data ap????e?e? ?a?
    t?? ?at?stas? st?? ?p??a ß??s?eta? t? block (p.?.
    invalid, valid, dirty).
  • ??s?ast??? ??a ???e block ?e?t????e? µ?a µ??a??
    pepe?asµ???? ?atast?se?? (FSM)
  • ???e p??sßas? se ??a block ? se ??p??a d?e????s?
    p?? a?t?st???e? st? ?d?? cache line µe a?t? t?
    block, p???a?e? µ?a µetaß??? t?? state ? a?????
    µ?a a??a?? ?at?stas?? st? FSM.
  • Se multiprocessor s?st?µata t? state e??? block
    e??a? ??a? p??a?a? µ????? p, ?p?? p ? a???µ?? t??
    caches.
  • To ?d?? FSM ?a?????e? t?? a??a??? ?atast?se??
    ??a ??a ta blocks se ??e? t?? caches.
  • To state e??? block µp??e? ?a d?af??e? ap? cache
    se cache.

28
Hardware ??a Cache Coherence
  • Coherence Controller (CC)
  • ?a?a??????e? t?? ????s? st? d??d??µ? (d?e????se??
    ?a? ded?µ??a)
  • ??te?e? t? p??t?????? s???fe?a? (coherence
    protocol).
  • ?p?fas??e? t? ?a ???e? µe t? t?p??? a?t???af? µe
    ß?s? a?t? p?? ß??pe? ?a µetad?d??ta? st? d??d??µ?.

29
Bus Snooping Cache Coherence (3)
  • ???p???s? ???t???????
  • ? e?e??t?? t?? cache d??eta? e?s?d? ap? 2 µe????
  • ??t?se?? p??sßas?? st? µ??µ? ap? t?? epe?e??ast?.
  • ? ?at?s??p?? (bus snooper) e??µe???e? ??a bus
    transactions p?? p?a?µat?p????? ?? ?p????pe?
    caches.
  • Se ???e pe??pt?s? a?tap?????eta?
  • ???µe???e? t?? ?at?stas? t?? block µe ß?s? t?
    FSM.
  • ?p?st??? ded?µ????.
  • ?a?a???? ???? bus transactions.
  • ???e p??t?????? ap?te?e?ta? ap? ta pa?a??t?
    d?µ??? st???e?a
  • ?? s????? t?? ep?t?ept?? states ??a ???e block
    st?? caches.
  • To state transition diagram p?? µe e?s?d? t?
    state t?? block ?a? t? a?t?s? t?? epe?e??ast? ?
    t? pa?at????µe?? bus transaction ?p?de????e? ??
    ???d? t? ep?µe?? ep?t?ept? state ??a t? block
    a?t?.
  • ??? e????e?e? p?? ep?ß???eta? ?a p?a?µat?p???????
    ?at? t?? a??a?? ?at?stas?? t?? block.

30
Simple Invalidation-based protocol (1)
  • write-through, write-no-allocate caches
  • 2 states ??a ???e block
  • Valid
  • Invalid
  • Se pe??pt?s? e???af?? e??? block
  • ???µe???eta? ? ????a µ??µ? µ?s?
    e??? bus transaction.
  • ???e bus snooper e??µe???e? t?? cache
    controller t??, ? ?p???? a?????e? t?
    t?p??? a?t???af? a? ?p???e?.
  • ?p?t??p??ta? p???ap??? ta?t?????e? a?a???se??
    (multiple readers). ??a e???af? ?µ?? t???
    a?????e?.
  • ???a? coherent

31
Simple Invalidation-based protocol (2)
  • ?p????µe ?a ?atas?e??s??µe µ?a ?a?????? se??? p??
    ?a ??a??p??e? t? se??? t?? p?????µµat?? ?a? t?
    se????p???s? t?? e???af??
  • ?p???t??µe atomic bus transactions ?a? memory
    operations.
  • ??a transaction ???e f??? st? bus.
  • ???e epe?e??ast?? pe??µ??e? ?a ?????????e? µ?a
    p??sßas? t?? st? µ??µ? p??? a?t??e? ?a??????a.
  • ?? e???af?? (?a? ?? a????se??) ???????????ta?
    ?at? t? d????e?a t?? bus transactions.
  • ??e? ?? e???af?? eµfa?????ta? st? bus
    (write-through protocol).
  • ?? e???af?? se µ?a ??s? se????p?????ta? s?µf??a
    µe t? se??? µe t?? ?p??a eµfa?????ta? st? bus.
    (bus order)
  • ?? a????se?? p?a?µat?p?????ta? ep?s?? s?µf??a µe
    t? bus order.
  • ??? pa?eµß?????µe t?? a?a???se?? st? se??? a?t?
  • ?? a?a???se?? de? e??a? ?p???e?t??? ?a
    p???a??s??? bus transaction ?a? µp????? ?a
    e?te????ta? a?e???t?ta ?a? ta?t?????a st??
    caches.

32
Simple Invalidation-based protocol (3)
  • Se????p???s? a?a???se??
  • Read hit ? read miss?
  • Read Miss
  • ??a??p??e?ta? µ?s? bus transaction. ?p?µ????
    se????p??e?ta? µa?? µe t?? e???af??.
  • Ta de? t?? t?µ? t?? te?e?ta?a? e???af?? s?µf??a
    µe t? bus order.
  • Read Hit
  • ??a??p??e?ta? ap? t?? t?µ? p?? ß??s?eta? µ?sa
    st?? cache.
  • ???pe? t?? t?µ? t?? p?? p??sfat?? e???af?? ap?
    t?? ?d?? epe?e??ast? ? t?? p?? p??sfat??
    a?????s?? (read miss).
  • ?a? ta 2 (write ?a? read miss) ??a??p?????ta?
    µ?s? bus transactions.
  • ?p?µ???? ?a? ta read hits ß??p??? t?? t?µ??
    s?µf??a µe t? bus order.

33
VI protocol - ?a??de??µa (write-back caches)
  • To ld t?? p1 d?µ?????e? ??a BusRd
  • O p0 apa?t? ???f??ta? p?s? t? modified block (WB)
    ?a? a???????ta? t? st?? cache t?? (µet?ßas? st??
    ?at?stas? I)

34
MSI Write-Back Invalidation Protocol (1)
  • To VI p??t?????? de? e??a? ap?d?t???
  • VI ? MSI
  • Sp?s?µ? t?? V se 2 ?atast?se??
  • 3 ?atast?se?? (states)
  • ???p?p???µ??? Modified(M)
  • ????a??µe?? Shared(S)
  • ????? Invalid(I)
  • 2 t?p?? a?t?se?? ap? t?? epe?e??ast?
  • PrRd (a?????s?) ?a? PrWr (e???af?)
  • 3 bus transactions
  • BusRd ??t? a?t???af? ????? s??p? ?a t?
    t??p?p???se?
  • BusRdX ??t? a?t???af? ??a ?a t? t??p?p???se?
  • BusWB ???µe???e? t? µ??µ?

35
MSI Write-Back Invalidation Protocol (2)
  • ?????aµµa ?et?ßas?? ?atast?se??
  • ?etaß?se?? e?a?t?a? ?e?t??????? t?? t?p????
    epe?e??ast?.
  • ?etaß?se?? e?a?t?a? t?? pa?at????µe??? bus
    transactions.
  • ?/? ?? ? cache controller pa?at???se? t? ?,
    t?te e?t?? ap? t? µet?ßas? st? ??a
    ?at?stas? p???a?e? ?a? t? ?.
  • -- ?aµ?a e????e?a.
  • ?e? pe???aµß????ta? ?? µetaß?se?? ?a? ??
    e????e?e? ?at? t?? a?t??at?stas? e??? block st??
    cache.
  • ?s? p?? ???? st? d????aµµa ß??s?eta? ??a block,
    t?s? p?? ste?? s??dedeµ??? (bound)
    e??a? µe t?? epe?e??ast?.

36
MSI protocol - ?a??de??µa (write-back caches)
  • To ld t?? p1 d?µ?????e? ??a BusRd
  • O p0 apa?t? ???f??ta? p?s? t? modified block (WB)
    ?a? a??????ta? t? a?t???af? t?? se S
  • To st t?? p1 d?µ?????e? ??a BusRdX
  • O p0 apa?t? a???????ta? t? a?t???af? t??
    (µet?ßas? se ?)

37
MSI Coherence
  • ? d??d?s? t?? e???af?? e??a? p??fa???.
  • Se???p???s? e???af??
  • ??e? ?? e???af?? p?? eµfa?????ta? st? d??d??µ?
    (BusRdX) d?at?ss??ta? ap? a?t??.
  • ?? a?a???se?? p?? eµfa?????ta? st? d??d??µ?
    d?at?ss??ta? ?? p??? t?? e???af??.
  • G?a t?? e???af?? p?? de? eµfa?????ta? st?
    d??d??µ?
  • ??a a???????a t?t???? e???af?? µeta?? 2 bus
    transactions ??a t? ?d?? block p??pe? ?a
    p???????ta? ap? t?? ?d?? epe?e??ast? P.
  • St? se????p???s? ? a???????a eµfa???eta? µeta??
    a?t?? t?? 2 transactions.
  • ?? a?a???se?? ap? t?? ? ?a ß??p??? t?? e???af??
    µe a?t? t? se??? ?? p??? t?? ?p????pe? e???af??.
  • ?? a?a???se?? ap? ?????? epe?e??ast??
    d?a???????ta? ap? t?? a???????a µe ??a bus
    transaction, ? ?p??a t?? t?p??ete? ?ts? se se???
    ?? p??? t?? e???af??.
  • ?? a?a???se?? ap? ????? t??? epe?e??ast?? ß??p???
    t?? e???af?? µe t?? ?d?a se???.

38
M?SI Write-Back Invalidation Protocol (1)
  • ???ß??µa MSI
  • 2 transactions ??a a?????s? ?a? t??p?p???s? e???
    block, a??µa ?a? a? de? ta µ?????eta? ?a?e??.
  • 4 ?atast?se?? (states)
  • ???p?p???µ??? Modified(M)
  • ?p???e?st??? Exclusive(E) ? ???? a?t? ? cache
    ??e? a?t???af? (µ? t??p?p???µ???).
  • ????a??µe?? Shared(S) ? ??? ? pe??ss?te?e?
    caches ????? a?t???af?.
  • ????? Invalid(I)
  • ?? ?a?e?? de? ??e? a?t???af? t?? block, t?te
    ??a PrRd ??e? sa? ap?t??esµa t?? µet?ßas? ? ? ?.
  • St? d??d??µ? ??e???eta? ??a s?µa shared ??
    ap??t?s? se ??a BusRd.

39
MESI Write-Back Invalidation Protocol (2)
  • ?????aµµa ?et?ßas?? ?atast?se??
  • ?etaß?se?? e?a?t?a? ?e?t??????? t?? t?p????
    epe?e??ast?.
  • ?etaß?se?? e?a?t?a? t?? pa?at????µe??? bus
    transactions.
  • ?/? ?? ? cache controller pa?at???se? t? ?,
    t?te e?t?? ap? t? µet?ßas? st? ??a
    ?at?stas? p???a?e? ?a? t? ?.
  • -- ?aµ?a e????e?a.
  • ??a block µp??e? ?a ß??s?eta? se ?at?stas? S e??
    de? ?p?????? ???a a?t???afa.
  • ???

40
??a?efa?a??s? - Coherence Snooping Protocols
  • ??at????µe t?? epe?e??ast?, t? ????a µ??µ? ?a?
    t?? caches.
  • ?p??tas? t?? cache controller - e?µet???e?s? t??
    bus.
  • Write-back caches
  • ?p?d?t??? a???p???s? t?? pe?????sµ???? bus
    bandwidth.
  • ?e? p???a???? bus transactions ??e? ??
    ?e?t?????e? µ??µ??.
  • ??? d?s???? ???p???s? t?? s???fe?a?.
  • ???s? t?? modified state (t??p?p???µ???
    ?at?stas?)
  • ?p???e?st??? ?d???t?s?a ? de? ?p???e? ???? ??????
    a?t???af?.
  • ? ????a µ??µ? µp??e? ?a ??e? ? ?a µ?? ??e?
    a?t???af?.
  • ? cache e??a? ?pe????? ?a pa???e? t? block se
    ?p???? t? ??t?se?.
  • Exclusivity (ap???e?st???t?ta)
  • ? cache µp??e? ?a t??p?p???se? t? block ????? ?a
    e?d?p???se? ?a???a ? ????? bus transaction
  • ???? t?? e???af? p??pe? ?a ap??t?se?
    ap???e?st???t?ta.
  • ???µa ?a? a? t? block e??a? valid ? write miss

41
Invalidation Protocols
  • Write-miss
  • ????a?e? ??a e?d??? transaction read-exclusive
    (RdX)
  • ??d?p??e? t??? ?p????p??? ?t? a??????e? e???af?
    ?a? ap??t? ap???e?st??? ?d???t?s?a.
  • ???? ?s?? d?a??t??? a?t???af? t?? block t?
    d?a???f???.
  • ???? µ?a RdX ep?t?????e? ???e f???. ????ap???
    a?t?se?? se????p?????ta? ap? t? d??d??µ?.
  • ?e???? ta ??a ded?µ??a ???f??ta? st?? ????a µ??µ?
    ?ta? t? block e?d????e? ap? t?? cache.
  • ?? ??a block de? ??e? t??p?p????e? (modified
    state), t?te de? ??e???eta? ?a ??afte? st?? ????a
    µ??µ? ?ta? e?d????e? ap? t?? cache.

42
Update Protocols
  • ??a ?e?t?????a e???af?? e??µe???e? ?a? t????
    a?t???afa t?? block st?? ?p????pe? caches.
  • ??e??e?t?µata
  • ?????te?? ?a??st???s? p??sßas?? st? block ap? t??
    ???e? caches.
  • ???? e??µe?????ta? µe ??a µ??? transaction.
  • ?e???e?t?µata
  • ????ap??? e???af?? st? block ap? t?? ?d??
    epe?e??ast? p???a???? p???ap?? transactions ??a
    t?? e??µe??se??.

43
Dragon Write-Back Update Protocol (1)
  • 4 ?atast?se?? (states)
  • ?p???e?st??? Exclusive (E) ? ???? a?t? ? cache
    ??e? a?t???af? (µ? t??p?p???µ???). ? ????a µ??µ?
    e??a? e??µe??µ??? (up-to-date).
  • ????a??µe??-?a?a?? Shared-clean (Sc) ? ??? ?
    pe??ss?te?e? caches ????? a?t???af?. ? ????a
    µ??µ? de? e??a? ?p???e?t??? up-to-date.
  • ????a??µe??-t??p?p???µ??? Shared-modified (Sm)
    ? ??? ? pe??ss?te?e? caches ????? a?t???af?, ?
    ????a µ??µ? de? e??a? up-to-date ?a? ? cache a?t?
    ??e? t?? e????? ?a e??µe??se? t?? ????a µ??µ?
    ?ta? e?d???e? t? block.
  • ???p?p???µ??? Modified (M) ? ???? ? cache a?t?
    d?a??te? t? t??p?p???µ??? block e?? ? ????a µ??µ?
    de? e??a? up-to-date.
  • ?e? ?p???e? Invalid state.
  • To p??t?????? d?at??e? p??ta ta blocks p??
    ß??s???ta? st?? caches up-to-date.
  • ??? ??e? a?t?se?? ap? t?? epe?e??ast? PrRdMiss,
    PrWrMiss
  • ??a ??? bus transaction BusUpd

44
Dragon Write-Back Update Protocol (2)
45
Dragon ?a??de??µa
?????e?a st?? epe?e??ast? ?at?stas? ?1 ?at?stas? ?2 ?at?stas? ?3 ?????e?a st? d??d??µ? ?a ded?µ??a pa?????ta? ap?
?1 d?aß??e? u ? --- --- BusRd Mem
?3 d?aß??e? u Sc --- Sc BusRd Mem
?3 ???fe? u Sc --- Sm BusUpd ?3 Cache
?1 d?aß??e? u Sc --- Sm --- ---
?2 d?aß??e? u Sc Sc Sm BusRd ?3 Cache
46
Invalidation vs. Update Protocols
  • Se ??p??a cache ???eta? e???af? se ??a block.
    ???? t?? ep?µe?? e???af? st? ?d?? block, ???e?
    ??p???? ????? ?a t? d?aß?se?
  • ?a?
  • Invalidation
  • Read-miss ? p??a??? p???ap?? transactions ?
  • Update
  • Read-hit a? e??a? ap? p??? a?t???afa? e??µ???s?
    µe ??a µ??? transaction ?
  • ???
  • Invalidation
  • ????ap??? e???af?? ????? ep?p???? ????s? st? bus
    ?
  • ???a????s? a?t????f?? p?? de ???s?µ?p?????ta? ?
  • Update
  • ????ap??? a??e?aste? e??µe??se?? (?a? se p??a???
    ?e??? a?t???afa) ?

47
Protocol Design Tradeoffs (1)
  • ? s?ed?as? p???epe?e??ast???? s?st?µ?t?? e??a?
    p???p????
  • ????µ?? epe?e??ast??
  • ?e?a???a µ??µ?? (levels, size, associativity, bs,
    )
  • ???d??µ??
  • Memory System (interleaved banks, width of banks,
    )
  • I/O subsystem
  • Cache Coherence Protocol (Protocol class, states,
    actions, )
  • ?? p??t?????? ep??e??e? ?a?a?t???st??? t??
    s?st?µat??, ?p?? latency ?a? bandwidth.
  • ? ep????? t?? p??t??????? ep??e??eta? ap? t?
    ??t??µe?? ap?d?s? ?a? s?µpe??f??? t?? s?st?µat??
    ?a??? ?a? ap? t?? ??????s? t?? ?e?a???a? µ??µ??
    ?a? t?? ep????????a?.

48
Protocol Design Tradeoffs (2)
  • Write-Update vs. Write-Invalidate
  • Write-run ??a se??? e???af?? ap? ??a epe?e??ast?
    se ??a block µ??µ??, ? a??? ?a? t? t???? t??
    ?p??a? ??????ta? ap? ?e?t?????e? se a?t? t? block
    ap? ?????? epe?e??ast??.
  • W2, R1, W1, W1, R1, W1, R3
  • Write-run length 3
  • Write-Invalidate ??a write-run ?p????d?p?te
    µ????? ?a d?µ??????se? ??a µ??ad??? coherence
    miss.
  • Write-Update ??a write-run µ????? L ?a
    p???a??se? L updates.

49
4C Cache Misses Model
  • Compulsory misses (cold)
  • ???t? p??sßas? se ??a block.
  • ????s? t?? block size.
  • Capacity misses
  • To block de ???? st?? cache (a??µa ?a? se full
    associative cache).
  • ????s? cache size.
  • Conflict misses
  • To block de ???? st? set p?? ???eta? mapped.
  • ????s? associativity.
  • Coherence misses (communication)
  • True sharing ?ta? ??a data word ???s?µ?p??e?ta?
    ap? 2 ? pa?ap??? epe?e??ast??.
  • False sharing ?ta? a?e???t?ta data words p??
    ???s?µ?p?????ta? ap? d?af??et????? epe?e??ast??
    a?????? st? ?d?? cache block.

50
(No Transcript)
51
Protocol Design Tradeoffs (3)
  • Cache Block Size
  • ????s? t?? block size µp??e? ?a ?d???se?
  • ?e??s? t?? miss rate (good spatial locality).
  • ????s? t?? miss penalty ?a? ?s?? t?? hit cost.
  • ????s? t?? miss rate e?a?t?a? false sharing (poor
    spatial locality).
  • ????s? t?? traffic st? bus, ???? µetaf????
    a??e?ast?? ded?µ???? (mismatch fetch/access
    size, false sharing).
  • ?p???e? ? t?s? ??a ???s?µ?p???s? µe?a??te???
    cache blocks.
  • ?p?sßes? ??st??? t?? bus transaction ?a? t??
    p??sßas?? st? µ??µ? µetaf????ta? pe??ss?te?a
    ded?µ??a.
  • Hardware ?a? software µ??a??sµ?? ??a a?t?µet?p?s?
    t?? false sharing.

52
False sharing reduction
  • ?e?t??µ??? data layout p???e?µ???? ?a ap?fe???e?
    ? t?p???t?s? a?e???t?t?? ded?µ???? st? ?d??
    block.
  • Data Padding
  • eg. Dummy variables µeta?? lock variables p??
    e??a? t?p??et?µ??e? ???t? ? µ?a st?? ????.
  • Tradeoff locality vs. false sharing
  • ???s? array of arrays ?ste ?a ßeßa?????µe ?t?
    ???e submatrix e??a? t?p??et?µ??? s??e??µe?a st?
    µ??µ?.
  • Tradeoff false sharing vs. instruction overhead
  • Partial-Block Invalidation
  • To block sp?e? se sub-blocks, ??a ???e ??a ap?
    ta ?p??a d?at??e?ta? t? state.
  • Se ???e miss f?????µe ??a ta invalid sub-blocks.
  • ?????µe invalidate µ??? t? sub-block p?? pe????e?
    ta ded?µ??a p?? ?a t??p?p???????.
  • Tradeoff less false sharing miss vs. more
    invalidation messages

53
Scalable Multiprocessor Systems
  • ?a s?st?µata p?? st??????ta? st? ???s? d?ad??µ??
    de? e??a? scalable.
  • ??a ta modules (cores, memories, etc) s??d???ta?
    µe ??a set ?a??d???.
  • ?e?????sµ??? bandwidth ? ?e? a????eta? µe t??
    p??s?es? pa?ap??? epe?e??ast?? ? Saturation
    (???esµ??).
  • ?e?a??te?? bus ? ?e?a??te?? latency.
  • ??a scalable s?st?µa p??pe? ?a a?t?µet?p??e? a?t?
    ta p??ß??µata.
  • ?? s??????? bandwidth ?a p??pe? ?a a????e? µe t??
    a???µ? t?? epe?e??ast??.
  • ? ?????? p?? apa?te?ta? ??a ??p??a e????e?a de ?a
    p??pe? ?a a????e? p??? (p?. ???et???) µe t?
    µ??e??? t?? s?st?µat??.
  • ???pe? ?a e??a? cost-effective.
  • ?????µe ßas???? ?d??t?te? t?? d?ad??µ??.
  • ????st?? a???µ?? ta?t??????? transactions.
  • ?e? ?p???e? global arbitration.
  • ?a ap?te??sµata (p?. a??a??? st? state) ?????ta?
    ape??e?a? ??at? µ??? ap? t??? ??µß??? p??
    s?µµet????? st? transaction.

54
Scalable Cache Coherence
  • Interconnect
  • ??t??at?stas? t?? d?ad??µ?? µe scalable
    interconnects (point-to-point networks, eg. mesh)
  • Processor snooping bandwidth
  • ????? t??a ta p??t?????a ??a?a? broadcast (spam
    everyone!)
  • ?e???? p?s?st? snoops de? p???a???? ??p??a
    µet?ßas?
  • G?a loosely shared data, ?at? p?sa p??a??t?ta
    µ??? ??a? epe?e??ast?? ??e? a?t???af?
  • ? Scalable Directory protocol
  • ??d?p???s? µ??? t?? epe?e??ast?? p?? t???
    e?d?af??e? ??a s???e???µ??? block (spam only
    those that care!)

55
Directory-Based Cache Coherence (1)
  • To cache block state de? µp??e? ?a ?a????ste?
    p???? pa?a????????ta? ta requests st? shared bus.
    (implicit determination)
  • ?a?????eta? ?a? d?at??e?ta? se ??a µ????
    (directory) ?p?? ta requests µp????? ?a
    ape???????? ?a? ?a t? a?a?a??????. (explicit
    determination)
  • ???e memory block ??e? ??a directory entry
  • Book-keeping (p???? nodes ????? a?t???afa, t?
    state t?? memory copy, )
  • ??a ta requests ??a t? block p??a????? st?
    directory.

56
Directory-Based Cache Coherence (2)
57
Directory-Based Cache Coherence (3)
58
Directory Protocol Taxonomy
59
Directory-Based Cache Coherence (4)
  • Directory Protocols
  • ?aµ???te?? ?ata????s? bandwidth
  • - ?e?a??te?e? ?a??ste??se?? (latency)
  • ??? pe??pt?se?? read miss
  • Unshared block ? get data from memory
  • Bus 2 hops (P0 ? memory ? P0)
  • Directory 2 hops (P0 ? memory ? P0)
  • S/E block ? get data from processor (P1)
  • Bus 2 hops (P0 ? P1 ? P0) (?p???t??ta? ?t?
    ep?t??peta? ? cache-to-cache µetaf??? ded?µ????)
  • Directory 3 hops (P0 ? memory ? P1? P0)
  • ? de?te?? pe??pt?s? pa?at??e?ta? a??et? s???? se
    p???epe?e??ast??? s?st?µata
  • ????? p??a??t?ta ?a ??e? t? block ??a?
    epe?e??ast??
Write a Comment
User Comments (0)
About PowerShow.com