Improving the performance of a Multicore architecture with different Cache coherence protocols and i - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Improving the performance of a Multicore architecture with different Cache coherence protocols and i

Description:

Improving the performance of a Multi-core architecture with different Cache ... an directory is maintained across each L2 caches whish store the status of block. ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 25
Provided by: rupeshc
Category:

less

Transcript and Presenter's Notes

Title: Improving the performance of a Multicore architecture with different Cache coherence protocols and i


1
  • Improving the performance of a Multi-core
    architecture with different Cache coherence
    protocols and its related work

2
What is Multi-Core Architecture ?
A Multi-Core Architecture is a method of
embedding a no of cores on a single chip. A Multi
Core Architecture improves the performance of a
system by computing a no of tasks at the same
time.
3
Improving the performance of a Multi-core
architecture with different Cache coherence
protocols and its related work
4
Improving the performance of a Multi-core
architecture with different Cache coherence
protocols and its related work
5
  • Multi core architecture

Core 1
Core 2
Core 3
6
What is cache?
  • Cache is simply a high speed Static RAM
    (SRAM). Every processor consists of a cache which
    is useful in retrieving the data which will be
    used frequently. The cache is very fast in
    retrieving the frequently used data when compared
    to a Dynamic RAM (main memory).

7
Cache coherence?
  • Cache Coherence is a protocol which makes the
    caches of different processors work correctly and
    efficiently i.e. the caches may contain an
    inconsistent data in which one cache may contain
    one data and the other one may have a different
    data which causes the data inconsistency problem.

8
Cache Coherence Problem
9
  • Cache Coherence Schemes
  • Software Cache Coherence Schemes
  • Software-Based Coherence which is
    straightforward approach that is shared data is
    not cached.
  • Hardware Cache Coherence Schemes
  • Hardware-Based Coherence is one which is
    imposed by snoop devices attached to the cores
    and their caches. In this scheme shared data can
    be cached because caches are guaranteed to
    coherent, but programmer must deal with
    synchronization of shared data.

10
  • Write-invalidate a processor gains exclusive
    access ofa block before writing by invalidating
    all other copies
  • Write-update when a processor writes, it
    updates other shared copies of that block

11
  • Directory-based A single location (directory)
    keeps trackof the sharing status of a block of
    memory
  • Snooping Every cache block is accompanied by
    the sharingstatus of that block all cache
    controllers monitor theshared bus so they can
    update the sharing status of theblock, if
    necessary

12
Snoopy based cache coherence
  • Snoopy based cache coherence protocol employs
    a bus connected to all L1 caches. In this
    mechanism for every L1 cache misses a coherence
    message is placed in the global state that is in
    L2 cache which is connected to bus and all other
    L1 caches maintain their cache states and respond
    to the message if it belongs to them. The
    messages generally used are request messages,
    invalidation messages, intervention messages,
    data block transfers, etc.

13
Pn
P0


bus snoop
Memory bus
memory op from Pn
Mem
Mem
The memory bus is a broadcast medium Caches
contain information on which addresses they
store Cache Controller snoops all transactions
on the bus A transaction is a relevant
transaction if it involves a cache block
currently contained in this cache Take action to
ensure coherenceinvalidate, update, or supply
value
14
  • Limits of Snoopy Coherence

BW per processor gt 1.28 GB/s combined
BW Assume 4 GHz processor gt 16 GB/s inst BW
per processor (32-bit) gt 9.6 GB/s data BW at
30 load-store of 8-byte elements Suppose 98
inst hit rate and 90 data hit rate gt 320 MB/s
inst BW per processor gt 960 MB/s data Assuming
10 GB/s bus bandwidth 8 processors will saturate
the bus
MEM
MEM

1.28 GB/s

cache
cache
25.6 GB/s
PROC
PROC
15
  • In cache coherence protocols like snoopy based
    protocols, different messages have different
    latency and different bandwidth needs. So in
    order to exploit these interconnect composed of
    wires are designed which have different
    latencies, bandwidth and energy properties. Using
    some techniques cache coherence protocols can
    exploit these interconnect wires and improve
    processor performance and also reduce ower
    consumption

16
  • Techniques employed to improve snoopy based cache
    coherency protocols

3 wired OR signals In this technique first
signal is given when any other cache has a copy
of block besides the requester and second signal
is asserted when any cache has exclusive copy of
block. The third signal is asserted when all
snoop actions are completed on the bus.9 When
the third signal is asserted, the requesting L1
and the L2 can safely examine the other two
signals. Since all of these signals are on the
critical path, implementing them using
low-latency L-Wires can improve performance.
17
  • Another technique used to improve snoopy based
    protocol with low latencies is voting wires.
    Generally cache to cache transfers occur from the
    data in the modified state, in which case there
    is a single supplier.10 On the other hand in
    MESI protocol a block can be retrieved from other
    cache rather from memory. Multiple caches share
    copy voting mechanism is generally employed to
    supply data therefore voting mechanism works with
    low latencies and improves processor performance

18
Directory based Protocol
  • In directory based protocol memory is
    distributed among different processors and for
    each such memory, directory is maintained. L1
    cache misses are sent to L2 caches and an
    directory is maintained across each L2 caches
    whish store the status of block. When request
    comes from requester node from another cache the
    request goes to home node where the original data
    is stored to check whether it has, if it is not
    available the request goes to remote node by home
    node and first fetches data from remote node and
    later send it to requester node. In chip
    multiprocessors especially in the latest
    technology we are using that is in core2duo
    write-invalidate-direct based protocol is
    employed.

19
(No Transcript)
20
Techniques used to improve Directory based cache
coherency
  • Exclusive Read Request for a block in a shared
    state
  • Read request for block in exclusive state
  • Proximity Aware coherence Protocols

21
  • Exclusive Read Request for a block in a
    shared state
  • In this approach both the acknowledgment and
    reply messages were send simultaneously through
    the corresponding low latency L-wires and low
    power PW wires. This approach improves the
    performance and decreases the consumption of power

22
  • Read request for block in exclusive state
  • This approach mainly follows the ways of
    improving the performance by sending the
    prioritized data through the L-wires and the
    least prioritized one through PW-wires.

23
  • ACCELERATING COHERENCE VIA PROXIMITY AWARENESS
  • In this approach the requester will send the
    read request to the home node if it is not found
    in its L2 cache then home node will sends the
    data if it is shared by it otherwise if the data
    is shared then it will forwards the request to
    the nearest node which contains the data and that
    nearest remote node will send the data to the
    requester and an ACK to the home node, if the
    home node does not get the ACK it will try for
    some more times if doesnt get an ACK from the
    remote node the it will send the data to the
    requester directly from the memory.

24
Conclusion
  • In conclusion in this paper we say that
    multi core processors are better choice than
    multiprocessor because chip complexity is
    reduced, high frequency is employed and achieve
    better performance with low power consumption.
    However cache coherence problem is the issue in
    multi-core processor. Using various protocols
    like snoopy based and directory based protocols,
    cache coherence problem is eliminated however it
    comes at the cost of trade off between latency
    and bandwidth. In snoopy based protocols we use
    wire implementation techniques like 3 wired OR
    wires, Voting wires to improve latencies at
    expense of high bandwidth. Directory based
    protocols are alternatives to snoopy based
    protocols which achieves low latencies and high
    bandwidth and this protocol is implemented in
    present day technologies like in Core2Duo
    processors.
Write a Comment
User Comments (0)
About PowerShow.com