Cache - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Cache

Description:

Cache is a fast-memory response to the von Neumann bottleneck ... between the CPU and memory, and. is transparent to the CPU (which is not aware of ... – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 25
Provided by: onyir
Category:
Tags: cache | forsee

less

Transcript and Presenter's Notes

Title: Cache


1
Cache
  • Cache model
  • Mapping algorithms
  • Replacement algorithms
  • Reads/writes
  • Mapping Functions
  • Direct Mapped cache
  • Associative Mapped cache
  • Block-set Associative cache
  • LRU

2
Cache Model
Cache is a fast-memory response to the von
Neumann bottleneck --i.e., the bottleneck that
occurs between the CPU and memory over
the system bus Cache is a fast memory inserted
between the CPU and primary memory-
Primary Memory
CPU
Memory Access Control and Data Paths
Cache
In effect, cache appears between the CPU and
memory, and is transparent to the CPU (which is
not aware of its existence)
3
Cache Model
The arguments which justify cache are- 1
cache will be faster than conventional memory
(perhaps by a factor of 10) 2 the principle of
locality of reference holds i.e., --the
likelihood that instructions are in sequential
locations is high 3 much code consists of tight
loops executed frequently If such code can be
placed into a fast buffer memory (cache) then
performance can be improved. Recall- L1 cache
-- is an on-chip cache -- very fast, but small
(256KB) L2 cache -- is an off-chip cache -- not
quite so fast, but large (1MB)
4
Cache Operation (Simplified)
Conceptually, a cache has a straightforward
operation CPU reads are the easiest (writes take
into consideration the replacement algorithm, see
later)- 1 the CPU issues a read request for a
memory word at some address 2 the read request
is forwarded through the cache circuitry 3 if
the address is in cache, the cache passes the
data to the CPU but 4 if the address is NOT in
cache, the cache loads a block of
memory locations which contain (bracket) the
memory location addressed
5
Cache confusions
There is one area where people can become
confused quickly by cache --the distinction
between a memory location being present in
cache and the contents of that location being
present in cache Cache is smaller than primary
memory (Mp), so the relationship between Mp
locations and cache is not one-to-one (it has to
be at least many-to-one or perhaps many-to-many)
-- else why bother having Mp? So a cache has to
keep track of two things 1 Whether the cache
currently stores a given memory address
(location) and 2 If it does, the contents
(value) of that location Try to remember this as
follows the CPU issues a read from location k
which is intercepted by cache. Cache asks
itself am I currently storing location k, and
if I am, what is its contents?
6
Cache Mapping Functions
The critical point here is that cache has far
fewer locations than Mp (say 1MB cache vs 128MB
Mp). So if a memory location is to be stored in
cache, there must be a way to determine where
that location goes to in cache. This is called
the mapping function Usually, cache will map
blocks of Mp into blocks (of equal size) in
cache block sizes (e.g., 2 words, 4 words, 8
words etc.) are cache-dependent (i.e., system
dependent). Suppose a block of Mp contains
memory addresses k to kb then if memory
location ka (where k lt ka lt kb) is
referenced by the CPU (in a read) AND that
location ka is not currently in cache, then the
cache mapping function will load the
block of addresses (from k to kb) which contains
ka into cache
7
Cache mapping
n cache blocks
A cache mapping function maps blocks Mp into
blocks of cache. Clearly the mapping cannot be
1-1 (else cache would be the same size as Mp
then why have Mp at all?)
8
Replacement algorithms
Suppose cache has n blocks (of 8 words each) and
memory has m blocks (of 8 words each). Let f be
the cache mapping function, so that if q is the
label of a memory block, then f(q) ?
0,1,....,n-1 gives the label (number) of the
block in cache that q maps to. Note that f(q) is
not necessarily many-to-one f(q) can be
many-to-many. Suppose block 15 of Mp can be
mapped to blocks 35 and 36 of cache, and suppose
that both cache blocks are in use -- that is,
some other Mp blocks are using cache blocks 35
and 36. One of these blocks must be replaced by
Mp block 15. The decision as to which one is
replaced is called the replacement algorithm.
9
Replacement Algorithm
x
y
q
Block q of memory can be mapped to blocks x and y
of cache (shown), but blocks x and y are
currently in use by the Mp blocks in
blue shown. The decision as to which cache
block Mp block q gets mapped to is called the
replacement algorithm.
Evidently, if a cache block that q maps to is NOT
is use, then that cache block is loaded with
block q from Mp -- there is no replacement.
10
cache hits/misses
Suppose the CPU requests (reads) data from Mp
address ka. Suppose address ka is in Mp block
q (which well say spans k to kb) Then there
are two possibilities a) block q is in
cache thus the content of address ka is in
cache thus cache can pass the contents to the
CPU this is a cache hit b) block q is NOT in
cache thus the cache system has to load block
q thus the content of ka is not in cache, and
must be read from Mp instead this is a cache
miss Cache performance is often measured in
terms of the hit/miss ratio
11
Load-through
Suppose the CPU issues a read on address ka (in
Mp block q). If Mp block q is in cache, then the
data is forwarded (at cache speed) to the
CPU However, if Mp block q is NOT in cache, the
cache system will read Mp block q from Mp. Cache
can pass the data of location ka (in block q) to
the CPU in one of two ways- 1 The cache can
wait until it has loaded block q in its entirety
before it sends the contents of k1 to the
CPU or 2 The cache can send the CPU the
contents of address Ka as soon as it loads that
address from Mp this is a better strategy, and
is called Load-Through
12
Writing Data
Writing data is a bit more complex. Suppose the
CPU writes a value to the location ka (in Mp
block q). If Mp block q is NOT in cache, the
CPU writes to Mp as always. But does it
transfer the block to cache? The answer depends
on the cache system. Usually, the answer is
NO. What happens if Mp block q is in
cache? There are two ways to proceed 1
Update memory simultaneously with the cache write
(known as the write-through strategy) or 2
Mark the fact that cache has been written to, but
do not update memory UNTIL THE AFFECTED CACHE
BLOCK GETS REPLACED (known as the write-back or
copy-back strategy)
13
Write on Replace
There are good reasons to avoid updating Mp when
a cache location is updated (written to). For
example, consider the following code
snippet- for (i0 ilt10000 i) ...do
something Suppose the variable i stays in
cache during the execution of this code note
that i gets updated 10000 times -- but is it
worth writing these updates back to Mp? Probably
not it would be better to wait until the cache
block containing the variable i were replaced
i.e. some other Mp block overwrites it. Then the
replaced block can be written back to memory (or
the affected words can be written back to
memory). Cache can do this by keeping a spare
bit associated with each word in the
cache --called often the dirty bit, it is used to
signal that a cache word was updated and needs to
be written back to Mp.
14
Cache Coherency
Consider a shared-memory computer system. This
can be a single computer with multiple CPUs and a
single memory system, or a distributed system
consisting of multiple computers. An important
thing to note if the write-back protocol is
used if a cache location is updated (written
to), then the real location (in Mp) and the
mirror copy (in cache) will no longer contain the
same value. The locations will be out of step.
Now suppose a different computer reads the
location of that variable in memory -- that
computer will have a wrong value for the
variable (which is only correct in the
cache). This is the basis of what is called the
cache coherency problem which affects shared-memor
y computers having distinct cache systems.
15
A simple cache/memory model
It helps us to be consistent throughout the
following. Therefore let us agree that the
typical system we are exploring has the following
properties- addresses are word addresses, not
byte addresses word size is 16 bits Mp has 64K
(65536) locations (64KW) cache has 2048
locations (2KW) blocks are 16 words in size
(very large for a cache) Thus, cache has 128
blocks (12816 2048) while Mp has 4096 blocks
(409616 65536) So main memory is 32 times as
large as the cache.
16
Mapping Strategies I Direct Mapping
In the direct mapping strategy, Mp block q is
mapped to cache block p where p q mod N
where N is the number of blocks in cache. In
our sample case, N 128, so memory block q maps
to cache block p under p q mod 128 --so
memory block 0 maps to cache block 0 1 maps
to cache block 1 .. .. 127 maps to cache
block 127 128 maps to cache block 0 129 maps
to cache block 1 .. .. 255 maps to cache
block 127 256 maps to cache block 0 etc.
17
Mapping Strategies I Direct Mapping
Memory block 0 1 2 3 4 127 128 129
4094 4095
Cache block 0 1 2 3 127
In the direct mapping technique, an Mp block q
maps to cache block p where p q mod N and N
number of blocks in cache
18
Mapping Strategies I Direct Mapping
Under direct mapping, cache block can contain
Mp block 0 0,128,256,384,512,640,..........,384
0,3968 1 1,129,257,385,512,641,..........,3841,
3969 .. .. 127 127,255,383,511,639,.........
....,3967,4095 In fact, each cache block can
have 32 possible Mp blocks mapped to it --there
are 128 cache blocks and 4096 Mp blocks, so
4096/128 32 If cache block p can have 32
possible Mp blocks loaded in to it, which Mp
block is currently in cache block p? How does
the cache know this?
19
Mapping Strategies I Direct Mapping
The answer to the previous question depends upon
address partitioning i.e., a logical division of
an Mp address into component fields. Recall we
have a 16-bit (word oriented) address, and that
Mp (and cache) blocks have 16 words in
them. 164096 65536
16 bits
Notice that an Mp address can be partitioned into
2 fields an upper 12 bits, which identifies the
Mp block number (address), and a lower 4 bits,
which identify the word within that block (one of
16 within a block)
20
Mapping Strategies I Direct Mapping
16 bits
4-bit word address in block
12 bit Block Address (Label)
Recall that each cache block can hold one of 32
possible Mp blocks e.g., cache block 0 could
have Mp blocks 0,128,256,... mapped to it. Now
the upper 12 bits of the address above specify
the Mp block number (in 12 bits). Note that the
lower 7 bits of this group specifies the cache
block that the given Mp block maps to-
12 bit Block Address (Label)
5 bits
7 bits specifying the cache block
21
Mapping Strategies I Direct Mapping
To see how this works, look at address 32385.
First of all, we note that address 32385 is in Mp
block 2024, which contains addresses 202416
to 202416 15 i.e., 32384 to 32399 Mp block
2024 maps to cache block p 2024 mod 128
104 Now examine 32385 in binary 01111
1101000 0001 and note that 0001 1 (word 1
of block) 1101000 104 (104th cache
block) 01111 15 (15th Mp block that can map
to block 104) These upper 5 bits identify the
different Mp blocks (32 of them) which can map
to any given cache block so every Mp address
decomposes as-
5 bits (tag) 7 bits (cache
block) 4 bits (word in block)
22
Mapping Strategies I Direct Mapping
In a direct-mapped cache, each block of cache has
an associated Tag Field (consisting of, in our
example, 5 bits). The upper 5 bits of the
address(es) in the memory block which is
currently in a given cache block are stored in
these Tag bits this is how (in a direct-mapped
cache) the cache knows what block of memory is in
a given cache block
the blocks shown can map to cache block 1 this
block is actually in cache block 1
Tag Bits (5)
The middle 7 bits (red) identify the cache
block The upper 5 bits are used by cache to mark
which Mp block is in cache
23
Mapping Strategies I Direct Mapping
This is how a direct-mapped cache determines if a
given word with address k is in cache- a) it
divides the address into the respective
fields -upper 5 bits (tag) -middle 7 bits
(cache block) -lower 4 bits (word in block) It
uses the middle 7 bits to identify the cache
block the word would be in. It then checks the
TAG bits of this cache block (5 bits) to see if
the 5 bits match the upper 5 bits of the
requested word. If they do, that memory block
(and therefore that word) is in the cache, and
the cache uses the lower 4 bits to identify the
word in the block. This word is passed to the CPU
(if the operation in question is a read
operation). If the tag bits DO NOT MATCH, then
the block is not in the cache, and must be loaded
into cache from Mp.
24
Mapping Strategies I Direct Mapping
Replacement Algorithm- In a direct mapped
cache, the replacement algorithm is
trivial- if a word in memory block q is
requested and memory block q maps to cache
block p and cache block p currently contains
another memory block r (not q) then if the
dirty bit has been set for this block (or words
in it) then the old Mp block r (currently in
cache block p) is written back to Mp and Mp
block q is loaded into cache block p In other
words, an Mp block can only go to the cache block
it is mapped to
Write a Comment
User Comments (0)
About PowerShow.com