CS1104%20Help%20Session%20I%20Memory%20Semester%20II%202001/02 - PowerPoint PPT Presentation

About This Presentation
Title:

CS1104%20Help%20Session%20I%20Memory%20Semester%20II%202001/02

Description:

... pigeon holes. Current computers have about 128,000,000 pigeon holes. ... Each pigeon hole is given a number, starting from 0. This number is called an 'address' ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 51
Provided by: soc128
Category:

less

Transcript and Presenter's Notes

Title: CS1104%20Help%20Session%20I%20Memory%20Semester%20II%202001/02


1
CS1104 Help Session IMemorySemester II 2001/02
  • Colin Tan,
  • S15-04-05,
  • Ctank_at_comp.nus.edu.sg

2
Memory
  • Memory can be visualized as a stack of pigeon
    holes. Current computers have about 128,000,000
    pigeon holes.
  • Each pigeon hole is given a number, starting from
    0. This number is called an address.
  • Each pigeon hole will contain either data (e.g.
    numbers you want to add together) or instruction
    (e.g. add two numbers)

3
Memory
  • Memory locations 0 to 3 contain instructions,
    locations 4 to 6 contain data.
  • Note In reality, instructions are also encoded
    into numbers!

4
Addresses
  • As mentioned, each pigeon hole has a number
    identifying it called an address.
  • When the CPU requires an instruction, it will
    send the instructions address to memory, and
    the memory will return the instruction at that
    address.
  • E.g. At IF CPU will send 0 to memory, and the
    memory returns li t1, 5
  • At MEM CPU will send 6 to memory, and memory
    returns 10.
  • At WB, CPU writes 10 back to t1.

5
Addressing Bits
  • Computers work only in binary
  • Hence addresses generated in the previous example
    are also in binary!
  • In general, to address a maximum of n memory
    locations, you will need m log2 n bits in your
    address.
  • Conversely, if you had m bits in your address,
    you can access a maximum of 2m memory locations.

6
Memory Hierarchy
  • Motivation
  • Not all memory is created equal
  • Cheap Memory gt Slow
  • Fast Memory gt Expensive
  • DRAM, 70 ns access time, 1/MByte
  • SRAM, 8 ns access time, 50/Mbyte
  • So, you can choose either
  • Have fast but very small memory, OR
  • Large but very slow memory.

7
Memory Hierarchy
  • Memory hierarchy gives you a third option
  • Large, but very fast memory
  • Though slower than the expensive memory mentioned
    earlier.

8
Locality
  • Locality is a particular type of behavior
    exhibited by running programs
  • Spatial locality If a memory location has been
    accessed, it is very likely its neighbor will
    also be accessed.
  • Temporal locality If a memory location has been
    accessed, it is very likely that it will be
    accessed again sometime soon.

9
Locality - Example
  • Consider the following program
  • for(i0 ilt10 i)
  • ai bi ci

10
Locality - Example
  • In memory it will look like this

11
Locality - Example
  • Tracing the execution of the program

12
Locality - Example
  • Focusing only on the addresses of the fetched
    instructions, we see that the addresses the
    instructions are fetched from are
  • 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 2, 3, 4, 5, 6,
    7, 8, 9, 10, 2, 3, 4, 5,
  • Here we see both
  • Spatial locality (e.g. after location 0 is
    accessed, location 1 is accessed, then 2, etc.)
  • Temporal locality (e.g. location 2 is accessed 10
    times!)

13
Effect of Locality
  • Locality means that in the short run out of all
    the memory you have (perhaps up to 128,000,000
    pigeon holes!), only a very small number of
    locations are actually being accessed!
  • In our example for ten iterations only memory
    locations 2 to 10 are being accessed out of
    128,000,000 possible locations!
  • What if we had a tiny amount of very fast (but
    expensive!) memory and kept these locations in
    that fast memory?
  • We can speed up access times dramatically!!
  • This is the idea behind caches.

14
How Do Caches Help?
  • The average time to access memory (AMAT) is given
    by
  • AMAT hit_rate Tcache miss_rate x (Tmemory
    Tcache)
  • Tcache Time to read the cache (8ns for SRAM
    cache)
  • Tmemory Time to read main memory (70ns for
    DRAM)
  • miss_rate Probability of not finding what we
    want in the cache.
  • Because of locality, miss_rate is very small
  • Typically about 3 to 5.
  • Here, our AMAT 0.95 8ns 0.05 x (70 8) ns
    11.5 ns
  • Our AMAT is about 43 slower than pure SRAM cache
    memory (11.5 ns vs. 8 ns)

15
How Do Caches Help?
  • What about cost?
  • Lets consider
  • A system with 32 MB of DRAM memory, 512KB of SRAM
    cache.
  • Cost is 1/MB for DRAM, and 50/MB for SRAM.
  • If we had 32MB of SRAM, access time is 8 ns, but
    cost will be 1,600
  • With 32MB of DRAM, cost is 32, but access time
    is 70 ns!
  • But with 32MB of DRAM and 512 (1/2 MB) of SRAM,
    cost will be 32 (512/1024) 50 57!

16
How do Caches Help?
  • So with pure SRAM, we can have 8 ms average
    access time at 1,600.
  • With pure DRAM, our memory will cost 32, but all
    accesses will take 70 ns!
  • With DRAM memory and SRAM cache, we can have 11.5
    ms access time at 57.
  • So for a performance drop of 43, we have a cost
    savings of gt2700!
  • Hence caches give us large memory size (32 MB),
    at close to the cost of the DRAM technology (57
    vs. 32), but at close to the speed of expensive
    SRAM technology (11.5 ms vs. 8 ms)

17
Cache Architecture
  • Caches consist of blocks (or lines). Each block
    stores data from memory
  • Block allocation problem
  • Given data from an address A, how do we decide
    which block of cache its data should go to?

18
The Block Allocation Problem
  • 3 possible solutions
  • Data from each address A will go to to a fixed
    block.
  • Direct Mapped Cache
  • Data from each address A will go to any block.
  • Fully associative cache
  • Data from address A will go to a fix set of
    blocks.
  • Data may be put into any block within a set.
  • Set associative cache.

19
Direct Mapped Caches
  • The value of a portion of memory address is used
    to decide which block to send the data to

Address A
Tag
Block Index
Block Offset
Byte Offset
  • The Block Index portion is used to decide which
    block data from this address should go to.

20
Example
  • The number of bits in the block index is log2N,
    where N is the total number of blocks.
  • For a 4-block cache, the block index portion of
    the address will be 2 bits, and these 2 bits can
    take on the value of 00, 01, 10 or 11.
  • The exact value of these 2 bits will determine
    which block the data for that address will go to.

21
Direct Mapped Addressing E.g.
  • Show how an addresses generated by the MIPS CPU
    will be divided into byte offset, block offset,
    block index and tag portions for the following
    cases
  • i) Block size 1 word, 128 blocks
  • ii) Block size 4 words, 64 blocks
  • All MIPS addresses are 32 bit byte addresses
    (i.e. they address individual bytes in a word).

22
Case I
23
Case II
24
Example
  • The value of the two block index bits will
    determine which block the data will go to,
    following the scheme shown below

25
Solving Direct-Mapped Cache Problems
  • Question 7.7
  • Basic formula
  • Blk_Addr floor(word_address/words_per_block)
    mod N
  • N here is the total number of blocks in the cache
  • This is the mathematical version of taking the
    value of the Block Index bits from the address.

26
A ComplicationMultiple Word Blocks
  • Single word blocks do not support spatial
    locality
  • Spatial locality Likelihood of accessing
    neighbor of a piece of data that was just
    accessed is high.
  • But with single word blocks, none of the
    neighbors are in cache!
  • All accesses to neighbors that were not accessed
    before will miss!

27
An ExampleQuestion 7.8
28
Accessing Individual Words
  • In our example, each block has 4 words.
  • But we always access memory 1 word at a time!
    (e.g. lw)
  • Use the Block Offset to specify which of the 4
    words in a block we want to read

Address A
29
The Block Offset
  • Number of block offset bits log2M, where M is
    the number of words per block.
  • For our example, M4. So number of block offset
    bits is 2.
  • These two bits can take on the values of 00, 01,
    10 and 11.
  • Note that for single word blocks, the number of
    block offset bits is log2 1, which is 0. I.e.
    There are no block offset bits for single-word
    blocks.
  • These values determine exactly which word within
    a block address A is referring to

30
Who am I?Purpose of the Tag
  • Many different addresses may map to the same
    block e.g. (Block Index portions shown
    highlighted)
  • All 3 addresses are different, but all map to
    block 00010010

31
Disambiguation
  • We need a way to disambiguate the situation
  • Otherwise how do we know that the data in block x
    actually comes from address A and not from
    another address A that has the same block index
    bit value?
  • The portion of the address A to the left of the
    Block Index can be used for disambiguation.
  • This portion is called the tag, and the tag for
    address A is stored in the cache together with
    address A data.

32
The Tag
  • When we access the cache, the Tag portion and
    Block Index portions of address A are extracted.
  • The Block Index portion will tell the cache
    controller which block of cache to look at.
  • The Tag portion is compared against the tag
    stored in the block. If the tags match, we have a
    cache hit. The data is read from the cache.

33
Accessing Individual Bytes
  • MIPS addresses are byte addresses, and actually
    index individual bytes rather than words.
  • Each MIPS word consists of 4 bytes.
  • The byte offset tells us exactly which byte
    within a word we are referring to.

34
Advantages Disadvantages ofDirect Mapped Caches
  • Advantages
  • Simple to implement
  • Fast performance
  • Less time to detect a cache hit gt less time to
    get data from the cache gt faster performance
  • Disadvantages
  • Poor temporal locality.
  • Many addresses may map to the same block.
  • The next time address A is accessed, it may have
    been replaced by the contents of address A.

35
Improving Temporal LocalityThe Fully Associative
Cache
  • In the fully associative cache, data from an
    address A can go to any block in cache.
  • In practice, data will go into the first
    available cache block.
  • When the cache is full, a replacement policy is
    invoked to choose which block of cache to throw
    out.

36
Advantages and DisadvantagesFully Associative
Cache
  • Good temporal locality properties
  • Flexible block placement allows smart replacement
    policies such that blocks that are likely to be
    referenced again will not be replaced. E.g. LRU,
    LFU.
  • Disadvantages
  • Complex and too expensive for large caches
  • Each block needs a comparator to check the tag.
  • With 8192 blocks, we need 8192 comparators!

37
A CompromiseSet Associative Caches
  • Represents a compromise between direct-mapped and
    fully associative caches.
  • Cache is divided into sets of blocks.
  • An address A is mapped directly to a set using a
    similar scheme as for direct mapped caches.
  • Once the set has been determined, the data from A
    may be stored in any block within a set - Fully
    associative within a set!

38
Set Associative Cache
  • An n-way set associative cache will have n blocks
    per set.
  • For example, for a 16-block cache that is
    implemented as a 2-way set associative cache,
    each set has 2 blocks, and we have a total of 8
    sets.

39
Advantages and DisadvantagesSet Associative Cache
  • Advantages
  • Almost as simple to build as a direct-mapped
    cache.
  • Only n comparators are needed for an n-way set
    associative cache. For 2-way set-associative,
    only 2 comparators are needed to compare tags.
  • Supports temporal locality by having full
    associativity within a set.

40
Advantages and DisadvantagesSet Associative Cache
  • Disadvantages
  • Not as good as fully-associative cache in
    supporting temporal locality.
  • For LRU schemes, because of small associativity,
    actually possible to have 0 hit rate for
    temporally local data.
  • E.g. If our accesses are A1 A2 A3 A1 A2 A3, and
    if A1, A2 and A3 map to the same 2-way set, then
    hit rate is 0 as subsequent accesses replace
    previous accesses in the LRU scheme.

41
Multi-level Cache
  • Let the first level of cache (closest to CPU) be
    called L1, and the next level L2.
  • Let Phit_l1 be the hit rate of L1, Tcache_L1 be
    the cache access time of L1, Tmiss_L1 be the miss
    penalty of L1.
  • AMAT of L1 Phit_l1 Tcache_L1 (1-Phit_l1)
    Tmiss_L1
  • What is Tmiss_L1?
  • If L1 misses, then we will attempt to get data
    from L2. Hence Tmiss_l1 is actually just the AMAT
    of L2!
  • Let Phit_l2 be the hit rate of L2, Tcache_l2 be
    the cache access time of L2, Tmiss_l2 be the miss
    penalty of L2.

42
Multilevel Cache
  • Tmiss_l1 AMATl2 Phit_l2 Tcache_L2
    (1-Phit_l2) Tmiss_L2
  • Substitute this back and we get
  • AMAT of L1 Phit_l1 Tcache_L1 (1-Phit_l1)
    (Phit_l2 Tcache_L2 (1-Phit_l2) Tmiss_L2)
  • Tmiss_l2 is of course the time taken to access
    the slow DRAM memory.
  • What if we had an L3 cache?

43
Other Problems
  • Question 7.9

44
Virtual Memory Motivation
  • Drive space is very very cheap
  • Typically about 2cents per megabyte.
  • It would be ideal if we could set aside a portion
    of drive space to be used as memory.
  • Unfortunately disk drives are very slow
  • Fastest access time is about 10ms, or about 1,000
    times slower than SRAM and several hundred times
    slower than DRAM.
  • Idea Use drive space as memory, and main memory
    to cache the drive space!
  • This is the idea behind virtual memory.

45
Main Idea
  • Virtual memory (residing on disk) is cached by
    main memory
  • Main memory is cached by system cache
  • All memory transfers are only between consecutive
    levels (e.g. VM to main memory, main memory to
    cache).

46
Cache vs. VM
  • Concept behind VM is almost identical to concept
    behind cache.
  • But different terminology!
  • Cache Block VM Page
  • Cache Cache Miss VM Page Fault
  • Caches implemented completely in hardware. VM
    implemented in software, with hardware support
    from CPU.
  • Cache speeds up main memory access, while main
    memory speeds up VM access.

47
Technical Issues of VM
  • Relatively cheap to remedy cache misses
  • Miss penalty is essentially the time taken to
    access the main memory (around 60-80ns).
  • Pipeline freezes for about 60-80 cycles.
  • Page Faults are EXPENSIVE!
  • Page fault penalty is the time taken to access
    the disk.
  • May take up to 50 or more ms, depending on the
    speed of the disk and I/O bus.
  • Wastes millions of processor cycles!

48
Virtual Memory Design
  • Because page-miss penalties are so heavy, not
    practical to implement direct-mapped or
    set-associative architectures
  • These have poorer hit rates.
  • Main memory caching of VM is always fully
    associative.
  • 1 or 2 improvement in hit rate over other fully
    associative or set associative designs.
  • But with heavy page-miss penalties, 1
    improvement is A LOT!
  • Also relatively cheap to implement full
    associativity in software

49
Summary
  • Memory can be thought of as pigeon holes where
    CPU stores instructions and data.
  • Each pigeon hole (memory location) is given a
    number called its address.
  • Memory technology can be cheap and slow (DRAM) or
    fast and expensive (SRAM)
  • Locality allows us to use a small amount of fast
    expensive memory to store parts of the cheap and
    slow memory to improve performance.
  • Caches are organized into blocks.

50
Summary
  • Mapping between memory addresses and blocks can
    be accomplished by
  • Directly mapping a memory location to a cache
    block (direct map)
  • Slotting a memory location to any block (fully
    associative)
  • Mapping a memory location to a set of blocks,
    then slotting it into any block within the set
    (set associative)
  • Virtual memory attempts to use disk space as
    main memory, DRAM main memory as cache to the
    disk memory, and SRAM as cache to the DRAM.
Write a Comment
User Comments (0)
About PowerShow.com