Internal Memory - PowerPoint PPT Presentation

1 / 80
About This Presentation
Title:

Internal Memory

Description:

Computer Organization and Architecture Chapter 4 Cache Memory Topics Computer Memory System Overview Memory Hierarchy Cache Memory Principles Elements of Cache Design ... – PowerPoint PPT presentation

Number of Views:797
Avg rating:3.0/5.0
Slides: 81
Provided by: Adria189
Category:
Tags: cache | internal | memory

less

Transcript and Presenter's Notes

Title: Internal Memory


1
Computer Organization and Architecture
Chapter 4 Cache Memory
2
Topics
  • Computer Memory System Overview
  • Memory Hierarchy
  • Cache Memory Principles
  • Elements of Cache Design
  • Pentium and PowerPC Cache

3
Computer Memory System Overview
  • Characteristics of memory systems
  • Location
  • Capacity
  • Unit of transfer
  • Access method
  • Performance
  • Physical type
  • Physical characteristics
  • Organization

4
Location
  • CPU registers, control memory
  • Internal main memory
  • External secondary memory

5
Capacity
  • In terms of words or bytes
  • 1 Byte 8 bits
  • Word size
  • The natural unit of organization
  • size 8, 16, and 32 bits are common, even 64 bits

6
Unit of Transfer
  • Number of data elements transferred at a time
  • Internal
  • Usually governed by data bus width
  • External
  • Usually a block which is much larger than a word

7
Addressable Unit
  • Smallest location which can be uniquely addressed
  • Word or byte
  • E.g., Motorola 68000
  • word 16 bits
  • internal transfer unit 16 bits
  • addressable unit 8 bits (byte addressable)
  • Let A address length in bits
  • N addressable units
  • ? 2A N

8
Access Methods (1)
  • Sequential
  • Data does not have a unique address
  • Start at the beginning and read through in order
  • Must read intermediate data items until the
    desired item found
  • Access time depends on location of data and
    previous location
  • e.g. tape

9
Access Methods (2)
  • Direct
  • Individual blocks have unique addresses
  • Access is by jumping to vicinity plus sequential
    search
  • Access time depends on location and previous
    location
  • e.g. disk

10
Access Methods (3)
  • Random
  • Individual addresses identify locations exactly
  • Location can be selected randomly and addressed
    and accessed directly
  • Access time is independent of location or
    previous access (i.e., constant)
  • e.g. RAM

11
Access Methods (4)
  • Associative
  • A variation of random access
  • Data is located by a comparison with contents of
    a portion of the store
  • All words are searched simultaneously
  • Access time is independent of location or
    previous access
  • e.g. cache

12
Performance (1)
  • Access time
  • Time between presenting the address and getting
    the valid data
  • For random access memory time to address data
    unit and perform transfer
  • For non-random access memory time to position
    hardware mechanism at the desired position
  • Memory Cycle time
  • Primarily applied to random access memory
  • Time may be required for the memory to recover
    before next access
  • Cycle time is (access time recovery time)

13
Performance (2)
  • Transfer Rate R bps
  • Rate at which data can be transferred in/out of
    memory
  • For random access memory, R 1/(memory cycle
    time)
  • For non-random access memory, TN TA N/R,
    where
  • TN average time to R/W N bits
  • TA average access time
  • N bits

14
Physical Types
  • Semiconductor
  • RAM
  • Magnetic
  • Disk Tape
  • Optical
  • CD DVD

15
Physical Characteristics
  • Decay
  • Volatility
  • Erasability
  • Power consumption

16
Organization
  • Physical arrangement of bits into words
  • Not always obvious
  • e.g. interleaved

17
The Bottom Line
  • How much?
  • Capacity
  • How fast?
  • Time is money
  • How expensive?
  • Cost/bit

18
Memory Hierarchy (1)
  • Major design objective of memory systems
  • Provision of adequate storage capacity at
  • an acceptable level of performance
  • a reasonable cost
  • Memory technologies
  • Smaller access time ? greater cost/bit
  • Greater capacity ? smaller cost/bit
  • Greater capacity ? greater access time
  • ? DILEMMA
  • ? Solution MEMORY HIERARCHY

19
(No Transcript)
20
Memory Hierarchy (2)
  • If
  • Memory organized according to A) - C)
  • Data and instruction distributed according to D)
  • then
  • Overall cost reduced
  • Level of performance maintained
  • How can we validate D)?

21
Locality of Reference (1)
  • Basis for validity of D)
  • During the course of the execution of a program,
    memory references tend to cluster
  • Examples?
  • Over a long period of time, clusters in used
    migrate from one locality to another
  • Over a short period of time, fixed clusters are
    used primarily
  • Current locality kept in high speed memory
  • ? average access time reduced

22
Locality of Reference (2)
  • Spatial locality
  • Tendency of execution to involve a number of
    memory locations that are clustered
  • E.g., sequential instruction access, subroutines,
    arrays, tables
  • Temporal locality
  • Tendency to access memory locations that have
    been used recently
  • E.g., iteration loops

23
Typical Memory Hierarchy
  • Registers
  • In CPU
  • Internal or Main memory
  • May include one or more levels of cache
  • RAM
  • External memory
  • Backing store

24
Hierarchy List
  • Registers
  • L1 Cache
  • L2 Cache
  • Main memory
  • Disk cache
  • Disk
  • Optical
  • Tape

25
Performance example (1)
  • Assume 2-level memory system
  • Level 1 access time T1
  • Level 2 access time T2
  • Hit ratio, H fraction of time a reference
    can be found in level 1
  • Average access time, Tave
  • prob(found in level1) x T(found in level1)
    prob(not found in level1) x T(not found in
    level1)
  • H xT1 (1- H ) x (T1 T2 )
  • T1 (1 - H )T2

26
Performance example (2)
  • Assume 2-level memory system
  • Level 1 access time T1 1 ?s
  • Level 2 access time T2 10 ?s
  • Hit ratio, H 95
  • Average access time,
  • Tave H xT1 (1- H )x(T1 T2 ) .95 x 1
    (1 - .95) X (1 10) .95 .05 X 11
    1.5 ?s

27
Performance example (3)
Higher hit ratio ? better performance
28
So you want Speed?
  • It is possible to build a computer which uses
    only static RAM (technique for cache)
  • This would be very fast
  • This would need no cache
  • How can you cache cache?
  • This would cost a very large amount
  • Stick with memory hierarchy!

29
Cache Memory Principles
  • Objective
  • High speed
  • Large memory size
  • Less expensive memory system
  • Cache
  • Small amount of fast memory
  • Sits between normal main memory and CPU
  • May be located on CPU chip or module

30
Cache and Main Memory
  • Cache contains a copy of portions of main memory
  • smaller, faster larger, slower

31
Cache operation - overview
  • Consider READ operation
  • CPU requests contents of memory location
  • Check cache for this data
  • If present, get from cache (fast)
  • If not present, read required block from main
    memory to cache
  • Then deliver from cache to CPU
  • Q Why delivering a whole block into cache?
  • Cache includes tags to identify which block of
    main memory is in each cache slot

32
Typical Cache Organization
33
Cache/Main-Memory Structure
  • Memory
  • 2n addressable words
  • each word has a unique n-bit address
  • M fixed length blocks of K words each ? M 2n/K
  • Cache
  • C slots (lines) of K words each
  • C ltlt M

34
Cache/Main-Memory Structure
  • At any time, some subset of blocks resides in
    lines
  • As C ltlt M, each line includes a tag indicating
    which block is being stored
  • tag is a portion of an address

35
(line)
36
Elements of Cache Design
  • Size
  • Mapping Function
  • Replacement Algorithm
  • Write Policy
  • Block Size
  • Number of Caches

37
Size does matter
  • Usually 1K - 512K
  • Cost
  • More cache is more expensive
  • Speed
  • More cache is faster (up to a point)
  • Checking cache for data takes time

38
Mapping Function
  • Algorithms for mapping main memory blocks to
    cache lines
  • Needed, as C ltlt M
  • Approaches
  • Direct
  • Associate
  • Set Associate

39
Mapping Function Example
  • Cache of 64KByte
  • Cache block of 4 bytes
  • i.e. cache is 16K (214) lines of 4 bytes (why?)
  • 16MBytes main memory, byte addressable
  • 24 bit address
  • (224 16M)
  • 4M blocks
  • C 16K, M 4M, C ltlt M

40
Direct Mapping (1)
  • Each block of main memory maps to only one
    possible cache line
  • i.e. if a block is in cache, it must be in one
    specific place
  • Mapping
  • i j mod m, where
  • i cache line number
  • j memory block number
  • m number of lines (i.e., C )

41
Direct Mapping (2)
  • Example of mapping 16 blocks, 4 lines
  • line blocks
  • 0 0, 4, 8, 12
  • 1 1, 5, 9, 13
  • 2 2, 6, 10, 14
  • 3 3, 7, 11, 15
  • Which block (in the line)?
  • No two blocks in the same line have the same Tag
    field in address
  • Check contents of cache by finding line and then
    check Tag

42
Direct Mapping - Address Structure
  • Address is in 3 fields
  • Least Significant w bits identify unique word in
    a block (or line)
  • Most Significant s bits specify one memory block
  • The MSBs are split into
  • cache line field of r bits, where m 2r (or C
    2r)
  • tag of s-r (most significant) bits

43
Direct Mapping Cache Line Table
  • Cache line Main Memory blocks held
  • 0 0, m, 2m, , 2s-m
  • 1 1, m1, 2m1, , 2s-m1
  • m-1 m-1, 2m-1, 3m-1, , 2s-1

44
Direct Mapping Cache Organization
45
Direct Mapping Example (1)
Tag s-r
Line or Slot r
Word w
14
2
8
  • 24 bit address
  • 2 bit word identifier (4 byte block)
  • 22 bit block identifier
  • 8 bit tag (22-14)
  • 14 bit slot or line
  • Again
  • No two blocks in the same line have the same Tag
    field
  • Check contents of cache by finding line and
    checking Tag

46
(No Transcript)
47
Direct Mapping Example (2)
  • Q1 Where in cache is the word from main memory
    location 16339D mapped?
  • 0 C E 7
  • Ans Line 0CE7, Tag 16, word offset 1
  • Q2 Where in cache is the word from main memory
    location ABCDEF mapped?

Tag 8 bits
Line 14 bits
Word 2 bits
01
0001 0110
0011 0011 1001 11
48
Direct Mapping Summary
  • Address length (s w) bits
  • Number of addressable units 2sw words or bytes
  • Block size line size 2w words or bytes
  • Number of blocks in main memory 2s w/2w 2s
  • Number of lines in cache m 2r
  • Size of tag (s r) bits

49
Direct Mapping pros cons
  • Advantages
  • Simple
  • Inexpensive to implement
  • Disadvantage
  • Fixed location for given block
  • ? If a program accesses 2 blocks that map to the
  • same line repeatedly, cache misses are very
    high
  • ? These blocks will be continually swapped in
    and out ? Hit ratio will be low

50
Associative Mapping
  • A main memory block can load into any line of
    cache
  • Memory address is interpreted as tag and word
  • Tag uniquely identifies block of memory
  • Every lines tag is examined for a match
  • Cache searching gets expensive
  • must simultaneously examine every lines tag for
    a match

51
Fully Associative Cache Organization
52
F
F
F
53
Associative MappingAddress Structure Example
Word 2 bit
Tag 22 bit
  • 24 bit address
  • 22 bit tag stored with each 32 bit block of data
  • Compare tag field with tag entry in cache to
    check for hit
  • Least significant 2 bits of address identify
    which byte is required from 32 bit data block

54
Associative Mapping Example
Word 2 bit
Tag 22 bit
  • Address Tag Cache line Offset Data
  • FFFFFC 3FFFFF 3FFF 00 24
  • 16339D 058CE7 0001 01 DC
  • ABCDEF ? ? ? ?

55
Associative Mapping Summary
  • Address length (s w) bits
  • Number of addressable units 2sw words or bytes
  • Block size line size 2w words or bytes
  • Number of blocks in main memory 2s w/2w 2s
  • Number of lines in cache undetermined
  • Size of tag s bits

56
Associate Mapping pros cons
  • Advantage
  • Flexible
  • Disadvantages
  • Cost
  • Complex circuit for simultaneous comparison

57
Set Associative Mapping
  • Compromise between the previous two
  • Cache is divided into v sets of k lines each
  • m v x k, where m lines
  • i j mod v, where
  • i cache set number
  • j memory block number
  • A given block maps to any line in a given set
  • K-way set associate cache
  • 2-way and 4-way are common

58
Set Associative Mapping Example
  • m 16 lines, v 8 sets
  • ? k 2 lines/set, 2 way set associative mapping
  • Assume 32 blocks in memory, i j mod v
  • set blocks
  • 0 0, 8, 16, 24
  • 1 1, 9, 17, 25
  • 7 7, 15, 23, 31
  • A given block can be in one of 2 lines in only
    one set
  • e.g., block 17 can be assigned to either line 0
    or line 1 in set 1

59
Set Associative MappingAddress Structure
Word w bit
Tag (s-d) bit
Set d bit
  • d bits v 2d, specify one of v sets
  • s bits specify one of 2s blocks
  • Use set field to determine cache set to look in
  • Compare tag field simultaneously to see if we
    have a hit

60
K Way Set Associative Cache Organization
61
Set Associative MappingExample
Word 2 bit
Tag 9 bit
Set 13 bit
  • Same example, 2-way set associate
  • 214 lines, 2 lines/set ? 213 sets ? 29 blocks can
    be loaded to either of the two lines in a set
  • Each block mapped into a set has a unique tag
  • E.g., Address Tag Set Offset Data
  • FFFFFF ? 1FF 7FFF 1FF 1FFF 11 68
  • 16339D ? 02C 339D 02C 0CE7 01 DC
  • ABCDEF ? ? ? ? ?

62
(No Transcript)
63
Set Associative Mapping Summary
  • Address length (s w) bits
  • Number of addressable units 2sw words or bytes
  • Block size line size 2w words or bytes
  • Number of blocks in main memory 2d
  • Number of lines in set k
  • Number of sets v 2d
  • Number of lines in cache kv k 2d
  • Size of tag (s d) bits

64
Remarks
  • Why is the simultaneous comparison cheaper here,
    compared to associate mapping?
  • Tag is much smaller
  • Only k tags within a set are compared
  • Relationship between set associate and the first
    two extreme cases of set associate
  • k 1 ? v m ? direct (1 line/set)
  • k m ? v 1 ? associate (one big set)

65
Replacement Algorithms (1)Direct mapping
  • Replacement algorithm
  • When a new block is brought into cache, one of
    existing blocks must be replaced
  • Direct Mapping
  • No choice
  • Each block only maps to one line
  • Replace that line

66
Replacement Algorithms (2)Associative Set
Associative
  • Hardware implemented algorithm (speed)
  • Least Recently used (LRU)
  • e.g. in 2 way set associative
  • Which of the 2 block is LRU?
  • First in first out (FIFO)
  • replace block that has been in cache longest
  • Least frequently used
  • replace block which has had fewest hits
  • Random

67
Write Policy
  • Must not overwrite a cache block unless main
    memory is up to date
  • Multiple CPUs may have individual caches
  • I/O may address main memory directly

68
Write through
  • All writes go to main memory as well as cache
  • Both copies always agree
  • Multiple CPUs can monitor main memory traffic to
    keep local (to CPU) cache up to date
  • Disadvantage
  • Lots of traffic ? bottleneck

69
Write back
  • Updates initially made in cache only
  • Update bit for cache slot is set when update
    occurs
  • If block is to be replaced, write to main memory
    only if update bit is set, i.e., only if the
    cache line is dirty, i.e., only if at least
    one word in the cache line is updated
  • Other caches get out of sync
  • I/O must access main memory through cache
  • N.B. 15 of memory references are writes

70
Block Size
  • Block size line size
  • As block size increases from very small
  • ? hit ratio increases because of
  • the principle of locality
  • As block size becomes very large
  • ? hit ratio decreases as
  • Number of blocks decreases
  • Probability of referencing all words in a block
    decreases
  • 4 - 8 addressable units is reasonable

71
Number of Caches
  • Two aspects
  • Number of levels
  • Unified vs. split

72
Multilevel Caches
  • Modern CPU has on-chip cache (L1) that increases
    overall performance
  • e.g., 80486 8KB
  • Pentium 16KB
  • PowerPC up to 64KB
  • Secondary, off-chip cache (L2) provides high
    speed access to main memory
  • Generally 512KB or less

73
Unified vs. Split
  • Unified cache
  • Stores data and instructions in one cache
  • Flexible and can balance the load between data
    and instruction fetches
  • ? higher hit ratio
  • Only one cache to design and implement
  • Split cache
  • Two caches, one for data and one for instructions
  • Trend toward split cache
  • Good for superscalar machines that support
    parallel execution, prefetch, and pipelining
  • Overcome cache contention

74
Pentium 4 Cache
  • 80386 no on chip cache
  • 80486 single on-chip, 8k using 16 byte lines
    and four way set associative organization
  • Pentium (all versions) two on chip L1 caches
  • Data instructions
  • Pentium 4 L1 caches
  • 8k bytes
  • 64 byte lines
  • four way set associative
  • L2 cache
  • Feeding both L1 caches
  • 256k
  • 128 byte lines
  • 8 way set associative

75
Pentium 4 Diagram (Simplified)
76
Pentium 4 Core Processor
  • Fetch/Decode Unit
  • Fetches instructions from L2 cache
  • Decode into micro-ops
  • Store micro-ops in L1 cache
  • Out of order execution logic
  • Schedules micro-ops
  • Based on data dependence and resources
  • May speculatively execute
  • Execution units
  • Execute micro-ops
  • Data from L1 cache
  • Results in registers
  • Memory subsystem
  • L2 cache and systems bus

77
Pentium 4 Design Reasoning
  • Decodes instructions into RISC like micro-ops
    before L1 cache
  • Micro-ops fixed length
  • Superscalar pipelining and scheduling
  • Pentium instructions long complex
  • Performance improved by separating decoding from
    scheduling pipelining
  • (More later ch14)
  • Data cache is write back
  • Can be configured to write through
  • L1 cache controlled by 2 bits in register
  • CD cache disable
  • NW not write through
  • 2 instructions to invalidate (flush) cache and
    write back then invalidate

78
Power PC Cache Organization
  • 601 1 x 32kb 8-way set associative 32b/line
  • 603 2 x 8kb 2-way set associative 32b/line
  • 604 2 x 16kb 4-way set associative 32b/line
  • 620 2 x 32kb 8-way set associative 64b/line
  • G3 G4
  • 2 x 32kb L1 cache
  • 8 way set associative
  • G3 64b/line, G4 32b/line
  • 256k, 512k or 1M L2 cache
  • two way set associative

79
PowerPC G4
80
Comparison of Cache Sizes
Write a Comment
User Comments (0)
About PowerShow.com