Internal Memory presentation

About This Presentation

Transcript and Presenter's Notes

Title: Internal Memory

1
Computer Organization and Architecture
Chapter 4 Cache Memory
2
Topics

Computer Memory System Overview
Memory Hierarchy
Cache Memory Principles
Elements of Cache Design
Pentium and PowerPC Cache

3
Computer Memory System Overview

Characteristics of memory systems
Location
Capacity
Unit of transfer
Access method
Performance
Physical type
Physical characteristics
Organization

4
Location

CPU registers, control memory
Internal main memory
External secondary memory

5
Capacity

In terms of words or bytes
1 Byte 8 bits
Word size
The natural unit of organization
size 8, 16, and 32 bits are common, even 64 bits

6
Unit of Transfer

Number of data elements transferred at a time
Internal
Usually governed by data bus width
External
Usually a block which is much larger than a word

7
Addressable Unit

Smallest location which can be uniquely addressed
Word or byte
E.g., Motorola 68000
word 16 bits
internal transfer unit 16 bits
addressable unit 8 bits (byte addressable)
Let A address length in bits
N addressable units
? 2A N

8
Access Methods (1)

Sequential
Data does not have a unique address
Start at the beginning and read through in order
Must read intermediate data items until the
desired item found
Access time depends on location of data and
previous location
e.g. tape

9
Access Methods (2)

Direct
Individual blocks have unique addresses
Access is by jumping to vicinity plus sequential
search
Access time depends on location and previous
location
e.g. disk

10
Access Methods (3)

Random
Individual addresses identify locations exactly
Location can be selected randomly and addressed
and accessed directly
Access time is independent of location or
previous access (i.e., constant)
e.g. RAM

11
Access Methods (4)

Associative
A variation of random access
Data is located by a comparison with contents of
a portion of the store
All words are searched simultaneously
Access time is independent of location or
previous access
e.g. cache

12
Performance (1)

Access time
Time between presenting the address and getting
the valid data
For random access memory time to address data
unit and perform transfer
For non-random access memory time to position
hardware mechanism at the desired position
Memory Cycle time
Primarily applied to random access memory
Time may be required for the memory to recover
before next access
Cycle time is (access time recovery time)

13
Performance (2)

Transfer Rate R bps
Rate at which data can be transferred in/out of
memory
For random access memory, R 1/(memory cycle
time)
For non-random access memory, TN TA N/R,
where
TN average time to R/W N bits
TA average access time
N bits

14
Physical Types

Semiconductor
RAM
Magnetic
Disk Tape
Optical
CD DVD

15
Physical Characteristics

Decay
Volatility
Erasability
Power consumption

16
Organization

Physical arrangement of bits into words
Not always obvious
e.g. interleaved

17
The Bottom Line

How much?
Capacity
How fast?
Time is money
How expensive?
Cost/bit

18
Memory Hierarchy (1)

Major design objective of memory systems
Provision of adequate storage capacity at
an acceptable level of performance
a reasonable cost
Memory technologies
Smaller access time ? greater cost/bit
Greater capacity ? smaller cost/bit
Greater capacity ? greater access time
? DILEMMA
? Solution MEMORY HIERARCHY

19
(No Transcript)
20
Memory Hierarchy (2)

If
Memory organized according to A) - C)
Data and instruction distributed according to D)
then
Overall cost reduced
Level of performance maintained
How can we validate D)?

21
Locality of Reference (1)

Basis for validity of D)
During the course of the execution of a program,
memory references tend to cluster
Examples?
Over a long period of time, clusters in used
migrate from one locality to another
Over a short period of time, fixed clusters are
used primarily
Current locality kept in high speed memory
? average access time reduced

22
Locality of Reference (2)

Spatial locality
Tendency of execution to involve a number of
memory locations that are clustered
E.g., sequential instruction access, subroutines,
arrays, tables
Temporal locality
Tendency to access memory locations that have
been used recently
E.g., iteration loops

23
Typical Memory Hierarchy

Registers
In CPU
Internal or Main memory
May include one or more levels of cache
RAM
External memory
Backing store

24
Hierarchy List

Registers
L1 Cache
L2 Cache
Main memory
Disk cache
Disk
Optical
Tape

25
Performance example (1)

Assume 2-level memory system
Level 1 access time T1
Level 2 access time T2
Hit ratio, H fraction of time a reference
can be found in level 1
Average access time, Tave
prob(found in level1) x T(found in level1)
prob(not found in level1) x T(not found in
level1)
H xT1 (1- H ) x (T1 T2 )
T1 (1 - H )T2

26
Performance example (2)

Assume 2-level memory system
Level 1 access time T1 1 ?s
Level 2 access time T2 10 ?s
Hit ratio, H 95
Average access time,
Tave H xT1 (1- H )x(T1 T2 ) .95 x 1
(1 - .95) X (1 10) .95 .05 X 11
1.5 ?s

27
Performance example (3)
Higher hit ratio ? better performance
28
So you want Speed?

It is possible to build a computer which uses
only static RAM (technique for cache)
This would be very fast
This would need no cache
How can you cache cache?
This would cost a very large amount
Stick with memory hierarchy!

29
Cache Memory Principles

Objective
High speed
Large memory size
Less expensive memory system
Cache
Small amount of fast memory
Sits between normal main memory and CPU
May be located on CPU chip or module

30
Cache and Main Memory

Cache contains a copy of portions of main memory
smaller, faster larger, slower

31
Cache operation - overview

Consider READ operation
CPU requests contents of memory location
Check cache for this data
If present, get from cache (fast)
If not present, read required block from main
memory to cache
Then deliver from cache to CPU
Q Why delivering a whole block into cache?
Cache includes tags to identify which block of
main memory is in each cache slot

32
Typical Cache Organization
33
Cache/Main-Memory Structure

Memory
2n addressable words
each word has a unique n-bit address
M fixed length blocks of K words each ? M 2n/K
Cache
C slots (lines) of K words each
C ltlt M

34
Cache/Main-Memory Structure

At any time, some subset of blocks resides in
lines
As C ltlt M, each line includes a tag indicating
which block is being stored
tag is a portion of an address

35
(line)
36
Elements of Cache Design

Size
Mapping Function
Replacement Algorithm
Write Policy
Block Size
Number of Caches

37
Size does matter

Usually 1K - 512K
Cost
More cache is more expensive
Speed
More cache is faster (up to a point)
Checking cache for data takes time

38
Mapping Function

Algorithms for mapping main memory blocks to
cache lines
Needed, as C ltlt M
Approaches
Direct
Associate
Set Associate

39
Mapping Function Example

Cache of 64KByte
Cache block of 4 bytes
i.e. cache is 16K (214) lines of 4 bytes (why?)
16MBytes main memory, byte addressable
24 bit address
(224 16M)
4M blocks
C 16K, M 4M, C ltlt M

40
Direct Mapping (1)

Each block of main memory maps to only one
possible cache line
i.e. if a block is in cache, it must be in one
specific place
Mapping
i j mod m, where
i cache line number
j memory block number
m number of lines (i.e., C )

41
Direct Mapping (2)

Example of mapping 16 blocks, 4 lines
line blocks
0 0, 4, 8, 12
1 1, 5, 9, 13
2 2, 6, 10, 14
3 3, 7, 11, 15
Which block (in the line)?
No two blocks in the same line have the same Tag
field in address
Check contents of cache by finding line and then
check Tag

42
Direct Mapping - Address Structure

Address is in 3 fields
Least Significant w bits identify unique word in
a block (or line)
Most Significant s bits specify one memory block
The MSBs are split into
cache line field of r bits, where m 2r (or C
2r)
tag of s-r (most significant) bits

43
Direct Mapping Cache Line Table

Cache line Main Memory blocks held
0 0, m, 2m, , 2s-m
1 1, m1, 2m1, , 2s-m1
m-1 m-1, 2m-1, 3m-1, , 2s-1

44
Direct Mapping Cache Organization
45
Direct Mapping Example (1)
Tag s-r
Line or Slot r
Word w
14
2
8

24 bit address
2 bit word identifier (4 byte block)
22 bit block identifier
8 bit tag (22-14)
14 bit slot or line
Again
No two blocks in the same line have the same Tag
field
Check contents of cache by finding line and
checking Tag

46
(No Transcript)
47
Direct Mapping Example (2)

Q1 Where in cache is the word from main memory
location 16339D mapped?
0 C E 7
Ans Line 0CE7, Tag 16, word offset 1
Q2 Where in cache is the word from main memory
location ABCDEF mapped?

Tag 8 bits
Line 14 bits
Word 2 bits
01
0001 0110
0011 0011 1001 11
48
Direct Mapping Summary

Address length (s w) bits
Number of addressable units 2sw words or bytes
Block size line size 2w words or bytes
Number of blocks in main memory 2s w/2w 2s
Number of lines in cache m 2r
Size of tag (s r) bits

49
Direct Mapping pros cons

Advantages
Simple
Inexpensive to implement
Disadvantage
Fixed location for given block
? If a program accesses 2 blocks that map to the
same line repeatedly, cache misses are very
high
? These blocks will be continually swapped in
and out ? Hit ratio will be low

50
Associative Mapping

A main memory block can load into any line of
cache
Memory address is interpreted as tag and word
Tag uniquely identifies block of memory
Every lines tag is examined for a match
Cache searching gets expensive
must simultaneously examine every lines tag for
a match

51
Fully Associative Cache Organization
52
F
F
F
53
Associative MappingAddress Structure Example
Word 2 bit
Tag 22 bit

24 bit address
22 bit tag stored with each 32 bit block of data
Compare tag field with tag entry in cache to
check for hit
Least significant 2 bits of address identify
which byte is required from 32 bit data block

54
Associative Mapping Example
Word 2 bit
Tag 22 bit

Address Tag Cache line Offset Data
FFFFFC 3FFFFF 3FFF 00 24
16339D 058CE7 0001 01 DC
ABCDEF ? ? ? ?

55
Associative Mapping Summary

Address length (s w) bits
Number of addressable units 2sw words or bytes
Block size line size 2w words or bytes
Number of blocks in main memory 2s w/2w 2s
Number of lines in cache undetermined
Size of tag s bits

56
Associate Mapping pros cons

Advantage
Flexible
Disadvantages
Cost
Complex circuit for simultaneous comparison

57
Set Associative Mapping

Compromise between the previous two
Cache is divided into v sets of k lines each
m v x k, where m lines
i j mod v, where
i cache set number
j memory block number
A given block maps to any line in a given set
K-way set associate cache
2-way and 4-way are common

58
Set Associative Mapping Example

m 16 lines, v 8 sets
? k 2 lines/set, 2 way set associative mapping
Assume 32 blocks in memory, i j mod v
set blocks
0 0, 8, 16, 24
1 1, 9, 17, 25
7 7, 15, 23, 31
A given block can be in one of 2 lines in only
one set
e.g., block 17 can be assigned to either line 0
or line 1 in set 1

59
Set Associative MappingAddress Structure
Word w bit
Tag (s-d) bit
Set d bit

d bits v 2d, specify one of v sets
s bits specify one of 2s blocks
Use set field to determine cache set to look in
Compare tag field simultaneously to see if we
have a hit

60
K Way Set Associative Cache Organization
61
Set Associative MappingExample
Word 2 bit
Tag 9 bit
Set 13 bit

Same example, 2-way set associate
214 lines, 2 lines/set ? 213 sets ? 29 blocks can
be loaded to either of the two lines in a set
Each block mapped into a set has a unique tag
E.g., Address Tag Set Offset Data
FFFFFF ? 1FF 7FFF 1FF 1FFF 11 68
16339D ? 02C 339D 02C 0CE7 01 DC
ABCDEF ? ? ? ? ?

62
(No Transcript)
63
Set Associative Mapping Summary

Address length (s w) bits
Number of addressable units 2sw words or bytes
Block size line size 2w words or bytes
Number of blocks in main memory 2d
Number of lines in set k
Number of sets v 2d
Number of lines in cache kv k 2d
Size of tag (s d) bits

64
Remarks

Why is the simultaneous comparison cheaper here,
compared to associate mapping?
Tag is much smaller
Only k tags within a set are compared
Relationship between set associate and the first
two extreme cases of set associate
k 1 ? v m ? direct (1 line/set)
k m ? v 1 ? associate (one big set)

65
Replacement Algorithms (1)Direct mapping

Replacement algorithm
When a new block is brought into cache, one of
existing blocks must be replaced
Direct Mapping
No choice
Each block only maps to one line
Replace that line

66
Replacement Algorithms (2)Associative Set
Associative

Hardware implemented algorithm (speed)
Least Recently used (LRU)
e.g. in 2 way set associative
Which of the 2 block is LRU?
First in first out (FIFO)
replace block that has been in cache longest
Least frequently used
replace block which has had fewest hits
Random

67
Write Policy

Must not overwrite a cache block unless main
memory is up to date
Multiple CPUs may have individual caches
I/O may address main memory directly

68
Write through

All writes go to main memory as well as cache
Both copies always agree
Multiple CPUs can monitor main memory traffic to
keep local (to CPU) cache up to date
Disadvantage
Lots of traffic ? bottleneck

69
Write back

Updates initially made in cache only
Update bit for cache slot is set when update
occurs
If block is to be replaced, write to main memory
only if update bit is set, i.e., only if the
cache line is dirty, i.e., only if at least
one word in the cache line is updated
Other caches get out of sync
I/O must access main memory through cache
N.B. 15 of memory references are writes

70
Block Size

Block size line size
As block size increases from very small
? hit ratio increases because of
the principle of locality
As block size becomes very large
? hit ratio decreases as
Number of blocks decreases
Probability of referencing all words in a block
decreases
4 - 8 addressable units is reasonable

71
Number of Caches

Two aspects
Number of levels
Unified vs. split

72
Multilevel Caches

Modern CPU has on-chip cache (L1) that increases
overall performance
e.g., 80486 8KB
Pentium 16KB
PowerPC up to 64KB
Secondary, off-chip cache (L2) provides high
speed access to main memory
Generally 512KB or less

73
Unified vs. Split

Unified cache
Stores data and instructions in one cache
Flexible and can balance the load between data
and instruction fetches
? higher hit ratio
Only one cache to design and implement
Split cache
Two caches, one for data and one for instructions
Trend toward split cache
Good for superscalar machines that support
parallel execution, prefetch, and pipelining
Overcome cache contention

74
Pentium 4 Cache

80386 no on chip cache
80486 single on-chip, 8k using 16 byte lines
and four way set associative organization
Pentium (all versions) two on chip L1 caches
Data instructions
Pentium 4 L1 caches
8k bytes
64 byte lines
four way set associative
L2 cache
Feeding both L1 caches
256k
128 byte lines
8 way set associative

75
Pentium 4 Diagram (Simplified)
76
Pentium 4 Core Processor

Fetch/Decode Unit
Fetches instructions from L2 cache
Decode into micro-ops
Store micro-ops in L1 cache
Out of order execution logic
Schedules micro-ops
Based on data dependence and resources
May speculatively execute
Execution units
Execute micro-ops
Data from L1 cache
Results in registers
Memory subsystem
L2 cache and systems bus

77
Pentium 4 Design Reasoning

Decodes instructions into RISC like micro-ops
before L1 cache
Micro-ops fixed length
Superscalar pipelining and scheduling
Pentium instructions long complex
Performance improved by separating decoding from
scheduling pipelining
(More later ch14)
Data cache is write back
Can be configured to write through
L1 cache controlled by 2 bits in register
CD cache disable
NW not write through
2 instructions to invalidate (flush) cache and
write back then invalidate

78
Power PC Cache Organization

601 1 x 32kb 8-way set associative 32b/line
603 2 x 8kb 2-way set associative 32b/line
604 2 x 16kb 4-way set associative 32b/line
620 2 x 32kb 8-way set associative 64b/line
G3 G4
2 x 32kb L1 cache
8 way set associative
G3 64b/line, G4 32b/line
256k, 512k or 1M L2 cache
two way set associative

79
PowerPC G4
80
Comparison of Cache Sizes

Write a Comment

User Comments (0)

About PowerShow.com

Internal Memory PowerPoint PPT Presentation