CS 3xx Introduction to High Performance Computer Architecture: Address Accessible Memories - PowerPoint PPT Presentation

1 / 192
About This Presentation
Title:

CS 3xx Introduction to High Performance Computer Architecture: Address Accessible Memories

Description:

In pursuit of improving the performance and hence to reduce the CPU time ... Erasable Programmable ROM (EPROM) Electrically Alterable ROM (EAROM) Flash Memory ... – PowerPoint PPT presentation

Number of Views:101
Avg rating:3.0/5.0
Slides: 193
Provided by: alihu2
Category:

less

Transcript and Presenter's Notes

Title: CS 3xx Introduction to High Performance Computer Architecture: Address Accessible Memories


1
CS 3xx Introduction to High Performance Computer
Architecture Address Accessible Memories
  • A.R. Hurson
  • 325 CS Building,
  • Missouri ST
  • hurson_at_mst.edu

2
Introduction to High Performance Computer
Architecture
  • Read sections 7.1-7.4

3
Introduction to High Performance Computer
Architecture
  • Memory System
  • In pursuit of improving the performance and hence
    to reduce the CPU time

in this section we will talk about the memory
system.
  • The goal is to develop means to reduce m
  • and k.

4
Introduction to High Performance Computer
Architecture
  • Memory System
  • Memory Requirements for a Computer
  • An internal storage medium to store the
    intermediate as well as the final results,
  • An external storage medium to store input
    information, and
  • An external storage medium to store permanent
    results for future

5
Introduction to High Performance Computer
Architecture
  • Memory System
  • Different parameters can be used in order to
    classify the memory systems.
  • In the following we will use the access mode in
    order to classify memory systems
  • Access mode is defined as the way the information
    stored in the memory is accessed.

6
Introduction to High Performance Computer
Architecture
  • Memory System Access Mode
  • Address Accessible Memory Where information is
    accessed by its address in the memory space.
  • Content Addressable Memory Where information is
    accessed by its contents (or partial contents).

7
Introduction to High Performance Computer
Architecture
  • Memory System Access Mode
  • Within the scope of address accessible memory we
    can distinguish several sub-classes
  • Random Access Memory (RAM) Access time is
    independent of the location of the information.
  • Sequential Access Memory (SAM) Access time is a
    function of the location of the information.
  • Direct Access Memory (DAM) Access time is
    partially independent of and partially dependent
    on the location of the information.

8
Introduction to High Performance Computer
Architecture
  • Memory System Access Mode
  • Even within each subclass, we can distinguish
    several sub subclasses.
  • For example within the scope of Direct Access
    Memory we can recognize different groups
  • Movable head disk,
  • Fixed head disk,
  • Parallel disk

9
Introduction to High Performance Computer
Architecture
  • Memory System
  • Movable head disk Each surface has just one
    read/write head. To initiate a read or write,
    the read/write head should be positioned on the
    right track first seek time.
  • Seek time is a mechanical movement and hence,
    relatively, very slow and time consuming.

10
Introduction to High Performance Computer
Architecture
  • Memory System
  • Fixed head disk Each track has its own
    read/write head. This eliminates the seek time.
    However, this performance improvement comes at
    the expense of cost.

11
Introduction to High Performance Computer
Architecture
  • Memory System
  • Parallel disk To respond the growth in
    performance and capacity of semiconductor,
    secondary storage technology, introduced RAID
    Redundant Array of Inexpensive Disks.
  • In short RAID is a large array of small
    independent disks acting as a single high
    performance logical disk.

12
Introduction to High Performance Computer
Architecture
  • Memory System RAID
  • RAID increases the performance and reliability.
  • Data Striping
  • Redundancy

13
Introduction to High Performance Computer
Architecture
  • Memory System RAID
  • Concept of data striping (distributing data
    transparently over multiple disks) is used to
    allow parallel access to the data and hence to
    improve disk performance.
  • In data striping, the data set is partitioned
    into equal size segments, and segments are
    distributed over multiple disks.
  • The size of segment is called the striping unit.

14
Introduction to High Performance Computer
Architecture
  • Memory System RAID
  • Redundant information allows reconstruction of
    data if a disk fails. There are two choices to
    store redundant data
  • Store redundant information on a small number of
    separate disks check disks.
  • Distribute the redundant information uniformly
    over all disks.
  • Redundant information can be an exact duplicate
    of the data or we can use a Parity scheme
    additional information that can be used to
    recover from failure of any one disk in the array.

15
Introduction to High Performance Computer
Architecture
  • Memory System RAID
  • Level 0 Striping without redundancy
  • Offers the best write performance no redundant
    data is being updated.
  • Offers the highest Space utilization.
  • Does not offer the best read performance.

16
Introduction to High Performance Computer
Architecture
  • Memory System RAID
  • Level 0

17
Introduction to High Performance Computer
Architecture
  • Memory System RAID
  • Level 1 Mirrored Two identical copies
  • Each disk has a mirror image.
  • Is the most expensive solution space
    utilization is the lowest.
  • Parallel reads are allowed.
  • Write involves two disks, in some cases this will
    be done in sequence.
  • Maximum transfer rate is equal to the transfer
    rate of one disk No striping.

18
Introduction to High Performance Computer
Architecture
  • Memory System RAID
  • Level 1

Mirrored
19
Introduction to High Performance Computer
Architecture
  • Memory System RAID
  • Level 0 1 Striping and Mirroring
  • Parallel reads are allowed.
  • Space utilization is the same as level 1.
  • Write involves two disks and cost of write is the
    same as Level 1.
  • Maximum transfer rate is equal to the aggregate
    bandwidth of the disks.

20
Introduction to High Performance Computer
Architecture
  • Memory System RAID
  • Level 2 Error Correcting Codes
  • The striping unit is a single bit.
  • Hamming coding is used as a redundancy scheme.
  • Space utilization increases as the number of data
    disks increases.
  • Maximum transfer rate is equal to the aggregate
    bandwidth of the disks Read is very efficient
    for large requests and is bad for small requests
    of the size of an individual block.

21
Introduction to High Performance Computer
Architecture
  • Memory System RAID
  • Level 2

Error Correcting Codes
22
Introduction to High Performance Computer
Architecture
  • Memory System RAID
  • Level 3 Bit Interleaved Parity
  • The striping unit is a single bit.
  • Unlike the level 2, the check disk is just a
    single parity disk, and hence it offers a higher
    space utilization than level 2.
  • Write protocol is similar to level 2
    read-modify-write cycle.
  • Similar to level 2, can process one I/O at a time
    each read and write request involves all disks.

23
Introduction to High Performance Computer
Architecture
  • Memory System RAID
  • Level 3

24
Introduction to High Performance Computer
Architecture
  • Memory System RAID
  • Level 4 Block Interleaved Parity
  • The striping unit is a disk block.
  • Read requests of the size of a single block can
    be served just by one disk.
  • Parallel reads are possible for small requests,
    large requests can utilize full bandwidth.
  • Write involves modified block and check disk.
  • Space utilization increases with the number of
    data disks.

25
Introduction to High Performance Computer
Architecture
  • Memory System RAID
  • Level 4

26
Introduction to High Performance Computer
Architecture
  • Memory System RAID
  • Level 5 Block Interleaved Distributed Parity
  • Parity blocks are uniformly distributed among all
    disks. This eliminates the bottleneck at the
    check disk.
  • Several writes can be potentially done in
    parallel.
  • Read requests have a higher level of parallelism.

27
Introduction to High Performance Computer
Architecture
  • Memory System RAID
  • Level 5

28
Introduction to High Performance Computer
Architecture
  • Memory System RAID
  • Level 6 PQ Redundancy
  • Can tolerate higher level of failure than level
    2.
  • It requires two check disks and similar to level
    5, redundant blocks are uniformly distributed at
    the block level over the disks.
  • For small and large read requests, and large
    write requests, the performance is similar to
    level 5.
  • For small write requests, it behaves the same as
    level 2.

29
Introduction to High Performance Computer
Architecture
  • Memory System RAID
  • Level 6

30
Introduction to High Performance Computer
Architecture
  • Memory System RAID

31
Introduction to High Performance Computer
Architecture
  • Memory System RAM
  • Random access memory can also be grouped into
    different classes
  • Read Only Memory (ROM)
  • Programmable ROM
  • Erasable Programmable ROM (EPROM)
  • Electrically Alterable ROM (EAROM)
  • Flash Memory

32
Introduction to High Performance Computer
Architecture
  • Memory System RAM
  • Read/Write Memory (RWM)
  • Static RAM (SRAM)
  • Dynamic RAM (DRAM)
  • Synchronous DRAM
  • Double-Data-Rate SDRAM
  • Volatile/Non-Volatile Memory
  • Destructive/Non-Destructive Read Memory

33
Introduction to High Performance Computer
Architecture
  • Memory System
  • Within the scope of Random Access Memory we are
    concerned about two major issues
  • Access Gap Is the difference between the CPU
    cycle time and the main memory cycle time.
  • Size Gap Is the difference between the size of
    the main memory and the size of the information
    space.

34
Introduction to High Performance Computer
Architecture
  • Memory System
  • Within the scope of the memory system, the goal
    is to design and build a system with low cost per
    bit, high speed, and high capacity. In other
    words, in the design of a memory system we want
    to
  • Match the rate of the information access with the
    processor speed.
  • Attain adequate performance at a reasonable cost.

35
Introduction to High Performance Computer
Architecture
  • Memory System
  • The appearance of a variety of hardware as well
    as software solutions represents the fact that in
    the worst cases the trade-off between cost,
    speed, and capacity can be made more attractive
    by combining different hardware systems coupled
    with special features memory hierarchy.

36
Introduction to High Performance Computer
Architecture
  • Memory System Access gap
  • Access gap problem was created by the advances in
    technology. In fact in early computers, such as
    IBM 704, CPU and main memory cycle time were
    identical 12 µsec.
  • IBM 360/195 had the logic delay of 5 hsec per
    stage, a CPU cycle time of 54 hsec and a main
    memory cycle time of .756 µsec.
  • CDC 7600 had CPU and main memory cycle times of
    27.5 hsec and .275 µsec, respectively.

37
Introduction to High Performance Computer
Architecture
  • Access gap
  • How to reduce the access gap bottleneck
  • Software Solutions
  • Devise algorithmic techniques to reduce the
    number of accesses to the main memory.
  • Hardware Solutions
  • Reduce the access gap.
  • Advances in technology
  • Interleaved memory
  • Application of registers
  • Cache memory

38
Introduction to High Performance Computer
Architecture
  • Access gap Interleaved Memory
  • A memory is n-way interleaved if it is composed
    of n independent modules, and a word at address i
    is in module number i mod n.
  • This implies consecutive words in consecutive
    memory modules.
  • If the n modules can be operated independently
    and if the memory bus line is time shared among
    memory modules then one should expect an increase
    in bandwidth between the main memory and the CPU.

39
Introduction to High Performance Computer
Architecture
  • Access gap Interleaved Memory
  • Dependencies in the programs branches and
    randomness in accessing the data will degrade the
    effect of memory interleaving.

40
Introduction to High Performance Computer
Architecture
  • Access gap Interleaved Memory
  • To show the effectiveness of memory interleaving,
    assume a pure sequential program of m
    instructions.
  • For a conventional system in which main memory is
    composed of a single module, the system has to go
    through m-fetch cycles and m-execute cycles in
    order to execute the program.
  • For a system in which main memory is composed of
    n modules, the system executes the same program
    by executing ém/nù-fetch cycles and m-execute
    cycles.

41
Introduction to High Performance Computer
Architecture
  • Access gap Interleaved Memory
  • The concept of modular memory can be traced back
    to the design of the so-called Harvard-class
    machines, where the main memory was composed of
    two modules namely, program memory and data
    memory.

42
Introduction to High Performance Computer
Architecture
  • Access gap Interleaved Memory
  • It was in the design of the ILLIAC II System,
    where the concept of the interleaved memory was
    introduced.
  • In this machine, the memory was composed of two
    units. The even addresses generated by the CPU
    were sent to the module 0 and the odd addresses
    were directed to the module 1.

43
Introduction to High Performance Computer
Architecture
  • Access gap Interleaved Memory
  • In general when the main memory is composed of n
    different modules, the addresses in the address
    space can be distributed among the memory modules
    in two different fashions
  • Consecutive addresses in consecutive memory
    modules Low Order Interleaving.
  • Consecutive addresses in the same memory module
    High Order Interleaving.

44
Introduction to High Performance Computer
Architecture
  • Access gap Interleaved Memory
  • Whether low order interleaving or high order
    interleaving, a word address is composed of two
    parts
  • Module Address, and
  • Word Address.

45
Introduction to High Performance Computer
Architecture
  • Questions
  • Name and discuss the factors which influence the
    speed, cost, and capacity of the main memory the
    most.
  • Compare and contrast low-order interleaving
    against high-order interleaving.
  • Dependencies in the program degrade the
    effectiveness of the memory interleaving
    justify it.

46
Introduction to High Performance Computer
Architecture
  • Access gap Interleaved Memory
  • Within the scope of interleaved memory, a memory
    conflict (contention, interference) is defined if
    two or more addresses are issued to the same
    memory module.
  • In the worst case all the addresses issued are
    referred to the same memory module.
  • In this case the system's performance will be
    degraded to the level of a single module memory
    organization.

47
Introduction to High Performance Computer
Architecture
  • Access gap Interleaved Memory
  • To take advantage of interleaving, CPU should be
    able to perform look ahead fetches issuing
    addresses before they are really needed.
  • In the case of straight line programs and lack of
    random-data-access such a look ahead policy can
    be enforced very easily and effectively.

48
Introduction to High Performance Computer
Architecture
  • Interleaved Memory Effect of Branch
  • Assume l is the probability of a successful
    branch. Hence 1-l is the probability of a
    sequential instruction (in case of a straight
    line program then l is zero).
  • In the case of interleaving where memory is
    composed of n modules, CPU employs a look-ahead
    policy and issues n-instruction fetches to the
    memory.
  • Naturally, memory utilization will be degraded if
    one of these n instructions generate a successful
    branch.

49
Introduction to High Performance Computer
Architecture
  • Interleaved Memory Effect of Branch
  • P(1) l Prob. of the 1st instruction to
    generate a successful branch.
  • P(2) l(1-l) Prob. of the 2nd instruction to
    generate a successful branch.
  • P(k) l(1-l)k-1 Prob. of the kth instruction to
    generate a successful branch.
  • P(n) (1-l)n-1 Prob. of 1st (n-1) instructions
    to be sequential instructions.

50
Introduction to High Performance Computer
Architecture
  • Interleaved Memory Effect of Branch
  • Note in case of P(n), it does not matter whether
    or not the last instruction is a sequential
    instruction.
  • The average number of memory modules to be used
    effectively

51
Introduction to High Performance Computer
Architecture
  • Interleaved Memory Effect of Branch
  • Example
  • For l 5 and n 4 then IBn 3.8
  • For l 5 and n 8 then IBn 6.8
  • For l 10 and n 4 then IBn 3.4
  • For l 10 and n 8 then IBn 5.7
  • Less branching, as expected, implies higher
    memory utilization.
  • Memory utilization is not linear in the number of
    memory.

52
Introduction to High Performance Computer
Architecture
  • Interleaved Memory Effect of Branch

53
Introduction to High Performance Computer
Architecture
  • Interleaved Memory Effect of Branch

54
Introduction to High Performance Computer
Architecture
  • Interleaved Memory Effect of Random Data-Access
  • In case of data-access, the effectiveness of the
    interleaved memory will be compromised if among
    the n requests made to the memory, some are
    referred to the same memory module.

55
Introduction to High Performance Computer
Architecture
  • Interleaved Memory Effect of Random Data-Access
  • Access requests are queued,
  • A scanner will check the request in the head of
    the queue
  • If no conflict, the request is passed to the
    memory,
  • If conflict then the scanning is suspended as
    long as the conflict is not resolved.

56
Introduction to High Performance Computer
Architecture
  • Interleaved Memory Effect of Random Data-Access

57
Introduction to High Performance Computer
Architecture
  • Interleaved Memory Effect of Random Data-Access

Prob. of one successful access. Prob. of two
successful accesses. Prob. of k
successful accesses.

58
Introduction to High Performance Computer
Architecture
  • Interleaved Memory Effect of Random Data-Access
  • The average number of active memory modules is
  • If n 16 one can conclude that on average, just
  • 4 modules can be kept busy under randomly
  • generated access requests.

59
Introduction to High Performance Computer
Architecture
  • Interleaved Memory Effect of Random Data-Access
  • Naturally, performance of the interleaved memory
    under random data-access can be improved by not
    allowing an access to a busy module to stop other
    accesses to the main memory.
  • In another words, the conflicting access is
    queued and retried again.
  • This concept was first implemented in the design
    of CDC6600 Stunt Box.

60
Introduction to High Performance Computer
Architecture
  • Interleaved Memory Stunt Box
  • Stunt Box is designed to provide a maximum flow
    of addresses to the Main memory.
  • Stunt Box is a piece of hardware that controls
    and regulates accesses to the main memory.
  • Stunt Box allows access out-of-order to the main
    memory.

61
Introduction to High Performance Computer
Architecture
  • Interleaved Memory Stunt Box
  • Stunt Box is composed of three parts
  • Hopper
  • Priority Network
  • Tag Generator and Distributor

62
Introduction to High Performance Computer
Architecture
  • Interleaved Memory Stunt Box
  • Hopper is an assembly of four registers to retain
    storage reference information until storage
    conflicts are resolved.
  • Priority Network prioritizes the requests to the
    memory generated by the central processor and the
    peripheral processors.
  • Tag Generator is used to control read/write
    conflict.

63
Introduction to High Performance Computer
Architecture
  • Interleaved Memory Stunt Box

64
Introduction to High Performance Computer
Architecture
  • Stunt Box Flow of Data and Control
  • Assuming an empty hopper, a storage address from
    one of the sources is entered in register M1.
  • The access request in M1 is issued to the main
    memory.
  • The contents of the registers in hopper are
    circulated every 75 nano seconds.
  • If a request is accepted by the main memory, it
    will not be re-circulated back to the M1.
    Otherwise, after each 300 nano seconds it will be
    sent back to the main memory for a possible
    access.

65
Introduction to High Performance Computer
Architecture
  • Stunt Box Flow of Data and Control
  • Time events of a request
  • t00 - Enter M1
  • t25 - Send to the central storage
  • t75 - M1 to M4
  • t150 - M4 to M3
  • T225 - M3 to M2
  • t300 - M2 to M1 (if not accepted)

66
Introduction to High Performance Computer
Architecture
  • Stunt Box Example
  • Assume a sequence of access requests is initiated
    to the same memory module

67
Introduction to High Performance Computer
Architecture
  • Stunt Box Example
  • The previous chart indicated
  • Access out-of-order,
  • A request to the memory, sooner or later, will be
    granted.

68
Introduction to High Performance Computer
Architecture
  • Read sections 7.5-7.6

69
Introduction to High Performance Computer
Architecture
  • Memory System Cache Memory
  • Principle of Locality
  • Analysis of a large number of typical programs
    has shown that most of their execution time is
    spent in a few main routines.
  • As a result, a number of instructions are
    executed repeatedly. This maybe in the form of a
    single loop, nested loops, or a few subroutines
    that repeatedly call each other.

70
Introduction to High Performance Computer
Architecture
  • Memory System Cache Memory
  • Principle of Locality
  • It has been observed that a program spends 90 of
    its execution time in only 10 of the code
    principle of locality.
  • The main observation is that many instructions in
    each of a few localized areas of the program are
    repeatedly executed, while the remainder of the
    program is accessed relatively infrequently.

71
Introduction to High Performance Computer
Architecture
  • Memory System Cache Memory
  • Principle of Locality locality can be
    represented in two forms
  • Temporal Locality If an item is referenced, it
    will tend to be referenced again soon.
  • Spatial Locality If an item is referenced,
    nearby items tend to be referenced soon.

72
Introduction to High Performance Computer
Architecture
  • Memory System Cache Memory
  • Principle of Locality
  • Now, if it can be arranged to have the active
    segments of a program in a fast memory, then the
    total execution time can be significantly
    reduced.
  • Such a fast memory is called a cache (slave,
    buffer) memory.

73
Introduction to High Performance Computer
Architecture
  • Memory System Cache Memory
  • Principle of Locality
  • Cache is a level of memory inserted between the
    main memory and the CPU.
  • Due to economical reasons, cache is relatively
    much smaller than main memory.
  • To make the cache effective, it must be
    considerably faster than the main memory.

74
Introduction to High Performance Computer
Architecture
  • Memory System Cache Memory
  • Principle of Locality
  • The main memory and the cache are partitioned
    into blocks of equal sizes.
  • Naturally, because of the size gap between the
    main memory and the cache at each moment of time
    a portion of the main memory is resident in the
    cache.

75
Introduction to High Performance Computer
Architecture
  • Memory System Cache Memory
  • The concept of the cache was introduced in mid
    1960s by Wilkes.
  • When a memory request is generated, it is first
    presented to the cache memory, and if the cache
    cannot respond, the request is then presented to
    the main memory.

76
Introduction to High Performance Computer
Architecture
  • Memory System Cache Memory
  • The idea of cache is similar to virtual memory in
    that some active portion of a low-speed memory is
    stored in duplicate in a higher-speed memory.
  • The difference between cache and virtual memory
    is a matter of implementation, the two approaches
    are conceptually the same because they both rely
    on the correlation properties observed in
    sequences of address references.

77
Introduction to High Performance Computer
Architecture
  • Memory System Cache Memory
  • Cache implementations are totally different from
    virtual memory implementation because of the
    speed requirements of cache. If we assume that
    cache memory has an access time of one machine
    cycle, then main memory typically has an access
    time anywhere from 4 to 20 times longer, not 500
    times larger for the delay due to a page fault in
    virtual memory.
  • In general caches are controlled by hardware
    algorithms.

78
Introduction to High Performance Computer
Architecture
  • Memory System Cache Memory
  • Cache vs. Virtual Memory

79
Introduction to High Performance Computer
Architecture
  • Memory System Cache Memory
  • Ranges of parameters for cache

80
Introduction to High Performance Computer
Architecture
  • Memory System Cache Memory
  • Address Mapping
  • Each reference to a memory word is presented to
    the cache.
  • The cache searches its directory
  • If the item is in the cache, then it will be
    accessed from the cache.
  • Otherwise, a miss occurs.

81
Introduction to High Performance Computer
Architecture
  • Cache Memory Address Mapping

82
Introduction to High Performance Computer
Architecture
  • Cache Memory Address Mapping
  • In our previous diagram
  • A reference to address 01173 is responded by
    cache.
  • A reference to address 01163 produces a miss.

83
Introduction to High Performance Computer
Architecture
  • Memory System Cache Memory
  • Replacement Policy
  • For each read operation that causes a cache miss,
    the item is retrieved from the main memory and
    copied into the cache. This forces some other
    item in cache to be identified and removed from
    the cache to make room for the new item (if cache
    is full).
  • The collection of rules which allows such
    activities is referred to as the Replacement
    Algorithm.

84
Introduction to High Performance Computer
Architecture
  • Memory System Cache Memory
  • Replacement Policy
  • The cache-replacement decision is critical, a
    good replacement algorithm, naturally, can yield
    somewhat higher performance than can a bad
    replacement algorithm.

85
Introduction to High Performance Computer
Architecture
  • Memory System Cache Memory
  • Let h be the probability of a cache hit hit
    ratio

and tcache and tmain be the respective cycle
times of cache and main memory then
  • teff tcache (1-h)tmain
  • (1-h) is the probability of a miss miss ratio.

86
Introduction to High Performance Computer
Architecture
  • Cache Memory Issues of Concern
  • Read Policy
  • Load Through
  • Write policy (on hit)
  • Write through
  • Write back Þ dirty bit
  • Write policy (on miss)
  • Write allocate
  • No-write allocate
  • Placement/replacement policy
  • Address Mapping

87
Introduction to High Performance Computer
Architecture
  • Cache Memory Issues of Concern
  • In Case of Miss-Hit
  • For read operation, the block containing the
    referenced address is moved to the cache.
  • For write operation, the information is written
    directly into the main memory.

88
Introduction to High Performance Computer
Architecture
  • Questions
  • Compare and contrast different write policies
    against each other.
  • In case of miss-hit, why are read and write
    operations treated differently?

89
Introduction to High Performance Computer
Architecture
  • Cache Memory Issues of Concern
  • Sources for cache misses
  • Compulsory cold start misses
  • Capacity
  • Conflict placement/replacement policy

90
Introduction to High Performance Computer
Architecture
  • Cache Memory Issues of Concern
  • It has been shown that increasing the cache sizes
    and/or degree of associativity will reduce cache
    miss ratio.
  • Naturally, compulsory miss ratios are independent
    of cache size.

91
Introduction to High Performance Computer
Architecture
  • Memory System Cache Memory
  • Mixed caches Cache contains both instruction
    and data Unified caches.
  • Instruction-only and Data-only caches Dedicated
    caches for instructions and data.

92
Introduction to High Performance Computer
Architecture
  • Memory System Cache Memory
  • In general, miss ratios for instruction caches
    are lower that miss ratios for data caches.
  • For smaller cache sizes, unified caches offer
    higher miss ratios than dedicated caches.
    However, as the cache size increases, miss ratio
    for unified caches relative to the dedicated
    caches reduces.

93
Introduction to High Performance Computer
Architecture
  • Exam question
  • a) Define term interleave memory
  • A memory is n-way interleaved if it is composed
    of n independent modules, and a word at address i
    is in module number i mod n. Modules are
    operated independently and memory bus line is
    time shared among memory modules.

94
Introduction to High Performance Computer
Architecture
  • Exam question
  • b) Define high-order interleaving
  • Consecutive addresses in the same memory module.
  • c) Define low-order interleaving
  • Consecutive addresses in consecutive memory
    modules.

95
Introduction to High Performance Computer
Architecture
  • Exam question
  • d) Compare and contrast high-order interleaving
    with low-order interleaving
  • Speed low order interleaving
  • Fault tolerance high order interleaving
  • Block transfer - high order interleaving
  • Enforcing security - high order interleaving
  • Multiprocessing - high order interleaving

96
Introduction to High Performance Computer
Architecture
  • Exam question
  • e) Two issues affect performance of an
    interleave memory
  • What are they
  • Random access to data
  • Branches in programs
  • Show (proof) how do they affect the
    effectiveness of the interleave memory

97
Introduction to High Performance Computer
Architecture
  • Exam question
  • f) With respect to the part (e), discuss about
    solutions (one for each case)
  • Reduce number of branches in the program, or
  • Expect compiler to reshuffle instructions in an
    attempt to have branch instructions in rightmost
    modules.
  • Application of stunt-box and its intension.

98
Introduction to High Performance Computer
Architecture
  • Exam question
  • As a computer architect, in the design process of
    an ALU, what initial issues one have to keep in
    mind? Name them and discuss about their
    importance.
  • Functionality
  • Representation of information
  • Length and number of operand (s)
  • Organization
  • Serial
  • Parallel
  • modular

99
Introduction to High Performance Computer
Architecture
  • Exam question
  • Cray Y-MP/8 (a vector processor) has a cycle time
    of 6ns. During a cycle, the results of both an
    addition and a multiplication can be completed.
    Furthermore, there are eight processors operating
    simultaneously without interference in the best
    case. Calculate the peak performance of the Cray
    Y-MP/8 (in MIPS).

100
Introduction to High Performance Computer
Architecture
  • Exam question - Solution
  • Peak Performance

101
Introduction to High Performance Computer
Architecture
  • Exam question
  • Assume we have a machine where the CPI is 2.0
    when all memory accesses (instruction fetches and
    data fetches) hit in the cache. The only data
    accesses are loads and stores (note, these are
    one address type instructions), and these total
    40 of the instructions (the rest of instructions
    are dealing with registers). If the miss penalty
    is 25 clock cycles and the miss rate is 2, how
    much faster would the machine be if all accesses
    were cache hits?

102
Introduction to High Performance Computer
Architecture
  • Exam question Solution
  • CPI 2
  • CPI 2 (1 .4) .02 25 2.7
  • Speed up CPI/CPI 2.7/2 1.35

103
Introduction to High Performance Computer
Architecture
  • Exam question
  • Apply the Column Compression technique to perform
    the following operation
  • 1110111
  • 1101011
  • Note numbers are in 2s complement format.
  • Column Compression does not work for 2's
    complement numbers, so we need to convert them
    into positive numbers, or
  • Apply column compression technique on the numbers
    and then correct the result.

104
Introduction to High Performance Computer
Architecture
  • Exam question - Solution

01 00 01
01 01 01 00 01
00 00 00 00 00
00 00 ------------------------
001 000 001 001 001
001 000 001
000 000 000 000 000 -----------------
------
0001001 0010101 -------------
0001001 0000000
0001001 0000000
0001001 0000000
0000000 --------------------
0000000010111101
105
Introduction to High Performance Computer
Architecture
  • Exam question
  • Figure 1 shows the ith stage logic of a parallel
    ALU Where (Ai, Bi and Ci) are the operands and
    the carry-in, respectively, and (S2, S1, S0 and
    M) are the control signals. Determine under what
    values of S2, S1, S0, M, and C1 (carry-in to the
    right most stage) the ALU performs the following
    operation
  • a) F ? A - B (Why?)
  • b) F ? B (Why?)

106
Introduction to High Performance Computer
Architecture
  • Exam question - Solution

F ? B M 0 Logic operation C1 x Dont care S2
0 Disable A S1 0 Disable1s complement of B S0
1 Pass B to ALU
107
Introduction to High Performance Computer
Architecture
  • Exam question
  • Use SRT division method to perform the following
    operation
  • AQ/B where
  • AQ .11001100 and B .0111
  • Show step-by-step operation.

108
Introduction to High Performance Computer
Architecture
  • Exam question Solution
  • Note the divide overflow condition (A) gt (B), to
    eliminate divide overflow Dividend must be
    shifted to the right and 1 should be added to its
    power
  • AQ .0110 0110
  • Normalize B .1110

109
Introduction to High Performance Computer
Architecture
  • Exam question - Solution

1.1110 110 Negative Result, shift and insert 0
1.1101 100 Skip over 1s
1.0110 011 Add B
0.0100 011 Positive Result, shift and insert 1
0.100 0111
1.101 0111 Negative Result, shift and insert 0
1.010 1110 Now we need to correct remainder
110
Introduction to High Performance Computer
Architecture
  • Exam question - Solution

1.101
0.100
111
Introduction to High Performance Computer
Architecture
  • Exam question
  • Assume we are utilizing a parallel disk (RAID)
    composed of 6 and 8 disks ( of data disks
    available).

112
Introduction to High Performance Computer
Architecture
  • Exam question
  • A memory is n-way interleaved if

1) It is composed of n independent modules, 2)
Address i is in (i mod n), and 3) Bus is shared
among modules.
113
Introduction to High Performance Computer
Architecture
  • Exam question
  • Define high-order interleaving,
  • Define low-order interleaving,

Consecutive addresses in the same module.
Consecutive addresses in consecutive modules.
114
Introduction to High Performance Computer
Architecture
  • Exam question
  • Address accessible memory can be classified as
  • Explain access gap as clearly as possible.

1) RAM 2) SAM 3) DAM
The difference between main memory cycle time and
CPU cycle time.
115
Introduction to High Performance Computer
Architecture
116
Introduction to High Performance Computer
Architecture
  • Memory System Cache Memory
  • Address Mapping
  • Direct Mapping
  • Associative Mapping
  • Set Associative Mapping

117
Introduction to High Performance Computer
Architecture
  • Cache Memory Address Mapping
  • In the following discussion assume
  • B block size (2b)
  • C number of blocks in cache (2c)
  • M number of blocks in main memory (2m)
  • S number of sets in cache (2s)

118
Introduction to High Performance Computer
Architecture
  • Cache Memory Address Mapping
  • Direct Mapping
  • Block K of main memory maps into block (K modulo
    C) of the cache.
  • Since more than one main memory block is mapped
    into a given cache position, contention may arise
    even when the cache in not full.

119
Introduction to High Performance Computer
Architecture
  • Cache Memory Address Mapping
  • Direct Mapping
  • Address mapping can be implemented very easily.
  • Replacement policy is very simple and trivial.
  • In general, cache utilization is low.

120
Introduction to High Performance Computer
Architecture
  • Cache Memory Address Mapping
  • Direct Mapping
  • Main memory address is of the following form
  • A Tag-register of length m-c is dedicated to each
  • cache block

121
Introduction to High Performance Computer
Architecture
  • Cache Memory Address Mapping
  • Direct Mapping
  • Content of (tag-register)c is compared against
    the tag portion of the address

122
Introduction to High Performance Computer
Architecture
  • Cache Memory Address Mapping
  • Direct Mapping

123
Introduction to High Performance Computer
Architecture
  • Cache Memory Address Mapping
  • Associative Mapping
  • A block of main memory can potentially reside in
    any cache block position. This flexibility can
    be achieved by utilizing a wider Tag-Register.
  • Address mapping requires hardware facility to
    allow simultaneous search of tag-registers.
  • A reasonable replacement policy can be adopted
    (least recently used).
  • Cache can be used very effectively.

124
Introduction to High Performance Computer
Architecture
  • Cache Memory Address Mapping
  • Associative Mapping
  • Main memory address is of the following form
  • A tag-register of length m is dedicated to each
  • cache block.

125
Introduction to High Performance Computer
Architecture
  • Cache Memory Address Mapping
  • Associative Mapping
  • Contents of Tag portion of the address is
    searched (in parallel) against the contents of
    the Tag-registers
  • If no-match, then miss-hit bring block from
  • memory into the proper cache block.

126
Introduction to High Performance Computer
Architecture
  • Cache Memory Address Mapping
  • Associative Mapping

127
Introduction to High Performance Computer
Architecture
  • Cache Memory Address Mapping
  • Set Associative Mapping
  • Is a compromise between Direct-Mapping and
    Associative Mapping.
  • Blocks of cache are grouped into sets (S), and
    the mapping allows a block of main memory (K) to
    reside in any block of the set (K modulo S).
  • Address mapping can be implemented easily at a
    more reasonable hardware cost relative to the
    associative mapping.

128
Introduction to High Performance Computer
Architecture
  • Cache Memory Address Mapping
  • Set Associative Mapping
  • This scheme allows one to employ a reasonable
    replacement policy within the blocks of a set and
    hence offers better cache utilization than the
    direct-mapping scheme.

129
Introduction to High Performance Computer
Architecture
  • Cache Memory Address Mapping
  • Set Associative Mapping
  • Main memory address is of the following form
  • A tag-register of length m-s is dedicated to each
  • block in the cache.

130
Introduction to High Performance Computer
Architecture
  • Cache Memory Address Mapping
  • Set Associative Mapping
  • Contents of Tag-registerss are compared
    simultaneously against the tag portion of the
    address

131
Introduction to High Performance Computer
Architecture
  • Cache Memory Address Mapping
  • Set Associative Mapping

132
Introduction to High Performance Computer
Architecture
  • Cache Memory Address Mapping
  • Set Associative Mapping

133
Introduction to High Performance Computer
Architecture
  • Questions
  • Compare and contrast unified cache against
    dedicated caches.
  • Compare and contrast Direct Mapping, Associative
    Mapping, and Set Associative Mapping against each
    other.

134
Introduction to High Performance Computer
Architecture
  • Cache Memory IBM 360/85
  • Main Memory (4-way interleaved)
  • Size 512-4096 k bytes
  • Cycle Time 1.04 µsec
  • Block Size 1 k bytes

135
Introduction to High Performance Computer
Architecture
  • Cache Memory IBM 360/85
  • Cache
  • Size 16 k bytes
  • Access Time 80 hsec
  • Block Size1 k bytes
  • Address Mapping Associative Mapping
  • Replacement Policy Least Recently Used
  • Read Policy Read-Through
  • Write Policy Write-Back, write access to the
    main memory does not cause any cache reassignment.

136
Introduction to High Performance Computer
Architecture
  • Cache Memory IBM 360/85
  • Hardware Configuration
  • An associative memory of 16 words, each 14 bits
    long, represents the collection of the
    tag-registers.
  • Each block is a collection of 16 units each of
    length 64 bytes.
  • Each block has a validity register of length 16.

137
Introduction to High Performance Computer
Architecture
  • Cache Memory IBM 360/85
  • Hardware Configuration
  • A validity bit represents the availability of a
    unit within a block in the cache.
  • A unit is the smallest granule of information
    which is transferred between the main memory and
    the cache.
  • The units in a block are brought in on a demand
    basis.

138
Introduction to High Performance Computer
Architecture
  • Cache Memory IBM 360/85
  • Main memory address format

139
Introduction to High Performance Computer
Architecture
  • Cache Memory IBM 360/85
  • Flow of Operations

140
Introduction to High Performance Computer
Architecture
  • Cache Memory IBM 370/155
  • Main Memory (4-way interleaved)
  • Size 256 - 2048 k bytes
  • Cycle time 2.100 µsec
  • Block size 32 bytes

141
Introduction to High Performance Computer
Architecture
  • Cache Memory IBM 370/155
  • Cache
  • Size 8 k bytes
  • Cycle time 230 hsec
  • Block size 32 bytes
  • Address Mapping
  • Set associative mapping
  • Set-size 2

142
Introduction to High Performance Computer
Architecture
  • Cache Memory IBM 370/155
  • Hardware
  • Configuration

143
Introduction to High Performance Computer
Architecture
  • Cache Memory IBM 370/155
  • Main memory address format
  • Cache address format

144
Introduction to High Performance Computer
Architecture
  • Cache Memory IBM 370/155
  • Memory Organization and View

145
Introduction to High Performance Computer
Architecture
  • Cache Memory IBM 370/155

146
Introduction to High Performance Computer
Architecture
  • Cache Memory 68040
  • Two dedicated caches on processor chip
  • Instruction cache
  • Data cache
  • Each cache is of size 4K bytes with 4-way set
    associative organization.
  • Each cache is a collection of 64 sets of 4 blocks
    each.

147
Introduction to High Performance Computer
Architecture
  • Cache Memory 68040
  • Each block is a collection of 4 long words a
    long word is 4 bytes long.
  • Each cache block has a valid bit and a dirty bit.
  • Either write back or write through policy can be
    employed.
  • It uses a randomly selected block in each set as
    a replacement policy.

148
Introduction to High Performance Computer
Architecture
  • Cache Memory 68040
  • Main Memory Address Format

149
Introduction to High Performance Computer
Architecture
  • Cache Memory Pentium III
  • Has two cache Levels
  • Level1
  • Has dedicated caches for instructions and data.
  • Each cache is 16K bytes.
  • Data cache is 4-way set associative organization.
  • Instruction cache is 2-way set associative
    organization.
  • Both write back and write through policies can be
    adopted.

150
Introduction to High Performance Computer
Architecture
  • Cache Memory Pentium III
  • Level2
  • It is a unified cache, either external to the
    processor chip (Pentium III Katmai) or internal
    to the processor chip (Pentium III Coppermine).
  • If internal, it is of size 256 Kbytes SRAM, 4-way
    set associative organization.
  • If external, it is of size 512 Kbytes, 8-way set
    associative organization.
  • Either write back or write through policy can be
    employed.

151
Introduction to High Performance Computer
Architecture
  • Cache Memory
  • How to make cache faster?
  • Make the cache faster Better technology,
  • Make the cache larger,
  • Sub block cache blocks A portion of a block is
    the granule of information transferred between
    the main memory and cache,
  • Use a write buffer Care should be taken for
    write-read order.

152
Introduction to High Performance Computer
Architecture
  • Cache Memory
  • How to make cache faster?
  • Early restart Allow the CPU to continue as
    soon as the requested data is in cache (read
    through),
  • Out-of-order fetch Attempt to fetch the
    requested information first should be used in
    conjunction with read through.
  • Multi-level cache memory organization.
Write a Comment
User Comments (0)
About PowerShow.com