Review CPSC 321 - PowerPoint PPT Presentation

About This Presentation
Title:

Review CPSC 321

Description:

Review CPSC 321 Andreas Klappenecker Announcements Tuesday, November 30, midterm exam Cache Placement strategies direct mapped fully associative set-associative ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 37
Provided by: facultyC70
Category:
Tags: cpsc | offset | review

less

Transcript and Presenter's Notes

Title: Review CPSC 321


1
ReviewCPSC 321
  • Andreas Klappenecker

2
Announcements
  • Tuesday, November 30, midterm exam

3
Cache
  • Placement strategies
  • direct mapped
  • fully associative
  • set-associative
  • Replacement strategies
  • random
  • FIFO
  • LRU

4
Direct Mapped Cache
  • Mapping address modulo the number of blocks in
    the cache, x -gt x mod B

5
Set Associative Caches
  • Each block maps to a unique set,
  • the block can be placed into any element of that
    set,
  • Position is given by
  • (Block number) modulo ( of sets in cache)
  • If the sets contain n elements, then the cache is
    called n-way set associative

6
Direct Mapped Cache
The index is determined by address mod 1024
  • Cache with 1024210 words
  • tag from cache is compared against upper portion
    of the address
  • If tagupper 20 bits and valid bit is set, then
    we have a cache hit otherwise it is a cache
    missWhat kind of locality are we
    taking advantage of?

Byte offset
7
Direct Mapped Cache
  • Taking advantage of spatial locality

Block offset
8
Address Determination
  • reconstruction of the memory address
  • tag bits set index bits block offset
    byte offset
  • Example
  • 32 bit words, cache capacity 212 4096 words,
    blocks of 8 words, direct mapped
  • byte offset 2 bits, block offset 3 bits, set
    index bits 9 bits, tag bits 18 bits

9
(No Transcript)
10
Example
  • Suppose you want to realize a cache with a
    capacity for 8 KB of data (32 bits of address
    size). Assume that the blocksize is 4 words and a
    word consists of 4 bytes.
  • How many bits are needed to realize a direct
    mapped cache?
  • 8 KByte 2K words 512 blocks 29 blocks
  • direct mapped gt index bits log(29)9.
  • 29 x (128 (32 9 2 2) 1) 29 x 148
    bits
    number of blocks x (bits per block tag valid
    bit)
  • How many bits are needed to realize a 8-way set
    associative cache?
  • Number of tag bits increase by 3. Why?

11
Typical Questions
  • Show the evolution of a cache
  • Determine the number of bits needed in an
    implementation of a cache
  • Know the placement and replacement strategies
  • Be able to design a cache according to
    specifications
  • Determine the number of cache misses
  • Measure cache performance

12
Typical Questions
  • What kind of placement is typically used in
    virtual memory systems?
  • What is a translation lookaside buffer?
  • Why is a TLB used?

13
Pages virtual memory blocks
  • Page faults if data is not in memory, retrieve
    it from disk
  • huge miss penalty, thus pages should be fairly
    large (e.g., 4KB)
  • reducing page faults is important (LRU is worth
    the price)
  • can handle the faults in software instead of
    hardware
  • using write-through takes too long so we use
    writeback
  • Example page size 2124KB 218 physical pages
  • main memory lt 1GB virtual memory lt 4GB

14
Page Faults
  • Incredible high penalty for a page fault
  • Reduce number of page faults by optimizing page
    placement
  • Use fully associative placement
  • full search of pages is impractical
  • pages are located by a full table that indexes
    the memory, called the page table
  • the page table resides within the memory

15
Page Tables
The page table maps each page to either a page in
main memory or to a page stored on disk
16
Page Tables

17
Making Memory Access Fast
  • Page tables slow us down
  • Memory access will take at least twice as long
  • access page table in memory
  • access page
  • What can we do?

Memory access is local gt use a cache that keeps
track of recently used address translations,
called translation lookaside buffer
18
Making Address Translation Fast
  • A cache for address translations translation
    lookaside buffer

19
MIPS Processor and Variations
20
Datapath for MIPS instructions
Note the seven control signals!
21
Single Cycle Datapath
22
Pipelined Version
23
Obstacles to Pipelining
  • Structural Hazards
  • hardware cannot support the combination of
    instructions in the same clock cycle
  • Control Hazards
  • need to make decision based on results of one
    instruction while other is still executing
  • Data Hazards
  • instruction depends on results of instruction
    still in pipeline

24
  • Control Hazards Resolution (for branch)
  • Stall pipeline
  • predict result
  • delayed branch

25
Stall on Branch
  • Assume that all branch computations are done in
    stage 2
  • Delay by one cycle to wait for the result

26
Branch Prediction
  • Predict branch result
  • For example, predict always that branch
  • is not taken
  • (e.g. reasonable for while instructions)
  • if choice is correct, then pipeline runs at full
    speed
  • if choice is incorrect, then pipeline stalls

27
Branch Prediction
28
Delayed Branch
29
Data Hazards
  • A data hazard results if an instruction depends
    on the result of a previous instruction
  • add s0, t0, t1
  • sub t2, s0, t3 // s0 to be determined
  • These dependencies happen often, so it is not
    possible to avoid them completely
  • Use forwarding to get missing data from internal
    resources once available

30
Forwarding
  • add s0, t0, t1
  • sub t2, s0, t3

31
(No Transcript)
32
(No Transcript)
33
Typical Questions
  • Given a brief specification of the processor and
    a sequences of instructions, determine all
    pipeline hazards.
  • Most typical question fill in some steps in a
    timing diagram (almost every exam has such a
    question, google).

34
Example
  • add 1, 2, 3 _ _ _ _ _
  • add 4, 5, 6 _ _ _ _ _
  • add 7, 8, 9 _ _ _ _ _
  • add 10, 11, 12 _ _ _ _ _
  • add 13, 14, 1 _ _ _ _ _ (data arrives
    early OK)
  • add 15, 16, 7 _ _ _ _ _ (data
    arrives on time OK)
  • add 17, 18, 13 _ _ _ _ _ (uh, oh)
  • add 19, 20, 17 _ _ _ _ _ (uh, oh)

35
Verilog
36
Mixed Questions
Write a Comment
User Comments (0)
About PowerShow.com