The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms - PowerPoint PPT Presentation

About This Presentation

Title:

The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms

Description:

The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms (ACM SIGMETRIC 05 ) ACM International Conference on Measurement & Modeling of ... – PowerPoint PPT presentation

Number of Views:108

Avg rating:3.0/5.0

Slides: 26

Provided by: Super154

Category:

more less

Transcript and Presenter's Notes

Title: The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms

1
The Performance Impact of Kernel Prefetching on
Buffer Cache Replacement Algorithms

(ACM SIGMETRIC 05 ) ACM International Conference
on Measurement Modeling of Computer Systems
Ali R. Butt, Chris Gniady, Y. Charlie Hu
Purdue University

Presented by Hsu Hao Chen
2
Outline

Introduction
Motivation
Replacement Algorithm
OPT
LRU
LRU-2
2Q
LIRS
LRFU
MQ
ARC
Performance Evaluation
Conclusion

3
Introduction

Improving file system performance
Design effective block replacement algorithms for
the buffer cache
Almost all buffer cache replacement algorithms
have been proposed and studied comparatively
without taking into account file system
prefetching which exists in all modern operating
systems
Cache hit ratio is used as sole performance
metric
The actual number of disk I/O requests?
The actual running time of applications?

4
Introduction (Cont.)

Kernel Prefetching in Linux
Beneficial for sequential accesses

Various kernel components on the path from file
system operation to the disk
5
Motivation

The goal of buffer replacement algorithm
Minimize the number of disk I/O
Reduce the running time of the applications
Example
Without prefetching,
Belady results in 16 misses
LRU results in 23 misses

With prefetching, Beladys is not optimal!
6
Replacement Algorithm

OPT
Evicts the block that will be referenced farthest
in the future
Often used for comparative studies
Prefetched blocks are assumed to be accessed most
recently, OPT can immediately determine wrong or
right prefetches

7
Replacement Algorithm

LRU
Replaces the page that has not been accessed for
the longest time
Prefetched blocks are inserted in the MRU just
like regular blocks

8
Replacement Algorithm

LRU pathological case
the working set size is larger than the cache
The application has a looping access pattern
In this case, LRU will replace all blocks before
they are used again

9
Replacement Algorithm

LRU-2
Try to avoid the pathological cases of LRU
LRU-K replaces a block based on the
Kth-to-the-last reference
Authors recommended K2
LRU-2 can quickly remove cold blocks from the
cache
Each block access requires log(N) operations to
manipulate a priority queue
N is the number of blocks in the cache

10
Replacement Algorithm

2Q
Proposed
Achieve similar page replacement performance to
LRU-2
Low overehad way (constant LRU)
All missed blocks in A1in queue
Address of replaced blocks in A1out queue
Re-referenced blocks in Am queue
Prefetched blocks are treated as on-demand blocks
and if prefetched block is evicted from A1in
queue before on-demand access, it is simply
discarded

11
Replacement Algorithm

12
Replacement Algorithm

LIRS (Low Inter-reference Recency Set)
LIR block if accessed again since inserted on
the LRU stack
HIR block referenced less frequently
Insert prefetched blocks into the cache that
maintains HIR blocks

13
Replacement Algorithm

LRFU (Least Recently/Frequently Used)
Replaces the block with the smallest C(x) value
Prefetched blocks are treated as the most
recently accessed
Problem how to assign the initial weight (c(x))
Solution a prefetched flag is set
When the block is accessed on-demand
Initial value

every block x,at every time t,? a tunable
parameter Initially,assign a value C(x)0
14
Replacement Algorithm

MQ (Multi-Queue)
Use m LRU queues (typically m8)
Q0,Q1,.Qm-1,where Qi contains blocks that have
been at least 2i times but no more than 2i1-1
times recently
Not increments the reference counter when a block
is prefetched

15
Replacement Algorithm

MQ (Multi-Queue)

16
Replacement Algorithm

ARC (Adaptive Replacement Cache)
Maintains two LRU lists
Pages that have been referenced only once (L1)
Pages that have been referenced at least twice
(L2)
Each list has same length c as cache
Cache contains tops of both lists T1 and T2

L-1
L-2
T1 T2 c
T1
T2
17
Replacement Algorithm

ARC attempts to maintain a Buffer size B_T1 for
list T1
When cache is full, ARC replacement
if T1 gt B_T1
LRU page from T1
otherwise
LRU page from T2
if prefetched block is already in the ghost
queue, it is not moved to the second queue, but
to the first queue

18
Performance Evaluation

Simulation Environment
implement a buffer cache simulator
functionally (prefetching, I/O clustering) Linux
With DiskSim, they simulate the I/O time of
applications

Application
Sequential access
Random access
Multi1 workload in a code development
environment Multi2 workload in a graphic
development and simulation Multi2 workload in a
database and a web index server
19
Performance Evaluation (Cont.)
cscope (sequential)
Hit ratio
of clustered disk requests
Execution time
20
Performance Evaluation (Cont.)
cscope (sequential)
Hit ratio
of clustered disk requests
Execution time
21
Performance Evaluation (Cont.)
glimpse (sequential)
Hit ratio
of clustered disk requests
Execution time
22
Performance Evaluation (Cont.)
tph-h (random)
Hit ratio
of clustered disk requests
Execution time
23
Performance Evaluation (Cont.)
tph-r (random)
Hit ratio
of clustered disk requests
Execution time
24
Performance Evaluation (Cont.)

Concurrent applications
Multi1 hit ratios and disk requests with or
without prefetching exhibit similar behavior as
cscope
Multi2 behavior is similar to multi1, but
prefetching does not improve the execution time
(CPU-bound viewperf)
Multi3 behavior is similar to tpc-h
Synchronous vs. asynchronous prefetching

With prefetching, number of requests is at least
30 lower than without prefetching except OPT,
especially when asynchronous prefetching is used
Number and size of disk I/O (cscope at 128MB
cache size)
25
Conclusion

Kernel prefetching performance can have
significant impact
different replacement algorithms
Application file access patterns importance for
prefetching disk data
Sequential access
Random access
With prefetching or without prefetching, hit
ratio is not sole performance metric

Write a Comment

User Comments (0)