Title: The Five-Minute Rule 20 Years Later (And How Flash Memory Changes The Rules)?
1The Five-Minute Rule20 Years Later(And How
Flash Memory Changes The Rules)?
Goetz Graefe Presented By Abhinav Parate
2Storage Hierarchy
FLASH
3Comparing Flash with Disks
4When should we increase main memory?
- Metrics to decide-
- Cost of infrastructure
- Cost of maintenance
- Mean Time to Failure
- Performance improvement
- Simplest answer Increase RAM size if it is
insufficient to hold frequently accessed data
item - What time period is frequent?
5Cost of accessing a data item
- A disc provides N accesses per second and costs
D. - DA D/N Cost of disc access per second
- M Cost of 1 byte of main memory
- I Expected interval when the same data is
accessed again (in seconds)? - B Size of data in bytes
6Cost of accessing a data item
- Number of accesses per second for data item 1/I
- Cost if item is accessed from disc DA/I
- Cost if item is available in memory M B
- Keep data item in memory if main memory cost is
less than disc access cost - M B lt DA/ I
- I lt DA/ (M B)?
- For 1 KB data item, I lt 400s 5 minutes at 1987
costs
7The Five-Minute Rule
- In 1987, Keep a 1KB data item in main memory, if
it is accessed repeatedly in less than 5 minutes.
- In 1967, the frequent period was 0.5 s
- In 2007, the authors predicted 5 hour rule
- At actual 2007 prices, the period turned out to
be little under 6 hours.
8Sample Case
- A database consists of 500,000 records of 1000
bytes each. - Peak load consists of 600 transactions per sec.
- Only 6 of data gets 96 accesses and gets
accessed in lt5min. - 6 data resides in main memory.
- Remaining data gets accessed via two hard disks
to support 1 second access time. - The design saved 3.5m at 1987 costs when
compared with entirely main-memory design
9Back to Present
- Technology changed
- Multiple cores
- Virtualization
- Size of data increased tremendously
- Gap between RAM and disks performance increased
- FLASH memory comes into the picture!
10Flash memory characteristics
- Purchase cost
- Access Latency
- Bandwidth
- Density
- Power consumption
- Cooling costs
- Everything lies in between RAM and rotating hard
disks!
11Comparison Flash and Disks
12Desirability of Flash Memory
- Disk I/O is increasingly becoming bottleneck as
the number of CPU instructions possible in a disk
I/O is steadily increasing - A faster intermediate memory in storage hierarchy
is highly desirable
13Limitation of Flash Memory
- Write-bandwidth is lower than read-bandwidth.
- Re-writing a block requires erasing of entire
block. - Reliability 100,000-1M erase and write cycles
- Requires wear levelling mechanism
- Requires agent to erase blocks as soon as they
are written to hard disk.
14The presentation ahead ...
- Key challenges in using flash memory
- Addressing challenges
- Lots of open questions
- Implications in greening the computing
infrastructure.
151 Which hardware interface to use?
- Use DIMM?
- Use Serial-ATA?
- Use new hardware interface?
- Defining and developing new hardware interface is
time-consuming exercise - Use one of the existing interfaces
162 Use as Buffer or Persistent Storage?
- Database systems are concerned with providing
consistency. - Databases have large number of small updates and
must maintain recovery logs. - Write logs to persistent storage quickly.
- Use Flash as Persistent Storage!
172 Use as Buffer or Persistent Storage?
- File-systems manipulates the file contents in
memory and write file to disk in its entirety - Consistency is achieved via careful write
ordering, quick write-back and expensive
file-system checks. - Page movement between flash and disks is
expensive if flash is considered as persistent
storage. - Use Flash memory as buffer pool!.
183 How to track Frequent Pages?
- The estimation and administration of frequent
pages in current system is done through LRU - Maintain two LRU chains in RAM
19Least Recently Used Chain
- LRU for RAM
- LRU for flash memory
T(N)?
T(N-1)?
T(1)?
204 How to decide size of RAM and Flash?
215 How to move pages among layers in hierarchy?
- RAM and flash
- DMA Transfer
- Flash and Disk
- DMA (hardware)?
- Transfer buffer in RAM (software)?
226 How to track Page Locations?
- File systems
- Maintain pointer pages
- Pointer points to data page or run of contiguous
data pages - Individual page movement may require breaking up
run and updating pointer pages
236 How to track Page Locations?
- Database systems
- Use B-Tree indexes
- Other kinds of indexes have been implemented on
B-Trees efficiently - Page movement requires updating pointers in
parent node and neighbors
24Benefits to Database Systems
- Check Point Processing
- provides consistency in databases
- writes dirty pages to persistent storage
- persistent flash storage is faster
- need to write to disk only if page-replacement
policy requires - Recovery Logs
- quick writes
25Benefits to Database Systems
- Query Processing
- Index based selection is faster
- Need to consider index based query plans
- Index joins and intersections
- Example
- Table Scan 100M rows 100s
- Index fetches 10K rows in 100s
- Table Scan is efficient if result has more than
10K rows. - Flash index scan fetches 500K rows!
26Problem of Optimal B-tree Page Size
- Two different optimal page sizes
27Implications for Green Computing
- This work's focus is infrastructure cost.
- Energy optimization may lead to different optimal
page sizes for B-trees. - Infrastructure cost optimization can lead to
significant reduction in RAM size and hence,
lower energy consumption. - Introduces large flash memory in the system.
28Implications for Green Computing
- P_flash be power consumption with flash memory
- P_noflash be power consumption without flash
- Let T_flash,T_noflash denote system throughput
with/without flash - System is green if
- P_flash / P_noflash lt 1
- T_flash / T_noflash gt 1
29Implications for Green Computing
- What if P_flash / P_noflash gt 1?
- In this case, system is green if
- T_flash / T_noflash gt P_flash / P_noflash
- Gain in throughput is higher than extra power
spent
30Some calculations
- Assume linear relation between number of
frequently accessed pages and the frequent period - If M is RAM used in no-flash system
- M/15 is RAM in flash-based system
- 4M is flash memory
- P_flash M/15 x pram 4M x pflash
- P_noflash M x pram
- P_flash lt P_noflash if pflashlt 14/60 pram
- The relationship holds true.
31Conclusions
- Desirable to have faster intermediate memory in
storage hierarchy. - Database systems are likely to benefit a lot.
- Things are not clear about file-systems.
- Flash can improve system throughput and reduce
power consumption. - Reduction in RAM usage can lead to significant
power savings.
32Thank You!