Title: Fractal Prefetching B Trees: Optimizing Both Cache and Disk Performance
1Fractal Prefetching B-Trees Optimizing Both
Cache and Disk Performance
Joint work with
2B-Tree Operations Review
- Search
- binary search in every node on the path
- Insertion/Deletion
- search followed by data movement
- Range Scan
- locate a collection of tuples in a range
- traverse the linked list of leaf nodes
- different from search-like operations
3Disk-optimized B-Trees
- Traditional focus I/O performance
- minimize of disk accesses
- optimal tree nodes are disk pagestypically
4KB-64KB large
4Cache-optimized B-Trees
- Recent studies cache performance
- e.g. Rao Ross, SIGMOD00, Bohannon,
McIlroy, Rastogi, SIGMOD01,Chen, Gibbons,
Mowry, SIGMOD01 - cache line size is 32-128B
- optimal tree nodes are only a few cache lines
5Large Difference in Node Sizes
6Cache-optimized B-Trees Poor I/O Performance
- may fetch a distinct disk page for every node on
the path of a search - similar penalty for range scan
7Disk-optimized B-Trees Poor Cache Performance
- Binary search in a large node suffers excessive
number of cache misses(explained later in the
talk)
8Optimizing for Both Cache and Disk Performance?
9Our Approach
- Fractal Prefetching B-Trees (fpB-Trees)
- embedding cache optimized trees inside
disk-optimized trees
10Outline
- Overview
- Optimizing Searches and Updates
- Optimizing Range Scans
- Experimental Results
- Related Work
- Conclusion
11Page Structure of Disk-optimized B-Trees
- We focus on fixed sized keys
- (please see our full paper for a discussion on
variable sized keys)
An index entry is ltkey, page IDgt or ltkey, tuple
IDgt
12Binary Search in a B-Tree Page
- Suppose
- an index entry array has 1023 index entries,
numbered 1-1023 - 8 index entries / cache line
- the array occupies 128 cache lines
- e.g. 8KB page, an entry is lt4B key, 4B page IDgt,
64B cache line, 8B header
1st cache line
2nd
3rd
4th
8th
9th
128th
13Binary Search in a B-Tree Page
Active Range
1023
1
71
Search for entry 71
14Fractal Prefetching B-Trees (fpB-Trees)
- Embedding cache-optimized trees inside disk
pages - good search cache performance
- binary search in cache-optimized nodes
- much better locality
- use cache prefetching
- good search disk performance
- nodes are embedded into disk pages
15Node Size Mismatch Problem
- Disk page size and cache-optimized node size
- determined by hardware parameters and key sizes
- Ideally cache-optimized trees fit nicely in disk
pages - But usually this is not true !
A 2-level tree overflows
A 2-level tree underflows. But adding one more
level overflows.
16Two Solutions
- Solution 1 use different sizes for in-page leaf
and nonleaf nodes - e.g. smaller root when overflow, larger root when
underflow
Solution 2 overflowing nodes become roots of new
pages
17The Two Solutions from Another Point of View
- Conceptually we apply disk and cache
optimizations in different orders - Solution 1 disk-first
- first build the disk-optimized pages
- then fit smaller trees into disk pages by
allowing different node sizes
- Solution 2 cache-first
- first build the cache-optimized trees
- then group nodes together and place them into
disk pages
18Insertion and Deletion Cache Performance
- In disk-optimized B-Trees, data movement is very
expensive - the huge array structure in disk pages
- on average, we need to move half the array
- In our fpB-Trees, the cost of data movement is
much smaller - small cache-optimized nodes
- We show that fpB-Trees have much better
insertion/deletion performance over
disk-optimized B-Trees with fixed sized keys
19Outline
- Overview
- Optimizing Searches and Updates
- Optimizing Range Scans
- Experimental Results
- Related Work
- Conclusion
20Jump-pointer Array Prefetching for Range Scan
- Recall that range scans essentially traverse the
linked list of leaf nodes
- Previous proposal for range scan cache
performance (SIGMOD01) - build data structures to hold leaf node addresses
- prefetch leaf nodes during range scans
Internal Jump Pointer Array
21New Proposal I/O Prefetching
linking leaf parentpages together
- Employ jump-pointer array prefetching in I/O
- jump-pointer arrays contain leaf page IDs
- prefetching leaf pages to improve range scan I/O
performance - Very useful when leaf pages are not sequential on
disk - non-clustered index under frequent updates
- (when sequential prefetching is not applicable)
22Both Cache and I/O Prefetching in fpB-Trees
- Two jump-pointer arrays in fpB-Trees
- One for range scan cache performance
- containing leaf node addresses for cache
prefetching - One for range scan disk performance
- containing leaf page IDs for I/O prefetching
23More Details in Our Paper
- Computation for optimal node sizes
- Data structures
- Algorithms
- Bulkload
- Search
- Insertion
- Deletion
- Range scan
24Outline
- Overview
- Optimizing Searches and Updates
- Optimizing Range Scans
- Experimental Results
- Related Work
- Conclusion
25Implementation
- We implemented a buffer manager and three index
structures on top of the buffer manager
26Experiments and Methodology
- Experiments
- Search (1) Cache performance (2) Disk
performanceimproving cache performance while
preserving good disk performance - Update (3) Cache performancesolving data
movement problem - Range Scan (4) Cache performance (5) Disk
performancejump-pointer array prefetching
- Methodology
- cache performance detailed cycle-by-cycle
simulations - memory system parameters in near future
- better prefetching support
- range scan I/O performance execution times on
real machines - search I/O performance counting the number of
I/Os - I/O operations in search do not overlap
27Search Cache Performance
2000 random searches after bulkload 100 full
except root16KB pages
- fpB-Trees perform significantly better than
disk-optimized B-Trees - achieving speedups 1.09-1.77 at all sizes over
1.25 when trees contain at least 1M entries - The performances of two fpB-Trees are similar
28Search I/O Performance
2000 random searches after bulkloading 10M index
entries 100 full except root
- Disk-first fpB-Trees access lt 3 more pages
- Very small I/O performance impact
- Cache-first fpB-Trees may access up to 25 more
pages in our results
29Insertion Cache Performance
2000 random insertions after bulkloading 3M keys
70 full
- fpB-Trees are significantly faster than
disk-optimized B-Trees - achieving up to 35-fold speedups over
disk-optimized B-Trees - Data movement costs dominate disk-optimized
B-Tree performance
30Range Scan Cache Performance
100 scans starting at random locations in index
bulkloaded with 3M keys 100 full each range
contains 1M keys 16KB pages
- Disk-first and cache-first fpB-Trees achieve
speedups of 4.2 and 3.5 over disk-optimized
B-Trees - Jump-pointer array cache prefetching is effective
31Range Scan I/O Performance
8-processor machine (RS/6000 line), 2GB memory,
80 SSA disksmature index on a 12.8GB table
- IBM DB2 Universal Database
- Jump-pointer array I/O prefetching achieves
speedups of 2.5-5.0 for disk-optimized B-Trees
32Other Experiments
- We find similar benefits in deletion cache
performance - Up to 20-fold speedups
- We performed many cache performance experiments
and got similar results for - Varying tree sizes, bulkload factors, and page
sizes - Mature trees
- Varying key sizes 20B keys
- We performed range scan I/O experiments in our
own index implementations and saw up to 6.9 fold
speedups
33Related Work
- Micro-indexing (discussed briefly by Lomet,
SIGMOD Record, Sep. 2001)
Micro-index
- We are the first to quantitatively analyze
performance for micro-indexing - improves search cache performance
- but suffers from the data movement problem in
update because of the continuous array structure - fpB-Trees have much better update performance
34Fractal prefetching B-Trees Conclusion
- Search combine cache-optimized and
disk-optimized node sizes - better cache performance
- 1.1-1.8 speedup over disk-optimized B-Trees
- good disk performance for disk-first fpB-Trees
- disk-first fpB-Trees visit lt 3 more disk pages
- we only recommend cache-first fpB-Trees with
very large memory - Update solve data movement problem by using
smaller nodes - better cache performance
- up to a 20-fold speedup over disk-optimized
B-Trees - Range Scan employ jump-pointer array
prefetching - better cache performance
- better disk performance
- 2.5-5.0 speedup on IBM DB2
35Back Up Slides
36Previous Work Prefetching B-Trees
- (SIGMOD 2001)
- Study B-Trees in main memory environment
- For search prefetching wider tree nodes
- increase node size to multiple cache lines wide
- use prefetching to read all cache lines of a node
in parallel
Prefetching B-Tree with four-line nodes
B-Tree with one-line nodes
37Prefetching Btrees (contd)
- For range scan jump-pointer array prefetching
- build jump-pointer arrays to hold leaf node
addresses - prefetch leaf nodes with jump-pointer array
- two implementations
External Jump Pointer Array
Internal Jump Pointer Array
38Optimization in Disk-first Approach
- Two conflicting goals 1) optimize search cache
performance2) maximize page fan-out to preserve
good I/O performance - Optimal Criteriamaximize page fan-out while
maintaining analytical search cost to be within
10 of the optimal - Details in the paper
39Cache-first fpB-Trees Structure
- Group sibling leaf nodes into the same pages for
range scan
- Group parent and its children into the same page
for search
- Leaf parent nodes may be put into overflow pages
40Simulation Parameters
Models all the gory details, including memory
system contention
41Optimal Node Sizes Computation (key4B)
- Optimal criteria maximize page fan-out while
maintaining analytical search cost to be within
10 of the optimal - We used these optimal values in our experiments
42Search Cache Performance
2000 random searches after bulkload 100 full
except root16KB pages
- Cache-sensitive schemes (fpB-Trees and
micro-indexing) all perform significantly better
than disk-optimized B-Trees - The performances of cache-sensitive schemes are
similar
43Search Cache Performance (Varying Page Sizes)
4KB pages
8KB pages
32KB pages
execution time (M cycles)
- Same experiments but with different page sizes
- We see the same trends cache-sensitive schemes
are better - They achieve speedups 1.09-1.77 at all sizes
1.25-1.77 when trees contain at least 1M entries
44Optimal Width Selection
(16KB pages, 4B keys)
Disk-first fpB-Trees
Cache-first fpB-Trees
- Our selected trees perform within 2 and 5 of
the best for disk-first and cache-first fpB-Trees
45Search I/O Performance
(2000 random searches, 4B keys)
- Disk-first fpB-Trees access lt 3 more pages
- Very small I/O performance impact
- Cache-first fpB-Trees may access up to 25 more
pages in our results
46Insertion Cache Performance
2000 random insertions after bulkloading 3M keys
70 full
47Insertion Cache Performance II
2000 random insertions after bulkloading 3M keys
16KB pages
- fpB-Trees are significantly faster than both
disk-optimized B-Trees and Micro-indexing - fpB-Trees achieve up to 35-fold speedups over
disk-optimized B-Trees across all page sizes
48Insertion Cache Performance II
2000 random insertions after bulkloading 3M keys
16KB pages
- Two major costs data movement, page split
- Micro-indexing still suffers from data movement
costs - fpB-Trees avoid this problem with smaller nodes
49Space Utilization
(4B keys)
After Bulkload 100 full
Mature Trees
- Disk-first fpB-Trees incur lt 9 space overhead
- Cache-first fpB-Trees may use up to 36 more
pages in our results
50Range Scan Cache Performance
100 scans starting at random locations on index
bulkloaded with 3M keys Each range spans 1M
keys 16KB pages
- Disk-first and cache-first fpB-Trees achieve
speedups of 3.5-4.2 and 3.0-3.5 over
disk-optimized B-Trees
51Range Scan I/O Performance
(10M entries in the range)
(10 disks)
- Setup SGI Origin 200 with four 180MHz R10000
processors, 128MB memory, 12 SCSI disks (10 of
them used in experiments) Range scan on mature
trees - Jump-pointer array prefetching achieves up to a
speedup of 6.9
52Jump-pointer Array Prefetching on IBM DB2
- Setup 8-processor machine (RS/6000 line), 2GB
memory, 80 SSA disks mature index on a 12.8GB
table SELECT COUNT() FROM data - Jump-pointer array prefetching achieves speedups
of 2.5-5.0