Fractal Prefetching B Trees: Optimizing Both Cache and Disk Performance - PowerPoint PPT Presentation

1 / 52

About This Presentation

Title:

Fractal Prefetching B Trees: Optimizing Both Cache and Disk Performance

Description:

Binary search in a large node suffers excessive number of cache misses ... 2000 random insertions after bulkloading 3M keys 70% full ... – PowerPoint PPT presentation

Number of Views:48

Avg rating:3.0/5.0

Slides: 53

Provided by: toddc9

Category:

more less

Transcript and Presenter's Notes

Title: Fractal Prefetching B Trees: Optimizing Both Cache and Disk Performance

1
Fractal Prefetching B-Trees Optimizing Both
Cache and Disk Performance
Joint work with
2
B-Tree Operations Review

Search
binary search in every node on the path
Insertion/Deletion
search followed by data movement
Range Scan
locate a collection of tuples in a range
traverse the linked list of leaf nodes
different from search-like operations

3
Disk-optimized B-Trees

Traditional focus I/O performance
minimize of disk accesses
optimal tree nodes are disk pagestypically
4KB-64KB large

4
Cache-optimized B-Trees

Recent studies cache performance
e.g. Rao Ross, SIGMOD00, Bohannon,
McIlroy, Rastogi, SIGMOD01,Chen, Gibbons,
Mowry, SIGMOD01
cache line size is 32-128B
optimal tree nodes are only a few cache lines

5
Large Difference in Node Sizes
6
Cache-optimized B-Trees Poor I/O Performance

may fetch a distinct disk page for every node on
the path of a search
similar penalty for range scan

7
Disk-optimized B-Trees Poor Cache Performance

Binary search in a large node suffers excessive
number of cache misses(explained later in the
talk)

8
Optimizing for Both Cache and Disk Performance?
9
Our Approach

Fractal Prefetching B-Trees (fpB-Trees)
embedding cache optimized trees inside
disk-optimized trees

10
Outline

Overview
Optimizing Searches and Updates
Optimizing Range Scans
Experimental Results
Related Work
Conclusion

11
Page Structure of Disk-optimized B-Trees

We focus on fixed sized keys
(please see our full paper for a discussion on
variable sized keys)

An index entry is ltkey, page IDgt or ltkey, tuple
IDgt
12
Binary Search in a B-Tree Page

Suppose
an index entry array has 1023 index entries,
numbered 1-1023
8 index entries / cache line
the array occupies 128 cache lines
e.g. 8KB page, an entry is lt4B key, 4B page IDgt,
64B cache line, 8B header

1st cache line
2nd
3rd
4th
8th
9th
128th
13
Binary Search in a B-Tree Page
Active Range
1023
1
71
Search for entry 71
14
Fractal Prefetching B-Trees (fpB-Trees)

Embedding cache-optimized trees inside disk
pages
good search cache performance
binary search in cache-optimized nodes
much better locality
use cache prefetching
good search disk performance
nodes are embedded into disk pages

15
Node Size Mismatch Problem

Disk page size and cache-optimized node size
determined by hardware parameters and key sizes
Ideally cache-optimized trees fit nicely in disk
pages
But usually this is not true !

A 2-level tree overflows
A 2-level tree underflows. But adding one more
level overflows.
16
Two Solutions

Solution 1 use different sizes for in-page leaf
and nonleaf nodes
e.g. smaller root when overflow, larger root when
underflow

Solution 2 overflowing nodes become roots of new
pages
17
The Two Solutions from Another Point of View

Conceptually we apply disk and cache
optimizations in different orders
Solution 1 disk-first
first build the disk-optimized pages
then fit smaller trees into disk pages by
allowing different node sizes

Solution 2 cache-first
first build the cache-optimized trees
then group nodes together and place them into
disk pages

18
Insertion and Deletion Cache Performance

In disk-optimized B-Trees, data movement is very
expensive
the huge array structure in disk pages
on average, we need to move half the array
In our fpB-Trees, the cost of data movement is
much smaller
small cache-optimized nodes
We show that fpB-Trees have much better
insertion/deletion performance over
disk-optimized B-Trees with fixed sized keys

19
Outline

Overview
Optimizing Searches and Updates
Optimizing Range Scans
Experimental Results
Related Work
Conclusion

20
Jump-pointer Array Prefetching for Range Scan

Recall that range scans essentially traverse the
linked list of leaf nodes

Previous proposal for range scan cache
performance (SIGMOD01)
build data structures to hold leaf node addresses
prefetch leaf nodes during range scans

Internal Jump Pointer Array
21
New Proposal I/O Prefetching
linking leaf parentpages together

Employ jump-pointer array prefetching in I/O
jump-pointer arrays contain leaf page IDs
prefetching leaf pages to improve range scan I/O
performance
Very useful when leaf pages are not sequential on
disk
non-clustered index under frequent updates
(when sequential prefetching is not applicable)

22
Both Cache and I/O Prefetching in fpB-Trees

Two jump-pointer arrays in fpB-Trees
One for range scan cache performance
containing leaf node addresses for cache
prefetching
One for range scan disk performance
containing leaf page IDs for I/O prefetching

23
More Details in Our Paper

Computation for optimal node sizes
Data structures
Algorithms
Bulkload
Search
Insertion
Deletion
Range scan

24
Outline

Overview
Optimizing Searches and Updates
Optimizing Range Scans
Experimental Results
Related Work
Conclusion

25
Implementation

We implemented a buffer manager and three index
structures on top of the buffer manager

26
Experiments and Methodology

Experiments
Search (1) Cache performance (2) Disk
performanceimproving cache performance while
preserving good disk performance
Update (3) Cache performancesolving data
movement problem
Range Scan (4) Cache performance (5) Disk
performancejump-pointer array prefetching

Methodology
cache performance detailed cycle-by-cycle
simulations
memory system parameters in near future
better prefetching support
range scan I/O performance execution times on
real machines
search I/O performance counting the number of
I/Os
I/O operations in search do not overlap

27
Search Cache Performance
2000 random searches after bulkload 100 full
except root16KB pages

fpB-Trees perform significantly better than
disk-optimized B-Trees
achieving speedups 1.09-1.77 at all sizes over
1.25 when trees contain at least 1M entries
The performances of two fpB-Trees are similar

28
Search I/O Performance
2000 random searches after bulkloading 10M index
entries 100 full except root

Disk-first fpB-Trees access lt 3 more pages
Very small I/O performance impact
Cache-first fpB-Trees may access up to 25 more
pages in our results

29
Insertion Cache Performance
2000 random insertions after bulkloading 3M keys
70 full

fpB-Trees are significantly faster than
disk-optimized B-Trees
achieving up to 35-fold speedups over
disk-optimized B-Trees
Data movement costs dominate disk-optimized
B-Tree performance

30
Range Scan Cache Performance
100 scans starting at random locations in index
bulkloaded with 3M keys 100 full each range
contains 1M keys 16KB pages

Disk-first and cache-first fpB-Trees achieve
speedups of 4.2 and 3.5 over disk-optimized
B-Trees
Jump-pointer array cache prefetching is effective

31
Range Scan I/O Performance
8-processor machine (RS/6000 line), 2GB memory,
80 SSA disksmature index on a 12.8GB table

IBM DB2 Universal Database
Jump-pointer array I/O prefetching achieves
speedups of 2.5-5.0 for disk-optimized B-Trees

32
Other Experiments

We find similar benefits in deletion cache
performance
Up to 20-fold speedups
We performed many cache performance experiments
and got similar results for
Varying tree sizes, bulkload factors, and page
sizes
Mature trees
Varying key sizes 20B keys
We performed range scan I/O experiments in our
own index implementations and saw up to 6.9 fold
speedups

33
Related Work

Micro-indexing (discussed briefly by Lomet,
SIGMOD Record, Sep. 2001)

Micro-index

We are the first to quantitatively analyze
performance for micro-indexing
improves search cache performance
but suffers from the data movement problem in
update because of the continuous array structure
fpB-Trees have much better update performance

34
Fractal prefetching B-Trees Conclusion

Search combine cache-optimized and
disk-optimized node sizes
better cache performance
1.1-1.8 speedup over disk-optimized B-Trees
good disk performance for disk-first fpB-Trees
disk-first fpB-Trees visit lt 3 more disk pages
we only recommend cache-first fpB-Trees with
very large memory
Update solve data movement problem by using
smaller nodes
better cache performance
up to a 20-fold speedup over disk-optimized
B-Trees
Range Scan employ jump-pointer array
prefetching
better cache performance
better disk performance
2.5-5.0 speedup on IBM DB2

35
Back Up Slides
36
Previous Work Prefetching B-Trees

(SIGMOD 2001)
Study B-Trees in main memory environment
For search prefetching wider tree nodes
increase node size to multiple cache lines wide
use prefetching to read all cache lines of a node
in parallel

Prefetching B-Tree with four-line nodes
B-Tree with one-line nodes
37
Prefetching Btrees (contd)

For range scan jump-pointer array prefetching
build jump-pointer arrays to hold leaf node
addresses
prefetch leaf nodes with jump-pointer array
two implementations

External Jump Pointer Array
Internal Jump Pointer Array
38
Optimization in Disk-first Approach

Two conflicting goals 1) optimize search cache
performance2) maximize page fan-out to preserve
good I/O performance
Optimal Criteriamaximize page fan-out while
maintaining analytical search cost to be within
10 of the optimal
Details in the paper

39
Cache-first fpB-Trees Structure

Group sibling leaf nodes into the same pages for
range scan

Group parent and its children into the same page
for search

Leaf parent nodes may be put into overflow pages

40
Simulation Parameters
Models all the gory details, including memory
system contention
41
Optimal Node Sizes Computation (key4B)

Optimal criteria maximize page fan-out while
maintaining analytical search cost to be within
10 of the optimal
We used these optimal values in our experiments

42
Search Cache Performance
2000 random searches after bulkload 100 full
except root16KB pages

Cache-sensitive schemes (fpB-Trees and
micro-indexing) all perform significantly better
than disk-optimized B-Trees
The performances of cache-sensitive schemes are
similar

43
Search Cache Performance (Varying Page Sizes)
4KB pages
8KB pages
32KB pages
execution time (M cycles)

Same experiments but with different page sizes
We see the same trends cache-sensitive schemes
are better
They achieve speedups 1.09-1.77 at all sizes
1.25-1.77 when trees contain at least 1M entries

44
Optimal Width Selection
(16KB pages, 4B keys)
Disk-first fpB-Trees
Cache-first fpB-Trees

Our selected trees perform within 2 and 5 of
the best for disk-first and cache-first fpB-Trees

45
Search I/O Performance
(2000 random searches, 4B keys)

Disk-first fpB-Trees access lt 3 more pages
Very small I/O performance impact
Cache-first fpB-Trees may access up to 25 more
pages in our results

46
Insertion Cache Performance
2000 random insertions after bulkloading 3M keys
70 full
47
Insertion Cache Performance II
2000 random insertions after bulkloading 3M keys
16KB pages

fpB-Trees are significantly faster than both
disk-optimized B-Trees and Micro-indexing
fpB-Trees achieve up to 35-fold speedups over
disk-optimized B-Trees across all page sizes

48
Insertion Cache Performance II
2000 random insertions after bulkloading 3M keys
16KB pages

Two major costs data movement, page split
Micro-indexing still suffers from data movement
costs
fpB-Trees avoid this problem with smaller nodes

49
Space Utilization
(4B keys)
After Bulkload 100 full
Mature Trees

Disk-first fpB-Trees incur lt 9 space overhead
Cache-first fpB-Trees may use up to 36 more
pages in our results

50
Range Scan Cache Performance
100 scans starting at random locations on index
bulkloaded with 3M keys Each range spans 1M
keys 16KB pages

Disk-first and cache-first fpB-Trees achieve
speedups of 3.5-4.2 and 3.0-3.5 over
disk-optimized B-Trees

51
Range Scan I/O Performance
(10M entries in the range)
(10 disks)

Setup SGI Origin 200 with four 180MHz R10000
processors, 128MB memory, 12 SCSI disks (10 of
them used in experiments) Range scan on mature
trees
Jump-pointer array prefetching achieves up to a
speedup of 6.9

52
Jump-pointer Array Prefetching on IBM DB2

Setup 8-processor machine (RS/6000 line), 2GB
memory, 80 SSA disks mature index on a 12.8GB
table SELECT COUNT() FROM data
Jump-pointer array prefetching achieves speedups
of 2.5-5.0

Write a Comment

User Comments (0)