Title: Cache Conscious Indexing for DecisionSupport in Main Memory
1Cache Conscious Indexing for Decision-Support in
Main Memory
2Why In-memory databases
- Telecommunications
- CAD tools
- Moores law will allow us to store relations in
memory
3Redesigning DBMSs
- Optimize memory-cpu performance vs disk-memory
performance - Re-evaluate space/time tradeoff space isnt
cheap - Given certain space requirement, need to optimize
response time for lookups
4Indices in In-Memory DBMSs
- Little extra space vs. Increased performance
- Index design takes on new dimensions when looking
at in-memory databases - Space overhead can not be ignored hash tables
are unacceptable
5Hardware solutions
- Caches
- Growing disparity between CPU performance and
memory performance. - Cache misses cant be overlapped
6Solution
- CSS-trees indices exploit cache behavior to get
improved performance
7Direct Mapped Cache
8Fully Associative Cache
92-Way Set Associative Cache
10Binary Search on Sorted Array
- Store the relation in sorted order on a key
- Cache performance dependent upon tuple size
11T-trees
12Enhanced B trees
13Hash Indices
000
001
010
011
Put however many pairs fit into a
cache line
100
101
110
111
14Idea Behind CSS-trees
- Save space by not storing pointers
- Use an array as a tree
- Implicitly store pointers as offsets into the
array
15Useful Formulas for CSS-trees
n of elements m of elements per node N
of nodes
Children of a node b are nodes b(m1) to b(m1)
(m1)
(EQ 1)
N n m
(EQ 2)
of Internal Nodes
(EQ 3)
First leaf node in bottom level
(EQ 4)
16How it works
17Building a full CSS-tree
18Searching Within a Node
19Level CSS-trees
m 2t
Entries per node m -1
20Level vs. Full CSS-trees
- Level CSS-trees will be deeper due to the
difference in branching factor - Level CSS-trees have fewer comparisons per node
- Level CSS-trees have more cache accesses and and
node traversals
21Time Analysis
22Space Analysis
23Experiment
- Results are for Ultra Sparc II
-
-
- Keys randomly generated integers between 0 and 1
million - Performed 5 tests of 100,00 searches for random
keys
24Figure 5a Array Size vs. time
25Figure 5b Array Size vs. Time
26Figure 6a Array Size vs. 2nd cache accesses
27Figure 6b Array Size vs. 2nd cache misses
28Figure 7 Node Size vs. Time
29CSS Performance on Other Queries
- CSS is very good for individual selection queries
- CSS will probably perform the best in range
queries - Index nested loops join vs. Sort merge join
30Doubts About CSS
- Flexibility of CSS-trees across different cache
designs - Any applicability to variable sized records
- Multiple CSS-tree indices on different keys
31Conclusion
- CSS-trees improve searching performance by
exploiting cache consciousness.
32One Last Thought
- Cache designs
- Should we redesign them to let programmers have
control?