The Buffer Tree: A Technique for Designing Batched External Data Structures - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

The Buffer Tree: A Technique for Designing Batched External Data Structures

Description:

Only the buffers on the path from the root to the left-most leaf node are emptied. ... The result is reported in batched way, when the buffer is emptied. ... – PowerPoint PPT presentation

Number of Views:347
Avg rating:3.0/5.0
Slides: 24
Provided by: Tian74
Category:

less

Transcript and Presenter's Notes

Title: The Buffer Tree: A Technique for Designing Batched External Data Structures


1
The Buffer Tree A Technique for Designing
Batched External Data Structures
  • Presenter FANG, Tian

2
Agenda
  • Quick review of B-tree
  • Motivation and Ideas
  • Overview of Buffer Tree
  • Detail of Buffer Tree
  • Application
  • Conclusion

3
Symbols
  • N number of elements in the problem instance
  • M number of elements that fit into main memory
  • B number of elements per block
  • nN/B
  • mM/B

4
Quick review of B-tree
  • Construction in O(Sort(N))
  • All elements are known before-hand.
  • O(N/B) space
  • O(logBNT/B) query
  • O(logBN) update

5
Motivation
  • A gap between the performance of sorting using
    current B-tree and the optimal.
  • O(NlogBN) vs. O(Sort(N))
  • Take the advantage of big memory
  • Target on batched scenario
  • The result need not to be reported at once.

6
Ideas
  • Laziness
  • The operations are batched, so we dont need to
    report the results immediately.
  • We have large memory, so we can postpone the disk
    writing as late as we can.
  • We get chance to carry out some operations in
    memory before writing to disk.

7
Overview of Buffer Tree
  • Based on (a,b)-tree, a m/4, b m
  • A O(m) sized buffer is attached to each node
  • Height O(logmn) Space O(n)

8
Overview of Buffer Tree
  • How it works?
  • The commands are carried out lazily
  • What are inserted into the buffer?
  • How to empty the buffer

9
What is inserted to buffer tree?
  • Insert commands rather than value
  • (Command, Value, Time)
  • E.g. (insert, 10, 1022), (query, 2, 1323),
    (rangesearch, (10, 29), 0923)

10
How to empty the buffer
  • Internal nodes
  • Leaf nodes may cause rebalance

11
Empty the Buffer of Internal Node v
  • Sort the elements in the buffer
  • Merge the sorted elements in the buffer with the
    additional sorted elements from the parent.
  • Remove matched insert and delete pair
  • Distribute the remained elements to the children
  • Empty the buffers of the children if needed.

12
Empty the Buffer of Leaf Node v
  • Sort and remove matched insert/delete pair in the
    buffer
  • Merge the result with the elements of the leaves.
  • Place the resulting leaves as the leaves of v in
    sorted order
  • If the number of leaves lt k, add dummy block
    until the number of leaves k
  • Repeatedly insert the additional leaves once and
    rebalance.
  • Repeatedly delete dummy block one by one.

13
Rebalancing algorithm
  • For inserting, similar to (a, b)-tree
  • For deleting
  • Perform one or two buffer-emptying processes on
    the siblings

14
Analysis of the I/Os
  • The cost consists of two parts
  • The cost for emptying buffers
  • Lets guess O(nlogmn) I/Os?
  • The cost for rebalancing
  • By the properties of (a, b) tree, the of
    rebalancing is bounded by O(n/m)
  • Each rebalancing takes O(m) I/Os
  • Total O(n) I/Os
  • Total O(nlogmn) I/Os

15
Analysis of I/Os (2)
  • Why O(nlogmn)?
  • Intuitively
  • Being emptied O(logmn) times per block
  • We get n blocks. Totally, O(nlogmn) times
  • Empty the buffer in linear I/Os?

16
Analysis (3)
  • Empty the buffer with x (x gt m) in O(x) I/Os
  • Utilize that elements are distributed in sorted
    order
  • Sort at most m unsorted elements
  • Merge with the additional sorted elements

Sorted
At most m unsorted
17
Summary of Buffer Tree
  • Linear space
  • O(nlogmn) I/Os for an arbitrary sequence of N
    intermixed insert and delete operation on an
    empty buffer tree.
  • Amortized cost for insertion and deletion is
    O((logmn)/B)

18
Application
  • Sorting
  • External Priority Queue
  • Buffered Range Search (further reading)
  • Buffered Segment Tree (further reading)

19
Application Sorting
  • The sorted elements are stored in the leaf nodes
    by the natural of (a, b)-tree.
  • In buffer tree, some nodes may be in the buffer.
  • Empty all the buffer in O(n) I/Os after N updates.

20
Application External Priority Queue
  • The smallest element can be extracted once the
    elements are sorted.
  • The same problem as sorting some elements are
    in the buffer rather than leaves.
  • Only the buffers on the path from the root to the
    left-most leaf node are emptied.

21
Application External Priority Queue
  • Delete the (m/4)B smallest elements in the tree
  • We can answer the next (m/4)B delete min
    operations without performing any I/Os
  • Emptying buffers takes O(mlogmn) I/Os. The
    amortized cost for each DeleteMin is O((logmn)/B)

22
Application Buffered Range Tree
  • Use the buffer strategy
  • Insert a range-search element
  • The result is reported in batched way, when the
    buffer is emptied.
  • Split the range-search elements if needed.
  • Detail (further reading)

23
Conclusion
  • A technique for designing efficient batched
    external data structures.
  • Search tree
  • Sorting
  • External Priority Queue
  • Buffered Range Search
  • Buffered Segment Tree
Write a Comment
User Comments (0)
About PowerShow.com