Title: Temporal Databases
1Temporal Databases
S. Srinivasa Rao April 12, 2007 Part 1 based
on Ch23 of C.J. Date (slides by Prof. Ghafoor, EE
562) Part 2 based on slides by Prof. Arge,
I/O-algorithms
2Outline
- Part 1 Introduction to temporal databases
- Part 2 Temporal index Persistent B-tree and its
applications
3Introduction
- Temporal database a database that contains
historical data as well as current data. - Note historical is a misleading term
temporal databases may contain data regarding the
future as well as the past. - Extreme case data is only inserted, never
deleted from a temporal database (eg. vehicle
position data in the project). - So far, we have studied the other extreme - i.e.
snapshot databases. - Distinguishing feature the element of time.
4Introduction
- Temporal data encoded representation of
timestamped facts. - Each tuple must include at least one timestamp.
- ProblemWhat about queries that produce results
that are not temporal? i.e. result of query is
outside the domain of (temporal) database. - eg. Get names of all people who have supplied
something in the past. - Redefine temporal database database that
includes, but is not limited to, temporal data.
5Motivation
- Queries on time-varying data are difficult to
express in SQL. - Temporal databases provide build-in support for
recording and querying such information. - It is possible to use SQL to evaluate these
queries, but performance is poor.
6Motivation
- Most applications manage temporal data.
- If a temporal database is used for such data
- Schemas, including integrity constraints are
simpler. - Queries are simpler
- Application code is less complex
- easier to understand
- easier to produce
- easier to maintain
7Applications
- Most applications of database technology are
temporal in nature - Financial apps. portfolio management, accounting
banking, stock market analysis, audit analysis - Record-keeping apps. personnel, medical records,
inventory management, legal records (commercial
laws change frequently) - Data Warehousing historical trends for analysis
- Scheduling apps. airline, car, hotel
reservations and project management - Scientific apps. weather monitoring, chemical
process monitoring
8Intervals
- An interval s,e is a set of times from time s
to time e. - Does interval s,e represent an infinite set?
- Assumption Timeline is a finite sequence of
discrete, indivisible time quanta. - Time Quanta smallest unit of time system can
represent. - Timepoints/point time unit considered
indivisible for our purpose. - An interval is treated as a single type, not as
pair of separate values. - Interval can be open/closed w.r.t. start
point/end point. - eg. d04,d10,d04,d11),(d03,d10,(d03,d11)
- all represent the sequence of days from day4 to
day10 inclusive.
9Operators on Intervals
- Temporal predicate operators
- i1 s1,e1 i2 s2,e2
- i1 BEFORE i2
- (e1lts2)
- i1 MEETS i2
- (s2 e1)
- i1 EQUALS i2
- (s1 s2 AND e1 e2)
- i1 OVERLAPS i2
- (s2 lt s1 lt e2 OR s1 lt s2 lt e1)
i1
i2
i2
i1
i1
i2
i1
i2
10Operators on Intervals
- i1 DURING i2
- (s2 lt s1 AND e2 gt e1 )
- i1 STARTS i2
- (s1 s2 AND e1 lt e2)
- i1 FINISHES i2
- (e1 e2 AND s1 gt s2)
- Additional operators
- i1 MERGES i2 (i1 MEETS i2 OR i1 OVERLAPS i2)
- i1 CONTAINS i2 (i2 DURING i1)
i1
i2
i1
i2
i1
i2
11Scalar and Relational Operators
- DURATION(i) - returns the number of time points
in i - eg. DURATION (d03,d07) returns 5
- i1 UNION i2
- returns MIN(s1,s2),MAX(e1,e2)
- if (i1 MERGES i2)
- otherwise undefined
- i1 INTERSECT i2
- returns MAX(s1,s2),MIN(e1,e2)
- if (i1 OVERLAPS i2)
- otherwise undefined
12Aggregate Operators
- EXPAND(X)
- Where X is a set. The output is also a set.
- Used to generate time quantum intervals.
- The expanded form of X is the set of all
intervals of the form p,p where p is a time
point in some interval in X. - e.g.
- X1 d01,d01,d03,d05,d04,d06
- X2 d01,dp1,d03,d04,d05,d05,d05,d06
- X3 d01,d01,d03,d03,d04,d04,d05,d05,d0
6,d06 - Then EXPAND(X1) EXPAND(X2) X3
13Aggregate Operators
- COLLAPSE(X)
- The collapsed form of X is the set Y of
intervals of the same type such that - (a) X Y have the same unfolded form.
- (b) no two distinct members i1 and i2 of Y are
such that (i1 MERGES i2) is true. - e.g.
- X1 d01,d01,d03,d05,d04,d06
- X2 d01,d01,d03,d04,d05,d05,d05,d06
- X3 d01,d01,d03,d06
- Then COLLAPSE (X1) COLLAPSE (X2) X3
14Relation Operators InvolvingIntervals
- PACK r on A groups the relation r by all its
attributes apart from A - This is equivalent to
- WITH ( r GROUP A AS X ) AS R1
- ( EXTEND R1 ADD COLLAPSE (X) AS Y )
- ALL BUT X AS R2
- R2 UNGROUP Y
- UNPACK r on A
- Replace COLLAPSE with EXPAND in PACK.
15Example
Given two temporal relations S Supplier S was
under contract during the interval During SP
Supplier S was able to supply part P during the
interval During
SP
S
16Example 1
- Active supplier intervals Get S-DURING pairs
for suppliers who have been able to supply at
least one part during at least one interval of
time, where DURING designates such an interval. - PACK SP S,DURING ON DURING
SP
RESULT
17Example 2
- Inactive (passive) supplier intervals Get
S-DURING pairs for suppliers who have been
unable to supply any parts at all during at least
one interval of time, where DURING designates
such an interval. - PACK
- ( ( UNPACK S S,DURING ON DURING )
- MINUS
- ( UNPACK SP S,DURING ON DURING ) )
- ON DURING
- Shorthand U_MINUS
RESULT
18More Relational Operators
- USING ( AList ) ? r1 op r2 ? is a shorthand for
- PACK
- ( ( UNPACK r1 on (AList) ) op ( UNPACK r1 on
(AList) ) ) - ON (AList)
- Where op is either UNION, INTERSECT, MINUS or
JOIN - Various comparison operators on relations are
defined similarly. - USING ( AList ) ? r1 rel-op r2 ? is equivalent
to - ( ( UNPACK r1 on (AList) ) rel-op ( UNPACK r1 on
(AList) ) )
19 - Part 2
- Persistent B-trees
- and applications
20Persistent B-tree
- In some applications we are interested in being
able to access previous versions of data
structure - Databases
- Geometric data structures
- Partial persistence
- Update the current version (getting a new
version) - Query all versions
- We would like to have partial persistent B-tree
with - O(N/B) space N is number of updates performed
- update
- query in any version
21Persistent B-tree
- East way to make B-tree partial persistent
- Copy structure at each operation
- Maintain version-access structure (B-tree)
- Good query in any
version, but - O(N/B) I/O update
- O(N2/B) space
i
i2
i1
22Persistent B-tree
- Idea Elements augmented with existence
interval and stored in one structure - Persistent B-tree with parameter b
- Directed graph
- Nodes contain elements augmented with existence
interval - At any time t, nodes with elements alive at time
t form B-tree with leaf and branching parameter b
(i.e., each node/leaf has at least b/4 and at
most b children/keys in them) - B-tree with leaf and branching parameter b on
indegree 0 nodes - ?
- If bB Query at any time t in
I/Os
23Persistent B-tree Updates
- Updates performed as in B-tree
- To obtain linear space we maintain new-node
invariant - New node contains between and
alive elements and no dead elements
24Persistent B-tree Insert
- Search for relevant leaf u and insert new element
- If u contains B1 elements Block overflow
- Version split
- Mark u dead and create new node u with x alive
element - If Strong overflow
- If Strong underflow
- If then recursively
update parent(u) - Delete (persistently) reference to u and insert
reference to u
25Persistent B-tree Insert
- Strong overflow ( )
- Split u into u and u with elements each (
) - Recursively update parent(u)
- Delete reference to u and insert reference to v
and v - Strong underflow ( )
- Merge x elements with y live elements obtained by
version split on sibling (
) - If then (strong overflow)
perform split into nodes with (xy)/2 elements
each ( ) - Recursively update parent(u) Delete two insert
one/two references
26Persistent B-tree Delete
- Search for relevant leaf u and mark element dead
- If u contains alive elements Block
underflow - Version split
- Mark u dead and create new node u with x alive
element - Strong underflow ( )
- Merge (version split) and possibly split (strong
overflow) - Recursively update parent(u)
- Delete two references insert one or two
references
27Persistent B-tree
28Persistent B-tree Analysis
- Update
- Search and rebalance on one root-leaf path
- Space O(N/B)
- At least updates in leaf in existence
interval - When leaf u dies
- At most two other nodes are created
- At most one block over/underflow one level up (in
parent(u)) - ?
- During N updates we create
- leaves
- nodes i levels up
- ? blocks
29Summary/Conclusion Persistent B-tree
- Persistent B-tree
- Update current version
- Query all versions
- Efficient implementation obtained using existence
intervals - Standard technique
- ?
- During N operations
- O(N/B) space
- update
- query
30Interval Management
- Problem
- Maintain N intervals with unique endpoints
dynamically such that stabbing query with point x
can be answered efficiently - As in (one-dimensional) B-tree case we are
interested in - space
- update
- query
x
31Interval Management Static Solution
- Sweep from left to right maintaining persistent
B-tree - Insert interval when left endpoint is reached
- Delete interval when right endpoint is reached
- Query x answered by reporting all intervals in
B-tree at time x - space
- query
- construction using buffer
technique - Dynamic with insert bound using
logarithmic method
x
32Internal Memory Logarithmic Method Idea
- Given (semi-dynamic) structure D on set V
- O(log N) query, O(log N) delete, O(N log N)
construction - Logarithmic method
- Partition V into subsets V0, V1, Vlog N, Vi
2i or Vi 0 - Build Di on Vi
- Delete O(log N)
- Query Query each Di ? O(log2 N)
- Insert Find first empty Di and construct Di out
of - elements
in V0,V1, Vi-1 - O(2i log 2i) construction ? O(log N) per moved
element - Element moved O(log N) times ?
amortized
33External Logarithmic Method Idea
- Decrease number of subsets Vi
- to logB N to get query
- Problem Since there are not
enough elements in V0,V1, Vi-1 to build Vi - Solution We allow Vi to contain any number of
elements ? Bi - Insert Find first Di such that
and construct new - Di from elements in V0,V1, Vi
- We move elements
- If Di constructed in O((Vi/B)logB Vi)
O(Bi-1logB N) I/Os every moved element charged
O(logB N) I/Os - Element moved O(logB N) times ?
amortized
34External Logarithmic Method Idea
- Given (semi-dynamic) linear space external data
structure with - I/O query
- I/O construction
- ( I/O delete)
- ?
- Linear space dynamic data structure with
- I/O query
- I/O insert amortized
- ( I/O delete)
- Dynamic interval management
- I/O query
- I/O insert amortized
35Planar Point Location
- Static problem
- Store planar subdivision with N segments on disk
such that region containing query point q can be
found I/O-efficiently - We concentrate on vertical ray shooting query
- Segments can store regions it bounds
- Segments do not have to form subdivision
- Dynamic problem
- Insert/delete segments
- (we will not discuss this)
q
36Static Solution
- Vertical line imposes above-below order on
intersected segments - Sweep from left to right maintaining
- persistent B-tree on above-below order
- Left endpoint Insert segment
- Right endpoint Delete segment
- Query q answered by successor query on B-tree at
time qx - space
- query
37Static Solution
- Note Not all segments comparable!
- Have to be careful about what we compare
- ?
- Problem Routing elements in internal nodes of
leaf oriented B-trees - Luckily we can modify persistent B-tree to use
regular (live) elements as routing elements - However, buffer technique construction cannot be
used - ?
- Only I/O construction
algorithm - Cannot be made dynamic using logarithmic method
38References
- External Memory Geometric Data Structures
- Lecture notes by Lars Arge.
- Section 1-4
- I/O-efficient Point Location using Persistent
B-trees - Lars Arge, Andrew Danner and Sha-Mayn Teh