Persistent data structures - PowerPoint PPT Presentation

About This Presentation

Title:

Persistent data structures

Description:

Persistent data structures – PowerPoint PPT presentation

Number of Views:83

Avg rating:3.0/5.0

Slides: 72

Provided by: Sintec

Category:

more less

Transcript and Presenter's Notes

Title: Persistent data structures

1
Persistent data structures
2
Ephemeral A modification destroys the version
which we modify.
Persistent Modifications are nondestructive.
Each modification creates a new version. All
version coexist. We have a big data structure
that represent all versions
3
Partially persistent Can access any version,
but can modify only the most recent one.
V1
V2
V3
V4
V5
4
fully persistent Can access and modify any
version any number of times .
V1
V2
V4
V5
V5
V3
5
confluently persistent fully persistent and
there is an operation that combines two or more
versions to one new version.
V1
V2
V4
V5
V5
V3
6
Purely functional You are not allowed to change
a field in a node after it is initialized. This
is everything you can do in a pure functional
programming language
7
Example -- stacks
Two operations S push(x,S) (x,S) pop(S)
S push(x,S)
y
S push(z,S)
S
(y,S3) pop( S)

Stacks are automatically fully persistent.
8
Example -- queues
Two operations Q inject(x,Q) (x,Q) pop(Q)
y

Q
Q
Q inject(x,Q)
Q inject(z,Q)
(y,Q3) pop( Q)
We have partial persistent, We never want to
store two different values in the same field
How do we make the queue fully persistent ?
9
Example -- double ended queues
Q
Q
four operations Q inject(x,Q) (x,Q)
eject(Q) Q push(x,Q) (x,Q) pop(Q)

x
(x,Q) eject(Q)
Q inject(z,Q)
Here its not even obvious how to get partial
persistence ?
10
Maybe we should use stacks
Stacks are easy. We know how to simulate queues
with stacks. So we should be able to get
persistent queues this way...
inject
push

eject
pop
When one of the stacks gets empty we split the
other
2
4
3
1
eject
11
Deque by stack simulation (ephemeral analysis)
? Sl - Sr
Each operation changes the potential by O(1) The
amortized cost of the reverse is 0.
4
3
2
1
eject
4
2
3
1
eject
In a persistent setting it is not clear that this
potential is well defined
12
Deque by stack simulation (partial persistence)
? Sl - Sr
Where S is the live stack, the one which we can
modify Everything still works
When we do the reversal in order not to modify
any other stack we copy the nodes !
4
3
2
1
eject
4
2
3
1
eject
13
Deque by stack simulation (full persistence)
Can repeat the expensive operation over and over
again
....

eject
or
....

eject
A sequence of n operations that costs ?(n2)
14
Summary so far
Stacks are automatically fully persistent Got
partially persistent queues in O(1) time per
pop/inject Got partially persistent deques in
O(1) amortized time per operation How about
fully persistent queues ? Partially persistent
search trees, other data structures ? Can we do
something general ?
15
Some easy observations
You could copy the entire data structure before
doing the operation ?(n) time per update, ?(nm)
space. You could also refrain from doing
anything just keep a log of the updates. When
accessing version i perform first the i updates
in order to obtain version i ?(i) time per
access, O(m) space.
You could use a hybrid approach that would store
the entire sequence of updates and in addition
every kth version for some suitable k. Either the
space or the access time blows up by a factor of
?m.
Can you do things more efficiently ?
16
How about search trees ?
All modifications occur on a path.
So it suffices to copy one path.
This is the path copying method.
17
Example -- path copying
. . . . . . .
. . . . . . . .
. .
3
1
12
18
15
14
20
28
21
40
16
18
Example -- path copying
. . . . . . .
. . . . . . . .
. .
3
1
12
18
15
14
20
28
21
40
12
18
15
14
16
19
Path copying -- analysis
Gives fully persistent search trees!
O(log n) time for update and access
O(log n) space per update
Want the space bound to be proportional to the
number of field modifications that the ephemeral
update did.
In case of search trees we want the space
consumption of update to be O(1) (at least
amortized).
20
Application -- planar point location
Suppose that the Euclidian plane is subdivided
into polygons by n line segments that intersect
only at their endpoints. Given such polygonal
subdivision and an on-line sequence of query
points in the plane, the planar point location
problem, is to determine for each query point the
polygon containing it.
Measure an algorithm by three parameters 1) The
preprocessing time. 2) The space required for the
data structure. 3) The time per query.
21
Planar point location -- example
22
Planar point location -- example
23
Solving planar point location (Cont.)
Partition the plane into vertical slabs by
drawing a vertical line through each endpoint.
Within each slab the lines are totally ordered.
Allocate a search tree per slab containing the
lines at the leaves with each line associate the
polygon above it.
Allocate another search tree on the x-coordinates
of the vertical lines
24
Solving planar point location (Cont.)
To answer query first find the appropriate
slab Then search the slab to find the polygon
25
Planar point location -- example
26
Planar point location -- analysis
Query time is O(log n) How about the space ?
?(n2)
And so could be the preprocessing time
27
Planar point location -- bad example
Total lines O(n), and number of lines in each
slab is O(n).
28
Planar point location persistence
So how do we improve the space bound ?
Key observation The lists of the lines in
adjacent slabs are very similar.
Create the search tree for the first slab. Then
obtain the next one by deleting the lines that
end at the corresponding vertex and adding the
lines that start at that vertex
How many insertions/deletions are there
alltogether ?
2n
29
Planar point location persistence (cont)
Updates should be persistent since we need all
search trees at the end.
Partial persistence is enough
Well, we already have the path copying method,
lets use it. What do we get ?
O(nlogn) space and O(nlog n) preprocessing time.
We shall improve the space bound to O(n).
30
What are we after ?
Break each operation into elementary access steps
(ptr traversal) and update steps (assignments,
allocations).
Want a persistent simulation with consumes O(1)
time per update or access step, and O(1) space
per update step.
31
Making data structures persistent (DSST 89)
We will show a general technique to make data
structures partially and later fully persistent.
The time penalty of the transformation would be
O(1) per elementary access and update step.
The space penalty of the transformation would be
O(1) per update step.
In particular, this would give us an O(n) space
solution to the planar point location problem
32
The fat node method
Every pointer field can store many values, each
tagged with a version number.
NULL
4
5
7
15
33
The fat node method (Cont.)
Simulation of an update step when producing
version i
NULL
4
5

When a new node is created by the ephemeral
update we create a new node, each value of a
field in the new node is marked with version i.

7
15

When we change a value of a field f to v, we add
an entry to the list of f with key i and value v

34
The fat node method (Cont.)
Simulation of an access step when navigating in
version i
NULL
4
5

The relevant value is the one tagged with the
largest version number smaller than i

7
15
35
Partialy persistent deques via the fat node method
x
1
1
V1
Null
Null
36
Fat node -- analysis
Space is ok -- O(1) per update step
That would give O(n) space for planar point
location since each insertion/deletion does O(1)
changes amortized.
We screwed up the update time, it may take O(log
m) to traverse a pointer, where m is the of
versions
So query time goes up to O(log2n) and
preprocessing time is O(nlog2n)
37
Node copying
This is a general method to make pointer based
data structures partially persistent.
Nodes have to have bounded in degree and bounded
outdegree
We will show this method first for balanced
search trees which is a slightly simpler case
than the general case.
Idea It is similar to the fat node method just
that we wont make nodes too fat.
38
Partially persistent balanced search trees via
node copying
Here it suffices to allow one extra pointer field
in each node
Each extra pointer is tagged with a version
number and a field name.
When the ephemeral update allocates a new node
you allocate a new node as well.
When the ephemeral update changes a pointer field
if the extra pointer is empty use it, otherwise
copy the node. Try to store pointer to the new
copy in its parent. If the extra ptr at the
parent is occupied copy the parent and continue
going up this way.
39
Insert into persistent 2-4 trees with node copying
. . . . . . .
. . . . . . . .
. .
3
1
12
18
14
20
28
21
16
40
Insert into persistent 2-4 trees with node copying
1
. . . . . . .
. . . . . . . .
. .
3
1
12
18
14
20
28
21
16
12
18
14
41
Insert into persistent 2-4 trees with node copying
1
. . . . . . .
. . . . . . . .
. .
3
1
12
18
14
20
28
21
29
16
12
18
14
42
Insert into persistent 2-4 trees with node copying
1
. . . . . . .
. . . . . . . .
. .
3
1
12
18
14
20
28
21
16
12
18
14
29
20
28
21
43
Insert into persistent 2-4 trees with node copying
2
1
. . . . . . .
. . . . . . . .
. .
3
1
12
18
14
20
28
21
16
12
18
14
29
20
28
21
44
Node copying -- analysis
The time slowdown per access step is O(1) since
there is only a constant of extra pointers per
node.
What about the space blowup ?
O(1) (amortized) new nodes per update step due to
nodes that would have been created by the
ephemeral implementation as well.
How about nodes that are created due to node
copying when the extra pointer is full ?
45
Node copying -- analysis
Well show that only O(1) of copings occur on the
average per update step.
Amorized space consumption real space
consumption ??
? (used slots in live nodes)
A node is live if it is reachable from the root
of the most recent version.
gt Amortized space cost of node copying is 0.
46
Node copying in general
Each persistent node has d p e 1 pointers
e extra pointers
p predecessor pointers
1 copy pointer.
4
7
11
6
5
live
47
Simulating an update step in node x
When there is no free extra ptr in x copy x.
When you copy node x, and x points to y, c(x)
should point to y, update the corresponding
predecessor ptr in y. Add x to the set S of
copied nodes. (S contains no nodes initially)
y
7
x
7
48
Node copying in general (cont)
Take out a node x from S, go to nodes pointing to
x and update then, maybe copying more nodes
y
7
x
7
11
49
Node copying in general (cont)
Take out a node x from S, go to nodes pointing to
x and update then, maybe copying more nodes
y
7
x
7
11
11
11
11
50
Node copying in general (cont)
Take out a node x from S, go to nodes pointing to
x and update then, maybe copying more nodes
y
7
x
7
11
11
11
11
51
Node copying in general (cont)
Take out a node x from S, go to nodes pointing to
x and update then, maybe copying more nodes
y
7
x
7
11
11
11
11
52
Node copying in general (cont)

Remove any node x from S,
for each node y indicated by a predecessor
pointer in x
find in y the live pointer to x.
If this ptr has version stamp i, replace it by a
ptr to c(x). Update the corresponding reverse
pointer
If this ptr has version stamp less than i, add
to y a ptr to c(x) with version stamp i. If there
is no room, copy y as before, and add it to S.
Update the corresponding reverse pointer

53
Node copying (analysis)
Actual space consumed is S
? (used extra fields in live nodes)
?? -eS pS
This is smaller than S if e gt p (Actually e
p suffices if we were more careful) So whether
there were any copings or not the amortized space
cost of a single update step is O(1)
54
The fat node method - full persistence
Does it also work for full persistence ?
NULL
1
5
5
6
7
6
We have a navigation problem.
55
The fat node method - full persistence (cont)
Maintain a total order of the version tree.
56
The fat node method - full persistence (cont)
When a new version is created add it to the list
immediately after its parent. gt The list is a
preorder of the version tree.
57
The fat node method - full persistence (cont)
When traversing a field in version i, the
relevant value is the one recorded with a version
preceding i in the list and closest to it.
NULL
1
5
6
58
The fat node method - full persistence (cont)
How do we update ?
NULL
1
5
6
59
The fat node method - full persistence (cont)
NULL
1
10
5
6
10
10
5
6
7
9
8
So what is the algorithm in general ?
60
The fat node method - full persistence (cont)
Suppose that when we create version i we change
field f to have value v.
Let i1 (i2) be the first version to the left
(right) of i that has a value recorded at field f
i1
f
i
i2
v

i1
i2
i

61
The fat node method - full persistence (cont)
We add the pair (i,v) to the list of f
Let i be the version following i in the version
list
v
i1
f
i
i2
v

i1
i2
i
i

If (i lt i2) or i exists and i2 does not exist
add the pair (i,v) where v is the value
associated with i1.
62
Fully persistent 2-4 trees with the fat node
method
0
. . . . . . .
. . . . . . . .
. .
3
1
12
18
14
20
28
21
16
63
Insert into fully persistent 2-4 trees (fat nodes)
0
0
1
1
1
. . . . . . .
. . . . . . . .
. .
3
1
12
18
14
20
28
21
16
12
18
14
64
Insert into fully persistent 2-4 trees (fat nodes)
0
0
1
2
1
2
1
. . . . . . .
. . . . . . . .
. .
3
1
12
18
14
20
28
21
29
16
12
18
14
65
Insert into persistent 2-4 trees with node copying
0
0
1
2
1
2
2
1
. . . . . . .
. . . . . . . .
. .
3
1
12
18
14
20
28
21
16
12
18
14
29
20
28
21
66
Insert into persistent 2-4 trees with node copying
0
0
1
2
1
2
2
1
1
. . . . . . .
. . . . . . . .
. .
3
1
12
18
14
20
28
21
16
12
18
14
29
20
28
21
67
Fat node method (cont)
How do we efficiently find the right value of a
field in version i ?
Store the values sorted by the order determined
by the version list. Use a search tree to
represent this sorted list.
To carry out a find on such a search tree we need
in each node to answer an order query on the
version list.
Use Dietz and Sleators data structure for the
version list.
68
Fat node method (summary)
We can find the value to traverse in O(log(m))
where m is the number of versions
We get O(1) space increase per ephemeral update
step
O(log m) time slowdown per ephemeral access step
69
Node splitting
Similar to node copying. (slightly more
evolved) Allows to avoid the O(log m) time
slowdown. Converts any pointer based data
structure with constant indegrees and outdegrees
to a fully persistent one. The time slowdown per
access step is O(1) (amortized). The space blowup
per update step is O(1) (amortized)
70
Search trees via node splitting
You get fully persistent search trees in which
each operation takes O(log n) amortized time and
space.
Why is the space O(log n) ?
Since in the ephemeral settings the space
consumption is O(1) only amortized.
71
Search trees via node splitting
So what do we need in order to get persistent
search trees with O(1) space cost per update
(amortized) ?
We need an ephemeral structure in which the space
consumption per update is O(1) on the worst case.
You can do it ! gt Red-black trees with lazy
recoloring
72
What about deques ?
We can apply node splitting to get fully
persistent deques with O(1) time per operation.
We can also transform the simulation by stacks
into a real time simulation and get O(1) time
solution.
What if we want to add the operation concatenate ?
None of the methods seems to extend...

Write a Comment

User Comments (0)