Loading...

PPT – PXML: Probabilistic Semistructured Databases PowerPoint presentation | free to download - id: 1e8f2c-ZDc1Z

The Adobe Flash plugin is needed to view this content

PXML Probabilistic Semistructured Databases

- Edward Hung, Lise Getoor, V.S. Subrahmanian
- University of Maryland, College Park

Outline

- Motivating example
- Semistructured data model
- PXML data model
- Semantics
- Interpretation
- Satisfaction
- Algebra
- Related work
- Future work

Motivating Example

- Surveillance applications monitoring a region of

battlefield - Image processing system identifies vehicles in

convoys appearing in the region in different time - Convoys
- Timestamp
- tanks, trucks, etc
- Uncertainty
- number of vehicles
- Category and identity of a vehicle, e.g., a tank?

T-72?

Motivating Example

- Doppler speed system detects the speed and

velocity of convoys and infers their possible

destinations - Convoys
- Timestamp
- Possible destinations
- Uncertainty
- Number of places the convoy will go
- The name of the places

Motivating Example

- Semistructured data model
- General hierarchical structure is known.
- The schema is not fixed
- Number of vehicles
- Properties of vehicles
- Our work store uncertain information in

probabilistic environments.

Semistructured Data Model

- Instance S(V, lch, t, val)
- lch(o, l) the set of children of o with label l
- G (V, lch) is a rooted, directed, edge-labeled

graph

Semistructured Data Model

Time 10

Semistructured Data Model

Time 15

Semistructured Data Model

- Example

PXML Data Model

- Uncertainty
- Existence of sub-objects
- Number of sub-objects
- Identity of the sub-objects

PXML Data Model

- Weak instance W (V, lch, t, val, card)
- Cardinality constraint (card(o, l)) gives the

bounds of the number of sub-objects with edge

label l connected to the same parents o.

PXML Data Model

- Example
- Convoy 2 surely has a timestamp
- card(convoy2, ts) 1, 1
- Convoy 2 may have one to two trucks
- card(convoy2, truck) 1, 2

PXML Data Model (Cardinality)

- Example of cardinality

Weak Instance W Semistructured Instance card

PXML Data Model

- Compatible Instances
- A semistructured instance S (VS, lchS, tS,

valS) is compatible with a weak instance W (VW,

lchW, tW, valW) if - (VS, lchS) is a rooted connected graph.
- If o is a leaf in S, then
- If o is also a leaf in W, tS(o)tW(o) and

valS(o)valW(o), otherwise, the type and value is

defined as unknown. - Otherwise, card(o,l).min lt k lt card(o,l).max

where k is the number of l-labeled children of o,

i.e. lchS(o, l)

PXML Data Model

- Example

PXML Data Model

- Example
- There are surely 2 convoys.
- card(S, convoy) 2, 2
- Convoy 1 surely has a timestamp, a truck and a

tank. - card(convoy1, ts) 1, 1
- card(convoy1, truck) 1, 1
- card(convoy1, tank) 1, 1
- Convoy 2 surely has a timestamp
- card(convoy2, ts) 1, 1
- Convoy 2 may have one to two trucks
- card(convoy2, truck) 1, 2

PXML Data Model

- D(W) the set of all semistructured instances

compatible with a weak instance W

(No Transcript)

PXML Data Model (Weak Instance)

- Example of a weak instance W

card(S1,convoy)2,2

card(convoy1,ts)1,1

card(convoy1,truck)1,1

card(convoy1,tank)1,1

card(convoy2,ts)1,1

card(convoy2,truck1,2

PXML Data Model

- Example of an instance compatible with W

card(convoy1,ts)1,1

card(S1,convoy)2,2

card(convoy1,truck)1,1

card(convoy1,tank)1,1

card(convoy2,ts)1,1

card(convoy2,truck)1,2

- D(W) the set of all semistructured instances

compatible with the weak instance W

PXML Data Model

- Potential child set
- PC(o), the potential child set of a non-leaf

object o in a weak instance W is - the set of all possible sets of children of o

satisfying the constraint of cardinality

PXML Data Model

- Example
- Convoy 2s surely has one time stamp which is

surely 15. Convoy 2 may have a truck of type mac

and/or a truck of type rover - card(convoy2, truck) 1, 2
- card(convoy2, ts) 1, 1
- PC(convoy2) ts2, truck3, ts2, truck4,

ts2, truck3, truck4

Potential child set of convoy2, PC(convoy2)

ts2, truck3, truck4,

ts2, truck3,

ts2, truck4

PXML Data Model

- Probabilistic instance I (V, lch, t, val, card,

ipf) - Interval probability function (ipf(o, c)) w.r.t.

the set PC(o) associates, with each c in PC(o), a

closed subinterval lb(c), ub(c) 0, 1

PXML Data Model

- Example
- PC(convoy2) ts2, truck3, ts2, truck4,

ts2, truck3, truck4 - ipf(convoy2, ts2, truck3)0.2, 0.3
- ipf(convoy2, ts2, truck4)0.3, 0.5
- ipf(convoy2, ts2, truck3, truck4)0.2, 0.4

Probabilistic Instance I Weak Instance W ipf

ipf(convoy2, ts2, truck3 , truck4)0.2, 0.3

ipf(convoy2, ts2, truck3)0.3, 0.5

ipf(convoy2, ts2, truck4)0.2, 0.4

PXML Data Model

- Here the ipf assigns the probability interval to

each possible set of children. - More independence assumptions are possible to

make the representation more compact - e.g. independence between trucks and tanks.
- e.g. all trucks are all indistinguishable.

Semantics (Global Interpretation)

- Interpretation
- Global interpretation, P
- a mapping from D(W) (the set of semistructured

instances compatible with W) to 0,1 s.t.

S1a

S1b

S1c

P(S1a) 0.12

P(S1b) 0.08

P(S1c) 0.2

S1d

S1e

S1f

P(S1d) 0.18

P(S1e) 0.12

P(S1f) 0.3

Semantics (Local Interpretation)

- An object probability function (OPF)for an object

o w.r.t. a weak instance W is a mapping w PC(o)

? 0, 1 s.t.

Semantics

- Example
- ipf(convoy2, ts2, truck3)0.2, 0.3
- ipf(convoy2, ts2, truck4)0.3, 0.5
- ipf(convoy2, ts2, truck3, truck4)0.2, 0.4
- wconvoy2(ts2, truck3) 0.2
- wconvoy2(ts2, truck4) 0.5
- wconvoy2(ts2, truck3, truck4) 0.3

Semantics (Local Interpretation)

- Previously, probabilities are assigned to each

compatible instance globally. - Now we are going to assign probabilities of the

actual children of each non-leaf object in a

local manner.

Object probability function (OPF) for convoy2

w.r.t. W is a mapping w PC(convoy2) ? 0,1 s.t.

wconvoy2(ts2, truck3 , truck4) 0.2

wconvoy2(ts2, truck3) 0.5

wconvoy2(ts2, truck4) 0.3

Semantics (Local Interpretation)

- Interpretation
- Local interpretation, p
- a mapping from the set of non-leaf objects to

OPFs - Example
- p(convoy2) wconvoy2

Semantics (Local ? Global)

- Assume that the probability of any potential

child of an object o is independent of

non-descendants of o. - W operator
- W operator returns the probabilities assigned to

every semistructured instance compatible with a

given weak instance, which is consistent with a

given local interpretation. - Given a semistructured instance S compatible with

a weak instance W and a local interpretation p

for W - W(p)(S)Õo S p(o)(CS(o))
- Theorem
- W(p) is a global interpretation for W

Semantics

- Example
- ipf(S1, convoy1, convoy2)1, 1
- wS1(ts1, truck1, tank1) 1
- ipf(convoy1, ts1, truck1, tank1)0.2, 0.6
- ipf(convoy1, ts1, truck1, tank2)0.4, 0.8
- wconvoy1(ts1, truck1, tank1) 0.4
- wconvoy1(ts1, truck1, tank2) 0.6
- ipf(convoy2, ts2, truck3)0.2, 0.3
- ipf(convoy2, ts2, truck4)0.3, 0.5
- ipf(convoy2, ts2, truck3, truck4)0.2, 0.4
- wconvoy2(ts2, truck3) 0.2
- wconvoy2(ts2, truck4) 0.5
- wconvoy2(ts2, truck3, truck4) 0.3

Semantics

- Example
- W(S1a)
- p(S1)(convoy1, convoy2) x p(convoy1)(ts1,

truck1, tank1) x p(convoy2)(ts2, truck3,

truck4) - wS1(ts1, convoy1, convoy2) x wconvoy1(ts1,

truck1, tank1) x wconvoy2(ts2, truck3, truck4) - 1 x 0.4 x 0.3
- 0.12

Semantics

wS1(convoy1, convoy2)1

wconvoy1(ts1, truck1, tank1) 0.4

wconvoy2(ts2, truck3, truck4)0.3

p(S1)(convoy1, convoy2) x p(convoy1)(ts1,

truck1, tank1) x p(convoy2)(ts2, truck3,

truck4)

- W(S1a)

wS1(ts1, convoy1, convoy2) x wconvoy1(ts1,

truck1, tank1) x wconvoy2(ts2, truck3, truck4)

1 x 0.4 x 0.3 0.12

Semantics

- Example
- Similarly, we can get
- W(S1a) 0.12
- W(S1b) 0.08
- W(S1c) 0.2
- W(S1d) 0.18
- W(S1e) 0.12
- W(S1f) 0.3

Semantics (Global ? Local)

- (Same assumption) The probability of any

potential child of an object o is independent of

non-descendants of o. - Given a global interpretation P for a weak

instance W - P satisfies W iff P(co, ndes(o)) P(co)
- ndes(o) is the set of non-descendants of o.

Semantics (Global ? Local)

- D operator
- D operator returns the probabilities assigned to

each possible set of children of every non-leaf

object, which is consistent with a given global

interpretation. - Given a global interpretation P that satisfies a

weak instance W, for any non-leaf object o, any c

in PC(o) - D(P) returns a function defined as follows for

any non-leaf object o, D(P)(o)wP,o

Semantics (Global ? Local)

- Theorem
- D(P) is a local interpretation for W
- Example
- Derive D(P)(convoy2)

S1a

S1b

S1c

P(S1a) 0.12

P(S1b) 0.08

P(S1c) 0.2

S1d

S1e

S1f

P(S1d) 0.18

P(S1e) 0.12

P(S1f) 0.3

D(P)(convoy2) wP, convoy2

- wP, convoy2(ts2, truck3, truck4)

(0.120.18)/10.3

D(P)(convoy2) wP, convoy2

- wP, convoy2(ts2, truck3, truck4)

(0.120.18)/10.3

- wP, convoy2(ts2, truck3) (0.080.12)/1 0.2

- wP, convoy2(ts2, truck4) (0.20.3)/1 0.5

Semantics

- Example
- Derive D(P)(convoy2) wP, convoy2
- wP, convoy2(ts2, truck3, truck4)

(0.120.18)/10.3 - wP, convoy2(ts2, truck3) (0.080.12)/1 0.2
- wP, convoy2(ts2, truck4) (0.20.3)/1 0.5

Semantics (Local ?? Global)

- Theorems
- Suppose p is a local interpretation for a weak

instance W, then D(W(p))p. - Suppose P is a global interpretation that

satisfies a weak instance W, then W(D(P))P.

Semantics (Satisfaction)

- Given a probabilistic instance I, a non-leaf

object o, - OC(o), the object constraints are
- p(c) is a real-valued variable denoting the

probability that c is the actual set of children

of o.

Semantics (Satisfaction)

- Example
- ipf(convoy2, ts2, truck3)0.2, 0.3
- ipf(convoy2, ts2, truck4)0.3, 0.5
- ipf(convoy2, ts2, truck3, truck4)0.2, 0.4
- OC(convoy2)

Semantics (Local Satisfaction)

- An OPF w satisfies a non-leaf object o iff w is a

probability distribution w.r.t. PC(o) over ipf. - A local interpretation p satisfies a non-leaf

object o iff p(o) satisfies o. - A local interpretation p satisfies a

probabilistic instance I iff p satisfies Is

every non-leaf object.

Semantics (Global Satisfaction)

- A global interpretation P satisfies a

probabilistic instance I iff D(P) satisfies I. - Corollary
- A local interpretation p satisfies a

probabilistic instance I iff W(p) satisfies I.

Semantics (Consistency)

- A probabilistic instance is locally consistent

iff there is a local interpretation that

satisfies it. - A probabilistic instance is globally consistent

iff there is a global interpretation that

satisfies it. - Theorem
- Every probabilistic instance is locally and

globally consistent.

Algebra

- Operators
- Projection
- Selection
- Cross-product
- Path expression
- o.l1.l2ln

S1.convoy.truck

Algebra (Projection)

- Ancestor projection
- Descendant projection
- Single projection

Algebra (Projection)

Semistructured Instance

- Ancestor projection ( )

Weak Instance

- Ancestor projection ( )

Probabilistic Instance

- Ancestor projection ( )

card(convoy1,ts)1,1

card(I2,convoy)1,1

card(convoy1,truck)1,1

ipf(I2, convoy1)1

card(convoy1,tank)1,1

ipf(convoy1, ts1,truck1,tank1)0,0.3 ipf(convo

y1, ts1,truck1,tank2)0.1,0.4 ipf(convoy1,

ts1,truck2,tank1)0.3,0.5 ipf(convoy1,

ts1,truck2,tank2)0.3,0.6

PC(convoy1)

card(I2,convoy)1,1

card(convoy1,truck)1,1

ipf(I2, convoy1)1

Children of convoy1 before CI2(convoy1)ts1,

truck1, truck2, tank1, tank2

Children of convoy1 after CI2(convoy1)truck1,

truck2

Let Cd CI2(convoy1) CI2(convoy1)ts1,

tank1, tank2

PC(convoy1)truck1,truck2

Probabilistic Instance

- Ancestor projection ( )

card(convoy1,ts)1,1

card(I2,convoy)1,1

card(convoy1,truck)1,1

ipf(I2, convoy1)1

card(convoy1,tank)1,1

ipf(convoy1, ts1,truck1,tank1)0,0.3 ipf(convo

y1, ts1,truck1,tank2)0.1,0.4 ipf(convoy1,

ts1,truck2,tank1)0.3,0.5 ipf(convoy1,

ts1,truck2,tank2)0.3,0.6

PC(convoy1)

card(I2,convoy)1,1

card(convoy1,truck)1,1

ipf(I2, convoy1)1

For each c in PC(convoy1),

ipf(convoy1, c)a, min(1,b)

ipf(convoy1) ? tight(ipf(convoy1))

Dekhtyar, Goldsmith (2002)

Probabilistic Instance

- Ancestor projection ( )

card(convoy1,ts)1,1

card(I2,convoy)1,1

card(convoy1,truck)1,1

ipf(I2, convoy1)1

card(convoy1,tank)1,1

ipf(convoy1, ts1,truck1,tank1)0,

0.3 ipf(convoy1, ts1,truck1,tank2)0.1,0.4 ip

f(convoy1, ts1,truck2,tank1)0.3,0.5 ipf(convo

y1, ts1,truck2,tank2)0.3,0.6

PC(convoy1)

card(I2,convoy)1,1

card(convoy1,truck)1,1

ipf(I2, convoy1)1

For truck1,

a 0.00.1 0.1

b 0.30.4 0.7

ipf(convoy1, truck1) 0.1, min(1, 0.7)

0.1, 0.7

Probabilistic Instance

- Ancestor projection ( )

card(convoy1,ts)1,1

card(I2,convoy)1,1

card(convoy1,truck)1,1

ipf(I2, convoy1)1

card(convoy1,tank)1,1

ipf(convoy1, ts1,truck1,tank1)0,

0.3 ipf(convoy1, ts1,truck1,tank2)0.1,0.4 ip

f(convoy1, ts1,truck2,tank1)0.3,0.5 ipf(convo

y1, ts1,truck2,tank2)0.3,0.6

PC(convoy1)

card(I2,convoy)1,1

card(convoy1,truck)1,1

ipf(I2, convoy1)1

For truck2,

a 0.30.3 0.6

b 0.50.6 1.1

ipf(convoy1, truck2) 0.6, min(1, 1.1)

0.6, 1

Probabilistic Instance

- Ancestor projection ( )

card(convoy1,ts)1,1

card(I2,convoy)1,1

card(convoy1,truck)1,1

ipf(I2, convoy1)1

card(convoy1,tank)1,1

ipf(convoy1, ts1,truck1,tank1)0,

0.3 ipf(convoy1, ts1,truck1,tank2)0.1,0.4 ip

f(convoy1, ts1,truck2,tank1)0.3,0.5 ipf(convo

y1, ts1,truck2,tank2)0.3,0.6

PC(convoy1)

card(I2,convoy)1,1

card(convoy1,truck)1,1

ipf(I2, convoy1)1

ipf(convoy1) ? tight(ipf(convoy1))

tight

ipf(convoy1, truck1)0.1, 0.7 ipf(convoy1,

truck2)0.6, 1

ipf(convoy1, truck1)0.1, 0.4 ipf(convoy1,

truck2)0.6, 0.9

- Ancestor projection ( )

HIDE IT

card(convoy1,ts)1,1

card(convoy1,truck)1,1

card(convoy1,tank)1,1

wconvoy1(ts1,truck1,tank1)0.4 wconvoy1(ts1,tru

ck1,tank2)0.6

card(S1,convoy)2,2

wS1(convoy1,convoy2)1

card(convoy2,ts)1,1

card(convoy2,truck1,2

wconvoy2(ts2,truck3)0.2 wconvoy2(ts2,truck4)

0.5 wconvoy2(ts2,truck3,truck4)0.3

Algebra (Projection)

- Descendant projection ( )

card(I3, truck)0,3 ipf(I3,c)0,1

One naive strategy

Our better strategy similar to the one in cross

product

Algebra (Projection)

- Single projection ( )

(null)

card(I3, truck)0,3 ipf(I3,c)0,1

Algebra (Projection)

- Equivalence

Equivalent

Algebra (Projection)

- Equivalence

Equivalent

Algebra (Projection)

- Equivalence

Equivalent

e1 and e2 are a sequence of zero or more

edges. Thus, I.e1.lm can include I.lm, I.l1.lm,

I.l2.l3.lm, etc.

In general non-equivalent

Algebra (Selection) ( )

- Similar to ancestor projection
- Path expression specifies leaf objects with a

specified value.

Algebra (Selection)

Semistructured Instance

I1

Algebra (Selection) ( )

card(I7, convoy)1,2, wI7(convoy1)0.2,

wI7(convoy2)0.5, wI7(convoy1,convoy2)0.3

card(convoy1, tank)1,1 wconvoy1(tank1)0.3,

wconvoy1(tank2)0.7

card(convoy2, tank)1,1 wconvoy2(tank2)0.4,

wconvoy2(tank3)0.6

0.14 0.3 0.054 0.036 0.084 0.614

D(I7) ?

0.054

/ 0.614

0.06

0.126

0.14

/ 0.614

0.036

0.3

/ 0.614

/ 0.614

0.2

0.084

/ 0.614

Algebra (Selection) ( )

card(I7, convoy)1,2, ipf(I7,convoy1)0.1,0.

3, ipf(I7,convoy2)0.4,0.6,

ipf(I7,convoy1,convoy2)0.2,0.4

card(convoy1, tank)1,1 ipf(convoy1,tank1)0.

2,0.4, ipf(convoy1,tank2)0.6,0.8

card(convoy2, tank)1,1 ipf(convoy2,tank2)0.

3,0.5, ipf(convoy2,tank3)0.5,0.7

D(I7) ?

0.012,0.08

Conditionalization of interval probabilities

0.02,0.12

0.02,0.112

0.06,0.24

0.036,0.16

Dekhtyar, Goldsmith (2002)

0.08,0.24

0.24,0.48

0.06,0.224

Algebra (Cross product (x))

- Probabilistic conjunction strategies
- Example
- Ignorance
- Positive correlation
- Negative Correlation
- Independence

Algebra (Cross product (x))

card(I4, truck)1,1 ipf(I4, truck1)0.2,0.7

ipf(I4, truck2)0.3,0.8

card(I5, tank)1,1 ipf(I5, tank1)0.1,0.6 ip

f(I5, tank2)0.4,0.9

card(I6, truck)1,1 card(I6, tank)1,1

I4 x I5

Algebra (Cross product (x))

card(I4, truck)1,1 ipf(I4, truck1)0.2,0.7

ipf(I4, truck2)0.3,0.8

card(I5, tank)1,1 ipf(I5, tank1)0.1,0.6 ip

f(I5, tank2)0.4,0.9

card(I6, truck)1,1 card(I6, tank)1,1

I4 x I5

Algebra (Cross product (x))

card(I4, truck)1,1 ipf(I4, truck1)0.2,0.7

ipf(I4, truck2)0.3,0.8

card(I5, tank)1,1 ipf(I5, tank1)0.1,0.6 ip

f(I5, tank2)0.4,0.9

card(I6, truck)1,1 card(I6, tank)1,1

I4 x I5

Algebra (Cross product (x))

card(I4, truck)1,1 ipf(I4, truck1)0.2,0.7

ipf(I4, truck2)0.3,0.8

card(I5, tank)1,1 ipf(I5, tank1)0.1,0.6 ip

f(I5, tank2)0.4,0.9

card(I6, truck)1,1 card(I6, tank)1,1

I4 x I5

Algebra (Cross product)

- Equivalence
- (I1 x I2) x I3
- I1 x (I2 x I3)
- (I1 x I3) x I2

Equivalent

Related Work

- Semistructured Probabilistic Objects (SPOs)

(Dekhtyar, Goldsmith, Hawkes, 2001) - SPO express probabilistic information in a

semistructured manner - PXML data model stores XML data AND probabilistic

information.

Related Work

- Algebras TAX, SAL
- TAX (Jagadish, Lakshmanan, Srivastava, 2001)
- use pattern tree to extract subsets of nodes, one

for each embedding of pattern tree. - fixed number of children
- SAL (Beeri, Tzaban, 1999)
- bind objects to variables
- original structure is totally lost

Future Work

- Implement the system
- Query optimization

Summary

- PXML data model
- Semistructured instance
- Weak instance (add cardinality)
- Probabilistic instance (add ipf)
- Semantics
- Local and Global
- Interpretation
- Satisfaction
- Algebra
- Projections, selection, cross product

Algebra (Projection)

- Equivalence

Equivalent

Algebra (Projection)

- Equivalence

Equivalent

e1 and e2 are a sequence of zero or more

edges. Thus, I.e1.lm can include I.lm, I.l1.lm,

I.l2.l3.lm, etc.

In general non-equivalent

Algebra (Cross product)

- Equivalence
- (I1 x I2) x I3
- I1 x (I2 x I3)
- (I1 x I3) x I2

Equivalent

Related Work

- Bayesian net (Pearl, 1988)
- random variables (probability of events)
- ours existence of children requires existence of

parents