Structured Queries for Semistructured Probabilistic Data - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Structured Queries for Semistructured Probabilistic Data

Description:

Year 2000. The Story of the SPO model. Reasoning with uncertainty in AI applications ... Year 2000. Probability Distributions. discrete. finite domains ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 44
Provided by: dekht
Category:

less

Transcript and Presenter's Notes

Title: Structured Queries for Semistructured Probabilistic Data


1
Structured Queries for Semistructured
Probabilistic Data
  • Alex Dekhtyar
  • Kevin Krol Mathias
  • Praveen Gutti

Department of Computer Science University of
Kentucky
2
Outline
  • Semistructured Probabilistic Objects (SPOs)
  • Why?
  • How?
  • Semistructured Probabilistic Algebra
    (SP-Algebra)
  • SPOQL

3
The Story of the SPO model
Year 2000
  • Reasoning with uncertainty in AI applications

Bayesian Networks
Markov Decision Processes
4
The Story of the SPO model
Year 2000
  • Reasoning with uncertainty in AI applications

Bayesian Networks
Markov Decision Processes
  • Probability Distributions
  • discrete
  • finite domains
  • conditional (CPTs)
  • with meta-information?

The probability distribution of performance in a
database class for CS majors who got A in Data
Structures
5
The Story of the SPO model
major CS
Conditional
Probability Table
meta-data
The probability distribution of performance in a
database class for CS majors who got A in Data
Structures
Data StructuresA
6
The Story of the SPO model
major CS
Data StructuresA
Probabilistic Object
  • Barbara,Garcia-Molina, Porter, 1992
  • Lakshmanan, Leone, Ross, Subrahmanian, 1997
  • Dey, Sarkar, 1996
  • Eiter, Liu, Lukasiewicz, Subrahmanian, 2001

7
The Story of the SPO model
major CS
Data StructuresA
Probabilistic Object
Semistricutured
(SPO)
Should be treated as records from the same
relation
8
SPO Birds Eye View
? S1
T context (meta-data)
S ltT,V,P,C,?gt
V participating random variables
(any additional information about the
distribution that we know, which isnt stochastic)
P probability table Pdom(V)? 0,1
C conditionals
? path, internal pseudo-id
9
Outline
  • Semistructured Probabilistic Objects (SPOs)
  • Why?
  • How?
  • Semistructured Probabilistic Algebra
    (SP-Algebra)
  • SPOQL

10
What questions would you ask?
  • Find all probability distributions for CS majors?
  • Find all probability distributions involving
    Databases
  • Find all probability distributions for students
    who got A in Data Structures
  • Find the marginal probability distribution of
    performance in OS
  • Find the probability distribution of performance
    in AI for students with A in Data Structures
    and Databases.

11
What questions would you ask?
  • Find the probability of students getting an A
    in Databases
  • Find all grade combinations which have
    probability over 9
  • Find the joint distributions of grades from the
    data in the database

12
SP-Algebra
Selection (s) Projection (p) Conditionalization
(µ) Cartesian Product (x) Join ( , ) Mix (
)
13
Selection

On context
On participating variables
On conditionals

14
Selection

On probabilities
On probability table

15
Selection

On probabilities
On probability table

16
Projection

On context On conditionals On participating
variables
marginalization

17
Conditionalization

Finds conditional probability distributions
  • Remove rows not matching condition
  • normalize
  • 3. add conditional


18
Cartesian ProductJoin, Mix
Compatibility
  • different participating
  • variables
  • matching conditionals

X
  • common participating
  • variables
  • matching conditionals

Join or cartesian product, depending on
compatibility


19
Cartesian ProductJoin, Mix


X Z Pr
X Y Pr
A A B B C C
A A B B C C


20
Cartesian ProductJoin, Mix
P(ZX)


X Z Pr
X Y Pr
A A B B C C


21
Cartesian ProductJoin, Mix


X Z Pr
X Y Pr
A A B B C C
X
X
X


22
SP-Algebra
Implementation
Selection (s) Projection (p) Conditionalization
(µ) Cartesian Product (x) Join ( , ) Mix (
)
  • on top of RDBMS
  • JDBC
  • needs views
  • XML I/O
  • mediator

Path (?) of the result SP-Algebra operation
23
Outline
  • Semistructured Probabilistic Objects (SPOs)
  • Why?
  • How?
  • Semistructured Probabilistic Algebra
    (SP-Algebra)
  • SPOQL

24
From Algebra to Structured Queries
Whats more readable?
pDB,AI((AI in V)(Plt0.2)(µOSA(S))) x
smajorCS( pDS,Networks(sPgt0.1 (R) x T))
or
SELECT S.var.DB, S.var.AI, RT.var.DS,
RT.var.Networks FROM S TIMES (R TIMES T) RT WHERE
EXISTS S.var.AI AND S.tbl.problt0.2 AND
R.tbl.probgt0.1 AND RT.major
CS CONDITIONAL S.var.OS A
Declarative query language appealed to
application programmers
25
SPOQL Syntax
p
SELECT ltselectlistgt FROM ltfromlistgt WHERE
ltexpressiongt CONDITIONAL ltconditionalizationsgt
X
s
µ
26
SPOQL Syntax
SELECT var.DB, S.cnt.major, R.cnd.OS FROM (S JOIN
R) SR, (T TIMES W) WHERE cnt.year gt 1999 AND
S.cnd.DS A AND
EXISTS T.var.DB AND SR.tbl.P lt
0.1 AND tbl.DB A CONDITIONAL
W.var.DS A AND
R.var.OS B
27
SPOQL Syntax
Variable projection on final result
SELECT var.DB, S.cnt.major, R.cnd.OS FROM (S JOIN
R) SR, (T TIMES W) WHERE cnt.year gt 1999 AND
S.cnd.DS A AND
EXISTS T.var.DB AND SR.tbl.P lt
0.1 AND tbl.DB A CONDITIONAL
W.var.DS A AND
R.var.OS B
28
SPOQL Syntax
Projection of context and conditionals
SELECT var.DB, S.cnt.major, R.cnd.OS FROM (S JOIN
R) SR, (T TIMES W) WHERE cnt.year gt 1999 AND
S.cnd.DS A AND
EXISTS T.var.DB AND SR.tbl.P lt
0.1 AND tbl.DB A CONDITIONAL
W.var.DS A AND
R.var.OS B
29
SPOQL Syntax
Join, alias for intermediate result
SELECT var.DB, S.cnt.major, R.cnd.OS FROM (S JOIN
R) SR, (T TIMES W) WHERE cnt.year gt 1999 AND
S.cnd.DS A AND
EXISTS T.var.DB AND SR.tbl.P lt
0.1 AND tbl.DB A CONDITIONAL
W.var.DS A AND
R.var.OS B
30
SPOQL Syntax
Cartesian product, no alias
SELECT var.DB, S.cnt.major, R.cnd.OS FROM (S JOIN
R) SR, (T TIMES W) WHERE cnt.year gt 1999 AND
S.cnd.DS A AND
EXISTS T.var.DB AND SR.tbl.P lt
0.1 AND tbl.DB A CONDITIONAL
W.var.DS A AND
R.var.OS B
31
SPOQL Syntax
mix
SELECT var.DB, S.cnt.major, R.cnd.OS FROM (S JOIN
R) SR, (T TIMES W) WHERE cnt.year gt 1999 AND
S.cnd.DS A AND
EXISTS T.var.DB AND SR.tbl.P lt
0.1 AND tbl.DB A CONDITIONAL
W.var.DS A AND
R.var.OS B
32
SPOQL Syntax
Selection on content applied to final SP-relation
SELECT var.DB, S.cnt.major, R.cnd.OS FROM (S JOIN
R) SR, (T TIMES W) WHERE cnt.year gt 1999 AND
S.cnd.DS A AND
EXISTS T.var.DB AND SR.tbl.P lt
0.1 AND tbl.DB A CONDITIONAL
W.var.DS A AND
R.var.OS B
Selection on conditions
33
SPOQL Syntax
Selection on participating random variables
SELECT var.DB, S.cnt.major, R.cnd.OS FROM (S JOIN
R) SR, (T TIMES W) WHERE cnt.year gt 1999 AND
S.cnd.DS A AND
EXISTS T.var.DB AND SR.tbl.P lt
0.1 AND tbl.DB A CONDITIONAL
W.var.DS A AND
R.var.OS B
34
SPOQL Syntax
Selection on probabilities applied to
intermediate result
SELECT var.DB, S.cnt.major, R.cnd.OS FROM (S JOIN
R) SR, (T TIMES W) WHERE cnt.year gt 1999 AND
S.cnd.DS A AND
EXISTS T.var.DB AND SR.tbl.P lt
0.1 AND tbl.DB A CONDITIONAL
W.var.DS A AND
R.var.OS B
Selection on probability table
35
SPOQL Syntax
SELECT var.DB, S.cnt.major, R.cnd.OS FROM (S JOIN
R) SR, (T TIMES W) WHERE cnt.year gt 1999 AND
S.cnd.DS A AND
EXISTS T.var.DB AND SR.tbl.P lt
0.1 AND tbl.DB A CONDITIONAL
W.var.DS A AND
R.var.OS B
Conditioning instructions
36
SPOQL Syntax
SELECT var.DB, S.cnt.major, R.cnd.OS FROM (S JOIN
R) SR, (T TIMES W) WHERE cnt.year gt 1999 AND
S.cnd.DS A AND
EXISTS T.var.DB AND SR.tbl.P lt
0.1 AND tbl.DB A CONDITIONAL
W.var.DS A AND
R.var.OS B
37
SPOQL Semantics. Order of operations
p
s
µ
Only operations that explicitly name S are
performed on S
X
1. Perform all conditionalizations
p
p
2. Perform all selections
3. Perform all projections
s
s
4. Join/CP/Mix with other parts of query
µ
µ
Selections on probability and probability tables
do not commute with other operations
R
S
Order of operations _at_ translation time matters!
38
SPOQL Semantics. Questions
  • What if need different order?
  • How to select which chains to combine
    (join/cp/mix)?

39
SPOQL Semantics. Questions
  • What if need different order?
  • nested queries in ltfromlistgt
  • How to select which chains to combine
    (join/cp/mix)?
  • instantiate parentheses
  • left-to-right
  • correct for joins on context

40
SPOQL Implementation
  • Implemented on top of SPDBMS
  • Queries converted into SP-Algebra
  • No optimization (yet)

41
SPOQL Conclusions
  • SPO model stores joint, conditional distributions
    of discrete random variables with finite domains
    plus additional meta-data
  • Structured queries are easier to generate within
    applications
  • (easier for humans to comprehend)
  • SPOQL-to-SP-Algebra translation requires more
    care
  • see Algorithm Evaluate

42
SPO model next steps
  • Query optimization
  • Incorporation into AI applications (planning
    with uncertainty and constraints)
  • XQuery-powered back-end

43
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com