ProbabilisticUncertain Data Management - PowerPoint PPT Presentation

About This Presentation
Title:

ProbabilisticUncertain Data Management

Description:

John. Product. City. Name. Pr(I1) = 1/3. Gadget. Seattle ... John. Seattle. John. Denver. Sue (1-p1)(1-p2)(1-p3) (1-p1)(1-p2)p3 (1-p1)p2(1-p3) p1(1-p2)(1-p3 ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 27
Provided by: dbCsBe
Learn more at: https://dsf.berkeley.edu
Category:

less

Transcript and Presenter's Notes

Title: ProbabilisticUncertain Data Management


1
Probabilistic/Uncertain Data Management
  • Dalvi, Suciu. Efficient query evaluation on
    probabilistic databases, VLDB2004.
  • Das Sarma et al. Working models for uncertain
    data, ICDE2006.
  • Slides based on the Suciu/Dalvi SIGMOD05 tutorial

2
What is a Probabilistic Database ?
  • An item belongs to the database is a
    probabilistic event
  • Tuple-existence uncertainty
  • Attribute-value uncertainty
  • A tuple is an answer to the query is a
    probabilistic event
  • Can be extended to all data models we discuss
    only probabilistic relational data

3
Possible Worlds Semantics
Attribute domains
int, char(30), varchar(55), datetime
values 232, 2120, 2440, 264
Relational schema
Employee(namevarchar(55), dobdatetime,
salaryint)
of tuples 2440 264 223 of
instances 22440 264 223
Database schema
Employee(. . .), Projects( . . . ), Groups( . .
.), WorksFor( . . .)
of instances N ( BIG but finite)
4
The Definition
The set of all possible database instances
INST I1, I2, I3, . . ., IN
will use Pr or Ip interchangeably
Definition A possible world is I s.t. Pr(I) gt 0
5
Query Semantics
Given a query Q and a probabilistic database
Ip,what is the meaning of Q(Ip) ?
6
Query Semantics
Semantics 1 Possible Answers A probability
distribution on sets of tuples
8 A. Pr(Q A) åI 2 INST. Q(I) A Pr(I)
Semantics 2 Possible Tuples A probability
function on tuples
8 t. Pr(t 2 Q) åI 2 INST. t2 Q(I) Pr(I)
7
Example Query Semantics
Purchasep
SELECT DISTINCT x.product FROM Purchasep x,
Purchasep y WHERE x.name 'John' and
x.product y.product and y.name 'Sue'
Pr(I1) 1/3
Possible answers semantics
Pr(I2) 1/12
Pr(I3) 1/2
Possible tuples semantics
Pr(I4) 1/12
8
Possible Worlds Query Semantics
  • Possible answers semantics
  • Precise
  • Can be used to compose queries
  • Difficult user interface
  • Possible tuples semantics
  • Less precise, but simple sufficient for most
    apps
  • Cannot be used to compose queries
  • Simple user interface

9
Possible Worlds Semantics Summary
  • Complete model Clean formal semantics for SQL
    queries
  • Not very useful as a representation or
    implementation tool
  • HUGE number of possible worlds!
  • Need more effective representation formalisms
  • Something that users can understand/explore
  • Allow more efficient query execution
  • Avoid possible worlds explosion
  • Perhaps giving up completeness

10
Representation Formalisms
  • ProblemNeed a good representation formalism
  • Will be interpreted as possible worlds
  • Several formalisms exists, but no winner

Main open problem in probabilistic db
11
Evaluation of Formalisms
  • Completeness?
  • What possible worlds can it represent?
  • What probability distributions on worlds?
  • Closure?
  • Is it closed under evaluation of query operators?

12
Outline
  • A complete formalism
  • Intensional Databases
  • Incomplete formalisms
  • Various expressibility/complexity tradeoffs
  • Focus on Explicit Independent Tuples

13
Intensional Database
FuhrRoelleke1997
Atomic event ids
e1, e2, e3,
Probabilities
p1, p2, p3, 2 0,1
Event expressions Æ, Ç,
e3 Æ (e5 Ç e2)
Intensional probabilistic database J each
tuple t has an event attribute t.E
14
Intensional DB ) Possible Worlds
J
Ip

15
Possible Worlds ) Intensional DB
p1
p2
J
Ip
p3
p4
Intensional DBs are complete
16
Closure Under Operators
FuhrRoelleke1997
P
-
s

One still needs to compute probability of event
expression
17
Summary on Intensional Databases
  • Event expression for each tuple
  • Possible worlds any subset
  • Probability distribution any
  • Complete but impractical
  • Evaluate the probability of long event
    expressions
  • Important abstraction consider restrictions
  • Related to c-tables

ImilelinskiLipski1984
18
Restricted Formalisms
  • Explicit tuples
  • Have a tuple template for every tuple that may
    appear in a possible world
  • Focus on the case of independent tuple events

19
Explicit Independent Tuples
tuple independent event
Atomic, distinct. May use TIDs.
Can be easily extended to capture attribute-value
uncertainty
20
Explicit Independent Tuples
Tuple independent probabilistic database
Pr(I) Õt 2 I pr(t) Õt Ï I (1-pr(t))
21
Tuple Prob. ) Possible Worlds
E size(Ip) 2.3 tuples
å 1
J
Ip

22
Tuple-Independent DBs are Incomplete
p1
  • Very limited cannot capture correlations across
    tuples
  • Not Closed
  • Query operators can introduce complex
    correlations!

p1p2
Ip

1-p1 - p1p2
23
Tuple Prob. ) Query Evaluation
SELECT DISTINCT x.city FROM Person x, Purchase
y WHERE x.Name y.Customer and
y.Product Gadget
1-(1-q2)(1-q3)
p1( )
1- (1- ) (1 -
)
p2( )
1-(1-q5)(1-q6)
p3 q7
24
Application 1 Similarity Predicates
Step 1evaluate predicates
SELECT DISTINCT x.city FROM Person x, Purchase
y WHERE x.Name y.Cust and y.Product
Gadget and x.profession scientist
and y.category music
25
Application 1 Similarity Predicates
Step 1evaluate predicates
SELECT DISTINCT x.city FROM Personp x, Purchasep
y WHERE x.Name y.Cust and y.Product
Gadget and x.profession scientist
and y.category music
Step 2evaluate restof query
26
Summary on Explicit Independent Tuples
  • Independent tuples
  • Possible worlds subsets
  • Probability distribution restricted
  • Closure no
Write a Comment
User Comments (0)
About PowerShow.com