Probabilistic Skylines on Uncertain Data VLDB2007 Jian Pei et al PowerPoint PPT Presentation

presentation player overlay
1 / 58
About This Presentation
Transcript and Presenter's Notes

Title: Probabilistic Skylines on Uncertain Data VLDB2007 Jian Pei et al


1
Probabilistic Skylines on Uncertain
Data(VLDB2007) Jian Pei et al
  • Supervisor Dr Benjamin Kao
  • Presenter For
  • Date 22 Feb 2008

?? the possible world concept
2
Outline
  • Motivation
  • Traditional and Probabilistic Skyline
  • Problem Definition
  • Computation Problem and Algorithms (Top down and
    Bottom up)
  • Experimental Results

3
MotivationSkyline Analysis on NBA players
performance
Each Player has multiple records
First read the topic and then the subtopic to
let others know what you are doing
Define skyline explanation of the graph, the
larger the better
instance e dominate b,d,c
4
MotivationSkyline Analysis on NBA players with
multiple records
5
MotivationSkyline Analysis on NBA players with
multiple records
  • Easy Approach Averaging
  • Arbor (x) is better in assist than Eddy, but Eddy
    (point b) dominates all games of Arbor (x).
  • Bob (point a) bias the aggregate value

not so fair to say Eddy is a worse in assist
than Arbor
not so fair to Bob to be severely affected by
only a game
Complete-Miss need a new graph
6
MotivationMotivating result using Probabilistic
Skyline
  • Olajuwon and Kobe Bryant are missing from
    Aggregate Skyline but present in Probabilistic
    Skyline
  • Their performance vary a lot over games
  • Details in experiment analysis

Completed (Miss Pictures of them)
7
Traditional and Probabilistic SkylineSemantics
difference of Dominance between objects
Certain Data
Uncertain Data
  • Dominance
  • Certain model an object dominate another object
    with Probability 1.
  • Uncertain model an object dominate another
    object with Probability P.
  • g

Assume smaller the value, the better
Miss A flash showing the calculation will be
better
8
Traditional and Probabilistic SkylineSemantics
difference of Dominance between objects
Uncertain Data
Certain Data
  • Dominance
  • Certain model an object dominate another object
    with Probability 1.
  • Uncertain model an object dominate another
    object with Probability P.
  • g

Assume smaller the value, the better
Miss A flash showing the calculation will be
better
9
Traditional and Probabilistic SkylineSemantics
difference of Dominance between objects
Certain Data
Uncertain Data
  • Dominance
  • Certain model an object dominate another object
    with Probability 1.
  • Uncertain model an object dominate another
    object with Probability P.
  • g

Assume smaller the value, the better
Miss A flash showing the calculation will be
better
Consider object d
10
Traditional and Probabilistic SkylineSemantics
difference of Dominance between objects
Certain Data
Uncertain Data
  • Dominance
  • Certain model an object dominate another object
    with Probability 1.
  • Uncertain model an object dominate another
    object with Probability P.
  • g

Assume smaller the value, the better
Miss A flash showing the calculation will be
better
11
Traditional and Probabilistic SkylineSemantics
difference of Dominance between objects
Certain Data
Uncertain Data
  • Dominance
  • Certain model an object dominate another object
    with Probability 1.
  • Uncertain model an object dominate another
    object with Probability P.
  • g

Assume smaller the value, the better
CompletedMiss A flash showing the calculation
will be better
12
Probabilistic SkylineCalculation of Probability
Object A dominating Object C
Pr A?C
1/41/3 (4..)
For easier illustration, discrete case are used
Explanation of Symbols
Miss Need a flash to demonstrate the calculation
of Dominance
13
Probabilistic SkylineCalculation of Probability
Object A dominates Object B
Pr A?C
1/41/3 (44..)
For easier illustration, discrete case are used
Explanation of Symbols
Miss Need a flash to demonstrate the calculation
of Dominance
14
Probabilistic SkylineCalculation of Probability
Object A dominates Object B
Pr A?C
1/41/3 (440)
2/3
For easier illustration, discrete case are used
Explanation of Symbols
CompletedMiss Need a flash to demonstrate the
calculation of Dominance
15
Probabilistic SkylineProbabilistic Skyline From
Dominance to Skyline
  • Intuition of finding Skyline, probability of an
    object not to be dominated by other objects
  • We need a new measure ..

OKMiss using flash to do the grouping of object
A,B,C
OKPlease change the equation of 0 (1/3)(1/3)
16
Probabilistic SkylineProbabilistic Skyline Idea
  • Intuition
  • 1) we know the dominance definition
  • 2) skyline not dominated by other objects

Miss not dominated demonstration of Object A,B
Consider Object A, instance by instance
17
Probabilistic SkylineProbabilistic Skyline Idea
  • Intuition
  • 1) we know the dominance definition
  • 2) skyline not dominated by other objects

Miss not dominated demonstration of Object
A,B we see that instance of Object A is not
dominated by instances of other objects
18
Probabilistic SkylineProbabilistic Skyline Idea
  • Intuition
  • 1) we know the dominance definition
  • 2) skyline not dominated by other objects

Miss not dominated demonstration of Object A,B
19
Probabilistic SkylineProbabilistic Skyline Idea
  • Intuition
  • 1) we know the dominance definition
  • 2) skyline not dominated by other objects

Miss not dominated demonstration of Object A,B
20
Probabilistic SkylineProbabilistic Skyline Idea
  • Intuition
  • Not dominated by other instances of objects,
    Probability of object A being dominated is 0.
    Probability skyline of object A is therefore 1.

OKMiss not dominated demonstration of Object
A,B
21
Probabilistic SkylineCalculation of
Probabilistic Skyline
Pr (D) ?
Miss another flash to show the calculation of
Skyline Probability of an 7/12
?? where to explain the consequence of an
instance dorminated by an object
22
Probabilistic SkylineCalculation of
Probabilistic Skyline
Pr (D) ?
Pr(d1) (1-1/4)
Miss another flash to show the calculation of
Skyline Probability of an 7/12
?? where to explain the consequence of an
instance dorminated by an object
23
Probabilistic SkylineCalculation of
Probabilistic Skyline
Pr (D) ?
Pr(d1) (1-1/4)
Pr(d2) (1-1/4) (1-2/3)
Miss another flash to show the calculation of
Skyline Probability of an 7/12
?? where to explain the consequence of an
instance dorminated by an object
24
Probabilistic SkylineCalculation of
Probabilistic Skyline
Pr (D) ?
Pr(d1) (1-1/4)
Pr(d2) (1-1/4) (1-2/3)
Pr(d3) (1-1/4)
P(D) 1/3(3/41/43/4)
7/12
OK-Miss another flash to show the calculation of
Skyline Probability of an 7/12
?? where to explain the consequence of an
instance dorminated by an object
25
Probabilistic SkylineThe p-skyline
  • 1-skyline
  • A,B
  • 7/12 skyline
  • A,B,D

If you have time, use the formula to find Object
c probability as well
26
Problem Definition
  • Given a set of uncertain objects S and a
    probability threshold p (0 p 1), the problem
    of probabilistic skyline computation is to
    compute the p-skyline on S.
  • 1-skyline
  • A,B
  • 7/12 skyline
  • A,B,D

27
Computation Problem of p-skyline
  • First, each uncertain object may have many
    instances. We have to process a large number of
    instances.
  • Second, we have to consider many probabilities in
    deriving the probabilistic skylines.

28
Algorithms (Top down and Bottom up)
  • Data
  • Multiple records of objects in the hope of
    approximating the probability density function
  • Techniques
  • Bounding
  • Pruning
  • Refining

The whole algorithms are very detailed,
technique authors use to efficient pruning will
be discussed
Assumption the smaller the value, the better
Please tell the audience clearly what is the
data being processed
29
Bottom-up AlgorithmTechnique Minimum Bounding
Box (MBB)
OKMiss flash drawing the bounding box of object
D and demonstrate the two property
30
Bottom-up Algorithm - Pruning Techniques (1/3)
using Umin, Umax to decide membership of
p-skyline
  • For an uncertain object U and probability
    threshold p, if Pr(Umin) the p-skyline. If Pr(Umax) p, then U is in the
    p-skyline

OKMiss Flash use figure 3 to illustrate
31
Bottom-up Algorithm - Pruning Techniques (2/3)
using Umax to prune instances of objects
  • Let U and V be uncertain objects such that U V
    . If u is an instance of U and Vmax ? u, then
    Pr(u) 0.

C2 is dominated by Umax, dominated by all
instances in object D
Pr(c2) (1 3/3)(..)(..)
0
OKMiss Flash use equation ()()() to illustrate
32
Bottom-up Algorithm - Pruning Techniques (3/3)
using subset of instance to prune objects
Estimate Pr(Vmin) upper bound by Pr(Umax)
Pr(Vmin) (1 U/U)(..)(..)
If U is large, more instances dominate Vmin,
then Pr(Vmin) is low
? How to say better
OK Better to use Flash illustration
You can take min cPr(u) for easy
understanding
to estimate the upper bound of Vmin using U
max assume all points of U appear only in U and
green region, such that Vmin is dorminated by
less objects
33
Bottom-up Algorithm - Pruning Techniques (3/3)
using subset of instance to prune objects
  • Special Case
  • As a special case, if there exists an instance u
    ? U such that Pr(u)
  • Very useful an uncertain object partially
    computed can be used to prune other objects

34
Bottom-up Algorithmsimplified version of
bottom-up algorithm
Input instances of objects and their Umin
  • If (u is dominated by another object)
  • prune u //c2 is dominated by D
  • end if
  • If (u is Umin)
  • compute Pr (Umin)
  • if (Pr(Umin)
  • prune u //Umin
  • end if
  • end if
  • Use Pr(u) to update Pr(U)s upper and lower bound
  • Decide membership of p-skyline of U
  • prune other objects // check with other Umins
  • End if

Miss Pictures of illustration
all instances of uncertain object are put into a
list as well as the Umin
35
Top-down AlgorithmDifference between top down
and bottom up algorithm
  • Bottom up
  • Start with single instance of an uncertain object
  • Top down
  • Start with the whole sets of instances of an
    uncertain object

36
Top-down AlgorithmIdea of bounding
  • The skyline probability of each subset of
    uncertain object can be bounded using its MBB.
  • The skyline probability of the uncertain object
    can be bounded as the weighted mean of the bounds
    of subsets.

Miss if possible draw a graph with 4 squares
inside it to replace the upper one
37
Top-down Algorithmsupporting data structure
partition tree
D
B
C
A
D
B
C
A
B
D
A
C
Miss the look of partition tree, with 2 dimension
Miss Mark the level of partition tree, 0,1,2 etc
for simplicity, a 2d tree will be used to
illustrate the concept for easy understanding
38
Top-down Algorithmpartition tree for bounding
D
B
D
B
C
A
C
A
B
D
B
D
A
C
A
C
B
D
B
D
A
C
A
C
  • Compare the partition of U with other partition
    tree as follows traverse the partition tree of
    other uncertain object V, in the depth-first
    manner.

wording needed to be changed if possible
dominating object is mentioned
?? Adding possible dominating object before
discussing the algorithms
39
Top-down Algorithmall possible situations during
partition trees traversal
B
D
B
D
A
C
A
C
B
D
B
D
A
C
A
C
B
D
B
D
A
C
A
C
40
Top-down Algorithmsituations 1/3 during
partition tree traversal for bounding calculation
B
D
B
D
A
C
A
C
B
D
B
D
A
C
A
C
B
D
B
D
A
C
A
C
41
Top-down Algorithmsituations 2/3 during
partition tree traversal for bounding calculation
B
D
B
D
A
C
A
C
B
D
B
D
A
C
A
C
B
D
B
D
A
C
A
C
(Place the two trees here, it is better to use
subtree starting at level 1)
42
Top-down Algorithmsituations 3/3 during
partition tree traversal for bounding calculation
B
D
B
D
A
C
A
C
B
D
B
D
A
C
A
C
B
D
B
D
A
C
A
C
Estimate lower bound
Estimate upper bound
(Place the two trees here, it is better to use
subtree starting at level 1)
43
Top-down AlgorithmPruning partition tree 1/3
B
D
B
D
A
C
A
C
B
D
B
D
A
C
A
C
B
D
B
D
A
C
A
C
  • compare ABCD with B

(better to put a tree here)
44
Top-down AlgorithmPruning partition tree 2/3
B
D
B
D
A
C
A
C
B
D
B
D
A
C
A
C
B
D
B
D
A
C
A
C
(better to put a tree here)
45
Top-down AlgorithmPruning partition tree 3/3
B
D
A
C
B
D
A
C
B
D
A
C
46
ExperimentData and Experiment
  • Experiment aggregate skyline and probabilistic
    skyline (0.1-skyline)
  • Data Set NBA players performance record(339,721)
  • Attributes points, assists, rebounds

47
ExperimentResults
  • 1) Top 12 players in probabilistic skyline also
    appear aggregate skyline
  • 2) Players like (Olajuwon and Kobe Bryant) appear
    only in probabilistic skyline but not aggregate
    skyline.
  • 3) Disagreement between probabilistic skyline and
    aggregate skyline. Player A dominate B in
    aggregate skyline but reverse in probabilistic
    skyline

48
Experiment
49
ExperimentResults Analysis
  • 2) Players like (Olajuwon and Kobe Bryant) appear
    only in probabilistic skyline but not aggregate
    skyline.
  • Finding
  • Comparing to the aggregate skyline, the
    probabilistic skyline finds not only players
    consistently performing well, but also
    outstanding players with large variances in
    performance

50
ExperimentResults Analysis
  • 3) Disagreement between probabilistic skyline and
    aggregate skyline. Ewing(0.13577) has a higher
    skyline probability than Brand(0.10966), though
    Ewing is dominated by Brand in the aggregate data
    set
  • Finding
  • Ewing play very well in few games
  • probabilistic skylines disclose interesting
    knowledge about uncertain data which cannot be
    captured by traditional skyline analysis.
  • Ranking can be performed on Probabilistic
    Skyline, which can not be done on aggregate
    skyline

51
ExperimentResults Analysis
52
Other ExperimentsSynthesis data set
  • Data
  • Synthesis data sets where instances of objects
    are generated in anti-correlated, independent,
    and correlated distributions

53
Other Experiment resultsEffect of probability
threshold to size of skyline
54
Other Experiment resultsEffect of dimensionality
to size of skyline
55
Other Experiment resultsEffect of cardinality
(instance) to size of skyline
56
Other Experiment resultsScalability with respect
to probability threshold
57
Other Experiment results
  • Compare Top-Down and Bottom-Up with
    dimensionality and cardinality

58
The End
Write a Comment
User Comments (0)
About PowerShow.com