Mining%20Favorable%20Facets - PowerPoint PPT Presentation

About This Presentation
Title:

Mining%20Favorable%20Facets

Description:

Suppose we want to look for a vacation package. Suppose we compare package a and package b ... we want to look for a vacation package. We want to have a ... – PowerPoint PPT presentation

Number of Views:17
Avg rating:3.0/5.0
Slides: 27
Provided by: raym168
Category:

less

Transcript and Presenter's Notes

Title: Mining%20Favorable%20Facets


1
Mining Favorable Facets
  • Raymond Chi-Wing Wong (the Chinese University of
    Hong Kong)
  • Jian Pei (Simon Fraser University)
  • Ada Wai-Chee Fu (the Chinese University of Hong
    Kong)
  • Ke Wang (Simon Fraser University)

Prepared by Raymond Chi-Wing Wong Presented by
Raymond Chi-Wing Wong
2
Outline
  1. Introduction
  2. Skyline
  3. Algorithm
  4. Empirical Study
  5. Conclusion

3
1. Introduction
Package a dominates package b
Thus, we do not need to consider package b.
Suppose we want to look for a vacation package
3 packages
We want to have a cheaper package.
We want to have a higher hotel-class.
Package ID Price Hotel-class
a 1000 4
b 2400 1
c 3000 5
Suppose we compare package a and package b
  • We know that package a is better
  • than package b
  • because
  • Price of package a is smaller
  • Hotel-class of package a is higher

4
1. Introduction
Package a dominates package b
Thus, we do not need to consider package b.
Suppose we want to look for a vacation package
3 packages
We want to have a cheaper package.
We want to have a higher hotel-class.
Package ID Price Hotel-class
a 1000 4
b 2400 1
c 3000 5
Suppose we compare package a and package b
  • We know that package a is better
  • than package b
  • because
  • Price of package a is smaller
  • Hotel-class of package a is higher

5
1. Introduction
Package a dominates package b
Thus, package a and package c are all of the
best possible choices.
We call that package a and package c are skyline
points.
Points are not dominated by any other points
Package a is NOT dominated by any other packages.
Suppose we want to look for a vacation package
Package c is NOT dominated by any other packages.
3 packages
We want to have a cheaper package.
We want to have a higher hotel-class.
Package ID Price Hotel-class
a 1000 4
b 2400 1
c 3000 5
Suppose we compare package a and package c
  • We know that
  • Package a has a cheaper price
  • Package c has a higher hotel-class
  • We cannot determine
  • whether package a is better than package c(i.e.,
    package a dominates package c)
  • whether package c is better than package a(i.e.,
    package c dominates package a)

6
1. Introduction
Suppose a customer have the following
preferences. H lt T lt M
Suppose another customer have the following
preferences. H lt M lt T
The skyline points are packages a and c.
The skyline points are packages a, c and e.
In other words, different preferences give
different skyline points.
Suppose we want to look for a vacation package
6 packages
We want to have a cheaper package.
We want to have a higher hotel-class.
Package ID Price Hotel-class Hotel-group
a 1000 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
How about this one?
Different customers may have different
preferences on Hotel-group.
7
1. Introduction
Suppose a customer have the following
preferences. H lt T lt M
Suppose another customer have the following
preferences. H lt M lt T
The skyline points are packages a and c.
The skyline points are packages a, c and e.
In other words, different preferences give
different skyline points.
Suppose we want to look for a vacation package
6 packages
Package ID Price Hotel-class Hotel-group
a 1000 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
Suppose hotel-group Mozilla wants to promote its
own packages (e.g., package f) to potential
customers.
8
1. Introduction
Customer Preference on Hotel-group Skyline






What preferences make package f a skyline point?
Alice
T lt M
a, c
Preferences
Bob
No special preference
a, c, e, f
No special preference
a, c, e
Chris
H lt M
M lt T

a, c, e
David
H lt M lt T
Emily
H lt T lt M
a, c
a, c, e, f
Fred
M lt T
Package ID Price Hotel-class Hotel-group
a 1000 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
Suppose hotel-group Mozilla wants to promote its
own packages (e.g., package f) to potential
customers.
Bob and Fred are the potential customers.
9
1. Introduction
Customer Preference on Hotel-group Skyline






What preferences make package e a skyline point?
Alice
T lt M
a, c
Preferences
Bob
No special preference
a, c, e, f
No special preference
H lt M
a, c, e
Chris
H lt M
H lt M lt T
a, c, e
David
H lt M lt T
M lt T
Emily
H lt T lt M
a, c

a, c, e, f
Fred
M lt T
Package ID Price Hotel-class Hotel-group
a 1000 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
Suppose hotel-group Mozilla wants to promote its
own packages (e.g., package e) to potential
customers.
Bob, Chris, David and Fred are the potential
customers.
Problem Given a package, we want to find what
preferences or conditions that this package is a
skyline point?
Favorable facets
10
1. Introduction
Package ID Price Hotel-class Hotel-group
a 1000 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
Problem Given a package, we want to find what
preferences or conditions that this package is a
skyline point?
Favorable facets
11
1. Introduction
Package ID Price Hotel-class Hotel-group
a 1000 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
Problem Given a package, we want to find what
preferences or conditions that this package is a
skyline point?
Favorable facets
12
1. Introduction
Package ID Price Hotel-class Hotel-group
a 1000 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
Problem Given a package, we want to find what
preferences or favorable facets that this package
is a skyline point?
We can solve the problem by a naive method
Lattice Search

SKYa, c, e, f
SKYa,c
SKYa,c,e
SKYa,c,e,f
SKYa,c,e,f
SKYa,c,e,f
SKYa,c,e,f
SKY
13
1. Introduction
Package ID Price Hotel-class Hotel-group
a 1000 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
Problem Given a package, we want to find what
preferences or favorable facets that this package
is a skyline point?
We can solve the problem by a naive method
Lattice Search
Consider package f
Preferences

, T lt H
, H lt T
, M lt T

, T lt H, M lt H
SKYa, c, e, f
SKYa,c
SKYa,c,e
SKYa,c,e,f
SKYa,c,e,f
SKYa,c,e,f
SKYa,c,e,f
This approach has two disadvantages. 1.
Computation is costly.
We need to compute all skyline points for each
possible preference
2. It is difficult to interpret the results.
There are many preferences which qualify package
f as a skyline point
SKY
14
1. Introduction
Package ID Price Hotel-class Hotel-group
a 1000 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
Problem Given a package, we want to find what
preferences or favorable facets that this package
is a skyline point?
We can solve the problem by a naive method
Lattice Search
Consider package f
We find that whenever the preference
contains T lt M or H lt M, package f is not
a skyline point.

border for f
SKYa, c, e, f
SKYa,c
SKYa,c,e
SKYa,c,e,f
SKYa,c,e,f
SKYa,c,e,f
SKYa,c,e,f
Skyline point
Not skyline point
SKY
15
1. Introduction
Package ID Price Hotel-class Hotel-group
a 1000 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
Problem Given a package, we want to find what
minimal conditions that this package is NOT a
skyline point?
Problem Given a package, we want to find what
preferences or favorable facets that this package
is a skyline point?
We can solve the problem by a naive method
Lattice Search
Consider package f
We find that whenever the preference
contains T lt M or H lt M, package f is not
a skyline point.

border for f
SKYa, c, e, f
SKYa,c
SKYa,c,e
SKYa,c,e,f
SKYa,c,e,f
SKYa,c,e,f
SKYa,c,e,f
We can say that T lt M or H lt M is a minimal
disqualifying condition (MDC).
Skyline point
Not skyline point
SKY
16
3. Algorithm
Problem Given a package, we want to find what
minimal conditions that this package is NOT a
skyline point?
  • How to find MDCs of a point?

17
3. Algorithm
Package ID Price Hotel-class Hotel-group
a 1000 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
Point q is said to quasi-dominate point p if all
attributes of point q are NOT worse than those
of point p.
e.g. Package a quasi-dominates package f
because
1. Package a has a lower (or better) price than
package f
2. Package a has a higher (or better) hotel-class
than package f
If package a quasi-dominates package f, we
define Ra?f as follows. T lt M
18
3. Algorithm
Problem Given a package, we want to find what
minimal conditions that this package is NOT a
skyline point?
  • Two Algorithms
  • MDC-O Computing MDC On-the-fly
  • Does not store MDCs of points
  • Compute MDC of a given points on-the-fly
  • MDC-M A Materialization Method
  • Store MDCs of all points
  • Indexing Method for Speed-up
  • R-tree

19
3.1 MDC-O Computing MDC On-the-fly
Problem Given a package, we want to find what
minimal conditions that this package is NOT a
skyline point?
  • On-the-fly Algorithm
  • Given
  • data point p
  • Variable
  • MDC(p) minimal disqualifying condition
  • Algorithm
  • MDC(p) ? ??
  • For each data point q which quasi-dominates p
  • if MDC(p) does not contain Rq?p
  • insert Rq?p to MDC(p)
  • Return MDC(p)

20
3.2 MDC-M A Materialization Method
Problem Given a package, we want to find what
minimal conditions that this package is NOT a
skyline point?
  • Materialization Algorithm
  • Variable
  • MDC(p) minimal disqualifying condition
  • Algorithm MDC(p) ? ??
  • For each data point p
  • For each data point q which quasi-dominates p
  • if MDC(p) does not contain Rq?pthen insert Rq?p
    to MDC(p)
  • Store MDC(p)
  • Query Algorithm
  • Given
  • A data point p
  • Algorithm
  • Return MDC(p)

21
4. Empirical Study
  • Datasets
  • Synthetic Dataset
  • Real Dataset (from UCI)
  • Nursery Dataset
  • Automobile Dataset
  • Default Values (Synthetic)
  • No. of tuples 500K
  • No. of numeric dimensions 3
  • No. of categorical dimensions 1
  • No. of values in a nominal dimension 20

22
4. Empirical Study
Without indexing MDC-O Slowest Search Time
MDC-M Faster Search Time
Storage of MDC 8MB With indexing MDC-O and
MDC-M Fast Search Time
23
4. Empirical Study
A salesperson should NOT promote this car to the
customer who prefers Toyota to Honda.
  • Automobile
  • Three car models

A salesperson should promote this car to the
customer who prefers Mitsubishi to others.
Car MDC
Honda Toyota lt Honda
Mitsubishi Honda lt Mitsubishi or Toyota lt Mitsubishi
Toyota -
A salesperson should promote this car to ANY
customers.
24
5. Conclusion
  • Skyline
  • Favorable Facets
  • Minimal Disqualifying Condition
  • Algorithm
  • On-the-fly
  • Materialization
  • Empirical Study

25
QA
  • Poster Board
  • Title Mining Favorable Facets
  • Date Monday, 13th August
  • Place Poster board carrying number 31

26
3.3 Speedup
a better value
All points (e.g., point q) in this region
quasi-dominate point p
p
q
0
a better value
  • Build an R-tree based on the totally-ordered
    attributes
  • For each point p,
  • MDC(p) ? ?
  • Perform a range search
  • from 0 to the value of dimension D of p for each
    dimension D
  • For each point q found in the range search
  • insert Rq?p into MDC(p)
Write a Comment
User Comments (0)
About PowerShow.com