Hierarchies in Data Mining - PowerPoint PPT Presentation

1 / 69
About This Presentation
Title:

Hierarchies in Data Mining

Description:

We propose desiderata that enable appropriate definition of query semantics for imprecise data ... Desideratum I: Consistency ... Desideratum II: Faithfulness ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 70
Provided by: carbonVide1
Category:

less

Transcript and Presenter's Notes

Title: Hierarchies in Data Mining


1
Hierarchies in Data Mining
  • Raghu Ramakrishnan
  • ramakris_at_yahoo-inc.com
  • Chief Scientist for Audience and Cloud Computing
  • Yahoo!

2
About this Talk
  • Common thememultidimensional view of data
  • Reveals patterns that emerge at coarser
    granularity
  • Widely recognized, e.g., generalized association
    rules
  • Helps handle imprecision
  • Analyzing imprecise and aggregated data
  • Helps handle data sparsity
  • Even with massive datasets, sparsity is a
    challenge!
  • Defines candidate space of subsets for
    exploratory mining
  • Forecasting query results over future data
  • Using predictive models as summaries
  • Potentially, space of mining experiments?

3
Driving Applications
  • Business Intelligence of combined text and
    relational data (Joint with IBM)
  • Burdick, Deshpande, Jayram, Vaithyanathan
  • Analyzing mass spectra from ATOFMS (NSF ITR
    project with environmental chemists at UW and
    Carleton College)
  • Chen, Chen, Huang, Musicant, Grossman, Schauer
  • Goal-oriented anonymization of cancer data (NSF
    CyberTrust project)
  • Chen, LeFevre, DeWitt, Shavlik, Hanrahan (Chief
    Epidemiologist, Wisconsin), Trentham-Dietz
  • Analyzing network traffic data
  • Chen, Yegneswaran, Barford
  • Content optimization and ad serving
  • Many people at Yahoo!

4
Background The Multidimensional Data ModelCube
Space
5
Star Schema
TIME timeid date week year
SERVICE pid timeid locid repair
PRODUCT pid pname Category Model
LOCATION locid country region state
FACT TABLE
DIMENSION TABLES
6
Dimension Hierarchies
  • For each dimension, the set of values can be
    organized in a hierarchy

PRODUCT
TIME
LOCATION
year
automobile quarter
country
category week month
region
model date
state
7
Multidimensional Data Model
  • One fact table D(X,M)
  • XX1, X2, ... Dimension attributes
  • MM1, M2, Measure attributes
  • Domain hierarchy for each dimension attribute
  • Collection of domains Hier(Xi) (Di(1),...,
    Di(k))
  • The extended domain EXi ?1kt DXi(k)
  • Value mapping function ?D1?D2(x)
  • e.g., ?month?year(12/2005) 2005
  • Form the value hierarchy graph
  • Stored as dimension table attribute (e.g., week
    for a time value) or conversion functions (e.g.,
    month, quarter)

8
Multidimensional Data
Automobile
3
2
1
3
ALL
ALL
2
Category
Truck
Sedan
State
ALL
Region
DIMENSION ATTRIBUTES
1
Model
Civic
Sierra
F150
Camry
p3
p4
MA
East
NY
p1
p2
ALL
LOCATION
TX
West
CA
9
Cube Space
  • Cube space C EX1?EX2??EXd
  • Region Hyper rectangle in cube space
  • c (v1,v2,,vd) , vi ? EXi
  • E.g., c1 (NY, Camry) c2 (West, Sedan)
  • Region granularity
  • gran(c) (d1, d2, ..., dd), di Domain(c.vi)
  • E.g., gran(c1) (State, Model) gran(c2)
    (State, Category)
  • Region coverage
  • coverage(c) all facts in c
  • Region set All regions with same granularity

10
OLAP Over Imprecise Datawith Doug Burdick,
Prasad Deshpande, T.S. Jayram, and Shiv
VaithyanathanIn VLDB 05, 06 joint work with IBM
Almaden
11
Imprecise Data
Automobile
3
2
1
3
ALL
ALL
2
Category
Truck
Sedan
State
ALL
Region
1
Model
Civic
Sierra
F150
Camry
p3
p4
MA
p5
East
NY
p1
p2
ALL
LOCATION
TX
West
CA
12
Querying Imprecise Facts
Auto F150 Loc MA SUM(Repair) ???
How do we treat p5?
Truck
Sierra
F150
p5
p4
MA
p3
East
NY
p1
p2
13
Allocation (1)
Truck
p5
MA
p3
p4
East
NY
p1
p2
14
Allocation (2)
  • (Huh? Why 0.5 / 0.5?
  • - Hold on to that thought)

Truck
p5
p5
MA
p3
p4
East
NY
p1
p2
15
Allocation (3)
Auto F150 Loc MA SUM(Repair) 150
Query the Extended Data Model!
Truck
p5
p5
MA
p3
p4
East
NY
p1
p2
16
Allocation Policies
  • Procedure for assigning allocation weights is
    referred to as an allocation policy
  • Each allocation policy uses different information
    to assign allocation weight
  • Key contributions
  • Appropriate characterization of the large space
    of allocation policies (VLDB 05)
  • Designing efficient algorithms for allocation
    policies that take into account the correlations
    in the data (VLDB 06)

17
Motivating Example
Query COUNT
Truck
Sierra
F150
  • We propose desiderata that enable appropriate
    definition of query semantics for imprecise data

MA
p5
East
NY
18
Desideratum I Consistency
  • Consistency specifies the relationship between
    answers to related queries on a fixed data set

Truck
Sierra
F150
p3
MA
p5
East
NY
p1
p2
19
Desideratum II Faithfulness
Data Set 1
Data Set 2
Data Set 3
Sierra
F150
MA
NY
  • Faithfulness specifies the relationship between
    answers to a fixed query on related data sets

20
Imprecise facts lead to many possible
worlds Kripke63,
p1
p2
p3
p5
w1
p4
w4
w2
w3
p2
p1
p5
p4
p4
p5
p3
p3
p2
p2
p1
p1
21
Query Semantics
  • Given all possible worlds together with their
    probabilities, queries are easily answered using
    expected values
  • But number of possible worlds is exponential!
  • Allocation gives facts weighted assignments to
    possible completions, leading to an extended
    version of the data
  • Size increase is linear in number of (completions
    of) imprecise facts
  • Queries operate over this extended version

22
Storing Allocations using the Extended Data Model
Truck
p5
p4
p3
East
p1
p2
23
Allocation Policy Count
Truck
Sierra
F150
p5
p5
MA
p3
p4
p6
c2
c1
East
p1
p2
NY
24
Allocation Policy Measure
Truck
Sierra
F150
p5
p5
MA
p3
p4
p6
c2
c1
East
p1
p2
NY
25
Allocation Policy Template
26
Allocation Graph
27
Example Processing of Allocation Graph
Precise Cells
1) Compute Qsum(r)
Cell(MA,Civic)
Cell(NY,F150)
2) Compute pc,r
Cell(NY,Sierra)
2 / 3
2
Cell(MA,F150)
Imprecise Facts
3
ltMA,Truckgt
1
Cell(MA,Sierra)
1 / 3
28
Processing Allocation Graph
  • What if precise cells and imprecise facts do not
    fit into memory?
  • Need to scan precise cells twice for each
    imprecise fact
  • Identify groups of imprecise facts that can be
    processed in same scan
  • Algorithm will process these groups

ltMA,Sedangt
p6
Cell(MA,Civic)
c1
p7
ltMA,Truckgt
ltCA,ALLgt
p8
c2
Cell(MA,Sierra)
ltEast,Truckgt
p9
c3
Cell(NY,F150)
ltWest,Sedangt
p10
ltALL,Civicgt
p11
c4
Cell(CA,Civic)
ltALL,Sierragt
p12
ltWest,Civicgt
c5
p13
Cell(CA,Sierra)
ltWest,Sierragt
p14
29
Summary
  • Consistency and faithfulness
  • Desiderata for designing query semantics for
    imprecise data
  • Allocation is the key to our framework
  • Aggregation operators with appropriate guarantees
    of consistency and faithfulness
  • Efficient algorithms for allocation policies
  • Lots of recent work on uncertainty and
    probabilistic data processing
  • Sensor data, errors, Bayesian inference

VLDB 05 (semantics), 06 (implementation)
30
Dealing with Data Sparsity
  • Deepak Agarwal, Andrei Broder, Deepayan
    Chakrabarti, Dejan Diklic, Vanja Josifovski,
    Mayssam Sayyadian
  • Estimating Rates of Rare Events at Multiple
    Resolutions, KDD 2007

31
Motivating ApplicationContent Match Problem
  • Problem
  • Which ads are good on what pages
  • Pages no control Ads can control
  • First simplification
  • (Page, Ad) completely characterized by a set of
    high-dimensional features
  • Naïve Approach
  • Experiment with all possible pairs several times
    and estimate CTR.
  • Of course, this doesnt work
  • Most (ad, page) pairs have very few impressions,
    if any,
  • and even fewer clicks
  • Severe data sparsity

32
Estimation in the Tail
  • Use an existing, well-understood hierarchy
  • Categorize ads and webpages to leaves of the
    hierarchy
  • CTR estimates of siblings are correlated
  • The hierarchy allows us to aggregate data
  • Coarser resolutions
  • provide reliable estimates for rare events
  • which then influences estimation at finer
    resolutions

Similar coarsening, different
motivation Mining Generalized Association
Rules Ramakrishnan Srikant, Rakesh Agrawal , VLDB
1995
33
System Overview
Retrospective dataURL, ad, isClicked
Crawl URLs
a sample of URLs
Classify pages and ads
Rare event estimation using hierarchy
Impute impressions, fix sampling bias
34
Sampling of Webpages
  • Naïve strategy sample at random from the set of
    URLs
  • Sampling errors in impression volume AND click
    volume
  • Instead, we propose
  • Crawling all URLs with at least one click, and
  • a sample of the remaining URLs
  • Variability is only in impression volume

35
Imputation of Impression Volume
Z(0)
  • Region node (page node, ad node)
  • Build a Region Hierarchy
  • A cross-product of the page hierarchy and the ad
    hierarchy

Z(i)
Leaf Region
Page leaves
Ad leaves
Page hierarchy
Ad hierarchy
36
Exploiting Taxonomy Structure
  • Consider the bottom two levels of the taxonomy
  • Each cell corresponds to a (page, ad)-class pair
  • Key point Children under a parent node are
    alike
  • and expected to have similar CTRs
  • (i.e., form a cohesive block)

37
Imputation of Impression Volume
For any level Z(i)
impressions nij mij xij
sums to ?nij K.?mijrow constraint
sums toTotal impressions(known)
sums to impressions on ads of this ad
classcolumn constraint
38
Imputation of Impression Volume
sums to
block constraint
39
Imputing xij
  • Iterative Proportional Fitting Darroch/1972
  • Initialize xij nij mij
  • Top-down
  • Scale all xij in every block in Z(i1) to sum to
    its parent in Z(i)
  • Scale all xij in Z(i1) to sum to the row totals
  • Scale all xij in Z(i1) to sum to the column
    totals
  • Repeat for every level Z(i)
  • Bottom-up Similar

Z(i)
Z(i1)
block
Page classes
Ad classes
40
Imputation Summary
  • Given
  • nij (impressions in clicked pool)
  • mij (impressions in sampled non-clicked pool)
  • impressions on ads of each ad class in the ad
    hierarchy
  • We get
  • Estimated impression volume Ñij nij mij
    xijin each region ij of every level Z(.)

41
Dealing with Data Sparsity
  • Deepak Agarwal, Pradheep Elango, Nitin Motgi,
    Seung-Taek Park, Raghu Ramakrishnan, Scott Roy,
    Joe Zachariah
  • Real-time Content Optimization through Active
    User Feedback, NIPS 2008

42
Yahoo! Home Page Featured Box
  • It is the top-center part of the Y! Front Page
  • It has four tabs Featured, Entertainment,
    Sports, and Video

43
Novel Aspects
  • Classical Arms assumed fixed over time
  • We gain and lose arms over time
  • Some theoretical work by Whittle in 80s
    operations research
  • Classical Serving rule updated after each pull
  • We compute optimal design in batch mode
  • Classical Generally. CTR assumed stationary
  • We have highly dynamic, non-stationary CTRs

44
Bellwether AnalysisGlobal Aggregates from Local
Regionswith Beechung Chen, Jude Shavlik, and
Pradeep TammaIn VLDB 06
45
Motivating Example
  • A company wants to predict the first year
    worldwide profit of a new item (e.g., a new
    movie)
  • By looking at features and profits of previous
    (similar) movies, we predict expected total
    profit (1-year US sales) for new movie
  • Wait a year and write a query! If you cant wait,
    stay awake
  • The most predictive features may be based on
    sales data gathered by releasing the new movie in
    many regions (different locations over
    different time periods).
  • Example region-based features 1st week sales
    in Peoria, week-to-week sales growth in
    Wisconsin, etc.
  • Gathering this data has a cost (e.g., marketing
    expenses, waiting time)
  • Problem statement Find the most predictive
    region features that can be obtained within a
    given cost budget

46
Key Ideas
  • Large datasets are rarely labeled with the
    targets that we wish to learn to predict
  • But for the tasks we address, we can readily use
    OLAP queries to generate features (e.g., 1st week
    sales in Peoria) and even targets (e.g., profit)
    for mining
  • We use data-mining models as building blocks in
    the mining process, rather than thinking of them
    as the end result
  • The central problem is to find data subsets
    (bellwether regions) that lead to predictive
    features which can be gathered at low cost for a
    new case

47
Motivating Example
  • A company wants to predict the first years
    worldwide profit for a new item, by using its
    historical database
  • Database Schema
  • The combination of the underlined attributes
    forms a key

48
A Straightforward Approach
  • Build a regression model to predict item profit
  • There is much room for accuracy improvement!

By joining and aggregating tables in the
historical database we can create a training set
Item-table features
Target
An Example regression model Profit ?0 ?1
Laptop ?2 Desktop ?3 RdExpense
49
Using Regional Features
  • Example region 1st week, HK
  • Regional features
  • Regional Profit The 1st week profit in HK
  • Regional Ad Expense The 1st week ad expense in
    HK
  • A possibly more accurate model
  • Profit1yr, All ?0 ?1 Laptop ?2 Desktop
    ?3 RdExpense
  • ?4 Profit1wk, HK ?5
    AdExpense1wk, HK
  • Problem Which region should we use?
  • The smallest region that improves the accuracy
    the most
  • We give each candidate region a cost
  • The most cost-effective region is the
    bellwether region

50
Basic Bellwether Problem
51
Basic Bellwether Problem
Location domain hierarchy
  • Historical database DB
  • Training item set I
  • Candidate region set R
  • E.g., 1-n week, Location
  • Target generation query??i(DB) returns the
    target value of item i ? I
  • E.g., ??sum(Profit) ??i, 1-52, All ProfitTable
  • Feature generation query ?i,r(DB), i ? Ir and r
    ? R
  • Ir The set of items in region r
  • E.g., Categoryi, RdExpensei, Profiti, 1-n,
    Loc, AdExpensei, 1-n, Loc
  • Cost query ??r(DB), r ? R, the cost of
    collecting data from r
  • Predictive model hr(x), r ? R, trained on
    (?i,r(DB), ?i(DB)) i ? Ir
  • E.g., linear regression model

52
Basic Bellwether Problem
Features ?i,r(DB)
Target ?i(DB)
Aggregate over data records in region r 1-2,
USA
Total Profit in 1-52, All
r
  • For each region r, build a predictive model
    hr(x) and then choose bellwether region
  • Coverage(r)?? fraction of all items in region ?
    minimum coverage support
  • Cost(r, DB)?? cost threshold
  • Error(hr) is minimized

53
Experiment on a Mail Order Dataset
Error-vs-Budget Plot
  • Bel Err The error of the bellwether region found
    using a given budget
  • Avg Err The average error of all the cube
    regions with costs under a given budget
  • Smp Err The error of a set of randomly sampled
    (non-cube) regions with costs under a given budget

1-8 month, MD
(RMSE Root Mean Square Error)
54
Experiment on a Mail Order Dataset
Uniqueness Plot
  • Y-axis Fraction of regions that are as good as
    the bellwether region
  • The fraction of regions that satisfy the
    constraints and have errors within the 99
    confidence interval of the error of the
    bellwether region
  • We have 99 confidence that that 1-8 month, MD
    is a quite unusual bellwether region

1-8 month, MD
55
Basic Bellwether Computation
  • OLAP-style bellwether analysis
  • Candidate regions Regions in a data cube
  • Queries OLAP-style aggregate queries
  • E.g., Sum(Profit) over a region
  • Efficient computation
  • Use iceberg cube techniques to prune infeasible
    regions (Beyer-Ramakrishnan, ICDE 99
    Han-Pei-Dong-Wang SIGMOD 01)
  • Infeasible regions Regions with cost gt B or
    coverage lt C
  • Share computation by generating the features and
    target values for all the feasible regions all
    together
  • Exploit distributive and algebraic aggregate
    functions
  • Simultaneously generating all the features and
    target values reduces DB scans and repeated
    aggregate computation

56
Subset Bellwether Problem
57
Subset-Based Bellwether Prediction
  • Motivation Different subsets of items may have
    different bellwether regions
  • E.g., The bellwether region for laptops may be
    different from the bellwether region for clothes
  • Two approaches

Bellwether Cube
Bellwether Tree
RD Expenses
Category
58
Bellwether Tree
  • How to build a bellwether tree
  • Similar to regression tree construction
  • Starting from the root node, recursively split
    the current leaf node using the best split
    criterion
  • A split criterion partitions a set of items into
    disjoint subsets
  • Pick the split that reduces the error the most
  • Stop splitting when the number of items in the
    current leaf node falls under a threshold value
  • Prune the tree to avoid overfitting

1
2
7
3
4
8
9
5
6
59
Bellwether Tree
  • How to split a node
  • Split criterion
  • Numeric split Ak ? ?
  • Categorical split Ak
  • (Ak is an item-table feature)
  • Pick the best split criterion
  • Best split The split that can reduce the error
    the most

Find bellwether region for S h Bellwether model
for S
Find bellwether region for Sp hp Bellwether
model for Sp
Total parent error
Total child error
(S is the set of items at the parent node, and Sp
is the set of items at the pth child node)
60
Problem of Naïve Tree Construction
  • A naïve bellwether tree construction algorithm
    will scan the dataset n?m times
  • n is the number of nodes
  • m is the number of candidate split criteria
  • Idea Extending the RainForest framework Gehrke
    et al., 98

1
  • For each node
  • Try all candidate split criteria to find the
    best one
  • It needs to scan the dataset m times

2
7
3
4
8
9
5
6
61
Efficient Tree Construction
  • Idea Extending the RainForest framework Gehrke
    et al., 98
  • Build the tree level by level
  • Scan the entire dataset once per level and keep
    small sufficient statistics in memory (size
    O(n?s?c))
  • Sufficient Statistics for a split criterion
  • Sp and Error(hp Sp),
  • for p 1 to of children
  • Split all the nodes at that level
  • after the scan based on the sufficient
    statistics
  • Further improved by a hybrid algorithm

1st scan
1
2nd scan
2
3
3rd scan
4
5
6
7
4th scan
8
9
62
Bellwether Cube
RD Expenses
Category
Rollup
Drilldown
RD Expenses
Category
The number in a cell is the error of the
bellwether region for that subset of items
63
Problem of Naïve Cube Construction
  • A naïve bellwether cube construction algorithm
    will conduct a basic bellwether search for the
    subset of items in each cell
  • A basic bellwether search involves building a
    model for each candidate region
  • For each cell
  • Build a model for each
  • candidate region

64
Efficient Cube Construction
  • Idea Transform model construction into
    computation of distributive or algebraic
    aggregate functions
  • Let S1, , Sn partition S
  • S S1 ? ? Sn and Si ? Sj ??
  • Distributive function ?(S) F(?(S1), ,
    ?(Sn))
  • E.g., Count(S) Sum(Count(S1), , Count(Sn))
  • Algebraic function ?(S) F(G(S1), , G(Sn))
  • G(Si) returns a length-fixed vector of values
  • E.g., Avg(S) F(G(S1), , G(Sn))
  • G(Si) Sum(Si), Count(Si)
  • F(a1, b1, , an, bn) Sum(ai) / Sum(bi)

65
Efficient Cube Construction
  • Build models for each finest-grained cells
  • For higher-level cells, use data cube computation
    techniques to compute the aggregate functions
  • For each finest-grained cell
  • Build models to find the
  • bellwether region
  • For each higher-level cell
  • Compute aggregate functions
  • to find the bellwether region

66
Efficient Cube Construction
  • Classification models
  • Use the prediction cube Chen et al., 05
    execution framework
  • Regression models (Weighted linear regression
    model builds on work in Chen-Dong-Han-Wah-Wang
    VLDB 02)
  • Having the sum of squared error (SSE) for each
    candidate region is sufficient to find the
    bellwether region
  • SSE(S) is an algebraic function, where S is a set
    of item
  • SSE(S) q( g(Sk) k 1, , n )
  • S1, , Sn partition S
  • g(Sk) ?Yk?WkYk, Xk?WkXk, Xk?WkYk?
  • q(?Ak, Bk, Ck? k 1, , n) ?k Ak ? (?k
    Ck)?(?k Bk)?1(?k Ck)

where
Yk is the vector of target values for set Sk of
items Xk is the matrix of features for set Sk of
items Wk is the weight matrix for set Sk of items
67
Experimental Results
68
Experimental Results Summary
  • We have shown the existence of bellwether regions
    on a real mail-order dataset
  • We characterize the behavior of bellwether trees
    and bellwether cubes using synthetic datasets
  • We show our computation techniques improve
    efficiency by orders of magnitude
  • We show our computation techniques scale linearly
    in the size of the dataset

69
Characteristics of Bellwether Trees Cubes
  • Result
  • Bellwether trees cubes have better accuracy
    than basic bellwether search
  • Increase noise ?? increase error
  • Increase complexity ? increase error
  • Dataset generation
  • Use random tree to generate
  • different bellwether regions
  • for different subset of items
  • Parameters
  • Noise
  • Concept complexity of tree nodes

15 nodes
Noise level 0.5
70
Efficiency Comparison
Naïve computation methods
Our computation techniques
71
Scalability
72
Exploratory MiningPrediction Cubeswith
Beechung Chen, Lei Chen, and Yi LinIn VLDB 05
73
The Idea
  • Build OLAP data cubes in which cell values
    represent decision/prediction behavior
  • In effect, build a tree for each cell/region in
    the cubeobserve that this is not the same as a
    collection of trees used in an ensemble method!
  • The idea is simple, but it leads to promising
    data mining tools
  • Ultimate objective Exploratory analysis of the
    entire space of data mining choices
  • Choice of algorithms, data conditioning
    parameters

74
Example (1/7) Regular OLAP
Z Dimensions
Y Measure
Goal Look for patterns of unusually
high numbers of applications
75
Example (2/7) Regular OLAP
Goal Look for patterns of unusually
high numbers of applications
Z Dimensions
Y Measure
Finer regions
76
Example (3/7) Decision Analysis
Goal Analyze a banks loan decision process
w.r.t. two dimensions Location and Time
Fact table D
Z Dimensions
X Predictors
Y Class
77
Example (3/7) Decision Analysis
  • Are there branches (and time windows) where
    approvals were closely tied to sensitive
    attributes (e.g., race)?
  • Suppose you partitioned the training data by
    location and time, chose the partition for a
    given branch and time window, and built a
    classifier. You could then ask, Are the
    predictions of this classifier closely correlated
    with race?
  • Are there branches and times with decision making
    reminiscent of 1950s Alabama?
  • Requires comparison of classifiers trained using
    different subsets of data.

78
Example (4/7) Prediction Cubes
  • Build a model using data from USA in Dec., 1985
  • Evaluate that model
  • Measure in a cell
  • Accuracy of the model
  • Predictiveness of Race
  • measured based on that
  • model
  • Similarity between that
  • model and a given model

79
Example (5/7) Model-Similarity
Given - Data table D - Target model h0(X)
- Test set ? w/o labels
The loan decision process in USA during Dec 04
was similar to a discriminatory decision model
80
Example (6/7) Predictiveness
Given - Data table D - Attributes V -
Test set ? w/o labels
Data table D
Yes No . . No
Yes No . . Yes
Build models
h(X?V)
h(X)
Level Country, Month
Predictiveness of V
Race was an important predictor of loan approval
decision in USA during Dec 04
Test set ?
81
Example (7/7) Prediction Cube
Cell value Predictiveness of Race
82
Efficient Computation
  • Reduce prediction cube computation to data cube
    computation
  • Represent a data-mining model as a distributive
    or algebraic (bottom-up computable) aggregate
    function, so that data-cube techniques can be
    directly applied

83
Bottom-Up Data Cube Computation
Cell Values Numbers of loan applications
84
Functions on Sets
  • Bottom-up computable functions Functions that
    can be computed using only summary information
  • Distributive function ?(X) F(?(X1), ,
    ?(Xn))
  • X X1 ? ? Xn and Xi ? Xj ??
  • E.g., Count(X) Sum(Count(X1), , Count(Xn))
  • Algebraic function ?(X) F(G(X1), , G(Xn))
  • G(Xi) returns a length-fixed vector of values
  • E.g., Avg(X) F(G(X1), , G(Xn))
  • G(Xi) Sum(Xi), Count(Xi)
  • F(s1, c1, , sn, cn) Sum(si) / Sum(ci)

85
Scoring Function
  • Represent a model as a function of sets
  • Conceptually, a machine-learning model h(X
    ?Z(D)) is a scoring function Score(y, x ?Z(D))
    that gives each class y a score on test example x
  • h(x ?Z(D)) argmax y Score(y, x ?Z(D))
  • Score(y, x ?Z(D)) ? p(y x, ?Z(D))
  • ?Z(D) The set of training examples (a cube
    subset of D)

86
Bottom-up Score Computation
  • Key observations
  • Observation 1 Score(y, x ?Z(D)) is a function
    of cube subset ?Z(D) if it is distributive or
    algebraic, bottom-up data cube computation
    techniques can be directly applied
  • Observation 2 Having the scores for all the test
    examples and all the cells is sufficient to
    compute a prediction cube
  • Scores ?? predictions ?? cell values
  • Details depend on what each cell means (i.e.,
    type of prediction cubes) but straightforward

87
Machine-Learning Models
  • Naïve Bayes
  • Scoring function algebraic
  • Kernel-density-based classifier
  • Scoring function distributive
  • Decision tree, random forest
  • Neither distributive, nor algebraic
  • PBE Probability-based ensemble (new)
  • To make any machine-learning model distributive
  • Approximation

88
Probability-Based Ensemble
PBE version of decision tree on WA, 85
Decision tree on WA, 85
Decision trees built on the lowest-level cells
89
Probability-Based Ensemble
  • Scoring function
  • h(y x bi(D)) Model hs estimation of p(y x,
    bi(D))
  • g(bi x) A model that predicts the probability
    that x belongs to base subset bi(D)

90
Outline
  • Motivating example
  • Definition of prediction cubes
  • Efficient prediction cube materialization
  • Experimental results
  • Conclusion

91
Experiments
  • Quality of PBE on 8 UCI datasets
  • The quality of the PBE version of a model is
    slightly worse (0 6) than the quality of the
    model trained directly on the whole training
    data.
  • Efficiency of the bottom-up score computation
    technique
  • Case study on demographic data

PBE
vs.
92
Efficiency of Bottom-up Score Computation
  • Machine-learning models
  • J48 J48 decision tree
  • RF Random forest
  • NB Naïve Bayes
  • KDC Kernel-density-based classifier
  • Bottom-up method vs. Exhaustive method
  • ? PBE-J48
  • PBE-RF
  • NB
  • KDC
  • ? J48ex
  • RFex
  • NBex
  • KDCex

93
Synthetic Dataset
  • Dimensions Z1, Z2 and Z3.
  • Decision rule

Z1 and Z2
Z3
94
Efficiency Comparison
Using exhaustive method
Execution Time (sec)
Using bottom-up score computation
of Records
95
Conclusions
96
Related Work Building Models on OLAP Results
  • Multi-dimensional regression Chen, VLDB 02
  • Goal Detect changes of trends
  • Build linear regression models for cube cells
  • Step-by-step regression in stream cubes Liu,
    PAKDD 03
  • Loglinear-based quasi cubes Barbara, J. IIS 01
  • Use loglinear model to approximately compress
    dense regions of a data cube
  • NetCube Margaritis, VLDB 01
  • Build Bayes Net on the entire dataset of
    approximate answer count queries

97
Related Work (Contd.)
  • Cubegrades Imielinski, J. DMKD 02
  • Extend cubes with ideas from association rules
  • How does the measure change when we rollup or
    drill down?
  • Constrained gradients Dong, VLDB 01
  • Find pairs of similar cell characteristics
    associated with big changes in measure
  • User-cognizant multidimensional analysis
    Sarawagi, VLDBJ 01
  • Help users find the most informative unvisited
    regions in a data cube using max entropy
    principle
  • Multi-Structural DBs Fagin et al., PODS 05, VLDB
    05
  • Experiment Databases Towards an Improved
    Experimental Methodology in Machine Learning
    Blockeel Vanschoren, PKDD 2007

98
Take-Home Messages
  • Promising exploratory data analysis paradigm
  • Can use models to identify interesting subsets
  • Concentrate only on subsets in cube space
  • Those are meaningful subsets, tractable
  • Precompute results and provide the users with an
    interactive tool
  • A simple way to plug something into cube-style
    analysis
  • Try to describe/approximate something by a
    distributive or algebraic function

99
Big Picture
  • Why stop with decision behavior? Can apply to
    other kinds of analyses too
  • Why stop at browsing? Can mine prediction cubes
    in their own right
  • Exploratory analysis of mining space
  • Dimension attributes can be parameters related to
    algorithm, data conditioning, etc.
  • Tractable evaluation is a challenge
  • Large number of dimensions, real-valued
    dimension attributes, difficulties in
    compositional evaluation
  • Active learning for experiment design, extending
    compositional methods

100
Conclusion
  • Hierarchies are widely used, and a promising tool
    to help us deal with
  • Data sparsity
  • Data imprecision and uncertainty
  • Exploratory analysis
  • Experiment planning and management
  • Area is as yet under-appreciated
  • Lots of work on taxonomies and how to use them,
    but there are many novel ways of using them that
    have not received enough attention
Write a Comment
User Comments (0)
About PowerShow.com