Incremental Integration of Probabilistic Models Learned from Data - PowerPoint PPT Presentation

1 / 30

About This Presentation

Title:

Incremental Integration of Probabilistic Models Learned from Data

Description:

Pedrito Maynard-Zhang (Amazon.com) Jianhua Chen (Louisiana State U) 2. Outline ... Maynard-Reid&Chajewska01 shows that if sources learn joints using MLE, then ... – PowerPoint PPT presentation

Number of Views:32

Avg rating:3.0/5.0

Slides: 31

Provided by: jia97

Category:

more less

Transcript and Presenter's Notes

Title: Incremental Integration of Probabilistic Models Learned from Data

1
Incremental Integration of Probabilistic Models
Learned from Data

Jian Xu (Louisiana State U)
Pedrito Maynard-Zhang (Amazon.com)
Jianhua Chen (Louisiana State U)

2
Outline

Incremental integration problem
Existing batch integration approach
BN incremental integration
Pros, cons, and experiments
BN subtraction
Conclusion and future work

3
Motivating Scenario
Symptoms, history, and test results
Expert k knowledge en route
Integrated model for diagnosis

Expert j knowledge arrived at time tj
Doctors own knowledge in this domain
Represented as probability models that are
learned from data

Expert i knowledge arrived at time ti
4
Incremental Integration Problem
integration algorithm
t1
learning algorithm
BN1
t2
BN2
learning algorithm

tn

aggregateBN
BNn

learning algorithm
?
learningalgorithm
optimalBN
M samples generated from the true BN
not possible in practice
5
Outline

Incremental integration problem
Existing batch integration approach
BN incremental integration
Pros, cons, and experiments
BN subtraction
Conclusion and future work

6
MC Batch Integration

Maynard-ReidChajewska01 shows that if sources
learn joints using MLE, then LinOP is the correct
integration algorithm where the weights are
percentage of data seen.
For BN integration, they adapt the MDL learning
algorithm
Use LinOP to approximate the needed statistics
since data is unavailable
Use the estimated fraction of data expert i saw
as the weight ai

7
MC Batch Integration Algorithm

Select a BN most likely to have generated data
using MDL and LinOP
Search over structures by adding, deleting, and
reversing edges
Score using LinOP-based MDL
Parameterize using LinOP
Use random restart to avoid getting stuck in a
local maximum

8
Outline

Incremental integration problem
Existing batch integration approach
BN incremental integration
Pros, cons, and experiments
BN subtraction
Conclusion and future work

9
Batch-Based Strawman 1

Algorithm
Wait for all models to arrive
Apply batch algorithm
Drawbacks
Must store all models
Can do nothing while waiting for models
May not be able to tell when all models have
arrived
Models may never stop arriving (e.g., periodic
reports)

10
Batch-Based Strawman 2

Algorithm
Store each model that arrives
Apply batch algorithm to all stored models after
each new arrival
Drawbacks
Must store all models
Roughly O(i) time to add the ith model, and O(n2)
total for n models

11
Incremental Integration Algorithm

Integrate the first group of sources to arrive.
Consider this intermediate result to be an
aggregate source BN
Assign new weight to aggregate source by making
the number of samples it has seen the sum of
the number of samples that all involved sources
have seen
When new BNs come, integrate them with the
current aggregate BN

12
Source Definition

Tuple ltp, M, , essgt where
p BN representing sources beliefs
M number of samples distribution is based on
, ess parameters defining prior over space of
distributions
prior over the sample space
ess number of "virtual" samples distribution
space prior is based on

13
Incremental Integration Algorithm

1. DM ? ltpD, 0, pD, essDgt
2. loop
(a) wait until a new group g of sources Sg
S1, , Skg arrive with associated weights and
cumulative estimated sample size M.
(b) DM ? lt pD, MD, pD, essD gt where
pD ? the integration of pD and Sg using the
batch integration algorithm and MD/(MDM) as the
aggregated sources weight, and
MD ? MD M.
3. until no new sources arrive
4. return DM.

g
g

14
Justification

We show algorithm is order-independent when
applied to joint distributions
Order-independence property holds approximately
for BNs
Approximation due to generalization and greedy
optimization search

15
Outline

Incremental integration problem
Existing batch integration approach
BN incremental integration
Pros, cons and experiments
BN subtraction
Conclusion and future work

16
Structure of Asia BN
Visit to Asia
Smoking
Lung Cancer
Tuberculosis
Bronchitis
Abnormality in Chest
Dyspnea
X-Ray
17
Pro Performance

Anytime response
Most up-to-date aggregate model always available
Efficient integration
Typically, fewer sources involved in each
iteration
Total integration time is O(n) for n sources
Idle time utilization
Can take advantage of wait times to do
integration, reducing the total wall-clock time
for integration
Space saving
Space only required for the current aggregate
model and arriving sources

18
Time Comparison
Comparing total integration time of incremental
and batch integration algorithms as the number of
sources increases from 1 to 15 for fixed source
sizes of 201000
19
Pro Accuracy

Incremental integration accuracy relatively close
to batch integration accuracy
Difference introduced by local optima in search
space
Difference generally decreases with larger source
size

20
Accuracy Comparison
Comparing incremental, batch, and source accuracy
over time when incrementally combining sources of
size 50
21
Con Bias Inertia

Bias introduced via local optima in search space
Inertia incoming sources with small weights
unable to change aggregate significantly after a
point
Inertia cut both ways bias in aggregate can be
countered and held at bay by accurate sources
with relatively large weights

22
Bias Inertia
Effect of a highly weighted, inaccurate source
arriving early third among 10 lower weight,
higher accuracy sources
23
Con Sensitivity to Order

Different source orderings can result in markedly
different results, even for same-size or
same-weight sources
The accuracy of the sources also matters less
accurate sources can introduce bias which is then
subject to the inertia effect
Source ordering-bias tradeoff
If bad sources arrive early, bias they introduce
easier to undo, but bias also easier to introduce
in the first place
If bad sources arrive late, they are less likely
to introduce bias, but bias more difficult to
undo once introduced

24
Outline

Incremental integration problem
Existing batch integration approach
BN incremental integration
Pros, cons and experiments
BN subtraction
Conclusion and future work

25
BN Subtraction

Scenarios
Incorporating updates
De-duplicating shared BNs
Algorithm Incremental integration algorithm, but
use negative weights for BNs to remove

26
Outline

Incremental integration problem
Existing batch integration approach
BN incremental integration
Pros, cons and experiments
BN subtraction
Conclusion and future work

27
Conclusion

Incremental algorithm supports anytime
querying, utilizes idle time, and saves space
The result of incremental integration of joint
distributions is independent of the source order
Experiments show BN integration result depends on
source order to a degree mainly due to bias
introduced by greedy optimization and maintained
by an inertial effect
Reduction of accuracy of incremental algorithm
may be acceptable

28
Future Work

Optimally grouping sources to minimize the total
integration time (we only explored the extreme of
integrating one source at a time)
Reducing high computation cost due to the heavy
reliance on BN inference
seek faster inference algorithm, e.g.,
approximate
organize the sources into hierarchical
integration tree, which allows parallel,
distributed integration
Subtraction experiments
Detecting shared sources

29
Acknowledgment