Title: Parameter Related Domain Knowledge for Learning in Bayesian Networks
1Parameter Related Domain Knowledge forLearning
in Bayesian Networks
Stefan Niculescu PhD Candidate, Carnegie Mellon
University Joint work with professor Tom Mitchell
and Dr. Bharat Rao April 2005
2Domain Knowledge
- In real world, often data is too sparse to allow
building of an accurate model - Domain knowledge can help alleviate this problem
- Several types of domain knowledge
- Relevance of variables (feature selection)
- Conditional Independences among variables
- Parameter Domain Knowledge
3Parameter Domain Knowledge
- In a Bayes Net for a real world domain
- can have huge number of parameters
- not enough data to estimate them accurately
- Parameter Domain Knowledge constraints
- reduce the number of parameters to estimate
- reduce the variance of parameter estimates
4 Outline
- Motivation
- Parameter Related Domain Knowledge
- Experiments
- Related Work
- Summary / Future Work
5Parameters and Counts
Theorem. The Maximum Likelihood estimators are
given by
CPT for variable Xi
6Parameter Sharing
Theorem. The Maximum Likelihood estimators are
given by
7Incomplete Data, Frequentist
8Dependent Dirichlet Priors
9Bayesian Averaging
10Hierarchical Parameter Sharing
11Probability Mass Sharing
DK Parameters of a given color have the same sum
across all distributions.
...
12Probability Ratio Sharing
DK Parameters of a given color preserve their
relative ratios across all distributions.
...
13Where are we right now?
14 Outline
- Motivation
- Parameter Related Domain Knowledge
- Experiments
- Related Work
- Summary / Future Work
15Datasets
- Project World - CALO
- 6 persons, 200 emails
- Manually labeled as About / Not About Meetings
- Data (Person, Email, Topic)
- Artificial Datasets
- Kept most of the characteristics of the data BUT
... - ... new emails were generated where frequencies
of certain words were shared across users - Purpose
- Domain Knowledge readily available
- To be able to study the effect of training set
size (up to 5000) - To be able to compare our estimated distribution
to the true distribution
16Approach
- Can model Email using a Naive Bayes model
- Without Parameter Sharing (PSNB)
- With Parameter Sharing (SSNB)
- Also compare with a model that assumes the sender
is irrelevant (GNB) - the frequencies of words within a topic to be
learnt from all examples
Sender
Topic
Word
Topic
Sender
Word
17Effect of Training Set Size
- As expected
- SSNB performs better than both models
- SSNB and PSNB tend to perform similarly when the
size of training set increases, but SSNB much
better when data is sparse
18 Outline
- Motivation
- Parameter Related Domain Knowledge
- Experiments
- Related Work
- Summary / Future Work
19Dirichlet Priors in a Bayes Net
Prior Belief
Spread
The Domain Expert specifies an assignment of
parameters. However, leaves room for some error
(Spread)
20HMMs and DBNs
...
...
...
...
21Module Networks
- In a Module
- Same parents
- Same CPTs
Image from Learning Module Networks by Eran
Segal and Daphne Koller
22Context Specific Independence
Burglary
Set
Alarm
23 Outline
- Motivation
- Parameter Related Domain Knowledge
- Experiments
- Related Work
- Summary / Future Work
24Summary
- Parameter Related Domain Knowledge is needed when
data is scarce - Developed methods to estimate parameters
- For each of four types of Domain Knowledge
presented - From both complete and incomplete Data
- Markov Models, Module Nets, Context Specific
Independence particular cases of our parameter
sharing domain knowledge - Models using Parameter Sharing performed better
than two classical Bayes Nets on synthetic data
25Future Work
- Automatically find Shared Parameters
- Study interactions among different types of
Domain Knowledge - Incorporate Domain Knowledge about continuous
variables - Investigate Domain Knowledge in the form of
inequality constraints
26Questions ?
27THE END
28Backup Slides
29Hierarchical Parameter Sharing
30Full Data Observability, Frequentist
31Probability Mass Sharing
- Want to model P(WordLanguage)
- Two languages English, Spanish
- Different sets of words
- Domain Knowledge
- Aggregate Probability Mass of Nouns the same in
both - Same holds for adjectives, verbs, etc
32Probability Mass Sharing
33Full Data Observability, Frequentist
34Probability Ratio Sharing
- Want to model P(WordLanguage)
- Two languages English, Spanish
- Different sets of words
- Domain Knowledge
- Word groups
- About computers computer, mouse, monitor, etc
- Relative frequency of computer to mouse same
in both languages - Aggregate mass can be different
T1 Computer Words
T2 Business Words
35Probability Ratio Sharing
36Full Data Observability, Frequentist