Loading...

PPT – Wednesday, March 14, 2001 PowerPoint presentation | free to download - id: 990f1-ODQ4M

The Adobe Flash plugin is needed to view this content

KDD Group Presentation

Real Time Bayesian Networks Inference

Wednesday, March 14, 2001 Haipeng Guo KDD

Research Group Department of Computing and

Information Sciences Kansas State University

Presentation Outline

- Bayesian Networks Introduction
- Bayesian Networks Inference Algorithms Review
- Real Time Related Issues
- A Distributed Anytime Architecture for

Probabilistic reasoning from Santos paperSantos

1995 - Summary

Bayesian Networks Introduction

- Definition
- Why is it important?
- Examples
- Applications

Bayesian Networks

- Bayesian Networks, also called Bayesian Belief

networks, causal networks, or probabilistic

networks, are a network-based framework for

representing and analyzing causal models

involving uncertainty - A BBN is a directed acyclic graph (DAG) with

conditional probabilities for each node. - Nodes represent random variables in a problem

domain - Arcs represent conditional dependence

relationship among these variables. - Each node contains a CPT(Conditional

Probabilistic Table) that contains probabilities

of this node being specific values given the

values of its parent nodes.

Family-Out Example

- " Suppose when I go home at night, I want to know

if my family is home before I try the

doors.(Perhaps the most convenient door to enter

is double locked when nobody is home.) Now, often

when my wife leaves the houses, she turns on an

outdoor light. However, she sometimes turns on

the lights if she is expecting a guest. Also, we

have a dog. When nobody is home, the dog is put

in the back yard. The same is true if the dog has

bowel problems. Finally, if the dog is in the

back yard, I will probably hear her barking(or

what I think is her barking), but sometimes I can

be confused by other dogs. "

Asia Example from Medical Diagnostics

Why is BBN important?

- Offers a compact, intuitive, and efficient

graphical representation of dependence relations

between entities of a problem domain. (model the

world in a more natural way than Rule-based

systems and neural network) - Handle uncertainty knowledge in mathematically

rigorous yet efficient and simple way - Provides a computational architecture for

computing the impact of evidence nodes on

beliefs(probabilities) of interested query nodes - Growing numbers of creative applications

Alarm Example the power of BBN

- The Alarm network
- 37 variables, 509 parameters (instead of 237)

Applications

- Medical diagnostic systems
- Real-time weapons scheduling
- Jet-engines fault diagnosis
- Intel processor fault diagnosis (Intel)
- Generator monitoring expert system (General

Electric) - Software troubleshooting (Microsoft office

assistant, Win98 print troubleshooting) - Space shuttle engines monitoring(Vista project)
- Biological sequences analysis and classification

Bayesian Networks Inference

- Given an observed evidence, do some computation

to answer queries - An evidence e is an assignment of values to a set

of variables E in the domain, E Xk1, , Xn

- For example, E e Visit Asia True, Smoke

True - Queries
- The posteriori belief compute the conditional

probability of a variable given the evidence, - P(Lung Cancer Visit Asia TRUE AND Smoke

TRUE) ? - This kind of inference tasks is called

Belief Updating - MPE compute the Most Probable Explanation given

the evidence - An explanation for the evidence is a complete

assignment X1 x1, , Xn xn that is

consistent with evidence. Computing a MPE is

finding an explanation such that no other

explanation has higher probability - This kind of inference tasks is called Belief

revision

Belief Updating

- The problem is to compute P(XxEe) the

probability of query nodes X, given the observed

value of evidence nodes E e. - For example Suppose that a patient arrives and

it is known for certain that he has recently

visited Asia and has dyspnea. - - Whats the impact that this evidence has on

the probabilities of the other variables in the

network ? P(Lung Cancer) ?

Smoking

Visit to Asia

Lung Cancer

Tuberculosis

tub. or lung cancer

Bronchitis

Dyspnea

X-Ray

Belief Revision

Let W is the set of all nodes in our given

Bayesian network Let the evidence e be the

observation that the roses are okay. Our goal is

to now determine the assignment to all nodes

which maximizes P(we).

We only need to consider assignments where the

node roses is set to okay and maximize P(w), i.e.

the most likely state of the world given the

evidence that rose is ok in this world.

The best solution then becomes -

P(sprinklers F, rain T, street wet, lawn

wet, soil wet, roses okay) 0.2646

Complexity of BBN Inference

- Probabilistic Inference Using Belief Networks is

NP-hard. Cooper 1990 - Approximating Probabilistic Inference in

Bayesian Belief Networks is NP-hard Dagum 1993

- Hardness does not mean we cannot solve inference.

It implies that - We cannot find a general procedure that works

efficiently for all networks - However, for particular families of networks, we

can have provably efficient algorithms either

exact or approximate - Instead of a general exact algorithm, we seek for

special case, average case, approximate

algorithms - Various of approximate, heuristic, hybrid and

special case algorithms should be taken into

consideration

BBN Inference Algorithms

- Exact algorithms
- Pearls message propagation algorithm(for single

connected networks only) - Variable elimination
- Cutset conditioning
- Clique tree clustering
- SPI(Symbolic Probabilistic Inference)
- Approximate algorithms
- Partial evaluation methods by performing exact

inference partially - Variational approach by exploiting averaging

phenomena in dense networks(law of large numbers) - Search based algorithms by converting inference

problem to an optimization problem, then using

heuristic search to solve it - Stochastic sampling also called Monte Carlo

algorithms

PolyTree

- Singly Connected Networks(or Polytrees)

Definition A directed acyclic graph (DAG) in

which at most one undirected path exists between

any two nodes.

Multiple parents and/or multiple children

Polytree structure satisfies definition

Do not satisfy definition

Propagation Algorithm Objective

Data

Data

- The algorithms purpose is fusing and

propagating the impact of new evidence and

beliefs through Bayesian networks so that each

proposition eventually will be assigned a

certainty measure consistent with the axioms of

probability theory. (Pearl, 1988, p 143)

PolyTree Propagation Example

The impact of each new piece of evidence is

viewed as a perturbation that propagatesthroughth

e network via message-passing betweenneighboring

variables . . . (Pearl, 1988, p 143)

? Message to Parent

? Message from Parent

Data

Data

- Exact algorithm, for Polytree only, linear in the

size of the network

Cutset Conditioning Algorithm

- Transfer the network into several simpler

polytrees by conditioning the cutset and then

call the Polytree propagation algorithm. Each

simple network has one or more variable

instantiated to a definite value. P(XE) is

computed as a weighted average over the values

computed by each polytree. Pearl 1988 - A cutset is a set of nodes when instantiated will

render the network single connected.

- First exact algorithm for multiple connected

networks, exponential time complexity in the

size of the cutset. - There are exponentially many such cutset

instantiations

Clique Tree Clustering Algorithm

- Transform the network into a tree of cliques,

then computes probabilities for the cliques

during a two-way message passing and the

individual node probabilities P(XE) are

calculated from the probabilities of cliques - A clique W of G is a maximal complete subset of

G, that is, there is no other complete subset of

G which properly contains W - The most common used exact inference algorithm

for general networks - Efficient for sparse networks, but could have a

very bad performance for more general, dense

networks - Exact, for multiple connected networks,

exponential time complexity in the size of the

network

Clique tree clustering

Triangulation

Moralization

Identify Cliques

?,? Message passing

P(Clqi) and P(XE)

Form Clique Tree

Variable Elimination Algorithm

- General idea
- Write query in the form
- Iteratively
- Move all irrelevant terms outside of innermost

sum - Perform innermost sum, getting a new term
- Insert the new term into the product
- Computation depends on order of elimination, a

good elimination orderings can reduce

complexity - The size of the largest clique in the induced

graph is thus an indicator for the complexity of

variable elimination. This quantity is called the

induced width of a graph according to the

specified ordering - Finding an ordering that minimizes the induced

width is NP-Hard - Exact, for all networks, exponential time

complexity, inefficient

SPI(Symbolic Probabilistic Inference)

- General idea
- Transform BBN inference problem into a

well-defined combinatorial optimization problem -

the Optimal Factoring Problem(OFP). Thus the

problem becomes to find an optimal factoring

given a set of probability distribution. The

solution of the OFP is then used to combine the

CPT that describe the BBN and extract the desired

marginal distribution. - OFP itself is NP-Hard.
- Exact, for all networks, exponential time

complexity, inefficient

Factoring 1 needs 72 multiplications

Factoring 2 needs only 28 multiplications

Approximate Algorithms

- Exact Inference for large-scale networks is

apparently infeasible. - Real life network can be up to thousands nodes.
- For example QMR(Quick medical Reference)

consists of a combination of statistical - and expert knowledge for approximately 600

significant diseases and 4000 findings. - The median size of the maximal clique of the

moralized graph is 151.5 nodes. Its - intractable for all exact inference algorithms.
- Approximate algorithms can be categorized into
- Partial evaluation methods by performing exact

inference partially - Variational approach by exploiting averaging

phenomena in dense networks(law of large numbers) - Search based algorithms by converting inference

problem to an optimization problem, then using

heuristic search to solve it - Stochastic sampling also called Monte Carlo

algorithms

Perform Exact Algorithm Partially

- General idea reduce the complexity by reducing

the solution space - Partial sets of nodes instantiation
- Partial sets of hypotheses
- Partial set of nodes
- Bounded conditioningCooper 1991
- Localized partial evaluationDraper 1994
- incremental SPIDAmbrosio 1993
- Probabilistic partial evaluationPoole 1997
- Mini-buckets algorithmDechter 1997
- Approximate, for all networks, complexity not

clear

Variational Method

- General idea exploit averaging phenomena in

dense graph - A sum can be avoided if it contains a sufficient

number of terms such that a law of large numbers

can be invoked - Graphically, the model is transformed into a

sub-graph of the original model in which some of

the finding nodes are delinked until its

possible to run an exact algorithm on the

resulting graph. Jaakkola Jordan 1999 - Approximate, efficient, for dense graph only

Search based algorithms

- General idea Convert the problem into an

optimization problem then use heuristic search

to solve it. - Consider node instantiations across the entire

graph - Exploiting characteristics of problem domain to

help search - A general hop is that a relatively small fraction

of the exponentially many node instantiations

contains a majority of the probability mass, and

by exploring the high probability

instantiations(bounding the unexplored

probability mass) one can obtain reasonable

bounds on posterior probabilities. - Cooper 1985, Peng Reggia 1987, Henrion 1991
- Best-first search(A), linear programming,

genetic algorithm - Charniak 1994, Santos 1993, Carlos 1993
- Approximate, heuristic, maybe fail

Stochastic Sampling Algorithms

- General idea Run repeated simulations according

to the BBN, the probability of an event of

interest is estimated using the frequency with

which that event occurs in a set of samples. - Logic sampling henrion 1988
- forward sampling
- backward sampling Fung 1994
- Likelihood weighting Fung Chang 1990
- Important sampling Shachter 1990
- Approximate, performance depends only on the

CPTs, can handle very large networks, but has

difficulty with extremely unlikely events.

Inference Algorithm Conclusions

- The general problem of exact inference is

NP-Hard. - The general problem of approximate inference is

NP-Hard. - Exact inference works for small, sparse networks

only. - No single champion either exact or inference

algorithms. - The goal of research should be that of

identifying effective approximate techniques that

work well in large classes of problems. - Another direction is the integration of various

kinds of approximate and exact algorithms

exploiting the best characteristics of each

algorithm.

A Distributed Anytime Inference Architecture

- On a Distributed Anytime Architecture for

Probabilistic Reasoning - Air Force Institute of Technology
- Eugene Santos Jr. , 1995

Anytime algorithms

- To meet the demand for real-time inference, an

inference algorithm must have two capibilities - Provide a near optimal solution at any given

moment - Improving upon solutions as more time and

resources are allocated - Algorithms which have this property of producing

a solution at any point in time are called

anytime algorithms

Anywhere Algorithms

- To exploit parallelism and distributed processing

to reduce the time complexity, the tasks in the

distributed environment must be able to exploit

intermediate results produced by the other

components of the system. - Algorithms with this property are called

anywhere algorithms. - When different algorithms having both anytime and

anywhere properties are harnessed together into a

cooperative system, the resultant architecture

can exploit the best characteristics of each

algorithm.

The OVERMIND Architecture

- Part of PESKI, an online expert system for engine

diagnosis for the Space Shuttle Program - Three components
- IRA(Intelligent Resource Allocator)
- Manages and allocates available computing

resources - OVERSEER(Overseer task Manager)
- Initiates new tasks, directs messages/information
- LOTS(Library of Tasks)
- A set of BBN inference algorithms suitable for

performing various including an A search

algorithm, a genetic algorithm, an integer linear

programming algorithm and a hybrid stochastic

algorithm(HySS)

General Idea

- The best algorithm to use is problem-instance

dependent. - In a set of anywhere algorithms, if each

particular algorithm is good at certain portion

of a problem we can then take the partial

solution of an algorithm and pass it to another

approach which itself works better on the new

portion - This leads to an anytime anywhere solution

Genetic Algorithms

- A heuristic search algorithm modeled after

natural genetic evolutions - Has anytime and anywhere property.
- No stopping criterion that guarantees an optimal

answer. - Its ability to generate solutions early can serve

as a starting point if possible for other

deterministic algorithm.

Best-First Search(A)

- A heuristic algorithm searching for optimal

solution from initial state - Provide an approximate answer when interrupted
- Allow the algorithm to accept initial guess from

another sources - Use Best-first search to find the most probable

complete instantiation among those compatible

with the guess

IRA(Intelligent Resource Allocator)

- Serve to maximize processor use by coordinating

requests for resources from OVERSEER and the

tasks themselves. - Hardware a network of workstations
- Identify resource requirements for different

tasks - GA single CPU
- ILP multi processing

The OVERSEER(Task manager)

- Currently simple messager role
- Advance capabilities involve deliberation

scheduling employing meta-reasoning to consider

what computational tasks to execute. - To do this, some estimate of runtime and quality

of results should be available foe each

algorithm.

Implementation and results

- The strengths of different methods are combined

together - Gas produce reasonable solution immediately
- A took those solutions near some maximas
- HySS fine-tuned those maximas
- ILP finished the optimization and generated te

optimal solution - Result
- Initial test multiple instances of GAs
- GAs 20 speed up
- HySS 35 speed up
- A and ILP 1525 speed up

Summary

- Exploited the anytime anywhere properties of

several inference algorithms such as Gas, ILP and

A and unified them into a single model of

parallel computation. - The architecture can use the best characteristics

of each algorithm.

Future Research

- Consider more algorithms
- Study the relationship between the problem domain

and the corresponding solutions domain to help

deliberation scheduling.

The End

- Any Questions ?

Linear Programming

- The problem of finding the most probable

explanation has been transformed into an integer

linear programming problem with a set of

constraints to satisfied. - Efficient algorithms for linear programming can

be used to compute the optimal solution