Wednesday, March 14, 2001 - PowerPoint PPT Presentation


PPT – Wednesday, March 14, 2001 PowerPoint presentation | free to download - id: 990f1-ODQ4M


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Wednesday, March 14, 2001


Kansas State University. Department of Computing and ... PAP. SHUNT. ANAPHYLAXIS. MINOVL. PVSAT. FIO2. PRESS. INSUFFANESTH. TPR. LVFAILURE. ERRBLOWOUTPUT ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 43
Provided by: lindajacks
Tags: march | wednesday


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Wednesday, March 14, 2001

KDD Group Presentation
Real Time Bayesian Networks Inference
Wednesday, March 14, 2001 Haipeng Guo KDD
Research Group Department of Computing and
Information Sciences Kansas State University
Presentation Outline
  • Bayesian Networks Introduction
  • Bayesian Networks Inference Algorithms Review
  • Real Time Related Issues
  • A Distributed Anytime Architecture for
    Probabilistic reasoning from Santos paperSantos
  • Summary

Bayesian Networks Introduction
  • Definition
  • Why is it important?
  • Examples
  • Applications

Bayesian Networks
  • Bayesian Networks, also called Bayesian Belief
    networks, causal networks, or probabilistic
    networks, are a network-based framework for
    representing and analyzing causal models
    involving uncertainty
  • A BBN is a directed acyclic graph (DAG) with
    conditional probabilities for each node.
  • Nodes represent  random variables in a problem
  • Arcs  represent conditional dependence
    relationship among these variables. 
  • Each node contains a CPT(Conditional
    Probabilistic Table) that contains probabilities
    of this node being specific values given the
    values of its parent nodes.

Family-Out Example
  • " Suppose when I go home at night, I want to know
    if my family is home before I try the
    doors.(Perhaps the most convenient door to enter
    is double locked when nobody is home.) Now, often
    when my wife leaves the houses, she turns on an
    outdoor light. However, she sometimes turns on
    the lights if she is expecting a guest. Also, we
    have a dog. When nobody is home, the dog is put
    in the back yard. The same is true if the dog has
    bowel problems. Finally, if the dog is in the
    back yard, I will probably hear her barking(or
    what I think is her barking), but sometimes I can
    be confused by other dogs. "

Asia Example from Medical Diagnostics
Why is BBN important?
  • Offers a compact, intuitive, and efficient
    graphical representation of dependence relations
    between entities of a problem domain. (model the
    world in a more natural way than Rule-based
    systems and neural network)
  • Handle uncertainty knowledge in mathematically
    rigorous yet efficient and simple way
  • Provides a computational architecture for
    computing the impact of evidence nodes on
    beliefs(probabilities) of interested query nodes
  • Growing numbers of creative applications

Alarm Example the power of BBN
  • The Alarm network
  • 37 variables, 509 parameters (instead of 237)

  • Medical diagnostic systems
  • Real-time weapons scheduling
  • Jet-engines fault diagnosis
  • Intel processor fault diagnosis (Intel)
  • Generator monitoring expert system (General
  • Software troubleshooting (Microsoft office
    assistant, Win98 print troubleshooting)
  • Space shuttle engines monitoring(Vista project)
  • Biological sequences analysis and classification

Bayesian Networks Inference
  • Given an observed evidence, do some computation
    to answer queries
  • An evidence e is an assignment of values to a set
    of variables E in the domain, E Xk1, , Xn
  • For example, E e Visit Asia True, Smoke
  • Queries
  • The posteriori belief compute the conditional
    probability of a variable given the evidence,
  • P(Lung Cancer Visit Asia TRUE AND Smoke
    TRUE) ?
  • This kind of inference tasks is called
    Belief Updating
  • MPE compute the Most Probable Explanation given
    the evidence
  • An explanation for the evidence is a complete
    assignment X1 x1, , Xn xn that is
    consistent with evidence. Computing a MPE is
    finding an explanation such that no other
    explanation has higher probability
  • This kind of inference tasks is called Belief

Belief Updating
  • The problem is to compute P(XxEe) the
    probability of query nodes X, given the observed
    value of evidence nodes E e.
  • For example Suppose that a patient arrives and
    it is known for certain that he has recently
    visited Asia and has dyspnea.
  • - Whats the impact that this evidence has on
    the probabilities of the other variables in the
    network ? P(Lung Cancer) ?

Visit to Asia
Lung Cancer
tub. or lung cancer
Belief Revision
Let W is the set of all nodes in our given
Bayesian network Let the evidence e be the
observation that the roses are okay. Our goal is
to now determine the assignment to all nodes
which maximizes P(we).
We only need to consider assignments where the
node roses is set to okay and maximize P(w), i.e.
the most likely state of the world given the
evidence that rose is ok in this world.
The best solution then becomes -
P(sprinklers F, rain T, street wet, lawn
wet, soil wet, roses okay) 0.2646
Complexity of BBN Inference
  • Probabilistic Inference Using Belief Networks is
    NP-hard. Cooper 1990
  •  Approximating Probabilistic Inference in
    Bayesian Belief Networks is NP-hard Dagum 1993
  • Hardness does not mean we cannot solve inference.
    It implies that
  • We cannot find a general procedure that works
    efficiently for all networks
  • However, for particular families of networks, we
    can have provably efficient algorithms either
    exact or approximate
  • Instead of a general exact algorithm, we seek for
    special case, average case, approximate
  • Various of approximate, heuristic, hybrid and
    special case algorithms should be taken into

BBN Inference Algorithms
  • Exact algorithms
  • Pearls message propagation algorithm(for single
    connected networks only)
  • Variable elimination
  • Cutset conditioning
  • Clique tree clustering
  • SPI(Symbolic Probabilistic Inference)
  • Approximate algorithms
  • Partial evaluation methods by performing exact
    inference partially
  • Variational approach by exploiting averaging
    phenomena in dense networks(law of large numbers)
  • Search based algorithms by converting inference
    problem to an optimization problem, then using
    heuristic search to solve it
  • Stochastic sampling also called Monte Carlo

  • Singly Connected Networks(or Polytrees)

Definition A directed acyclic graph (DAG) in
which at most one undirected path exists between
any two nodes.
Multiple parents and/or multiple children
Polytree structure satisfies definition
Do not satisfy definition
Propagation Algorithm Objective
  • The algorithms purpose is fusing and
    propagating the impact of new evidence and
    beliefs through Bayesian networks so that each
    proposition eventually will be assigned a
    certainty measure consistent with the axioms of
    probability theory. (Pearl, 1988, p 143)

PolyTree Propagation Example
The impact of each new piece of evidence is
viewed as a perturbation that propagatesthroughth
e network via message-passing betweenneighboring
variables . . . (Pearl, 1988, p 143)
? Message to Parent
? Message from Parent
  • Exact algorithm, for Polytree only, linear in the
    size of the network

Cutset Conditioning Algorithm
  • Transfer the network into several simpler
    polytrees by conditioning the cutset and then
    call the Polytree propagation algorithm. Each
    simple network has one or more variable
    instantiated to a definite value. P(XE) is
    computed as a weighted average over the values
    computed by each polytree. Pearl 1988
  • A cutset is a set of nodes when instantiated will
    render the network single connected.
  • First exact algorithm for multiple connected
    networks, exponential time complexity in the
    size of the cutset.
  • There are exponentially many such cutset

Clique Tree Clustering Algorithm
  • Transform the network into a tree of cliques,
    then computes probabilities for the cliques
    during a two-way message passing and the
    individual node probabilities P(XE) are
    calculated from the probabilities of cliques
  • A clique W of G is a maximal complete subset of
    G, that is, there is no other complete subset of
    G which properly contains W
  • The most common used exact inference algorithm
    for general networks
  • Efficient for sparse networks, but could have a
    very bad performance for more general, dense
  • Exact, for multiple connected networks,
    exponential time complexity in the size of the

Clique tree clustering
Identify Cliques
?,? Message passing
P(Clqi) and P(XE)
Form Clique Tree
Variable Elimination Algorithm
  • General idea
  • Write query in the form
  • Iteratively
  • Move all irrelevant terms outside of innermost
  • Perform innermost sum, getting a new term
  • Insert the new term into the product
  • Computation depends on order of elimination, a
    good elimination orderings can reduce
  • The size of the largest clique in the induced
    graph is thus an indicator for the complexity of
    variable elimination. This quantity is called the
    induced width of a graph according to the
    specified ordering
  • Finding an ordering that minimizes the induced
    width is NP-Hard
  • Exact, for all networks, exponential time
    complexity, inefficient

SPI(Symbolic Probabilistic Inference)
  • General idea
  • Transform BBN inference problem into a
    well-defined combinatorial optimization problem -
    the Optimal Factoring Problem(OFP). Thus the
    problem becomes to find an optimal factoring
    given a set of probability distribution. The
    solution of the OFP is then used to combine the
    CPT that describe the BBN and extract the desired
    marginal distribution.
  • OFP itself is NP-Hard.
  • Exact, for all networks, exponential time
    complexity, inefficient

Factoring 1 needs 72 multiplications
Factoring 2 needs only 28 multiplications
Approximate Algorithms
  • Exact Inference for large-scale networks is
    apparently infeasible.
  • Real life network can be up to thousands nodes.
  • For example QMR(Quick medical Reference)
    consists of a combination of statistical
  • and expert knowledge for approximately 600
    significant diseases and 4000 findings.
  • The median size of the maximal clique of the
    moralized graph is 151.5 nodes. Its
  • intractable for all exact inference algorithms.
  • Approximate algorithms can be categorized into
  • Partial evaluation methods by performing exact
    inference partially
  • Variational approach by exploiting averaging
    phenomena in dense networks(law of large numbers)
  • Search based algorithms by converting inference
    problem to an optimization problem, then using
    heuristic search to solve it
  • Stochastic sampling also called Monte Carlo

Perform Exact Algorithm Partially
  • General idea reduce the complexity by reducing
    the solution space
  • Partial sets of nodes instantiation
  • Partial sets of hypotheses
  • Partial set of nodes
  • Bounded conditioningCooper 1991
  • Localized partial evaluationDraper 1994
  • incremental SPIDAmbrosio 1993
  • Probabilistic partial evaluationPoole 1997
  • Mini-buckets algorithmDechter 1997
  • Approximate, for all networks, complexity not

Variational Method
  • General idea exploit averaging phenomena in
    dense graph
  • A sum can be avoided if it contains a sufficient
    number of terms such that a law of large numbers
    can be invoked
  • Graphically, the model is transformed into a
    sub-graph of the original model in which some of
    the finding nodes are delinked until its
    possible to run an exact algorithm on the
    resulting graph. Jaakkola Jordan 1999
  • Approximate, efficient, for dense graph only

Search based algorithms
  • General idea Convert the problem into an
    optimization problem then use heuristic search
    to solve it.
  • Consider node instantiations across the entire
  • Exploiting characteristics of problem domain to
    help search
  • A general hop is that a relatively small fraction
    of the exponentially many node instantiations
    contains a majority of the probability mass, and
    by exploring the high probability
    instantiations(bounding the unexplored
    probability mass) one can obtain reasonable
    bounds on posterior probabilities.
  • Cooper 1985, Peng Reggia 1987, Henrion 1991
  • Best-first search(A), linear programming,
    genetic algorithm
  • Charniak 1994, Santos 1993, Carlos 1993
  • Approximate, heuristic, maybe fail

Stochastic Sampling Algorithms
  • General idea Run repeated simulations according
    to the BBN, the probability of an event of
    interest is estimated using the frequency with
    which that event occurs in a set of samples.
  • Logic sampling henrion 1988
  • forward sampling
  • backward sampling Fung 1994
  • Likelihood weighting Fung Chang 1990
  • Important sampling Shachter 1990
  • Approximate, performance depends only on the
    CPTs, can handle very large networks, but has
    difficulty with extremely unlikely events.

Inference Algorithm Conclusions
  • The general problem of exact inference is
  • The general problem of approximate inference is
  • Exact inference works for small, sparse networks
  • No single champion either exact or inference
  • The goal of research should be that of
    identifying effective approximate techniques that
    work well in large classes of problems.
  • Another direction is the integration of various
    kinds of approximate and exact algorithms
    exploiting the best characteristics of each

A Distributed Anytime Inference Architecture
  • On a Distributed Anytime Architecture for
    Probabilistic Reasoning
  • Air Force Institute of Technology
  • Eugene Santos Jr. , 1995

Anytime algorithms
  • To meet the demand for real-time inference, an
    inference algorithm must have two capibilities
  • Provide a near optimal solution at any given
  • Improving upon solutions as more time and
    resources are allocated
  • Algorithms which have this property of producing
    a solution at any point in time are called
    anytime algorithms

Anywhere Algorithms
  • To exploit parallelism and distributed processing
    to reduce the time complexity, the tasks in the
    distributed environment must be able to exploit
    intermediate results produced by the other
    components of the system.
  • Algorithms with this property are called
    anywhere algorithms.
  • When different algorithms having both anytime and
    anywhere properties are harnessed together into a
    cooperative system, the resultant architecture
    can exploit the best characteristics of each

The OVERMIND Architecture
  • Part of PESKI, an online expert system for engine
    diagnosis for the Space Shuttle Program
  • Three components
  • IRA(Intelligent Resource Allocator)
  • Manages and allocates available computing
  • OVERSEER(Overseer task Manager)
  • Initiates new tasks, directs messages/information
  • LOTS(Library of Tasks)
  • A set of BBN inference algorithms suitable for
    performing various including an A search
    algorithm, a genetic algorithm, an integer linear
    programming algorithm and a hybrid stochastic

General Idea
  • The best algorithm to use is problem-instance
  • In a set of anywhere algorithms, if each
    particular algorithm is good at certain portion
    of a problem we can then take the partial
    solution of an algorithm and pass it to another
    approach which itself works better on the new
  • This leads to an anytime anywhere solution

Genetic Algorithms
  • A heuristic search algorithm modeled after
    natural genetic evolutions
  • Has anytime and anywhere property.
  • No stopping criterion that guarantees an optimal
  • Its ability to generate solutions early can serve
    as a starting point if possible for other
    deterministic algorithm.

Best-First Search(A)
  • A heuristic algorithm searching for optimal
    solution from initial state
  • Provide an approximate answer when interrupted
  • Allow the algorithm to accept initial guess from
    another sources
  • Use Best-first search to find the most probable
    complete instantiation among those compatible
    with the guess

IRA(Intelligent Resource Allocator)
  • Serve to maximize processor use by coordinating
    requests for resources from OVERSEER and the
    tasks themselves.
  • Hardware a network of workstations
  • Identify resource requirements for different
  • GA single CPU
  • ILP multi processing

The OVERSEER(Task manager)
  • Currently simple messager role
  • Advance capabilities involve deliberation
    scheduling employing meta-reasoning to consider
    what computational tasks to execute.
  • To do this, some estimate of runtime and quality
    of results should be available foe each

Implementation and results
  • The strengths of different methods are combined
  • Gas produce reasonable solution immediately
  • A took those solutions near some maximas
  • HySS fine-tuned those maximas
  • ILP finished the optimization and generated te
    optimal solution
  • Result
  • Initial test multiple instances of GAs
  • GAs 20 speed up
  • HySS 35 speed up
  • A and ILP 1525 speed up

  • Exploited the anytime anywhere properties of
    several inference algorithms such as Gas, ILP and
    A and unified them into a single model of
    parallel computation.
  • The architecture can use the best characteristics
    of each algorithm.

Future Research
  • Consider more algorithms
  • Study the relationship between the problem domain
    and the corresponding solutions domain to help
    deliberation scheduling.

The End
  • Any Questions ?

Linear Programming
  • The problem of finding the most probable
    explanation has been transformed into an integer
    linear programming problem with a set of
    constraints to satisfied.
  • Efficient algorithms for linear programming can
    be used to compute the optimal solution