Estimation Of Distribution Algorithm based on Markov Random Fields - PowerPoint PPT Presentation

About This Presentation

Title:

Estimation Of Distribution Algorithm based on Markov Random Fields

Description:

Estimation Of Distribution Algorithm based on Markov Random Fields Siddhartha Shakya School Of Computing The Robert Gordon University Outline From GAs to EDAs ... – PowerPoint PPT presentation

Number of Views:140

Avg rating:3.0/5.0

Slides: 63

Provided by: Comput543

Category:

more less

Transcript and Presenter's Notes

Title: Estimation Of Distribution Algorithm based on Markov Random Fields

1
Estimation Of Distribution Algorithm based on
Markov Random Fields

Siddhartha Shakya
School Of Computing
The Robert Gordon University

2
Outline

From GAs to EDAs
Probabilistic Graphical Models in EDAs
Bayesian networks
Markov Random Fields
Fitness modelling approach to estimating and
sampling MRF in EDA
Gibbs distribution, energy function and modelling
the fitness
Estimating parameters (Fitness modelling
approach)
Sampling MRF (several different approaches)
Conclusion

3
Genetic Algorithms (GAs)

Population based optimisation technique
Based on Darwin's theory of Evolution
A solution is encoded as a set of symbols known
as chromosome
A population of solution is generated
Genetic operators are then applied to the
population to get next generation that replaces
the parent population

4
Simple GA simulation
5
GA to EDA
6
Simple EDA simulation
0
0
1
1
1
1
1
1
0
1
0.5
0.5
1.0
1.0
0.5
1
0
1
0
1
0
1
1
1
1
7
Joint Probability Distribution (JPD)

Solution as a set of random variables
Joint probability Distribution (JPD)
Exponential to the number of variables, therefore
not feasible to calculate in most cases
Needs Simplification!!

8
Factorisation of JPD

Univariate model No interaction Simplest model
Bivariate model Pair-wise interaction
Multivariate Model interaction of more than two
variables

9
Typical estimation and sampling of JPD in EDAs

Learn the interaction between variables in the
solution
Learn the probabilities associated with
interacting variables
This specifies the JPD p(x)
Sample the JPD (i.e. learned probabilities)

10
Probabilistic Graphical Models

Efficient tool to represent the factorisation of
JPD
Marriage between probability theory and Graph
theory
Consist of Two components
Structure
Parameters
Two types of PGM
Directed PGM (Bayesian Networks)
Undirected PGM (Markov Random Field)

11
Directed PGM (Bayesian networks)

Structure
Directed Acyclic Graph (DAG)
Independence relationship
A variable is conditionally independent of rest
of the variables given its parents
Parameters
Conditional probabilities

12
Bayesian networks

The factorisation of JPD encoded in terms of
conditional probabilities is
JPD for BN

13
Estimating a Bayesian network

Estimate structure
Estimate parameters
This completely specifies the JPD
JPD can then be Sampled

14
BN based EDAs

Initialise parent solutions
Select a set from parent solutions
Estimate a BN from selected set
Estimate structure
Estimate parameters
Sample BN to generate new population
Replace parents with new set and go to 2 until
termination criteria satisfies

15
How to estimate and sample BN in EDAs

Estimating structure
Score Search techniques
Conditional independence test
Estimating parameters
Trivial in EDAs Dataset is complete
Estimate probabilities of parents before child
Sampling
Probabilistic Logical Sampling (Sample parents
before child)

16
BN based EDAs

Well established approach in EDAs
BOA, EBNA, LFDA, MIMIC, COMIT, BMDA
References
Larrañiaga and Lozano 2002
Pelikan 2002

17
Markov Random Fields (MRF)

Structure
Undirected Graph
Local independence
A variable is conditionally independent of rest
of the variables given its neighbours
Global Independence
Two sets of variables are conditionally
independent to each other if there is a third set
that separates them.
Parameters
potential functions defined on the cliques

X1
X3
X2
X4
X6
X5
18
Markov Random Field

The factorisation of JPD encoded in terms of
potential function over maximal cliques is
JPD for MRF

19
Estimating a Markov Random field

Estimate structure from data
Estimate parameters
Requires potential functions to be numerically
defined
This completely specifies the JPD
JPD can then be Sampled
No specific order (not a DAG) so a bit problematic

20
MRF in EDA

Has recently been proposed as a estimation of
distribution technique in EDA
Shakya et al 2004, 2005
Santana et el 2003, 2005

21
MRF based EDA

Initialise parent solutions
Select a set from parent solutions
Estimate a MRF from selected set
Estimate structure
Estimate parameters
Sample MRF to generate new population
Replace parent with new solutions and go to 2
until termination criteria satisfies

22
How to estimate and sample MRF in EDA

Learning Structure
Conditional Independence test (MN-EDA, MN-FDA)
Linkage detection algorithm (LDFA)
Learning Parameter
Junction tree approach (FDA)
Junction graph approach (MN-FDA)
Kikuchi approximation approach (MN-EDA)
Fitness modelling approach (DEUM)
Sampling
Probabilistic Logic Sampling (FDA, MN-FDA)
Probability vector approach (DEUMpv)
Direct sampling of Gibbs distribution (DEUMd)
Metropolis sampler (Is-DEUMm)
Gibbs Sampler (Is-DEUMg, MN-EDA)

23
Fitness modelling approach

Hamersley Clifford theorem JPD for any MRF
follows Gibbs distribution
Energy of Gibbs distribution in terms of
potential functions over the cliques
Assuming probability of solution is proportional
to its fitness
From (a) and (b) a Model of fitness function -
MRF fitness model (MFM) is derived

24
MRF fitness Model (MFM)

Properties
Completely specifies the JPD for MRF
Negative relationship between fitness and Energy
i.e. Minimising energy maximise fitness
Task
Need to find the structure for MRF
Need to numerically define clique potential
function

25
MRF Fitness Model (MFM)

Let us start with simplest model univariate
model this eliminates structure learning )
For univariate model there will be n singleton
clique
For each singleton clique assign a potential
function
Corresponding MFM
In terms of Gibbs distribution

26
Estimating MRF parameters using MFM

Each chromosome gives us a linear equation
Applying it to a set of selected solution gives
us a system of linear equations
Solving it will give us the approximation to the
MRF parameters
Knowing MRF parameters completely specifies JPD
Next step is to sample the JPD

27
General DEUM framework

Distribution Estimation Using MRF algorithm
(DEUM)
Initialise parent population P
Select set D from P (can use DP !!)
Build a MFM and fit to D to estimate MRF
parameters
Sample MRF to generate new population
Replace P with new population and go to 2 until
termination criterion satisfies

28
How to sample MRF

Probability vector approach
Direct Sampling of Gibbs Distribution
Metropolis sampling
Gibbs sampling

29
Probability vector approach to sample MRF

Minimise U(x) to maximise f(x)
To minimise U(x) Each aixi should be minimum
This suggests if ai is negative then
corresponding xi should be positive
We could get an optimum chromosome for the
current population just by looking on a
However not always the current population
contains enough information to generate optimum
We look on sign of each ai to update a vector of
probability

30
DEUM with probability vector (DEUMpv)
31
Updating Rule

Uses sign of a MRF parameter to direct search
towards favouring value of respective variable
that minimises energy U(x)
Learning rate controls convergence

32
Simulation of DEUMpv
0
1
1
1
1
4
0
1
1
1
1
4
1
0
1
0
1
1
0
1
0
1
3
3
0.5
0.5
0.5
0.5
0.5
0
0
1
0
1
2
2
0
1
0
0
0
1
1
0.4
0.6
0.6
0.6
0.6
33
Results

OneMax Problem

34
Results

F6 function optimisation

35
Results

Trap 5 function

Deceptive problem
No solution found

36
Sampling MRF

Probability vector approach
Direct sampling of Gibbs distribution
Metropolis sampling
Gibbs sampling

37
Direct Sampling of Gibbs distribution

In the probability vector approach, only the sign
of MRF parameters has been used
However, one could directly sample from the Gibbs
distribution and make use of the values of MRF
parameters
Also could use the temperature coefficient to
manipulate the probabilities

38
Direct Sampling of Gibbs distribution
39
Direct Sampling of Gibbs distribution

The temperature coefficient has an important role
Decreasing T will cool probability to either 1 or
0 depending upon sign and value of alpha
This forms the basis for the DEUM based on direct
sampling of Gibbs distribution (DEUMd)

40
DEUM with direct sampling (DEUMd)

1. Generate initial population, P, of size M
2. Select the N fittest solutions, N M
3. Calculate MRF parameters
4. Generate M new solutions by sampling
univariate distribution
5. Replace P by new population and go to 2 until
complete

41
DEUMd simulation
0
1
1
1
1
4
1
0
1
1
1
4
0
1
1
0
1
3
0
1
0
1
0
2
42
Experimental results

OneMax Problem

43
F6 function optimization
44
Plateau Problem (n180)
45
Checker Board Problem (n100)
46
Trap function of order 5 (n60)
47
Experimental results
GA UMDA PBIL DEUMd
Checker Board Fitness 254.68 (4.39) 233.79 (9.2) 243.5 (8.7) 254.1 (5.17)
Checker Board Evaluation 427702.2 (1098959.3) 50228.2 (9127) 191476.8 (37866.65) 33994 (13966.75)
Equal-Products Fitness 211.59 (1058.47) 5.03 (18.29) 9.35 (43.36) 2.14 (6.56)
Equal-Products Evaluation 1000000 (0) 1000000 (0) 1000000 (0) 1000000 (0)
Colville Fitness 0.61 (1.02) 40.62 (102.26) 2.69 (2.54) 0.61 (0.77)
Colville Evaluation 1000000 (0) 62914.56 (6394.58) 1000000 (0) 1000000 (0)
Six Peaks Fitness 99.1 (9) 98.58 (3.37) 99.81 (1.06) 100 (0)
Six Peaks Evaluation 49506 (4940) 121333.76 (14313.44) 58210 (3659.15) 26539 (1096.45)
48
Analysis of Results

For Univariate problems (OneMax), given
population size of 1.5n, PD and T-gt0, solution
was found in single generation
For problems with low order dependency between
variables (Plateau and CheckerBoard), performance
was significantly better than that of other
Univariate EDAs.
For the deceptive problems with higher order
dependency (Trap function and Six peaks) DEUMd
was deceived but by slowing the cooling rate, it
was able to find solution for Trap of order 5.
For the problems where optimum was not known the
performance was comparable to that of GA and
other EDAs and was better in some cases.

49
Cost- Benefit Analysis (the cost)

Polynomial cost of estimating the distribution
compared to linear cost of other univariate EDAs

Cost to compute univariate marginal frequency

Cost to compute SVD

50
Cost- Benefit Analysis (the benefit)

DEUMd can significantly reduce the number of
fitness evaluations
Quality of solution was better for DEUMd than
other compared EDAs
DEUMd should be tried on problems where the
increased solution quality outweigh computational
cost.

51
Sampling MRF

Probability vector approach
Direct Sampling of Gibbs Distribution
Metropolis sampling
Gibbs sampling

52
Example problem 2D Ising Spin Glass
Given coupling constant J find the value of each
spins that minimises H
MRF fitness model
53
Metropolis Sampler
54
Difference in Energy
55
DEUM with Metropolis sampler
56
Results
57
Sampling MRF