Estimation Of Distribution Algorithm based on Markov Random Fields - PowerPoint PPT Presentation

About This Presentation
Title:

Estimation Of Distribution Algorithm based on Markov Random Fields

Description:

Estimation Of Distribution Algorithm based on Markov Random Fields Siddhartha Shakya School Of Computing The Robert Gordon University Outline From GAs to EDAs ... – PowerPoint PPT presentation

Number of Views:140
Avg rating:3.0/5.0
Slides: 63
Provided by: Comput543
Category:

less

Transcript and Presenter's Notes

Title: Estimation Of Distribution Algorithm based on Markov Random Fields


1
Estimation Of Distribution Algorithm based on
Markov Random Fields
  • Siddhartha Shakya
  • School Of Computing
  • The Robert Gordon University

2
Outline
  • From GAs to EDAs
  • Probabilistic Graphical Models in EDAs
  • Bayesian networks
  • Markov Random Fields
  • Fitness modelling approach to estimating and
    sampling MRF in EDA
  • Gibbs distribution, energy function and modelling
    the fitness
  • Estimating parameters (Fitness modelling
    approach)
  • Sampling MRF (several different approaches)
  • Conclusion

3
Genetic Algorithms (GAs)
  • Population based optimisation technique
  • Based on Darwin's theory of Evolution
  • A solution is encoded as a set of symbols known
    as chromosome
  • A population of solution is generated
  • Genetic operators are then applied to the
    population to get next generation that replaces
    the parent population

4
Simple GA simulation
5
GA to EDA
6
Simple EDA simulation
0
0
1
1
1
1
1
1
0
1
0.5
0.5
1.0
1.0
0.5
1
0
1
0
1
0
1
1
1
1
7
Joint Probability Distribution (JPD)
  • Solution as a set of random variables
  • Joint probability Distribution (JPD)
  • Exponential to the number of variables, therefore
    not feasible to calculate in most cases
  • Needs Simplification!!

8
Factorisation of JPD
  • Univariate model No interaction Simplest model
  • Bivariate model Pair-wise interaction
  • Multivariate Model interaction of more than two
    variables

9
Typical estimation and sampling of JPD in EDAs
  • Learn the interaction between variables in the
    solution
  • Learn the probabilities associated with
    interacting variables
  • This specifies the JPD p(x)
  • Sample the JPD (i.e. learned probabilities)

10
Probabilistic Graphical Models
  • Efficient tool to represent the factorisation of
    JPD
  • Marriage between probability theory and Graph
    theory
  • Consist of Two components
  • Structure
  • Parameters
  • Two types of PGM
  • Directed PGM (Bayesian Networks)
  • Undirected PGM (Markov Random Field)

11
Directed PGM (Bayesian networks)
  • Structure
  • Directed Acyclic Graph (DAG)
  • Independence relationship
  • A variable is conditionally independent of rest
    of the variables given its parents
  • Parameters
  • Conditional probabilities

12
Bayesian networks
  • The factorisation of JPD encoded in terms of
    conditional probabilities is
  • JPD for BN

13
Estimating a Bayesian network
  • Estimate structure
  • Estimate parameters
  • This completely specifies the JPD
  • JPD can then be Sampled

14
BN based EDAs
  • Initialise parent solutions
  • Select a set from parent solutions
  • Estimate a BN from selected set
  • Estimate structure
  • Estimate parameters
  • Sample BN to generate new population
  • Replace parents with new set and go to 2 until
    termination criteria satisfies

15
How to estimate and sample BN in EDAs
  • Estimating structure
  • Score Search techniques
  • Conditional independence test
  • Estimating parameters
  • Trivial in EDAs Dataset is complete
  • Estimate probabilities of parents before child
  • Sampling
  • Probabilistic Logical Sampling (Sample parents
    before child)

16
BN based EDAs
  • Well established approach in EDAs
  • BOA, EBNA, LFDA, MIMIC, COMIT, BMDA
  • References
  • Larrañiaga and Lozano 2002
  • Pelikan 2002

17
Markov Random Fields (MRF)
  • Structure
  • Undirected Graph
  • Local independence
  • A variable is conditionally independent of rest
    of the variables given its neighbours
  • Global Independence
  • Two sets of variables are conditionally
    independent to each other if there is a third set
    that separates them.
  • Parameters
  • potential functions defined on the cliques

X1
X3
X2
X4
X6
X5
18
Markov Random Field
  • The factorisation of JPD encoded in terms of
    potential function over maximal cliques is
  • JPD for MRF

19
Estimating a Markov Random field
  • Estimate structure from data
  • Estimate parameters
  • Requires potential functions to be numerically
    defined
  • This completely specifies the JPD
  • JPD can then be Sampled
  • No specific order (not a DAG) so a bit problematic

20
MRF in EDA
  • Has recently been proposed as a estimation of
    distribution technique in EDA
  • Shakya et al 2004, 2005
  • Santana et el 2003, 2005

21
MRF based EDA
  • Initialise parent solutions
  • Select a set from parent solutions
  • Estimate a MRF from selected set
  • Estimate structure
  • Estimate parameters
  • Sample MRF to generate new population
  • Replace parent with new solutions and go to 2
    until termination criteria satisfies

22
How to estimate and sample MRF in EDA
  • Learning Structure
  • Conditional Independence test (MN-EDA, MN-FDA)
  • Linkage detection algorithm (LDFA)
  • Learning Parameter
  • Junction tree approach (FDA)
  • Junction graph approach (MN-FDA)
  • Kikuchi approximation approach (MN-EDA)
  • Fitness modelling approach (DEUM)
  • Sampling
  • Probabilistic Logic Sampling (FDA, MN-FDA)
  • Probability vector approach (DEUMpv)
  • Direct sampling of Gibbs distribution (DEUMd)
  • Metropolis sampler (Is-DEUMm)
  • Gibbs Sampler (Is-DEUMg, MN-EDA)

23
Fitness modelling approach
  • Hamersley Clifford theorem JPD for any MRF
    follows Gibbs distribution
  • Energy of Gibbs distribution in terms of
    potential functions over the cliques
  • Assuming probability of solution is proportional
    to its fitness
  • From (a) and (b) a Model of fitness function -
    MRF fitness model (MFM) is derived

24
MRF fitness Model (MFM)
  • Properties
  • Completely specifies the JPD for MRF
  • Negative relationship between fitness and Energy
    i.e. Minimising energy maximise fitness
  • Task
  • Need to find the structure for MRF
  • Need to numerically define clique potential
    function

25
MRF Fitness Model (MFM)
  • Let us start with simplest model univariate
    model this eliminates structure learning )
  • For univariate model there will be n singleton
    clique
  • For each singleton clique assign a potential
    function
  • Corresponding MFM
  • In terms of Gibbs distribution

26
Estimating MRF parameters using MFM
  • Each chromosome gives us a linear equation
  • Applying it to a set of selected solution gives
    us a system of linear equations
  • Solving it will give us the approximation to the
    MRF parameters
  • Knowing MRF parameters completely specifies JPD
  • Next step is to sample the JPD

27
General DEUM framework
  • Distribution Estimation Using MRF algorithm
    (DEUM)
  • Initialise parent population P
  • Select set D from P (can use DP !!)
  • Build a MFM and fit to D to estimate MRF
    parameters
  • Sample MRF to generate new population
  • Replace P with new population and go to 2 until
    termination criterion satisfies

28
How to sample MRF
  • Probability vector approach
  • Direct Sampling of Gibbs Distribution
  • Metropolis sampling
  • Gibbs sampling

29
Probability vector approach to sample MRF
  • Minimise U(x) to maximise f(x)
  • To minimise U(x) Each aixi should be minimum
  • This suggests if ai is negative then
    corresponding xi should be positive
  • We could get an optimum chromosome for the
    current population just by looking on a
  • However not always the current population
    contains enough information to generate optimum
  • We look on sign of each ai to update a vector of
    probability

30
DEUM with probability vector (DEUMpv)
31
Updating Rule
  • Uses sign of a MRF parameter to direct search
    towards favouring value of respective variable
    that minimises energy U(x)
  • Learning rate controls convergence

32
Simulation of DEUMpv
0
1
1
1
1
4
0
1
1
1
1
4
1
0
1
0
1
1
0
1
0
1
3
3
0.5
0.5
0.5
0.5
0.5
0
0
1
0
1
2
2
0
1
0
0
0
1
1
0.4
0.6
0.6
0.6
0.6
33
Results
  • OneMax Problem

34
Results
  • F6 function optimisation

35
Results
  • Trap 5 function
  • Deceptive problem
  • No solution found

36
Sampling MRF
  • Probability vector approach
  • Direct sampling of Gibbs distribution
  • Metropolis sampling
  • Gibbs sampling

37
Direct Sampling of Gibbs distribution
  • In the probability vector approach, only the sign
    of MRF parameters has been used
  • However, one could directly sample from the Gibbs
    distribution and make use of the values of MRF
    parameters
  • Also could use the temperature coefficient to
    manipulate the probabilities

38
Direct Sampling of Gibbs distribution
39
Direct Sampling of Gibbs distribution
  • The temperature coefficient has an important role
  • Decreasing T will cool probability to either 1 or
    0 depending upon sign and value of alpha
  • This forms the basis for the DEUM based on direct
    sampling of Gibbs distribution (DEUMd)

40
DEUM with direct sampling (DEUMd)
  • 1. Generate initial population, P, of size M
  • 2. Select the N fittest solutions, N M
  • 3. Calculate MRF parameters
  • 4. Generate M new solutions by sampling
    univariate distribution
  • 5. Replace P by new population and go to 2 until
    complete

41
DEUMd simulation
0
1
1
1
1
4
1
0
1
1
1
4
0
1
1
0
1
3
0
1
0
1
0
2
42
Experimental results
  • OneMax Problem

43
F6 function optimization
44
Plateau Problem (n180)
45
Checker Board Problem (n100)
46
Trap function of order 5 (n60)
47
Experimental results
GA UMDA PBIL DEUMd
Checker Board Fitness 254.68 (4.39) 233.79 (9.2) 243.5 (8.7) 254.1 (5.17)
Checker Board Evaluation 427702.2 (1098959.3) 50228.2 (9127) 191476.8 (37866.65) 33994 (13966.75)
Equal-Products Fitness 211.59 (1058.47) 5.03 (18.29) 9.35 (43.36) 2.14 (6.56)
Equal-Products Evaluation 1000000 (0) 1000000 (0) 1000000 (0) 1000000 (0)
Colville Fitness 0.61 (1.02) 40.62 (102.26) 2.69 (2.54) 0.61 (0.77)
Colville Evaluation 1000000 (0) 62914.56 (6394.58) 1000000 (0) 1000000 (0)
Six Peaks Fitness 99.1 (9) 98.58 (3.37) 99.81 (1.06) 100 (0)
Six Peaks Evaluation 49506 (4940) 121333.76 (14313.44) 58210 (3659.15) 26539 (1096.45)
48
Analysis of Results
  • For Univariate problems (OneMax), given
    population size of 1.5n, PD and T-gt0, solution
    was found in single generation
  • For problems with low order dependency between
    variables (Plateau and CheckerBoard), performance
    was significantly better than that of other
    Univariate EDAs.
  • For the deceptive problems with higher order
    dependency (Trap function and Six peaks) DEUMd
    was deceived but by slowing the cooling rate, it
    was able to find solution for Trap of order 5.
  • For the problems where optimum was not known the
    performance was comparable to that of GA and
    other EDAs and was better in some cases.

49
Cost- Benefit Analysis (the cost)
  • Polynomial cost of estimating the distribution
    compared to linear cost of other univariate EDAs
  • Cost to compute univariate marginal frequency
  • Cost to compute SVD

50
Cost- Benefit Analysis (the benefit)
  • DEUMd can significantly reduce the number of
    fitness evaluations
  • Quality of solution was better for DEUMd than
    other compared EDAs
  • DEUMd should be tried on problems where the
    increased solution quality outweigh computational
    cost.

51
Sampling MRF
  • Probability vector approach
  • Direct Sampling of Gibbs Distribution
  • Metropolis sampling
  • Gibbs sampling

52
Example problem 2D Ising Spin Glass
Given coupling constant J find the value of each
spins that minimises H
MRF fitness model
53
Metropolis Sampler
54
Difference in Energy
55
DEUM with Metropolis sampler
56
Results
57
Sampling MRF
  • Probability vector approach
  • Direct Sampling of Gibbs Distribution
  • Metropolis sampling
  • Gibbs sampling

58
Conditionals from Gibbs distribution
For 2D Ising spin glass problem
59
Gibbs Sampler
60
DEUM with Gibbs sampler
61
Results
62
Summary
  • From GA to EDA
  • PGM approach to modelling and sampling
    distribution in EDA
  • DEUM MRF approach to modelling and sampling
  • Learn Structure No structure learning so far
    (Fixed models are used)
  • Learn Parameter Fitness modelling approach
  • Sample MRF
  • Probability vector approach to sample
  • Direct sampling of Gibbs distribution
  • Metropolis sampler
  • Gibbs Sampler
  • Results are encouraging and lot more to explore
Write a Comment
User Comments (0)
About PowerShow.com