CSE 634 Data Mining Concepts - PowerPoint PPT Presentation

About This Presentation
Title:

CSE 634 Data Mining Concepts

Description:

Title: PowerPoint Presentation Last modified by: Gibral Mohammed Abosamra Created Date: 1/1/1601 12:00:00 AM Document presentation format: On-screen Show (4:3) – PowerPoint PPT presentation

Number of Views:327
Avg rating:3.0/5.0
Slides: 85
Provided by: kauEduSa2
Category:
Tags: cse | concepts | data | design | mining

less

Transcript and Presenter's Notes

Title: CSE 634 Data Mining Concepts


1
CSE 634Data Mining Concepts TechniquesProf
Anita WasilewskaGenetic Algorithms (GAs)By
Group 1Abhishek Sharma, Mikhail Rubnich, George
Iordache, Marcela Boboila
2
General descriptionof the method

By Abhishek Sharma
3
References
  • DATA MINING Concepts and Techniques Jiawei
    Han, Micheline Kamber Morgan Kaufman Publishers,
    2003
  • Data Mining Techniques Class Lecture Notes and
    PP Slides.
  • http//cs.felk.cvut.cz/xobitko/ga/
  • Massachusetts Institute of Technology - Prof. de
    Weck and Prof. Willcox, Multidisciplinary System
    Design Optimization Course Lecture Notes on
    Heuristic Techniques, A Basic Introduction to
    Genetic Algorithms http//ocw.mit.edu/NR/rdonlyr
    es/Aeronautics-and-Astronautics/16-888Spring-2004/
    D66C4396-90C8-49BE-BF4A-4EBE39CEAE6F/0/MSDO_L11_GA
    .pdf

4
History of Genetic Algorithms
  • Evolutionary Computing was introduced in the
    1960s by I. Rechenberg.
  • Professor John Holland at the University of
    Michigan came up with book "Adaptation in Natural
    and Artificial Systems" explored the concept of
    using mathematically-based artificial evolution
    as a method to conduct a structured search for
    solutions to complex problems.
  • Dr. David E. Goldberg. In his 1989 landmark text
    "Genetic Algorithms in Search, Optimization and
    Machine Learning, suggested applications for
    genetic algorithms in a wide range of engineering
    fields.

5
What Are Genetic Algorithms (GAs)?
  • Genetic Algorithms are search and optimization
    techniques based on Darwins Principle of Natural
    Selection.
  • problems are solved by an evolutionary process
    resulting in a best (fittest) solution (survivor)
    ,
  • -In Other words, the solution is evolved
  • 1. Inheritance Offspring acquire
    characteristics
  • 2. Mutation Change, to avoid similarity
  • 3. Natural Selection Variations improve
    survival
  • 4. Recombination - Crossover

6
Genetics
  • Chromosome
  • All Living organisms consists of cells. In each
    cell there is a same set of Chromosomes.
  • Chromosomes are strings of DNA and consists of
    genes, blocks of DNA.
  • Each gene encodes a trait, for example color of
    eyes.
  • Reproduction
  • During reproduction, recombination (or crossover)
    occurs first. Genes from parents combine to form
    a whole new chromosome. The newly created
    offspring can then be mutated. The changes are
    mainly caused by errors in copying genes from
    parents.
  • The fitness of an organism is measure by success
    of the organism in its life (survival)

Citation http//ocw.mit.edu/NR/rdonlyres/Aeronau
tics-and-Astronautics/16-888Spring-2004/D66C4396-9
0C8-49BE-BF4A-4EBE39CEAE6F/0/MSDO_L11_GA.pdf
7
Principle Of Natural Selection
  • Select The Best, Discard The Rest
  • Two important elements required for any problem
    before a genetic algorithm can be used for a
    solution are
  • Method for representing a solution (encoding)
  • ex string of bits, numbers, character
  • Method for measuring the quality of any proposed
    solution, using fitness function
  • ex Determining total weight

8
GA Elements
Citation http//ocw.mit.edu/NR/rdonlyres/Aeronau
tics-and-Astronautics/16-888Spring-2004/D66C4396-9
0C8-49BE-BF4A-4EBE39CEAE6F/0/MSDO_L11_GA.pdf
9
Search Space
  • If we are solving some problem, we are usually
    looking for some solution, which will be the best
    among others. The space of all feasible solutions
    (it means objects among those the desired
    solution is) is called search space (also state
    space). Each point in the search space represent
    one feasible solution. Each feasible solution can
    be "marked" by its value or fitness for the
    problem.
  • Initialization
  • Initially many individual solutions are randomly
    generated to form an initial population, covering
    the entire range of possible solutions (the
    search space)
  • Each point in the search space represents one
    possible solution marked by its value( fitness)
  • Selection
  • A proportion of the existing population is
    selected to bread a new bread of generation.
  • Reproduction
  • Generate a second generation population of
    solutions from those selected through genetic
    operators crossover and mutation.
  • Termination
  • A solution is found that satisfies minimum
    criteria
  • Fixed number of generations found
  • Allocated budget (computation, time/money)
    reached
  • The highest ranking solutions fitness is
    reaching or has reached

10
Methodology Associated with GAs
Begin
Initialize population
Evaluate Solutions
T 0 (first step)
Optimum Solution?
N
Selection
Y
TT1 (go to next step)
Stop
Crossover
Mutation
Citation http//cs.felk.cvut.cz/xobitko/ga/
11
Creating a GA on Computer
Simple_Genetic_Algorithm() Initialize the
Population Calculate Fitness Function While(F
itness Value ! Optimal Value) Selection//Na
tural Selection, Survival Of Fittest Crossover
//Reproduction, Propagate favorable
characteristics Mutation//Mutation Calculate
Fitness Function
12
Nature Vs Computer - Mapping
Nature Computer
Population Individual Fitness Chromosome Gene Reproduction Set of solutions. Solution to a problem. Quality of a solution. Encoding for a Solution. Part of the encoding of a solution. Crossover
13
Encoding
  • The process of representing the solution in the
    form of a string that conveys the necessary
    information.
  • Just as in a chromosome, each gene controls a
    particular characteristic of the individual,
    similarly, each element in the string represents
    a characteristic of the solution.

14
Encoding Methods
  • Binary Encoding Most common method of encoding.
    Chromosomes are strings of 1s and 0s and each
    position in the chromosome represents a
    particular characteristic of the problem.
  • Permutation Encoding Useful in ordering
    problems such as the Traveling Salesman Problem
    (TSP). Example. In TSP, every chromosome is a
    string of numbers, each of which represents a
    city to be visited.

15
Encoding Methods (contd.)
  • Value Encoding Used in problems where
    complicated values, such as real numbers, are
    used and where binary encoding would not suffice.
  • Good for some problems, but often necessary
    to develop some specific crossover and mutation
    techniques for these chromosomes.

16
Encoding Methods (contd.)
  • Tree Encoding This encoding is used mainly for
    evolving programs or expressions, i.e. for
    Genetic programming.
  • Tree Encoding - every chromosome is a tree of
    some objects, such as values/arithmetic operators
    or commands in a programming language.

(  x  ( /  5  y ) )
( do_until  step  wall )
Citation http//ocw.mit.edu/NR/rdonlyres/Aeronau
tics-and-Astronautics/16-888Spring-2004/D66C4396-9
0C8-49BE-BF4A-4EBE39CEAE6F/0/MSDO_L11_GA.pdf
17
GA Operators

By Mikhail Rubnich
18
References
  • DATA MINING Concepts and Techniques Jiawei
    Han, Micheline Kamber Morgan Kaufman Publishers,
    2003
  • http//www.ai-junkie.com/ga/intro/gat2.html
  • http//www.faqs.org/faqs/ai-faq/genetic/part2/
  • http//en.wikipedia.org/wiki/Genetic_algorithms

19
Citation http//www.ewh.ieee.org/soc/es/May2001/1
4/GA.GIF
20
Basic GA Operators
  • Recombination
  • Crossover - Looking for solutions near
    existing solutions
  • Mutation - Looking at completely new
    areas of search space

21
Fitness function
  • quantifies the optimality of a solution (that is,
    a chromosome) that particular chromosome may be
    ranked against all the other chromosomes
  • A fitness value is assigned to each solution
    depending on how close it actually is to solving
    the problem.
  • Ideal fitness function correlates closely to goal
    quickly computable.
  • For instance, knapsack problem
  • Fitness Function Total value of the things in
    the knapsack

22
Recombination
  • Main idea "Select The Best, Discard The Rest.
  • The process that chooses solutions to be
    preserved and allowed to reproduce and selects
    which ones must to die out.
  • The main goal of the recombination operator is
    to emphasize the good solutions and eliminate the
    bad solutions in a population ( while keeping the
    population size constant )

23
So, how to select the best?
  • Roulette Selection
  • Rank Selection
  • Steady State Selection
  • Tournament Selection

24
Roulette wheel selection
  • Main idea the fitter is the solution with
    the most chances to be chosen
  • HOW IT WORKS ?

25
Example of Roulette wheel selection
No. String Fitness Of Total
1 01101 169 14.4
2 11000 576 49.2
3 01000 64 5.5
4 10011 361 30.9
Total 1170 100.0
Citation www.cs.vu.nl/gusz/
26
Roulette wheel selection
All you have to do is spin the ball and grab the
chromosome at the point it stops ?
27
Crossover
  • Main idea combine genetic material ( bits ) of
    2 parent chromosomes ( solutions ) and
    produce a new child possessing characteristics
    of both parents.
  • How it works ?
  • Several methods .

28
Crossover methods
  • Single Point Crossover- A random point is chosen
    on the individual chromosomes (strings) and the
    genetic material is exchanged at this point.
  • Citation http//www.ewh.ieee.org/soc/es/May2001/
    14/CROSS0.GIF

29
Crossover methods
  • Two-Point Crossover- Two random points are chosen
    on the individual chromosomes (strings) and the
    genetic material is exchanged at these points.

Chromosome1 11011 00100 110110
Chromosome 2 10101 11000 011110
Offspring 1 10101 00100 011110
Offspring 2 11011 11000 110110
NOTE These chromosomes are different from the
last example.
30
Crossover methods
  • Uniform Crossover- Each gene (bit) is selected
    randomly from one of the corresponding genes of
    the parent chromosomes.

Chromosome1 11011 00100 110110
Chromosome 2 10101 11000 011110
Offspring 10111 00000 110110
NOTE Uniform Crossover yields ONLY 1 offspring.
31
Crossover (contd.)
  • Crossover between 2 good solutions MAY NOT ALWAYS
    yield a better or as good a solution.
  • Since parents are good, probability of the child
    being good is high.
  • If offspring is not good (poor solution), it will
    be removed in the next iteration during
    Selection.

32
Elitism
  • Main idea copy the best chromosomes (solutions)
    to new population before applying crossover and
    mutation
  • When creating a new population by crossover or
    mutation the best chromosome might be lost.
  • Forces GAs to retain some number of the best
    individuals at each generation.
  • Has been found that elitism significantly
    improves performance.

33
Mutation
  • Main idea random inversion of bits in
    solution to maintain diversity in population set
  • Ex. giraffes - mutations could be beneficial.
  • Citation http//www.ewh.ieee.org/soc/es/May2001/1
    4/MUTATE0.GIF

34
Advantages and disadvantages
  • Advantages
  • Always an answer answer gets better with time
  • Good for noisy environments
  • Inherently parallel easily distributed
  • Issues
  • Performance
  • Solution is only as good as the evaluation
    function (often hardest part)
  • Termination Criteria

35
Applications - Genetic programming and data
mining

By George Iordache
36
  • A.A. Freitas. A survey of evolutionary
    algorithms for data mining and knowledge
    discovery, Pontificia Universidade Catolica do
    Parana, Brazil. In A. Ghosh and S. Tsutsui,
    editors, Advances in Evolutionary Computation,
    pages 819--845. Springer-Verlag,
    2002.http//citeseer.ist.psu.edu/cache/papers/cs/
    23050/httpzSzzSzwww.ppgia.pucpr.brzSzalexzSzpub_
    papers.dirzSzAdvEC-bk.pdf/freitas01survey.pdf
  • Anita Wasilewska, Course Lecture Notes (2007 and
    previous years) on Classification (Data Mining
    book Chapters 5 and 7) -
  • http//www.cs.sunysb.edu/cse634/lecture_notes/07
    classification.pdf
  • J. Han, and M. Kamber. Data Mining Concepts
    and Techniques 2nd ed., Morgan Kaufmann
    Publishers, March 2006. ISBN 1-55860-901-6
  • R. Mendes, F. Voznika, A. Freitas, and J.
    Nievola. Discovering fuzzy classification rules
    with genetic programming and co-evolution,
    Pontificia Universidade Catolica do Parana,
    Brazil. In L. de Raedt and A. Siebes, editors,
    5th European Conference on Principles and
    Practice of Knowledge Discovery in Databases
    (PKDD'01), volume 2168 of LNAI, pages 314--325.
    Springer Verlag, 2001. http//citeseer.ist.psu.edu
    /cache/papers/cs/23050/httpzSzzSzwww.ppgia.pucpr.
    brzSzalexzSzpub_papers.dirzSzPKDD-2001.pdf/mendes
    01discovering.pdf
  • John R. Koza, Medical Informatics, Department of
    Medicine, Department of Electrical Engineering,
    Stanford University, Genetic algorithms and
    genetic programming, Lecture notes, 2003.
  • www.genetic-programming.com/c2003lecture1modified
    .ppt

37
Genetic Programming
  • A program in C
  • int foo (int time)
  • int temp1, temp2
  • if (time gt 10)
  • temp1 3
  • else
  • temp1 4
  • temp2 temp1 1 2
  • return (temp2)
  • Equivalent expression (similar to a
    classification rule in data mining)
  • ( 1 2 (IF (gt TIME 10) 3 4))

Citation www.genetic-programming.com/c2003lecture
1modified.ppt
38
Program tree
( 1 2 (IF (gt TIME 10) 3 4))
Citation www.genetic-programming.com/c2003lecture
1modified.ppt
39
Given data
Input Independent variable X Output Dependent variable Y
-1.00 1.00
-0.80 0.84
-0.60 0.76
-0.40 0.76
-0.20 0.84
0.00 1.00
0.20 1.24
0.40 1.56
0.60 1.96
0.80 2.44
1.00 3.00
Citation www.genetic-programming.com/c2003lecture
1modified.ppt
40
Problem description
Objective Find a computer program with one input (independent variable X) whose output Y equals the given data
1 Terminal set T X, Random-Constants
2 Function set F , -, , /
3 Initial population Randomly created individuals from elements in T and F.
4 Fitness y0 y0 y1 y1 where yi is computed output and yi is given output for xi in the range -1,1
5 Termination An individual emerges whose sum of absolute errors (the value of its fitness function) is less than 0.1
Citation www.genetic-programming.com/c2003lecture
1modified.ppt
41
Generation 0
Population of 4 randomly created individuals
x
x 1
x2 1
2
Citation examples taken from www.genetic-program
ming.com/c2003lecture1modified.ppt
42
X Y X1 X1-Y X21 X21-Y 2 2-Y X X-Y
-1.00 1.00 0 1 2 1 2 1 -1.00 2
-0.80 0.84 0.20 0.64 1.64 0.80 2 1.16 -0.80 1.64
-0.60 0.76 0.40 0.36 1.36 0.60 2 1.24 -0.60 1.36
-0.40 0.76 0.60 0.16 1.16 0.40 2 1.24 -0.40 1.16
-0.20 0.84 0.80 0.04 1.04 0.20 2 1.16 -0.20 1.04
0.00 1.00 1.00 0 1 0 2 1 0.00 1
0.20 1.24 1.20 0.04 1.04 0.20 2 0.76 0.20 1.04
0.40 1.56 1.40 0.16 1.16 0.40 2 0.44 0.40 1.16
0.60 1.96 1.60 0.36 1.36 0.60 2 0.04 0.60 1.36
0.80 2.44 1.80 0.64 1.64 0.80 2 0.44 0.80 1.64
1.00 3.00 2.00 1 2 1 2 1 1.00 2
S
S
S
S
Fitness
4.40
6.00
9.48
15.40
Best in Gen 0
43
Mutation
Mutation   picking 2 as mutation point
/
Citation part of the pictures used as examples
are taken from www.genetic-programming.com/c2003l
ecture1modified.ppt
44
Crossover
Crossover   picking subtree and leftmost x
as crossover points
Citation example taken from www.genetic-programm
ing.com/c2003lecture1modified.ppt
45
Generation 1
/
Citation part of the examples is taken from
www.genetic-programming.com/c2003lecture1modified.
ppt
46
X Y X1 X1-Y 1 1-Y X X-Y X2X1 X2X1-Y
-1.00 1.00 0 1 1 0 -1.00 2 1 0
-0.80 0.84 0.20 0.64 1 0.16 -0.80 1.64 0.84 0
-0.60 0.76 0.40 0.36 1 0.24 -0.60 1.36 0.76 0
-0.40 0.76 0.60 0.16 1 0.24 -0.40 1.16 0.76 0
-0.20 0.84 0.80 0.04 1 0.16 -0.20 1.04 0.84 0
0.00 1.00 1.00 0 1 0 0.00 1 1 0
0.20 1.24 1.20 0.04 1 0.24 0.20 1.04 1.24 0
0.40 1.56 1.40 0.16 1 0.56 0.40 1.16 1.56 0
0.60 1.96 1.60 0.36 1 0.96 0.60 1.36 1.96 0
0.80 2.44 1.80 0.64 1 1.44 0.80 1.64 2.44 0
1.00 3.00 2.00 1 1 2 1.00 2 3 0
S
S
S
S
Fitness
4.40
6.00
15.40
0.00
Found!
47
GA Classification
Classify customers based on number of children
and salary
Parameter of children (NOC) Salary (S)
Domain 010 0500000
Syntax of atomic expression NOC x NOC lt x NOC lt x NOC gt x NOC gt x S x S lt x S gt x
Citation data table is taken from prof. Anita
Wasilewska previous years course slides
48
GA Classification Rules
  • A classification rule is of the form (the rule is
    in a predicate form see course lectures)
  • IF formula THEN classci
  • Antecedent Consequence

49
Formula representation
  • Possible rule
  • If (NOC 2) AND ( S gt 80000) then GOOD
    (customer)

Formula
Class
AND

gt
NOC
2
S
80000
Citation the example is taken from prof. Anita
Wasilewska previous years course slides
50
Initial data table
Nr. Crt. Number of children (NOC) Salary(S) Type of customer (C)
1 2 gt 80000 GOOD
2 1 gt 30000 GOOD
3 0 50000 GOOD
4 gt 2 lt 10000 BAD
5 10 30000 BAD
6 5 lt 30000 BAD
51
Initial data (written as rules inferred from the
initial table)
  • Rule 1 If (NOC 2) AND ( S gt 80000) then C
    GOOD
  • Rule 2 If (NOC 1) AND ( S gt 30000) then C
    GOOD
  • Rule 3 If (NOC 0) AND ( S 50000) then C
    GOOD
  • Rule 4 If (NOC gt 2) AND ( S lt 10000) then C
    BAD
  • Rule 5 If (NOC 10) AND ( S 30000) then C
    BAD
  • Rule 6 If (NOC 5) AND ( S lt 30000) then C
    BAD

52
Generation 0
  • Population of 3 randomly created individuals
  • If (NOC gt 3) AND ( S gt 10000) then C GOOD
  • If (NOC gt 1) AND ( S gt 30000) then C GOOD
  • If (NOC gt 0) AND ( S lt 40000) then C GOOD
  • We want to find a more general (if it is possible
    the most general) characteristic description
    for class GOOD gt assign predicted class GOOD for
    all individuals

53
Generation 0
AND
Individual 1
gt
gt
NOC
3
S
10000
(NOC gt 3) AND ( S gt 10000)
AND
AND
Individual 2
Individual 3
gt
lt
gt
gt
NOC
0
S
40000
NOC
1
S
30000
(NOC gt 0) AND ( S lt 40000)
(NOC gt 1) AND ( S gt 30000)
54
Fitness function
  • For one rule (IF A THEN C)
  • CF (Confidence factor)
  • A number of records that satisfy A
  • AUC number of records that satisfy A and are
    in predicted class C 

AUC A
Citation the confidence formula is taken from
class slides http//www.cs.sunysb.edu/cse634/lec
ture_notes/07association.pdf
55
Fitness function Generation 0
  • Rule 1 If (NOC 2) AND ( S gt 80000) then GOOD
  • Rule 2 If (NOC 1) AND ( S gt 30000) then GOOD
  • Rule 3 If (NOC 0) AND ( S 50000) then GOOD
  • Rule 4 If (NOC gt 2) AND ( S lt 10000) then BAD
  • Rule 5 If (NOC 10) AND ( S 30000) then BAD
  • Rule 6 If (NOC 5) AND ( S lt 30000) then BAD
  • Fitness of Individual 1 If (NOC gt 3) AND ( S gt
    10000) then GOOD
  • A 2 (Rule 5 6), AUC 0, CF 0 /
    2 0
  • Fitness of Individual 2 If (NOC gt 1) AND ( S gt
    30000) then GOOD
  • A 1 (Rule 1), AUC 1, CF 1 / 1
    1
  • Fitness of Individual 3 If (NOC gt 0) AND ( S lt
    40000) then GOOD
  • A 4 (Rule 2 4 5 6), AUC 1,
    CF 1 / 4 0.25

Best in Gen 0
56
Mutation
Mutation
AND
AND
gt
lt
gt
lt
NOC
0
S
40000
NOC
0
S
90000
(NOC gt 0) AND ( S lt 40000)
(NOC gt 0) AND ( S lt 90000)
57
Crossover
AND
AND
gt
gt
gt
lt
S
30000
S
40000
NOC
1
1
NOC
(NOC gt 1) AND ( S lt 40000)
(NOC gt 1) AND ( S gt 30000)
Crossover
AND
AND
gt
lt
gt
gt
NOC
0
S
40000
NOC
0
S
30000
(NOC gt 0) AND ( S lt 40000)
(NOC gt 0) AND ( S gt 30000)
58
Generation 1
AND
Individual 1
AND
Individual 2
gt
gt
gt
lt
NOC
0
S
30000
S
40000
1
NOC
(NOC gt 1) AND ( S lt 40000)
(NOC gt 0) AND ( S gt 30000)
AND
Individual 3
gt
lt
NOC
0
S
90000
(NOC gt 0) AND ( S lt 90000)
59
Fitness function Generation 1
  • Rule 1 If (NOC 2) AND ( S gt 80000) then GOOD
  • Rule 2 If (NOC 1) AND ( S gt 30000) then GOOD
  • Rule 3 If (NOC 0) AND ( S 50000) then GOOD
  • Rule 4 If (NOC gt 2) AND ( S lt 10000) then BAD
  • Rule 5 If (NOC 10) AND ( S 30000) then BAD
  • Rule 6 If (NOC 5) AND ( S lt 30000) then BAD
  • Individual 1 If (NOC gt 1) AND ( S lt 40000) then
    GOOD
  • A 2 (Rule 4 5 6), AC 0, CF
    0 / 2 0
  • Individual 2 If (NOC gt 0) AND ( S gt 30000) then
    GOOD
  • A 3 (Rule 1 2 3), AC 3, CF
    3 / 3 1
  • Individual 3 If (NOC gt 0) AND ( S lt 90000) then
    GOOD
  • A 5 (Rule 1 2 4 5 6), AC
    1, CF 1 / 5 0.2

Best in Gen 1
60
GA Operators on Rules Flockhartss paper
approach

By Marcela Boboila
61
  • I.W. Flockhart and N.J. Radcliffe. GA-MINER
    parallel data mining with hierarchical genetic
    algorithms - final report. EPCC-AIKMS-GAMINER
    -Report 1.0. University of Edinburgh, UK, 1995.
  • http//coblitz.codeen.org3125/citeseer.ist.psu.e
    du/cache/papers/cs/3487/httpzSzzSzwww.quadstone.c
    o.ukzSzianzSzaikmszSzreport.pdf/flockhart95gamine
    r.pdf
  • I. W. Flockhart and N. J. Radcliffe, "A genetic
    algorithm-based approach to data mining," in The
    Second International Conference on Knowledge
    Discovery and Data Mining (KDD-96), (Portland,
    OR), p. 299-302, AAAI Press, Aug. 2-4 1996.
  • http//citeseer.ist.psu.edu/cache/papers/cs/3487/
    httpzSzzSzwww.quadstone.co.ukzSzianzSzaikmszSzkd
    d96a.pdf/flockhart96genetic.pdf

62
From rules to subset descriptions
  • Step 1 We have the following rules, that
    describe part of the data table
  • Rule 1 A1 gt C
  • Rule 2 A2 gt C
  • Rule n An gt C
  • Step 2 (A1 U A2 U An) gt C
  • Step 3 We look only at the antecedent to get the
    subset description
  • (A1 U A2 U An)

63
Part of the data table. An example
Nr. Crt. Age Hobby Class C
1 20 .. 30 dancing GOOD
2 25 .. 55 reading GOOD
Rule 1 If Age 20 .. 30 AND Hobby dancing
then GOOD Rule 2 If Age 25 .. 55 AND Hobby
reading then GOOD
A1
C
A2
C
A1 U A2 C
64
From rules to subset descriptions. An example
  • Step 1 We have the rules
  • Rule 1 If Age 20 .. 30 AND Hobby dancing
    then GOOD
  • Rule 2 If Age 25 .. 55 AND Hobby reading
    then GOOD
  • Step 2 We combine the antecedent part to form a
    single rule describing the subset of
    individuals in the same class
  • If ((Age 20 .. 30 AND Hobby dancing) OR (Age
    25 .. 55 AND Hobby reading)) then GOOD
  • Step 3 subset description antecedent part
  • Age 20 .. 30 AND Hobby dancing OR Age 25
    .. 55 AND Hobby reading

65
Subset description
  • or
  • and and
  • Age 20 .. 30 Hobby dancing
    Age 25 .. 55 Hobby reading

Term
Clause
66
Subset description
  • Chromosomes represented as subset descriptions.
  • Subsets consist of disjunction and conjunction of
    attribute value or attribute range constraints
  • Subset Description Clause or Clause
  • Clause Term
    and Term
  • Term Attribute in Value
    Set
  • Attribute in Range
  • E.g. Age 20 .. 30 and Hobby dancing or
    Age 25 .. 55 and Hobby reading

67
Crossover
  • Apply crossover at all levels, successively
  • Subset description crossover
  • Clause crossover (uniform or single-point)
  • Term crossover

68
Subset description crossover
Clause A1
Clause A2
Clause A3
OR
OR
Clause Crossover
Clause Crossover
rBias
Clause B1
Clause B2
Clause B4
OR
OR
(1 rBias)
Clause C1
Clause C2
Clause C3
Clause C4
OR
OR
OR
69
Subset description crossover
  • Consider the following 2 descriptors
    (chromosomes)
  • A Clause A1 or Clause A2 or Clause A3
  • B Clause B1 or Clause B2 or Clause B4
  • Apply clause crossover (uniform or single-point)
    to cross clause A1 with B1, and A2 with B2.
  • For clauses with no partner
  • Include A3 with probability rBias (first parent).
  • Include B4 with probability 1-rBias (second
    parent).

70
Uniform clause crossover
Age 20 .. 30
Height 1.5 .. 2.0
AND
Term Crossover
rBias
Age 0 .. 25
Hobby dancing
AND
(1 rBias)
Hobby dancing
Height 1.5 .. 2.0
Age ..
AND
AND
71
Uniform clause crossover
  • Consider the clauses
  • A Age 20 .. 30 and Height 1.5 .. 2.0
  • B Hobby dancing and Age 0 .. 25
  • Align clauses with respect to terms
  • A Age 20 .. 30 and Height
    1.5 .. 2.0
  • B Hobby dancing and Age 0 .. 25
  • Apply term crossover between Age terms
  • Include
  • Height term (with no partner) in the child with
    probability rBias.
  • Hobby term (with no partner) in the child with
    probability (1rBias).

72
Single-point clause crossover
Age 20 .. 30
Height 1.5 .. 2.0
AND
Crossover Point
Age 0 .. 25
Hobby dancing
AND
Age 0 .. 25
From first child
From second child
73
Single-point clause crossover
  • Consider the clauses
  • A Age 20 .. 30 and Height 1.5 .. 2.0
  • B Hobby dancing and Age 0 .. 25
  • Align clauses with respect to terms
  • A Age 20 .. 30 and Height
    1.5 .. 2.0
  • B Hobby dancing and Age 0 .. 25
  • E.g. consider crossover point between Hobby and
    Age
  • child takes terms to the left of the crossover
    point in clause A, and terms to the right of the
    crossover point in clause B
  • Child C Age 0 .. 25

74
Term crossover value terms
Hobby dancing, singing
rBias
Hobby dancing, hiking
(1 rBias)
Hobby dancing, singing, hiking
75
Term crossover range terms
Age 20 .... 30
rBias
rBias
Age 0 .... 25
(1 rBias)
(1 rBias)
Age low limit, high limit
76
Term crossover
  • Used to combine two terms concerning the same
    attribute.
  • Consider the clauses
  • A Hobby dancing, singing and Age 20 .. 30
  • B Hobby hiking, dancing and Age 0 .. 25
  • How to form child
  • Value terms
  • Include values common to both parents e.g.
    dancing
  • Include values unique to one parent with a
    probability e.g. rBias for singing and 1-rBias
    for hiking
  • Range terms
  • Select low and high limit with a probability
  • Low limit for Age rBias for value 20 and 1-rBias
    for value 0
  • High limit for Age rBias for value 30 and
    1-rBias for value 25
  • Later prune (get rid of) non-valid ranges.

77
Mutation
  • Apply mutation at all levels, successively
  • Subset description mutation
  • Clause mutation
  • Term mutation

78
Subset description mutation
Clause A1
Clause A2
Clause A3
OR
OR
Clause mutation
Clause mutation
Clause mutation
Clause A1
Clause A2
Clause A3
OR
OR
Do add/delete clause ?
Add or delete ?
pCls
50 (equal prob)
Add
Clause A1
Clause A2
Clause A3
OR
OR
Clause A4
OR
Clause A1
Clause A2
Clause A3
OR
OR
Delete
79
Subset description mutation
  • Consider the following descriptor (chromosome)
  • A Clause A1 or Clause A2 or Clause A3 or Clause
    A4
  • Steps
  • 1. Apply clause mutation on each clause on A1,
    A2, A3 and A4.
  • 2. Decide with probability pCls to do or not do
    an add/delete clause operation.
  • 3. If add/delete has been decided, either add a
    new clause or delete an existing clause with
    equal probability (50)
  • deletion pick a clause at random and delete it.
  • adding generate a new clause at random (from
    random possible attributes with random
    values/ranges assigned)

80
Clause mutation
Hobby dancing
Age 20 .. 30
AND
Term mutation
Term mutation
Term Age
Term Hobby
AND
Do add/delete term ?
Add or delete ?
pTerm
50 (equal prob)
Add
Term Hobby
Term Age
Term X
AND
AND
Term Hobby
Term Age
AND
Delete
81
Clause mutation
  • Consider the following clause
  • Hobby dancing and Age 20 .. 30
  • Steps
  • 1. Apply term mutation on each term.
  • 2. Decide with probability pTerm to do or not do
    an add/delete term operation.
  • 3. If add/delete has been decided, either add a
    new term or delete an existing term with equal
    probability (50)
  • deletion pick a term at random and delete it.
  • adding generate a new term at random

82
Term mutation - Value
Hobby dancing
Do term mutation?
rMutTerm
Attribute or value/range?
(1 rAvr) Value mutation
rAvr Attribute mutation
Occupation student
Hobby swimming
83
Term mutation - Range
Age 10 .. 50
Do term mutation?
rMutTerm
Attribute or value/range?
(1 rAvr) Range mutation
rAvr Attribute mutation
Occupation student
Age 3 .. 25
84
Term mutation
  • First decide with a probability rMutTerm to
    mutate this term or not.
  • If term mutation decided, do with a probability
    either attribute mutation, or value/range
    mutation.
  • Consider the following term Hobby dancing
  • Attribute mutation randomly choose another
    attribute available, e.g. occupation, and a
    random value for it e.g. student. New term
    occupation student
  • Value mutation randomly choose another value for
    current attribute. E.g. swimming. New term
    Hobby swimming
  • Consider the following term Age 10 .. 50
  • Range mutation randomly choose another range for
    current attribute. E.g. 3 .. 25. New term Age
    3 .. 25
Write a Comment
User Comments (0)
About PowerShow.com