Gene Expression Messy GAs - PowerPoint PPT Presentation

1 / 66
About This Presentation
Title:

Gene Expression Messy GAs

Description:

Importance of intra-cellular information flow for SEARCH. No explicit modeling ... Natural evolution evolved fitter (?) organisms ... – PowerPoint PPT presentation

Number of Views:114
Avg rating:3.0/5.0
Slides: 67
Provided by: illiga
Category:
Tags: expression | fitter | gas | gene | messy

less

Transcript and Presenter's Notes

Title: Gene Expression Messy GAs


1
Gene Expression Messy GAs
  • Kargupta, et al
  • Presented by
  • Abhishek Singh

2
Underpinnings
  • Extension of mGA
  • Black Box Optimization
  • Relevance of SEARCH
  • Importance of intra-cellular information flow for
    SEARCH
  • No explicit modeling
  • Precursor to Model Building GAs

3
Overview
  • SEARCH (some gory details)!
  • Natural Evolution as a SEARCH
  • Basic Gene Expression Messy GA
  • Updates on the GEMGA
  • Interspersed with Results and Achievements
  • Summary and Conclusions
  • Punctuated with Discussions and Debates
    (hopefully)

4
SEARCH overview
  • Black Box Search
  • Enumeration
  • Induction
  • Search Envisioned as Relation and Class
    Hierarchizing
  • Framework to formalize sample complexity,
    difficulty, etc
  • Relations, Class, and Samples

5
SEARCH contd.
  • Enumerative search exponential
  • Alternative - stochastic decision making based on
    sampling
  • Consequence Premise
  • Assumes inductive relationships
  • No relations enumeration!
  • Relations classify search domain
  • Classes contain optima

6
SEARCH Decomposition
  • Relation, R Set of ordered pairs
  • Class, C Instantiation of a relations
  • Sample, S Instance of a class
  • Relations imposed implicitly or explicitly
  • Representation
  • Operators
  • Heuristics
  • Direct modeling
  • Typical example for GAs follows

7
Relations, Classes, and Samples
1100 1001 1110 1011 0100 0011 0000 0111
f f
1 0 1 0
Relation Class Sample space space
space
8
SEARCH components
  • Classification based on relations
  • Sampling
  • Evaluation, ordering, selection of better
    classes
  • Evaluation, ordering selection of better
    relations
  • Resolution

9
A little bit of theory!
  • ri ith Relation
  • ?r Set of all Relations
  • Ci Set of classes created by ri,, Ci Ni
  • P Perturbation operator (dumb or smart)
  • T, Tr Class/relation comparison statistic
  • Mi Best from Ci (depends on decision error
    probability)
  • Sr Best used from ?r
  • Ci , ri Sampled ordered class/relation

10
SEARCH challenges
  • Based on T some classes are ordered and best Mi
    are selected
  • Pruning of classes search (resolution)
  • Relation must classify space such that optimal,
    Ci is within T based Mi
  • Ci and Ci may not be same
  • Ci must be in Misampled
  • Based on Tr relations are ordered and selected
  • Pruning of relations search based on past

11
SEARCH challenges contd.
  • Ordering dependent on sampling
  • SEARCH fails if
  • Defining relation is such that for chosen T, CI
    ? Mi
  • Stochastic error causes CI ? Misampled

12
Quantifying the Challenges
  • Relation Selection success
  • Pr(CRS ri ) ? Pr( rk ?Tr ri)min?r- Sr
  • Class Selection success
  • Pr(CCS ri ) ? Pr( Ck ?T Ci)minNi Mi
  • Overall success
  • ? Pr(CRSri)Pr(CCSri) ?ri ? Sr

13
Specializing the Equation
  • Specialize to a specific class comparison
    statistic and representation
  • Similar to earlier work on decision making
    (Goldberg, et al, 1992)
  • Order statistics used

14
Ordinal Class Selection
  • Prob. of correct
  • binary decision
  • Pr(FT j,i ?a FT k,i)
  • ? 1-2nH(a) (a d)a n
  • d - zone of indifference
  • F CDF of F
  • a Quantile of CDF
  • n no. of samples
  • H Binary entropy func.

15
Class Selection contd.
  • For correct class selection
  • Pr(CCSri) ? 1-2nH(a) (a d)a n Ni Mi
  • d min F(FT ,i) - F(FT j,i) ?j

16
Ordinal Relation Selection
  • Analysis same as for class selection except Tr
    used for comparison.
  • For one relation ri
  • Pr(CRS ri)? 1-2nr H(ar) (ar dr)ar nr
    ?r-?g
  • Overall success probability bound

17
Overall Success
  • Combine the search for better classes and
    relations
  • q Overall success probability
  • d Bound for min d over all classes
    compared to optima containing class
  • Nmax maxNi ?ri ?Sr
  • Mmin minMi ?ri ?Sr

18
Sample Complexity
  • Total number of Function Evaluations
  • For non-relational enumerative SC becomes the
    size of search space
  • To bound SEARCH, Nmax and Sr need to be bound
    by polynomials

19
Order-k delineable Problems
  • Order of a relation 0(ri) defined as log of
    number of defined classes
  • If o(ri) o(l) then exponential Nmax
  • For polynomial search
  • Bound O(ri)?k
  • Bound Sr O(poly(l))
  • This defines a Generalized Order-k delineable
    problem
  • Polynomial bound means simple relations can
    capture solutions!

20
Milestone 1
  • Discussed the basic motivation behind GEMGA
  • SEARCH challenges (déjà vu)
  • Need for Relational search
  • Enumeration bounds any BBS
  • Bound on Relation set cardinality for polynomial
    search
  • Next step SEARCH implementation as GEMGA

21
Food for thought
  • Linkage specific spatial operator-defined
    relation/problem
  • Precursors to model builders
  • Work relates to earlier and contemporary work on
    problem difficulty and decision making
  • Break

22
Recap
  • Introduced SEARCH (in some detail)
  • SEARCH challenges (intuitive and quantified)
  • Order statistics for bounds on SEARCH complexity
  • Without hierarchical search BBS is too expensive
  • Restrict sampling and use Intelligent Guessing

23
Nature Questions and answers
  • Natural evolution evolved fitter (?) organisms
  • 3x108 base pairs in humans implies HUGE search
    space
  • Without a priori knowledge evolution is a BBS
  • Not enough time for enumerative search

24
Some Questions
  • Problem of adequate time
  • Shapiro Junkyard Tornedo!
  • Holland Schema processing
  • Goldberg Problem Decomposition
  • Kauffman Gene Expression
  • Problem of selection space
  • Problem of recombination
  • Recombination good if we know what to combine
  • Natural recombination different from GAs

25
Evolution as Information Flow
  • Extracellular storage, exploration and
    transmission within generations
  • Intracellular Gene expression
  • Most GAs model extracellular flow
  • What about intracellular mechanisms (introns,
    diploidy, gene expression etc)

26
Expression Mechanisms
  • Transcription DNA mRNA
  • Translation mRNA proteins
  • Protein folding
  • Protein Phenotype
  • Lets see these in some detail

27
Transcription
  • Initiated and terminated by a specific sequence
    of genes
  • RNA polymerase transcribes portion in between
    (AGCT AGCU)
  • Regulatory proteins bind to DNA portions and
    control transcription
  • Gene activator
  • Gene repressor

28
Translation
  • mRNA is template for protein formation
  • (61) Base triplets correspond to (20) amino acids
  • 3 for promotion and termination
  • Many to one mapping
  • Regulated by control systems of repressors,
    promoters, and operators
  • Specialization within cells

29
Protein folding
  • 3-D structure determines protein function
    (phenotype)
  • This defines fitness space
  • Phenotypic Genotypic correspondence

30
Intracellular Information Flow
31
The SEARCH perspective
  • Sample space
  • DNA population
  • Class Space Amino acid sequences
  • mRNA correspond to DNA schemas
  • mRNA translate to amino acid seq.
  • Define equivalence class
  • Relation space Regulatory mech.
  • Transcription process defines classes
  • Transcription controlled by extra and intra DNA
    components (feed back loop)

32
SEARCH answers questions (?)
  • Time
  • Nature searched for relations too
  • Selection
  • Feed back loops apportion selection pressure
  • Recombination
  • Resolution of classes
  • Representation (diploidy, introns etc)

33
Doubts, concerns
  • Concept of natural optimality
  • Evolution as adaptation not SEARCH
  • Temporal and Spatial niches

34
Messy GAs
  • Separate relation/class space from sample space
  • Deterministically processed order k relations and
    classes

35
Messy GAs contd.
  • Relation still not separate from class
  • Sample space consisted of one template
  • No implicit parallelism thus expensive
  • fmGA tackled this issue (still problems of cross
    competition between classes from different
    relations)

36
GEMGA overview
  • Messy GA continued
  • SC is for order-k problem, of length l, and
    alphabet ?
  • Separates relation, class, and sample space
  • Explicit relation learning
  • Many changes and updates

37
Representation
  • Messy schemes maintained (locus, value)
  • Gene has additional variables
  • Weights, linkage lists, capacity
  • Start with simple GEMGA with only weights
    (initialized to 1)
  • No under/over-specification

38
Population Sizing
  • C is signal/noise coefficient
  • At least one instance of optimal order-k class in
    population of ?k
  • similar structure to previous equations
  • mGA was O(( ?k) l )
  • Order of relations processed
  • 2l relations but only O(k) processed

39
Basic Operators
  • Transcription
  • Selection
  • Class Selection
  • String Selection
  • Recombination
  • Each in detail

40
Transcription
  • Detects appropriate order-k relations
  • Relations need to be compared
  • GEMGA processes relations in distributed manner
  • Every chromosome evaluates its genes for instance
    of good class
  • Quality of good classes determine quality of
    relation

41
Transcription continued
  • Flip each gene
  • Note change in Fitness function
  • Fitness Increases
  • Gene not part of good class
  • Make weight Zero
  • Fitness Decreases
  • Gene may be part of good class
  • Make weight ?fitness
  • repeat for C lt ?

42
Selection
  • Class Selection
  • Grow better classes
  • Gene with higher weight overwrites one with lower
    weight on other string
  • String Selection
  • Binary Tournament Selection

43
Recombination
  • Randomly pick two strings
  • Consider all genes for swapping with some
    probability
  • If weight of gene is greater than corresponding
    gene then swap it
  • What does this do?
  • Preserve tight linkage

44
The Algorithm
  • Primordial Phase
  • l generations (all genes considered)
  • Juxtapositional Phase
  • Selection and recombination applied
  • Every chromosome converges to optimal class when
  • Substituting n
  • Overall SC O( ?k (lk))
  • Solution quality?

45
Results
  • Tested over uniform l bit trap functions of
    length l
  • Order-l delineability needed
  • Function evaluations grow linearly with l
  • Population size is constant for constant l
  • Scaling and Noise added

46
Results contd.
47
Milestone 2
  • Natural BBS discussed
  • SEARCH implemented as GEMGA to solve order-k
    delineable problems
  • Polynomial time achieved
  • Issues solved
  • Relation (linkage) space searched
  • Simplistic relation search mechanism
  • Scope for improvements
  • Similarity to hybrid GAs (local search)

48
GEMGA revisions
  • Need for more explicit relation learning
  • Linkage set added to gene representation
  • Transcription extended
  • And class selection linked with linkage

49
Linkage Set
  • For each gene the set stores related genes in
    chromosome
  • If genes 1,5,10,15 are related then linkage set
    for gene 1 is 5,10,15 and so on
  • Linkage space over all genes defines relation
    space

50
Transcription II
  • In addition to previous transcription operator
  • Tries to identify the exact relations (construct
    the linkage set)
  • Not very clear how!
  • Transcription II applied for l2-l generations

51
Transcription II contd.
  • Pick two points (with weight gt 0) on chromosome
  • Keep original fitness value
  • Perturb both genes
  • If change of fitness ! change due to
    perturbation of single gene then genes are
    related
  • Put them in the linkage set
  • Change weights to 1

52
Class Selection II
  • Here cardinality of linkage set decides gene
    growth
  • Same as previous except genes with high
    cardinality overwrite lesser genes
  • Linkage sets of genes with greater cardinality
    are copied with genes
  • Weight becomes a criterion for gene consideration

53
Recombination
  • Same as before except cardinality of linkage set
    is used
  • Genes with larger linkage sets are chosen and
    exchanged
  • This preserves linkage better

54
Algorithm Complexity
  • Transcription I
  • l generations
  • Transcription II
  • l2-l generations in worst case
  • Juxtapositional phase
  • Same as before O(k)
  • Population O(?k )
  • Total SC O(?k (l2k))
  • Worse than before, but better linkage

55
Final GEMGA
  • Linear Sample Complexity achieved
  • Change in representation (again)
  • Transcription II dropped but relation learning
    maintained
  • Recombination and Expression combined

56
Representation
  • Chromosome genesi , linki ?i?l
  • Gene has locus, value, capacity (for improvement)
  • Link has
  • Linkage set Set of related genes
  • Weights Number of particular linkage in
    population
  • Goodness (0,1) How good the linkage
    is w.r.t. fitness contribution
  • Trials Number of time linkage has been
    tried

57
Transcription
  • Same as before except that capacity is set to 1
    if fitness after perturbation increases, else 0
  • All genes with capacity 0 are put in the initial
    linkage set
  • Continued for l generations

58
Recombination Expression
  • Two phases
  • Pre-recombination Expression
  • GEMGA Recombination
  • Pre-recombination determines related gene
    clusters
  • GEMGA recombination ensures growth of proper
    classes and relations

59
Pre-recombination
  • Applied several time during first generation
  • Pair of chromosomes selected
  • Of those in initial linkage set (ILS) genes with
    same values and capacities are extracted
  • If this set is present in ILS then weight of
    linkage is increased by one
  • If not then this set is added to ILS

60
Pre-recombination contd.
  • Gives an lxl conditional probability matrix
  • Mi,j prob(genes i and j together)
  • Final Linkage Set constructed
  • Max(Mi,j) ? j, calculated for i
  • All genes with Mi,j within e of Max are included
    in linkage set for I

61
GEMGA recombination
  • Element from linkage set of one chromosome chosen
    based on weight and goodness for swapping
  • If goodness value of disrupted linkage set of
    other chromosome are less than this one then SWAP
    and adjust linkage set
  • Goodness is set by change in fitness due
    recombination

62
GEMGA recombination contd.
  • Of the two original chromosome and two recombined
    chromosomes two are selected based on goodness
    and fitness
  • Apply iteratively over all pairs
  • No fitness evaluations if fitness and disrupted
    fitness is stored

63
GEMGA analysis
  • Transcription applied for l generations
  • In Pre-recombination no fitness evaluations
  • Population O(?k )
  • SC O(?k ( l ))
  • Linear growth of fitness evaluations and
    relational learning

64
Milestone 3
  • GEMGA designed with strong linkage learning and
    linear time
  • More complex relation building
  • Tested on deceptive multi-modal functions to
    validate conclusions
  • Could it be tested for tougher relations

65
Musings
  • LLGA solved linkages sequentially, GEMGA solves
    parallely
  • Multi-Objective optimization
  • Niche specific SEARCH?
  • Linkage is a specific relations
  • Test on problems with relational complexity
    beyond deception
  • Similarity to natural gene expression?
  • Memory Drawbacks

66
Summary and Conclusion
  • Need for relation based search
  • GEMGA spans a class of methods that model
    relationships within individuals
  • GEMGA shown to solve difficult problems
    efficiently
  • Walsh analysis on GEMGA (Kargupta Park, 1999)
  • Used on G.P. (Neill Ryan, 2000)
Write a Comment
User Comments (0)
About PowerShow.com