Analysis of Alternative Caching Methods for jFuzz - PowerPoint PPT Presentation

About This Presentation
Title:

Analysis of Alternative Caching Methods for jFuzz

Description:

Hash Properties. Sound. Two objects have the same hash if they are interchangeable. Sound and Complete. Two objects have the same hash if and only if they are ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 26
Provided by: peopleC
Category:

less

Transcript and Presenter's Notes

Title: Analysis of Alternative Caching Methods for jFuzz


1
Analysis of Alternative Caching Methods for
jFuzz
  • David Harvison
  • Adam Kiezun

2
Summary
  • Problem
  • jFuzz makes many redundant calls to the
    constraint solver.
  • Approach
  • Measure the average hit rates of different
    caching strategies.
  • Results
  • Global caching should reduce the number of calls
    to the constraint solver.

3
NASA Java PathFinder
  • Dynamic analysis framework for Java implemented
    as a JVM
  • Features
  • Backtracking
  • Execute all thread interleavings
  • Execute a program on all possible inputs
  • Assign attributes to variables

4
jFuzz Architecture
Subject and Input
  • Runs JPF many times on the subject program and
    input files
  • Each run
  • Collects the Path Condition (PC)
  • Negates each constraint, reduces, and solves
  • Uses new PCs to generate new input files
  • Keeps track of inputs which caused exceptions to
    be thrown

5
jFuzz Architecture
Subject and Input
jFuzz
JPF
PC
Subject and Original Input
Negated PC
Negated PC
Cache
Solver
New Input
New Input
Inputs which cause crashes
6
Levels of Caching
  • Local Caching
  • Each run of JPF has a cache
  • Global Caching
  • Persistent cache throughout all runs of JPF

7
Hash Properties
  • Sound
  • Two objects have the same hash if they are
    interchangeable.
  • Sound and Complete
  • Two objects have the same hash if and only if
    they are interchangeable.

8
Ideal Cache
Path Condition 1 1 x 3
  • Path Condition 2
  • 1 y lt 4
  • 2 2 y gt 6
  • These two PCs are equivalent
  • Calculating this would be too much work for large
    PCs.
  • Hash functions need to be fast.

9
Caching Trade offs
  • Hit rate
  • The percentage of the time that the data being
    asked for is in the cache.
  • Speed of hashing
  • Inversely related to the hit rate.

10
Types of Caching
  • Identity Hash
  • Every PC has a unique value.
  • Hit only if the exact PC is seen again.
  • Identity Hash
  • Every PC has a unique value.
  • Hit only if the exact PC is seen again.
  • Name Dependent Hash
  • Unique value for structurally different PCs.
  • This includes variable names.
  • Name Independent Hash
  • Same as name dependent except variable names are
    factored out.

11
Types of Caching
  • Identity Hash
  • Every PC has a unique value.
  • Hit only if the exact PC is seen again.
  • Identity Hash
  • Every PC has a unique value.
  • Hit only if the exact PC is seen again.
  • Name Dependent Hash
  • Unique value for structurally different PCs.
  • This includes variable names.
  • Name Independent Hash
  • Same as name dependent except variable names are
    factored out.

12
Motivation
Path Condition 1 1 a b lt 10 2 b gt 6 3 c
lt 15 4 a lt 3 5 c d gt 7 6 e ! 1 7 c e
5 8 a 2
  • Path Condition 2
  • 1 x y lt 10
  • 2 z lt 15
  • 3 y gt 6
  • 4 x lt 3
  • 5 z w gt 7
  • 6 w ! 1
  • 7 x 2

13
Motivation
Path Condition 1 1 a b lt 10 2 b gt 6 3 c
lt 15 4 a lt 3 5 c d gt 7 6 e ! 1 7 c e
5 8 a ! 2
  • Path Condition 2
  • 1 x y lt 10
  • 2 z lt 15
  • 3 y gt 6
  • 4 x lt 3
  • 5 z w gt 7
  • 6 w ! 1
  • 7 x ! 2

14
Motivation
Path Condition 1 1 a b lt 10 2 b gt 6 4 a
lt 3 8 a ! 2
  • Path Condition 2
  • 1 x y lt 10
  • 3 y gt 6
  • 4 x lt 3
  • 7 x ! 2

15
Motivation
Path Condition 1 1 a b lt 10 2 b gt 6 3 a
lt 3 4 a ! 2
  • Path Condition 2
  • 1 x y lt 10
  • 2 y gt 6
  • 3 x lt 3
  • 4 x ! 2
  • Name dependent caching will pass both of these to
    the solver.
  • Name independent caching will recognize these are
    the same.

16
Removing Name Dependence
For Each conjunct in the PC
  • Locate the variables.
  • If the variable has been seen before use the
    previously used name.
  • Otherwise, replace the variable name with a name
    that will be consistent between runs.

17
Removing Name Dependence
Path Condition 1 1 a b lt 10 2 b gt 6 3 a
lt 3 4 a ! 2
  • Path Condition 2
  • 1 x y lt 10
  • 2 y gt 6
  • 3 x lt 3
  • 4 x ! 2

18
Removing Name Dependence
Path Condition 1 1 var1 var2 lt 10 2 var2 gt
6 3 var1 lt 3 4 var1 ! 2
Path Condition 2 1 var1 var2 lt 10 2 var2 gt
6 3 var1 lt 3 4 var1 ! 2
  • The PCs are now name independent.
  • This can reduce the number of times the solver is
    called.

19
Case Study
  • Subject Sat4J
  • SAT solver written in Java.
  • Takes inputs in dimacs files.
  • 10 kloc.
  • Goals
  • Compare Global vs Local Caching
  • Compare name dependent and independent Caching

test1.dimacs c test 3 single clauses c and 2
binary clauses p cnf 4 5 1 0 2 0 3 0 -2 4 0 -3 4 0
PC Size 250 constraints
20
Local Caching
  • Name dependent caching does nothing.
  • This is by design.
  • Name independent caching is sporadic.
  • High hit rates on runs with more input creation.

21
Global Caching
  • Name dependent caching plateaus between 70-80
  • Name independent quickly approaches a 99 average
    hit rate.

22
Results
  • Global caching quickly achieves a higher hit rate
    than local caching.
  • Name independent caching is better in both cases.

23
Conclusions
  • Name independent caching is better than name
    dependent caching.
  • Global caching has a much higher hit rate than
    local caching.
  • The gains from implementing global caching versus
    local caching should be higher than providing
    name independence.

24
Parallelization
  • Current bottle neck is waiting for JPF to finish.
  • Should execute on multiple input files
    simultaneously.
  • Distribute work over multiple computers.

25
Selecting the Next Input
  • One input produces many more inputs.
  • Currently the oldest is selected.
  • Oldest input is closest to the current input.
  • Test different heuristics picking the next input
Write a Comment
User Comments (0)
About PowerShow.com