DynaMine: Finding Common Error Patterns by Mining Software Revision Histories - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

DynaMine: Finding Common Error Patterns by Mining Software Revision Histories

Description:

Anybody knows any good error patterns specific to WinAmp plugins? There are hundreds of WinAmp plugins out there. Motivation: Matching Method Pairs. Start small: ... – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 38
Provided by: andreas66
Category:

less

Transcript and Presenter's Notes

Title: DynaMine: Finding Common Error Patterns by Mining Software Revision Histories


1
DynaMine Finding Common Error Patternsby Mining
Software Revision Histories
  • Benjamin Livshits
  • Stanford University

Thomas Zimmermann Saarland University
2
A Box Full of Nails
  • A lot of
  • promise
  • potential
  • excitement
  • Not that many success stories
  • Not sure what to apply it to
  • Lets try this particularly exciting idea
  • Miners looking at their tools
  • Promises, promises
  • Interesting usage patterns found by CVS mining
  • Interesting error patterns found by CVS mining

3
My Background
  • Tools for bug detection
  • Analysis pointer analysis, etc.
  • Mostly static, some dynamic
  • Applications
  • Security
  • Buffer overruns
  • Format string violations
  • SQL injections
  • Cross-site scripting
  • HTTP response splitting
  • Data lifetimes
  • J2EE patterns
  • Bad session stores
  • Lapsed listeners
  • Eclipse patterns
  • Missing calls to dispose
  • Not calling super
  • Forgetting to deregister listeners

4
Glorified Bug Finding System
  • A language for describing bug patterns
  • Called PQL, see OOPSLA 2005
  • 2 years of work
  • Static and dynamic analysis combined
  • We dont know what to look for
  • Took a long time to find useful error patterns
  • Programmers often dont recognize patterns
  • Have pretty good tools
  • How do we find more patterns to check?
  • Want find error patterns in unfamiliar code

5
The Usual Suspects
  • Much bug-detection research in recent years
  • Focus generic patterns, sometimes
    language-specific
  • NULL dereferences
  • Security
  • Buffer overruns
  • Format string violations
  • Memory
  • Double-deletes
  • Memory leaks
  • Locking errors/threads
  • Deadlock/race detection
  • Atomicity
  • Lets look at the space of error patterns in more
    detail

6
Classification of Error Patterns
NULL dereferences Buffer overruns Double-deletes L
ocks/threads
Generic patterns -- the usual suspects
  • NULL dereferences
  • Buffer overruns
  • Double-deletes
  • Locking errors/threads

Bugs in J2EE servlets
App-specific patterns particular to a system or a
set of APIs
Device drivers
Bugs in Linux code
Error Pattern Iceberg
7
Classification of Error Patterns
There are hundreds of WinAmp plugins out there
Generic patterns -- the usual suspects
  • NULL dereferences
  • Buffer overruns
  • Double-deletes
  • Locking errors/threads

Anybody knows any good error patterns specific to
WinAmp plugins?
App-specific patterns particular to a system or a
set of APIs
?
  • Intuition
  • Many other application-specific patterns exist
  • Much of application-specific stuff remains a gray
    area so far
  • Goal Lets figure out what the patterns are

8
Motivation Matching Method Pairs
  • Start small
  • Matching method pairs
  • Only two methods
  • A very simple state machine
  • Calls must match perfectly, order matters
  • Very common, our inspiration is
  • System calls
  • fopen/fclose
  • lock/unlock
  • GUI operations
  • addNotify/removeNotify
  • addListener/removeListener
  • createWidget/destroyWidget
  • Want to find more of the same
  • And, if are lucky, more interesting patterns

9
DynaMine Our Insight
  • Our problem
  • Want to find patterns whose violation causes
    errors
  • Want to find patterns for program understanding
  • Our technique
  • Look at revision histories
  • Crucial observation
  • Use data mining techniques to find method that
    are often added at the same time

Things that are frequently checked in together
often form a pattern
10
DynaMine Our Insight (continued)
  • Now we know the potential patterns
  • Profile the patterns
  • Run the application
  • See how many times each pattern
  • hits number of times a pattern is followed
  • misses number of times a pattern is violated
  • Based on this statistics, classify the patterns
  • Usage patterns almost always hold
  • Error patterns violated a large number of the
    times, but still hold most of the time
  • Unlikely patterns not validated enough times

11
Architecture of DynaMine
sort and filter
mine CVS histories
patterns
instrument relevant method calls
revision history mining
run the application
post-process
dynamic analysis
usage patterns
error patterns
unlikely patterns
report bugs
report patterns
reporting
12
Mining approach
13
Mining Basics
  • Rely on co-change
  • Simplification look at method calls only
  • Look for interesting patterns in the way methods
    are called
  • Example
  • Sequence of revisions
  • Files Foo.java, Bar.java, Baz.java, Qux.java

14
Mining Matching Method Calls
  • Use our observation
  • Methods that are frequently added simultaneously
    often represent a usage pattern
  • For instance addListener()
    removeListener()

15
Data Mining Summary
  • We consider method calls added in each check-in
  • We want to find patterns of method calls
  • Too many potential patterns to consider
  • Want to filter and rank them
  • Use support and confidence for that
  • Support and confidence of each pattern
  • Standard metrics used in data mining
  • Support reflects how many times each pair appears
  • Confidence reflects how strongly a particular
    pair is correlated
  • Refer to the paper for details

16
Improvements Over the Traditional Approach
  • Default data mining approach doesnt quite work
  • Filters based on confidence and support
  • Still too many potential patterns!
  • Filtering
  • Consider only patterns with the same initial
    subsequence as potential patterns
  • Ranking
  • Use one-line fixes to find likely error patterns

17
Matching Initial Call Sequences
1 Pair
3 Pairs 1 Pair
10 Pairs 2 Pairs
1 Pair 0 Pairs
0 Pairs
18
Using Fixes to Rank Patterns
  • Look for one-call additions which likely indicate
    fixes
  • Rank patterns with such methods higher

This is a fix! Move patterns containing
removeListener up
19
Applications under Study
  • Apply these ideas to the revision history of
    Eclipse and jEdit
  • Very large open-source projects
  • Many people working on both, are all over the
    planet
  • 122 on Eclipse
  • 92 on jEdit
  • Many check-ins
  • Eclipse 2,837,854
  • jEdit 144,495
  • Long histories
  • Eclipse since 2001
  • jEdit since 2000

20
Some patterns(as promised)
21
Categories of Patterns
  • Method calls during execution
  • Care about the methods
  • Care about the order
  • Care about the parameters/return values
  • Herere some common cases
  • Matching method pairs
  • State machines
  • More complex patterns

22
Some Interesting Method Pairs (1)
23
Some Interesting Method Pairs (2)
Register/unregister the current widget with the
parent display object for subsequent event
forwarding
24
Some Interesting Method Pairs (3)
Add/remove listener for a particular kind of GUI
events
25
Some Interesting Method Pairs (4)
Use OS native locking mechanism for resources
such as icons, etc.
26
State Machines
  • Order captured by a state machine
  • Must be followed precisely omitting or repeating
    a method call is a sign of error.
  • Simplest formalism for describing the object
    life-cycle.
  • Matching method pairs specific case
  • Very common in C
  • Consider OS code
  • Less common in Java, but

27
State Machines (1)
  • o.enterAlignment o.redoAlignment
    o.exitAlignment
  • Part of the org.eclipse.jdt.internal.formatter.Scr
    ibe package responsible for pretty-printing of
    code
  • enterAlignment/exitAlignment pairs must match
  • redoAlignment is invoked in exception cases

28
State Machines (2)
  • o.beginCompoundEdit()
  • (o.insert(...) o.remove(...))
  • o.endCompoundEdit()
  • Compound edits within jEdit can be undone at
    once
  • beginCompoundEdit/endCompoundEdit act as brackets
  • Other operations inbetween

29
State Machines (3)
  • OS.PmMemCreateMC
  • OS.PmMemStart OS.PmMemFlush OS.PmMemStop
  • OS.PmMemReleaseMC
  • Memory context manipulation (like memory pools)
  • Wrappers around underlying OS functionality
  • The middle part of the pattern is optional

30
More Complex Stuff (1)
  • try
  • monitor.beginTask(null, Policy.totalWork)
  • int depth -1
  • try
  • workspace.prepareOperation(null,
    monitor)
  • workspace.beginOperation(true)
  • depth workspace.getWorkManager().beginU
    nprotected()
  • return runInWorkspace(Policy.subMonitorFo
    r(monitor,
  • Policy.opWork, SubProgressMonitor.PRE
    PEND_MAIN_LABEL_TO_SUBTASK))
  • catch (OperationCanceledException e)
  • workspace.getWorkManager().operationCance
    led()
  • return Status.CANCEL_STATUS
  • finally
  • if (depth gt 0)
  • workspace.getWorkManager().endUnprotecte
    d(depth)
  • workspace.endOperation(null, false,
  • Policy.subMonitorFor(monitor,
    Policy.endOpWork))
  • catch (CoreException e)

31
More Complex Stuff (2)
  • try
  • monitor.beginTask(null, Policy.totalWork)
  • int depth -1
  • try
  • workspace.prepareOperation(null,
    monitor)
  • workspace.beginOperation(true)
  • depth workspace.getWorkManager().beginU
    nprotected()
  • return runInWorkspace(Policy.subMonitorFo
    r(monitor,
  • Policy.opWork, SubProgressMonitor.PRE
    PEND_MAIN_LABEL_TO_SUBTASK))
  • catch (OperationCanceledException e)
  • workspace.getWorkManager().operationCance
    led()
  • return Status.CANCEL_STATUS
  • finally
  • if (depth gt 0)
  • workspace.getWorkManager().endUnprotecte
    d(depth)
  • workspace.endOperation(null, false,
  • Policy.subMonitorFor(monitor,
    Policy.endOpWork))
  • catch (CoreException e)

32
More Complex Stuff (3)
  • try
  • monitor.beginTask(null, Policy.totalWork)
  • int depth -1
  • try
  • workspace.prepareOperation(null,
    monitor)
  • workspace.beginOperation(true)
  • depth workspace.getWorkManager().beginU
    nprotected()
  • return runInWorkspace(Policy.subMonitorFo
    r(monitor,
  • Policy.opWork, SubProgressMonitor.PRE
    PEND_MAIN_LABEL_TO_SUBTASK))
  • catch (OperationCanceledException e)
  • workspace.getWorkManager().operationCance
    led()
  • return Status.CANCEL_STATUS
  • finally
  • if (depth gt 0)
  • workspace.getWorkManager().endUnprotecte
    d(depth)
  • workspace.endOperation(null,
    false,
  • Policy.subMonitorFor(monitor,
    Policy.endOpWork))
  • catch (CoreException e)

33
Grammar for Workspace Transactions
  • Requires human intelligence
  • Requires a lot of it
  • Is actually an excellent pattern havent seen
    runtime violations

S ? O O ? w.prepareOperation()
w.beginOperation() U
w.endOperation() U ? w.getWorkManager().beginUnp
rotected() S w.getWorkManager()
.operationCanceled() w.getWorkManager().
beginUnprotected()
34
Dynamic checking
35
Dynamically Check the Patterns
  • Home-grown bytecode instrumentor
  • Get a list of matching patterns
  • Instrument calls to any of the methods to dump
    parameters
  • Post-processing of the output
  • Process a stream of events
  • Find and count matches and mismatches
  • o.register(d)
  • o.deregister(d)
  • o.deregister(d)

matched
???
mismatched
36
Experiments
37
Experimental Setup
  • Applied to Eclipse and jEdit
  • 3,600,000 lines of Java code combined
  • Included many plugins
  • Times
  • 6 days to fetch and process CVS histories
  • 30 minutes to compute the patterns
  • An hour to instrument
  • 15 minutes to run
  • And we are done!

38
Experimental Summary
  • Pattern classification
  • 56 patterns total
  • 13 are usage patterns
  • 8 are error patterns
  • 11 are unlikely patterns
  • 24 were not hit at runtime
  • Error patterns
  • Resulted in a total of 264 dynamically confirmed
    pattern violations

39
Summary
  • Knowing code patterns is important
  • We explored using software histories
  • Co-change often indicates patterns
  • Use previous fixes (one-line changes) to drive
    error patterns
  • Found interesting patterns
  • Matching method pairs
  • State machines
  • More complex stuff
  • Confirmed valid patterns
  • Found pattern violations at runtime
  • We have a paper in FSE 2005
Write a Comment
User Comments (0)
About PowerShow.com