Getting Started in Program Analysis Research: Outline - PowerPoint PPT Presentation

About This Presentation
Title:

Getting Started in Program Analysis Research: Outline

Description:

National High School for Math and Science. American University in Bulgaria, 1997 ... Lori's Journey. Science/Math love: Started in chemistry at liberal arts college. ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 72
Provided by: poll9
Category:

less

Transcript and Presenter's Notes

Title: Getting Started in Program Analysis Research: Outline


1
Getting Started in Program Analysis Research
Outline
  • Background and useful skills
  • Ana
  • Using and developing analysis
  • Mary Lou
  • Identifying and building infrastructure
  • Lori
  • Evaluating your analysis
  • Ana

2
Ana Milanova
  • I am from Bulgaria
  • National High School for Math and Science
  • American University in Bulgaria, 1997
  • I have a degree in Business Administration
  • Rutgers University, PhD in CS, 2003
  • Now Assistant Professor at RPI
  • Research program analysis for software tools
  • Family
  • Husband Tony
  • Katarina, 5 and Petar, 2

3
Program Analysis Useful Background and Skills
4
Program Analysis
  • Static program analysis
  • Analyzes the source code of the program
  • Run-time behavior properties without running the
    program
  • E.g., The object values that flow to reference
    variable x are only of classes A and B, but not
    C.
  • Static analyses are conservative consider all
    possible run-time behaviors of the program

5
Program Analysis
  • Dynamic program analysis
  • Analyzes a set of program executions
  • Reasons about run-time behavior properties over
    observed executions
  • E.g., The object values that flowed to reference
    variable x during observed executions were only
    of classes A and B, but not C.
  • Dynamic analyses are incomplete consider only
    behaviors over particular executions
  • Goal combine with static analysis

6
Uses of Static Program Analysis
  • Compilers traditional application domain
  • Enables optimizing transformation
  • Software engineering tools
  • Static debugging, verification, security
  • Uncover difficult errors and security flaws
  • Testing
  • Evaluate and improve test suites
  • Software understanding
  • Calling structure
  • Complex dependences
  • Change impacts

7
Uses of Program Analysis
  • Analysis for compiler optimization
  • is different from
  • Analysis for software tools
  • Different requirements, different success
    criteria (more later)

8
Static Analysis Methodologies
  • Data-flow analysis
  • Constraint-based program analysis
  • Abstract interpretation
  • Type and effect systems
  • Model checking

9
Example Data-flow Analysis
1. i11 read x,y
  • Flow facts
  • Information that we are propagating
  • E.g., set of definitions (i,1), (i,4),(i,6)
  • Transfer functions
  • The effect of a statement on the incoming flow
    facts
  • E.g., statement i11 at 6 kills the incoming
    definition (i,4), and generates definition (i,6)

(i,1)
2. if xlty
(i,1)
(i,1)
3. p(i)
4. ij5
(i,4)
5. p(i)
(i,4)
6. i11
(i,1)
(i,6)
7. iii
10
Theory
  • Data-flow frameworks
  • Control-flow graph CFG
  • Space of flow facts L
  • Space of transfer functions F
  • Certain properties of L and F allow a general
    solution procedure
  • Fixed-point iteration
  • Termination the iterative computation terminates
  • Safety (correctness, soundness) the solution is
    conservative
  • For most problems the analysis produces noise

11
Theory and Practice
  • Analysis cost how much time, memory
  • Analysis precision how much noise
  • a.m() A more precise analysis a B, and a less
    precise analysis a A,B,C
  • Typically, there is a tradeoff between cost and
    precision!
  • In practice, we need to analyze very large
    programs, 100K LOC, even 1M LOC

12
Theory and Practice
  • Approximations - introduce noise
  • make the CFG smaller
  • make the set of flow facts smaller
  • make the transfer functions converge faster
  • Approximations are necessary
  • But be careful different approximations for
    different analyses

13
Standard Approximations
  • Flow-sensitive vs. flow-insensitive

x true x true x
true, y false y false x false,
y false x y
x true,false, y false
14
Standard Approximations
  • Context-sensitive vs. context-insensitive

Merged flow
A(bool X) this.f X a new A(true) b new
A(false)
a.f true/false
a.f true
b.f true/false
b.f false
a.f true,false, b.f true,false
a.f true, b.f false
15
Useful Background and Skills
  • Higher-level undergraduate or graduate courses
    on
  • Programming Languages, Compilers, Algorithms,
    Logic, Software Engineering, Architecture
  • Analytical and programming skills
  • Step1 Design a program analysis algorithm
  • Understand your target language (e.g., Java and
    C, C)
  • Step2 Implement the analysis algorithm
  • Understand the language(s) of the infrastructure
  • Step3 Evaluate analysis algorithm

16
Useful Resources
  • Books (my personal list)
  • Compilers Principles, Techniques and Tools by
    Aho, Sethi, Ullman, Ch. 10
  • An introduction to data-flow analysis
  • Program Analysis by Nielsen, Nielsen, Hankin
  • An excellent reference for advanced students
  • Model Checking by Clarke, Grumberg, Peled
  • Course material on the web
  • Classes taught by professors
  • My class (there are better ones, of course)
    www.cs.rpi.edu/milanova/csci6961/lectures/

17
Using and Developing Program Analysis
  • Mary Lou Soffa
  • University of Virginia

18
About Mary Lou Soffa
  • Confused about what I wanted to be
  • Ph.D. programs
  • Mathematics, Sociology Philosophy Environmental
    Acoustics disenchanted
  • Found what I really loved computer science
  • After 25 years at Pitt, moved to UVA
  • Small farm grow crops love my tractor
  • Passion increasing the participation of women
    and minorities in computer science
  • Professional achievement 24 Ph.D. students ½
    are women.

19
Program analysis
  • How to apply program analysis in your research
  • What are questions and what do you have to do

20
Solve a problem Program behavior static or
dynamic

Determine information needed

What parts of program are involved

Develop appropriate representation

Develop analysis

Develop algorithm
21
Have a goal program code
  • Problem
  • Improve performance
  • Understand program
  • Find errors
  • Locate cause of errors
  • Need to collect information about the program
    that helps you infer properties of program
  • Static or dynamic code

22
Determine information needed
  • What questions are you asking
  • What do you need to gather to answer questions
  • Examples
  • Statements needed to compute an expression
  • Values are always constant at a particular
    program point
  • Locations of dead statement
  • Branches that are correlated

23
Example redundancy
  • Remove redundancies with goal of improving
    performance
  • Redundant redundant expressions
  • Redundant loads
  • Redundant stores
  • Dead code
  • Static
  • Remove redundant expressions from program
    representation

24
Redundant expressions
  • Does the value need to be computed for correct
    semantics?
  • X A B
  • F C E
  • C C 1
  • If (cond) then R A B S C E
  • Else X A B A 6
  • End if
  • G AB

25
What parts of program involved
  • Given information you need, what parts of
    program are involved
  • Examples
  • branches and statements that change values in
    conditional
  • all possible execution paths
  • Array definitions and uses
  • Types
  • Loops

26
Example Redundant expressions
  • Expressions
  • Definitions
  • Control flow among definitions and expressions
  • Program paths

27
Program representation
  • Program representation that enables collection of
    information
  • Granularity
  • Source, intermediate, binary
  • Issues how to get representation from another
    representation

28
Example redundant expressions
  • Want to know how expressions flow
  • Is the value of an expression same as when
    expression used again
  • Need control flow graph with statements in nodes
    intermediate level
  • X A B

29
Available Expressions
  • Control flow graph

X A B F C E
C C 1
R A B S C E
X A B A 6
G AB
30
Formulate analysis over representation
  • How to gather information from representation
  • How many analyses
  • Direction of flow of analysis
  • Along all paths or any path
  • Local solution
  • Global solution

31
Example Redundant expressions
  • Local - basic block single entry/exit
  • What expressions are generated
  • What expressions are killed by a definition
  • Global Flow over flow graph
  • Forward flow
  • Must be true on all paths

32
Redundant Expressions
  • Control flow graph

X A B F C E
C C 1
A B
A B
A B
R A B S C E
X A B A 6
A B, CE
G A B
33
Develop analyses
  • Data flow equations use data flow framework
  • Algorithm
  • Preciseness
  • Expense

34
Data flow equations
  • Gen (B) all expressions
  • Kill (B) all definitions kill all incoming
    available expression
  • Out(B) Gen(B) ? (IN(B) Kill(B))
  • In(B) ? Out(j)

35
Dynamic Optimization
  • Static optimizations
  • Apply before execution
  • Dynamic Optimizations
  • Apply during execution redundancy expressions
  • Binary code
  • Program traces

36
B1
1. A 4 2. T1 AB 3. L1 T2 T1/C 4. If
T2 lt W go to L2 5. M T1 K 6. T3 M
1 7. L2 H I 8. M T3 - H 9. If T3 gt 0 go
to L3 10. Go to L1 11. L3 halt
B2
B3
B4
B5
B6
37
Program Trace
Binary code
  • A 4
  • T1 AB
  • T2 T1/C
  • If T2 !lt W jump out
  • H I
  • M T3 - H
  • If T3 gt 0 go to L3
  • T2 T1/C
  • If T2 !lt W jump out
  • M T1 K
  • T3 M 1
  • H I
  • M T3 - H
  • halt

38
Dynamic optimization
  • Note
  • Single entry multiple exits
  • No Loops
  • Need to Representation bring up a level from
    binary code

39
Applying optimizations
  • Not as complicated
  • But, cannot tolerate much overhead
  • Phases in static
  • Developed algorithm that can apply multiple
    optimizations
  • Demand driven
  • Limit study of dynamic optimizations

40
Conclusion
  • Need analysis in many different applications
  • Virtual execution enviroments
  • Multicore
  • Wireless sensor networks
  • Testing
  • Testing for wireless sensor networks
  • Testing for security

41
Identifying and Building Infrastructure
42
Loris Journey
  • Science/Math love Started in chemistry at
    liberal arts college.
  • Field Trip and first cs course -gt CS major.
  • Advisors strong push for grad school -gt U Pitt.
  • Took compilers course from Mary Lou -gt PhD in
    compiler optimization.
  • Big year 10/85-married Mark. 1/86-started at
    Rice. 4/86-PhD
  • Family The yankees returned north 3 years later!
  • University of Delaware 15 yrs. Visiting,
    Assistant, Associate, Full
  • Family Lauren (HS senior), Lindsay (16 and
    driving), Matt (11)
  • Support Mark, Mark, Mark, Mary Lou, Errol,
    Sandee, CRA-W
  • Currently software tools, testing, compiler
    optimization

43
Identifying and Building Infrastructure for
Analysis Research
  • What kinds of infrastructure do you need?
  • How to identify and build infrastructure
  • Examples

44
What kinds of infrastructure do you need?
Analysis Research and Evaluation
People
Analysis Framework Software
Labspace
Hardware
Workloads
45
Identifying Analysis Framework Software
- Short term - Long term
Determine Goals
- Needed - Desired (Prioritized)
Specify Requirements
- Peers/Experts - Technical papers - Internet
search
Search for Possibilities
Try Them Out
- Install Run Tests - Read docs - Examine
code - Try small task
Weigh Choices
- Meet Requirements? - Ease of Use/Change?...
46
Example Identifying Analysis Framework Software
Evaluate new analysis on Java On its own and in
client tool
Determine Goals
- Needed call graph, cfg, chg Realistic
environment/apps Easy to extend/build client tools
Specify Requirements
- Common environment is IDE, Java. ? Eclipse
platform
Search for Possibilities
Try Them Out
- Install explore - Write a small plugin - Use
call graph, chg, cfg for small task
Weigh Choices
- Learning curve vs Available analyses, realism
47
Implementing Your Analysis
  • Once you have decided on an infrastructure
  • Think Reuse!! Think modularity!!
  • Think prototype, but extensible and scalable
  • Test, test, test - try to be systematic
  • Debug not easy

48
Example Implementing My NL Analysis
  • Build small modular components -gt reuse
  • Analyzing method signatures to extract NL
  • Building program representation for NL
  • Traversing program rep
  • Building program rep for IR
  • Design reps to avoid loss of info -gt reuse
  • Ids and their roles and locations in code
  • Verb, Direct object rep -gt extensible

49
Managing the Evolving Software Infrastructure
  • Managing change over time and people
  • CVS, subversion
  • Tracking tasks, bugs, deadlines/goals
  • TRAC, bugzilla, gforge
  • Maintaining documentation
  • JavaDocs, Doxygen
  • Testing, testing, testing
  • Unit, system, regression -- test suites
  • Sounds like software engineering

50
Selecting Appropriate Hardware
- Short term - Long term
Determine Goals
- Needed - Desired (Prioritized)
Specify Requirements
Search for Possibilities
- Peers/Experts - System Staff
Weigh Choices
- Meet Requirements? - Costs within budget? -
Need to ask for money?
51
Gathering Good Workloads
Kind of Evaluation Desired
Case Studies
Controlled Experiment
Representative
Try to reduce threats to validity of
experiments - varied/similar - domain -
size - complexity/form - known and available to
others
Synthesized Benchmarks
52
Example Gathering Good Workloads
Kind of Evaluation Desired
Research Questions - How effective is our
FindConcept Tool versus other code search
tools? (versus lexical search and IR) (precision
and recall) - How does the human effort compare?
Case Studies
Representative
Try to reduce threats to validity of
experiments - varied/similar - domain -
size - complexity/form - known/available to
others
Sourceforge - very large - many cvs updates
(active) - varied in domain
53
Identifying Strong Students
  • Teach a compiler or program analysis course
    regularly
  • Identify students from the course
  • Ideal
  • Creative quick to understand analysis
  • good problem solver
  • hard working
  • good coder
  • good communicator good writer
  • show initiative and interest in analysis
  • Some training will be required.
  • Start Small. Create a Pipeline.

54
Building a Working Lab Space
  • Needs
  • - one workspace/computer/storage per grad student
  • - room for growth and undergrad researchers
  • - current technology minimize old machines
    maintenance?
  • - lab printer
  • - lab library of research-oriented background
    books
  • Make it somewhere students want to work
  • - posters/pictures/plants
  • - open and pleasant microwave, frig,
    coffeepot?
  • - all needed resources/supplies easily available
  • - conference room for larger research meetings

55
Static Program Analysis Evaluating Your Analysis
56
A Typical Program Analysis Research Project
  • Step 1 Design your analysis
  • Reason about safety
  • Reason about complexity in terms of program size
  • Step 2 Implement your analysis
  • Hard!
  • Complex and difficult to test, debug and verify
    a real problem
  • Step 3 EVALUATE!

57
Evaluation of a Compiler Analysis
  • Strict requirements for the analysis
  • Safety is crucial!
  • An unsafe analysis may miss an execution path,
    and result in a change of the original program
  • Analysis time (and space)
  • Constraint by normal compilation time
  • Objective success criteria
  • Show improvement in execution time
  • Show reduction in memory footprint

58
Evaluation of a Compiler Analysis
  • Established benchmarks
  • E.g., the SPEC JVM98
  • General evaluation of Java compilers
  • E.g., the DaCapo benchmark suite
  • Memory intensive Java applications
  • Ideally you would say something like this
  • our analysis increases compilation time by at
    most 10, and results in speed-up of 10-16 on
    the SPEC JVM98 benchmarks.

59
Evaluation of an Analysis for a Software Tool
  • Requirements for the analysis - not so strict
  • Relaxing safety is OK!
  • Analysis time (space) is not so crucial
  • Developers would definitely wait if the analysis
    finds difficult bugs such as data races and
    memory leaks
  • Success criteria - not so objective
  • Precision low noise
  • Practicality practical time/space requirements,
    works on 100K LOC
  • Usability of tool
  • Bugs found absolutely sure

60
Evaluation of an Analysis for a Software Tool
  • Precision is CRUCIAL noise is really bad!
  • E.g., there are 10 buffer overflow bugs in
    program P
  • Safe analysis A issues 1000 warnings, 10 are real
    and 990 are false positives
  • Unsafe analysis B issues 13 warnings, 8 are real
    and 5 are false positives
  • Analysis B is much more useful than analysis A!
  • Absolute precision done more and more often
  • Choose a subset of analyzed programs
  • Manually find the real solution
  • Compare with analysis solution
  • Precision how much noise is there?
  • Recall (if the analysis is unsafe) how much did
    it miss?
  • E.g., a.m() The real solution a B, a safe
    analysis solution a A,B,C. Precision - 67
    noise!

61
Evaluation of an Analysis for a Software Tool
  • Finding a benchmark set
  • Depends on analysis application
  • Large programs
  • Diverse programs, as many as it is feasible
  • Publicly available sourceforge.org
  • Look at benchmark suites in published work!
  • Ideally, you will have a large set of diverse
    programs, will show acceptable absolute precision
    (low false positive rate) and practical cost

62
Comparison with Existing Analysis
  • Well-known program analysis problems
  • Havent we solved that problem yet?
  • E.g., Points-to analysis
  • Design a new analysis A
  • Compare with best known analysis B
  • Show improvement in one or more of analysis
    cost, analysis precision

63
What Not to Do
  • Propose a new analysis without any evaluation
  • E.g., We describe this new great points-to
    analysis.
  • Design your own metric, different from
    established metrics
  • E.g., We propose a novel points-to analysis A
    and points-to analysis A which improves on A.
    Therefore, both A and A are great.
  • Use non-standard benchmark
  • Report on a subset the ones for which the
    analysis works

64
Questions
65
(No Transcript)
66
An Example Devirtualization in Object-oriented
Programs
  • Polymorphism and dynamic dispatch
  • class A void m()
  • class B extends A void m()
  • class C extends A void m()
  • Virtual call a.m() is dispatched at run-time,
    based on the class of the receiver, A, B or C
  • Powerful enables modern software engineering
  • But costly 13 of time spent in virtual dispatch
  • Analysis only B objects ever flow to a
  • Optimization virtual call a.m() gt direct call
    to B.m()

67
Uses of Static Program Analysis
  • Software engineering tools
  • Static debugging, verification, security
  • Uncover difficult errors and security flaws
  • Testing
  • Evaluate and improve test suites
  • Software understanding
  • Calling structure
  • Complex dependences
  • Change impacts
  • Many (unexplored) areas of application

68
Static Debugging
  • Analyze the program and look for bugs
  • Memory and pointer bugs memory leaks, null
    pointer dereferences, double frees, buffer
    overflows, etc.
  • Concurrency bugs races, deadlocks
  • Issue warnings
  • Microsoft
  • PREFix and PREfast tools in use since 2000
  • Many new tools developed
  • IBM
  • Tools for static debugging of production J2EE
  • Tools for security auditing of J2EE

69
Software Testing
  • Coverage-based testing
  • Improve test quality with good coverage
  • E.g., cover all possible receiver classes at
    virtual calls
  • Step 1 analyze the tested code
  • What are all possible receiver classes at virtual
    calls?
  • a.m() Analysis only B objects ever flow to a
  • Step 2 insert instrumentation
  • Step 3 run tests and report coverage
  • What were the receiver classes actually observed
    while running the tests?

compare
70
Software Understanding
X.n()
  • Navigate through calling structure
  • Reason about (im)mutability
  • Powerful, central to imperative programming
  • Many real bugs are due to unintended mutability
  • Q1 is a method A.m() side-effect free?
  • Q2 can a private field in a class A be mutated
    by untrusted clients of A (i.e., classes that use
    A)?
  • Reason about other quality attributes
  • Find code related to a change, etc.
  • Reverse engineering

B.m()
71
Program Representations
  • if (xlty) then z1 else z2
  • Control Flow Graph
  • Linear
  • 3-address statements
  • Flow of control
  • Syntax Tree
  • Tree
  • Parse tree of the program
Write a Comment
User Comments (0)
About PowerShow.com