An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University - PowerPoint PPT Presentation

About This Presentation
Title:

An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Description:

... 5 took off on its maiden flight. 40 seconds into its flight it veered off course and exploded. ... For the next two years, virtually every research ... – PowerPoint PPT presentation

Number of Views:90
Avg rating:3.0/5.0
Slides: 202
Provided by: pete65
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University


1
An Introduction toProof-Carrying CodePeter
LeeCarnegie Mellon University
  • Lecture 1
  • October 29, 2001

ConCert Meeting
2
Plan
  • Today Show and tell.
  • Cartoons
  • Some history
  • Special J compiler
  • Demo
  • Next time Technical details.
  • Lfi and Oracle-based checking
  • Safety policies
  • Compiler strategy and annotations
  • Engineering considerations
  • Ideas for ConCert-related projects

3
(No Transcript)
4
Arianne 5
  • 40 seconds into its flight it veered off course
    and exploded.
  • It was later found to be an error in reuse of a
    software component.
  • For the next two years, virtually every research
    presentation used this picture.
  • On June 4, 1996, the Arianne 5 took off on its
    maiden flight.

5
Better, Faster, Cheaper
  • In 1999, NASA lost both the Mars Polar Lander and
    the Climate Orbiter.
  • Later investigations determined software errors
    were to blame.
  • Orbiter Component reuse error.
  • Lander Precondition violation.

6
USS Yorktown
After a crew member mistakenly entered a zero
into the data field of an application, the
computer system proceeded to divide another
quantity by that zero. The operation caused a
buffer overflow, in which data leaked from a
temporary storage space in memory, and the error
eventually brought down the ship's propulsion
system. The result the Yorktown was dead in the
water for more than two hours.
7
Programmable mobile devices
By 2003, one in five people will own a mobile
communications device. Nokia expects to sell 500M
Java-enabled phones in 2003. Most of these
devices will be power and memory limited.
8
Security Attacks
  • According to CERT, the majority of security
    attacks exploit
  • input validation failure
  • buffer overflow
  • VBS

http//www.cert.org/summaries/CS-2000-04.html
9
BSOD embarrassments
10
Observations
  • Failures often due to simple problems in the
    details.
  • Reuse is critical but perilous.
  • Performance still matters a lot.

11
Safety Engineering
  • Small theorems about large programs would be
    useful.
  • Need clearly specified interfaces and checking of
    interface compliance.
  • Must not sacrifice performance.

12
The Code Safety Problem
Please install and execute this.
13
Code Safety
Code
Trusted Host
14
Approach 4Formal Verification
Code
But really really really hard and must be correct.
Trusted Host
15
A Key Idea Explicit Proofs
Code
Certifying Prover
Proof Checker
Proof
Trusted Host
16
A Key Idea Explicit Proofs
Code
Certifying Prover
Proof
Proof Checker
17
Proof-Carrying CodeNecula Lee, OSDI96
A
rlrrllrrllrlrlrllrlrrllrrll
B
18
Proof-Carrying Code
Code
Certifying Prover
Proof
Proof Checker
19
Automation viaCertifying Compilation
Certifying Compiler
Certifying Prover
Proof Checker
20
The Role ofProgramming Languages
  • Civilized programming languages can provide
    safety for free.
  • Well-formed/well-typed ? safe.
  • Idea Arrange for the compiler to explain why
    the target code it generates preserves the safety
    properties of the source program.

21
The Role ofJava in this Short Course
  • In recent years, Java has been the main focus of
    my work.
  • Java is just barely a civilized programming
    language.
  • We routinely do better than this.

22
Java
  • Java is probably a worthwhile subject of
    research.
  • However, it contains many outrageous and mostly
    inexcusable design errors.
  • As researchers, we should not forget that we have
    already done much better, and must continue to do
    better in the future.

23
Note
  • Our current approach seems to work for many
    problems.
  • But it is the only one we have tried there are
    many others.
  • PCC is a general concept and we have just barely
    scratched the surface.

24
Overview of Our Approach
OK, but let me quickly look over the instructions
first.
Please install and execute this.
Code producer
Host
25
Overview of Our Approach
Code producer
Host
26
Overview of Our Approach
This store instruction is dangerous!
Code producer
Host
27
Overview of Our Approach
Can you prove that it is always safe?
Code producer
Host
28
Overview of Our Approach
Yes! Heres the proof I got from my certifying
Java compiler!
Can you prove that it is always safe?
?
Code producer
Host
29
Overview of Our Approach
Your proof checks out. I believe you because I
believe in logic.
?
Code producer
Host
30
Some History
31
History early 90s
  • Fox project starts building the FoxNet
  • Need to control memory layout of data
  • Words, bytes, etc. (endianness? alignment?)
  • Boxed vs unboxed data (efficiency? control?)
  • Packet headers (how to write packet filters?)
  • ML not expressive enough, and compiler technology
    is inadequate
  • Harper invents intentional polymorphism, typed
    intermediate languages, and type-directed
    compiling
  • Biagioni, et al., extend SML design

32
History mid 90s
  • Question Can these ideas be used in a
    production-quality compiler for a big language
    like ML?
  • Morrisett and Tarditi build TIL
  • General hints on IL design
  • Encouraging signs that optimizations are OK
  • Stone and Harper design the MIL
  • Lots of work, world-wide, on type-directed
    compiling
  • Work begins on TILT

33
History mid 90s
  • An easy observation in 1995
  • Types in TIL are not carried all the way down to
    the final target code
  • The idea of enclosing LF encodings of proofs with
    code is floating around
  • Lee and Necula work on this, but get nowhere
  • Many problems, such as optimizations
  • Necula goes to DEC SRC to intern with Detlefs and
    Nelson
  • Works on extending ESC to catch memory leaks in
    Modula-3 programs
  • The next Fall, takes Franks Constructive Logic
    course

34
History 1996
  • Necula and Lee write several standard BPF packet
    filters in hand-optimized Alpha assembly code.
  • Simple operational semantics for a core safe
    Alpha
  • Checks safety conditions for each instruction
    execution
  • Proof system for real Alpha
  • Encoded in LF
  • Proofs generated and checked using Elf
  • Results in self-certified code, later
    proof-carrying code
  • Plus proof representations, certifying
    compilation, safety policies (incl. resource
    bounds)
  • Inspires significant follow-on and new work at
    Cornell, Princeton, INRIA, and many other places

35
History 1999
  • CMU releases PCC to Cedilla Systems Incorporated.
  • Patent 6,128,774. Oct.2000, Safe to execute
    verification of software (Necula and Lee)
  • Patent 6,253,370. June 2001, Method and
    apparatus for annotating a computer program to
    facilitate subsequent processing of the program
    (Abadi, Ghemawat, and Stata)
  • In less than 26 months, a complete optimizing
    ahead-of-time PCC compiler for Java.

36
Applets, Not Craplets
37
History Today
  • Strong similarities in TILT, PCC, TAL,
  • Compiler design is changing
  • Some day, all compilers will be certifying

38
History Today
  • Are proofs really necessary?
  • Probably not
  • And they are messy, compared to types
  • But as a verification mechanism, proofchecking
    seems to have some possibly significant
    engineering advantages over typechecking

39
The primary contribution
  • Proof engineering.
  • PCC more clearly defined the proof-engineering
    problem
  • How to do checking
  • with minimal overhead and restriction on
    programs,
  • with minimal time and space overhead in checking,
  • with minimal size and complexity of the checker,
  • and with minimal need for changes when the proof
    system changes

40
K Virtual Machine
  • Designed to support the CLDC.
  • Must fit into lt128KB.
  • Must have fast bytecode verification.
  • kJava class files must be Java-compatible.
  • Divides bytecode verification into two stages.

41
kJava and KVM
kJava Compiler
kJava Preverifier
Verifier
42
KVM Verification
  • Preverification is performed by the code
    producer.
  • Uses global (iterative) analysis to compute the
    types of stack slots and local vars at every join
    point.
  • Second stage is performed by class loader.
  • Simple linear scan verifies correctness of
    join-point annotations.

43
KVM Examplefrom Frank Yellin
0. aload_0 1. astore_1 2. goto 10 Long Number
ltgt 5. aload_1 6. invokeStatic
nextValue(Number) 9. astore_1 Long Number
ltgt 10. aload_1 11. invokeVirtual
intValue() 14. ffne 5 17. return
static void test(Long x) Number y x
while (y.IntValue() ! 0) y
nextValue(y) return y
44
KVM Verification
  • The second stage verifier is a 10KB program that
    requires
  • a single scan of the code, and
  • lt100 bytes of run-time storage.
  • Impressive!
  • This is Java verification done right.

45
Join-Point Annotations
  • All of these approaches to certified code make
    use of join-point typing annotations to reduce
    code verification to a simple problem.
  • They are essentially the classical loop
    invariants of the Dijkstra/ Hoare program
    verification approach.

46
Overheads
  • In TAL and PCC we observe relatively large
    annotations sizes (10-20), sometimes much more.
  • Unknown for kJava.
  • Research question
  • Can we reduce this size?
  • Checking speed and storage space is also a
    problem.

47
The Special J Compiler
48
High-Level Architecture
Code
Verification condition generator
Checker
Explanation
Agent
Safety policy
Host
49
High-Level Architecture
Code
Verification condition generator
Checker
Explanation
Agent
Safety policy
Host
50
The VCGen
  • The verification condition generator (VCGen)
    examines each instruction.
  • It is a symbolic evaluator that essentially
    implements the operational semantics of a safe
    version of the machine language.
  • It checks some simple properties directly.
  • E.g., direct jumps go to legal addrs.
  • Informally, it invokes the Checker when
    dangerous instructions are encountered.

51
The VCGen, contd
  • Examples of dangerous instructions
  • memory operations
  • procedure calls
  • procedure returns
  • For each such instruction, VCGen creates a
    verification condition (VC).

52
High-Level Architecture
Code
Verification condition generator
Checker
Explanation
Agent
Safety policy
Host
53
The Checker
  • When given a VC, the Checker attempts to
    determine its validity.
  • Sometimes, it consults the explanation for help
    with this.
  • If successful, it allows VCGen to proceed.
  • The set of allowable VCs and their valid proofs
    is defined by the safety policy.

54
High-Level Architecture
Code
Verification condition generator
Checker
Explanation
Agent
Safety policy
Host
55
The Safety Policy
  • The safety policy is defined by an inference
    system that defines
  • the language of predicates (for VCs)
  • the axioms and inference rules for writing valid
    proofs of VCs.
  • specifications (pre/post-conditions) for each
    required entry point in the code.

56
Operational Semantics
  • The VCGen is derived (by hnd) directly from the
    operational semantics of a safe machine.
  • The calls to the checker establish that the code
    always makes progress (or halts normally) in the
    operational semantics.
  • This leads to a standard notion of soundness.

57
What Cant Be Enforced?
  • Liveness properties currently cannot be enforced
    by this architecture.
  • In practice, however, safety properties are often
    good enough.

58
Architecture
Ginseng
Special J
Code producer
Host
59
Architecture
Java binary
Native code
Special J
VCGen
Annotations
VC
Axioms
Proof checker
Proof
Code producer
Host
60
Architecture
Java binary
Native code
Certifying compiler
VCGen
Annotations
VC
VCGen
Axioms
VC
Axioms
Proof generator
Proof checker
Proof
Code producer
Host
61
Java Virtual Machine
Java Verifier
Checker
Proof-carrying code
JVM
JNI
62
Show either the Mandelbrot or NBody3D demo.
63
Crypto Test Suite ResultsCedilla Systems
sec
On average, 72.8 faster than Java, 37.5 faster
than Java with a JIT.
64
Java Grande Suite v2.0 Cedilla Systems
sec
65
Java Grande Bench Suite Cedilla Systems
ops
66
Ginseng
15KB, roughly similar to a KVM verifier (but
with floating-point).
VCGen
4KB, generic.
Checker
19KB, declarative and machine-generated.
Safety Policy
Dynamic loading Cross-platform support
22KB, some optional.
67
Example Source Code
public class Bcopy public static void
bcopy(int src, int dst)
int l src.length int i 0
for(i0 iltl i) dsti srci

68
Example Target Code
L7 ANN_LOOP(INV (csubneq ebx 0), (csubneq
eax 0), (csubb edx ecx), (of rm mem),
MODREG (EDI,EDX,EFLAGS,FFLAGS,RM)) cmpl esi,
edx jae L13 movl 8(ebx, edx, 4),
edi movl edi, 8(eax, edx, 4) incl edx cmpl
ecx, edx jl L7 ret L13 call __Jv_ThrowBadA
rrayIndex ANN_UNREACHABLE nop L6 call __Jv_Thr
owNullPointer ANN_UNREACHABLE nop
ANN_LOCALS(_bcopy__6arrays5BcopyAIAI,
3) .text .align 4 .globl _bcopy__6arrays5BcopyAIAI
_bcopy__6arrays5BcopyAIAI cmpl 0,
4(esp) je L6 movl 4(esp), ebx movl 4(ebx),
ecx testl ecx, ecx jg L22 ret L22 xorl e
dx, edx cmpl 0, 8(esp) je L6 movl 8(esp),
eax movl 4(eax), esi
69
Cut Points
  • Each loop entry must be annotated as a cut point.
  • VCGen requires this so that checking can be
    performed in a single scan of the code.
  • As a convenience, the modified registers are also
    declared in the cut annotations.

70
Example Source Code
public class Bcopy public static void
bcopy(int src, int dst)
int l src.length int i 0
for(i0 iltl i) dsti srci

71
Example Target Code
L7 ANN_LOOP(INV (csubneq ebx 0), (csubneq
eax 0), (csubb edx ecx), (of rm mem),
MODREG (EDI,EDX,EFLAGS,FFLAGS,RM)) cmpl esi,
edx jae L13 movl 8(ebx, edx, 4),
edi movl edi, 8(eax, edx, 4) incl edx cmpl
ecx, edx jl L7 ret L13 call __Jv_ThrowBadA
rrayIndex ANN_UNREACHABLE nop L6 call __Jv_Thr
owNullPointer ANN_UNREACHABLE nop
ANN_LOCALS(_bcopy__6arrays5BcopyAIAI,
3) .text .align 4 .globl _bcopy__6arrays5BcopyAIAI
_bcopy__6arrays5BcopyAIAI cmpl 0,
4(esp) je L6 movl 4(esp), ebx movl 4(ebx),
ecx testl ecx, ecx jg L22 ret L22 xorl e
dx, edx cmpl 0, 8(esp) je L6 movl 8(esp),
eax movl 4(eax), esi
72
A Note about Memory
  • We define a type for valid heap memory states
  • mem exp
  • and operators for reading and writing heap
    memory
  • (sel M A)
  • (upd M A E)

73
The VCGen Process (1)
_bcopy__6arrays5BcopyAIAI cmpl 0, src
je L6 movl src, ebx movl 4(ebx),
ecx testl ecx, ecx jg L22
ret L22 xorl edx, edx cmpl 0,
dst je L6 movl dst, eax movl
4(eax), esi L7 ANN_LOOP(INV
A0 (type src_1 (jarray jint)) A1 (type dst_1
(jarray jint)) A2 (type rm_1 mem) A3 (csubneq
src_1 0) ebx src_1 ecx (sel4 rm_1
(add src_1 4)) A4 (csubgt (sel4 rm_1
(add src_1 4)) 0) edx 0 A5 (csubneq dst_1
0) eax dst_1 esi (sel4 rm_1 (add
dst_1 4))
74
The VCGen Process (2)
L7 ANN_LOOP(INV (csubneq ebx 0),
(csubneq eax 0), (csubb edx ecx), (of
rm mem), MODREG (EDI, EDX,
EFLAGS,FFLAGS,RM)) cmpl esi, edx jae
L13 movl 8(ebx,edx,4), edi movl
edi, 8(eax,edx,4)
A3 A5 A6 (csubb 0 (sel4 rm_1 (add src_1
4))) edi edi_1 edx edx_1 rm rm_2 A7
(csubb edx_1 (sel4 rm_2 (add dst_1
4)) !!Verify!! (saferd4 (add src_1 (add
(imul edx_1 4) 8)))
75
The Checker (1)
The checker is asked to verify that
(saferd4 (add src_1 (add (imul edx_1 4) 8)))
under assumptions
A0 (type src_1 (jarray jint)) A1 (type dst_1
(jarray jint)) A2 (type rm_1 mem) A3 (csubneq
src_1 0) A4 (csubgt (sel4 rm_1 (add src_1 4))
0) A5 (csubneq dst_1 0) A6 (csubb 0 (sel4
rm_1 (add src_1 4))) A7 (csubb edx_1 (sel4 rm_2
(add dst_1 4))
The checker looks in the PCC for a proof of this
VC.
76
The Checker (2)
In addition to the assumptions, the proof may use
axioms and proof rules defined by the host, such
as
szint pf (size jint 4) rdArray4 Mexp
Aexp Texp OFFexp pf (type A
(jarray T)) -gt pf (type M mem) -gt
pf (nonnull A) -gt pf (size T 4) -gt
pf (arridx OFF 4 (sel4 M (add A 4))) -gt
pf (saferd4 (add A OFF)).
77
Checker (3)
A proof for
(saferd4 (add src_1 (add (imul edx_1 4) 8)))
in the Java specification looks like this
(excerpt)
(rdArray4 A0 A2 (sub0chk A3) szint (aidxi 4
(below1 A7)))
This proof can be easily validated via LF type
checking.
78
VCGenSummary
  • VCGen is a symbolic evaluator for the object
    language.
  • It essentially implements a reference
    interpreter, except
  • it uses symbolic values in order to model all
    possible executions, and
  • instead of performing run-time checks, it asks a
    Checker to verify the safety of dangerous
    instructions.

79
Safety Policies
  • More formally, we begin by defining the
    small-step operational semantics of a machine
    (called the s86).
  • ?, ?, pc ? instr ? ?, pc
  • We define the machine so that only safe
    executions are defined.

program
program counter
register state
80
Safety Policies, contd
  • For convenience we choose the s86 to be a
    restriction of the x86.
  • Hence all s86 programs will execute faithfully on
    a real x86.
  • Except that on some programs in which the x86
    does not execute, the x86 might do something
    weird.
  • The goal then is to prove that any given program
    always makes progress (or returns) in the s86.
  • With such a proof, the x86 is then just as good
    as an s86.

81
Verification Conditions
  • The point of the verification conditions, then,
    is to provide such progress theorems for each
    instruction in the program.
  • In other words, a VCs validity says that the
    corresponding instruction has a defined
    execution in the s86 operational semantics.

82
Symbolic Evaluator
  • We can define the verification condition
    generator (VCGen) via a symbolic evaluator
  • SE?,?,?0,Post(i, ?, L)
  • The result of symbolic evaluation is a
    conjunction of VCs, so the overall progress
    theorem is then
  • Pre ? SE?,?,?0,Post(i, ?, L)

annotations
LF signature
entry point
postcondition
83
Soundness
  • For particular operational semantics (a safe x86
    and a safe Alpha), we have presented theorems
    that say, essentially
  • Thm If Pre ? SE?,?,?0,Post(i, ?, L), then
    execution of ?, given Pre and ?0, and starting
    from entry point i, will always make progress (or
    return).

84
Getting from Concept to Implementation
  • In an actual implementation, it is also handy to
    have a bit more than just a VC generator.
  • Precise syntax for VCs.
  • Pre/post-conditions for each entry point expected
    by the host in any downloaded code.
  • Precisely specified logical system for proving
    the VCs.
  • Verifier for meta-data.

85
Safety Policy Implementations
  • Safety policies are thus given in four parts
  • A verification-condition generator (VCGen).
  • A specification of the pre post conditions for
    all required procedures.
  • A specification of the inference rules for
    constructing valid proofs.
  • Plug-ins for performing meta-data verification.
  • LF (Elf syntax) is used for the rule and pre/post
    specifications, C for the VCGen and plug-ins.

86
C?!_at__at_!
  • The use of C to define and implement the VCGen
    is, at best, expedient and at worst dubious.
  • However, since any code-inspection system must
    parse object files (not trivial!) and understand
    the instruction set, this seems to have practical
    benefits.
  • Clearly, a more formal approach would be
    desirable.

87
How Do We Know That Its Right?
88
How Do We Know That Its Right?
  • Although the papers and dissertation follow a
    rigorous development leading to a soundness
    result, in practice it is tempting to hack in new
    things in the LF signature

89
ExampleJava Type-Safety Specification
  • Our largest example of a safety-policy
    specification is for the SpecialJ Java
    native-code compiler.
  • It contains about 140 inference rules.
  • Roughly speaking, these rules can be separated
    into 5 classes.

90
Safety PolicyRule Excerpts
1. Standard syntax and rules for first-order
logic.
Syntax of predicates.
/\ pred -gt pred -gt pred. \/ pred -gt pred -gt
pred. gt pred -gt pred -gt pred. all (exp -gt
pred) -gt pred. pf pred -gt type. truei pf
true. andi Ppred Qpred pf P -gt pf Q -gt pf
(/\ P Q). andel Ppred Qpred pf (/\ P Q)
-gt pf P. ander Ppred Qpred pf (/\ P Q) -gt
pf Q.
Type of valid proofs, indexed by predicate.
Inference rules.
91
Safety PolicyRule Excerpts
2. Syntax and rules for arithmetic and equality.
csuble means ? in the x86 machine.
exp -gt exp -gt pred. ltgt exp -gt exp -gt
pred. eq_le Eexp E'exp pf (csubeq E E')
-gt pf (csuble E E'). moddist
Eexp E'exp Dexp pf ( (mod ( E E')
D) (mod ( (mod E D) E') D)). sym Eexp
E'exp pf ( E E') -gt pf ( E' E). ltgtsym
Eexp E'exp pf (ltgt E E') -gt pf (ltgt E'
E). tr Eexp E'exp E''exp pf (
E E') -gt pf ( E' E'') -gt pf ( E E'').
92
Safety PolicyRule Excerpts
3. Syntax and rules for the Java type system.
jint exp. jfloat exp. jarray exp -gt
exp. jinstof exp -gt exp. of exp -gt exp -gt
pred. faddf Eexp E'exp pf (of E
jfloat) -gt pf (of E' jfloat) -gt pf (of
(fadd E E') jfloat). ext Eexp Cexp
Dexp pf (jextends C D) -gt pf (of E
(jinstof C)) -gt pf (of E (jinstof D)).
93
Safety PolicySample Rules
4. Rules describing the layout of data structures.
aidxi Iexp LENexp SIZEexp pf
(below I LEN) -gt pf (arridx (add (imul I
SIZE) 8) SIZE LEN). wrArray4 Mexp Aexp
Texp OFFexp Eexp pf (of A
(jarray T)) -gt pf (of M mem) -gt pf
(nonnull A) -gt pf (size T 4) -gt
pf (arridx OFF 4 (sel4 M (add A 4))) -gt pf
(of E T) -gt pf (safewr4 (add A OFF) E).
This sel4 means the result of reading 4 bytes
from heap M at address A4.
94
Safety PolicySample Rules
5. Quick hacks.
nlt0_0 pf (csubnlt 0 0). nlt1_0 pf (csubnlt 1
0). nlt2_0 pf (csubnlt 2 0). nlt3_0 pf
(csubnlt 3 0). nlt4_0 pf (csubnlt 4 0).
Sometimes unclean things are put into the
specification...
95
The Basic Trick
  • Recall the bcopy program

public class Bcopy public static void
bcopy(int src, int dst)
int l src.length int i 0
for(i0 iltl i) dsti srci

96
Unoptimized Loop Body
L11 movl 4(ebx), eax cmpl eax,
edx jae L24 L17 cmpl 0, 12(ebp) movl 8(eb
x, edx, 4), esi je L21 L20 movl 12(ebp),
edi movl 4(edi), eax cmpl eax,
edx jae L24 L23 movl esi, 8(edi, edx,
4) movl edi, 12(ebp) incl edx L9
ANN_INV(ANN_DOM_LOOP, LF_(/\ (of rm mem ) (of
loc1 (jarray jint) ))_LF, RB(EBP,EBX,ECX,ESP,FTO
P,LOC4,LOC3)) cmpl ecx, edx jl L11
Bounds check on src.
Bounds check on dst.
Note L24 raises the ArrayIndex exception.
97
Unoptimized Code is Easy
  • In the absence of optimizations, proving the
    safety of array accesses is relatively easy.
  • Indeed, in this case it is reasonable for VCGen
    to verify the safety of the array accesses.
  • As the optimizer becomes more successful,
    verification gets harder.

98
Role of Loop Invariants
  • It is for this reason that the optimizers
    knowledge must be conveyed to the theorem prover.
  • Essentially, any facts about program values that
    were used to perform and code-motion
    optimizations must be declared in an invariant.

99
Optimized Loop Body
Essential facts about live variables, used by the
compiler to eliminate bounds-checks in the loop
body.
L7 ANN_LOOP(INV (csubneq ebx 0), (csubneq
eax 0), (csubb edx ecx), (of rm mem),
MODREG (EDI,EDX,EFLAGS,FFLAGS,RM)) cmpl esi,
edx jae L13 movl 8(ebx, edx, 4),
edi movl edi, 8(eax, edx, 4) incl edx cmpl
ecx, edx
100
Certifying Compiling andProving
  • Intuitively, we will arrange for the Prover to be
    at least as powerful as the Compilers optimizer.
  • Hence, we will expect the Prover to be able to
    reverse engineer the reasoning process that led
    to the given machine code.
  • An informal concept, needing a formal
    understanding! (Type theory is essential here)

101
What is Safety, Anyway?
  • If the compiler fails to optimize away a
    bounds-check, it will insert code to perform the
    check.
  • This means that programs may still abort at
    run-time, albeit with a well-defined exception.
  • Is this safe behavior?

102
Compiler Development
  • The PCC infrastructure catches many (probably
    most) compiler bugs early.
  • Our standard regression test does not execute the
    object code!
  • Principle Most compiler bugs show up as safety
    violations.

103
Example Bug
L42 movl 4(eax), edx testl edx,
edx jle L47 L46 set up for loop L44
enter main loop code jl L44 jmp
L32 L47 fldz fldz L32 return sequence ret
104
Example Bug
L42 movl 4(eax), edx testl edx,
edx jle L47 L46 set up for loop L44
enter main loop code jl L44 jmp
L32 L47 fldz L32 return sequence ret
105
Another Example Bug
Suppose bcopys inner loop is changed
L7 ANN_LOOP( ) cmpl esi, edx jae L13 movl
8(ebx, edx, 4), edi movl edi, 8(eax, edx,
4) incl edx cmpl ecx, edx jl L7 ret
106
Another Example Bug
Suppose bcopys inner loop is changed
L7 ANN_LOOP( ) cmpl esi, edx jae L13 movl
8(ebx, edx, 4), edi movl edi, 8(eax, edx,
4) addl 2, edx cmpl ecx, edx jl L7 ret
Again, PCC spots the danger.
107
Yet Another
class Floatexc extends Exception public
static int f(int x) throws Floatexc return x
public static int g(int x) return x
public static float handleit (int x, int y)
float fl0 try xf(x) fl1
yf(y) catch (Floatexc b) flfl
return fl
108
Yet Another
Install handler pushl _6except8Floatexc_C cal
l __Jv_InitClass addl 4, esp Enter try
block L17 movl 0, -4(ebp) pushl 8(ebp) cal
l _6except8Floatexc_MfI addl 4,
esp movl eax, ecx A handler L22 flds -4
(ebp) fadds -4(ebp) jmp L18
109
Another Exampleby George Necula
void fir (int data, int dlen, int
filter, int flen) int i, j for (i0
iltdlen-flen i) int s 0 for (j0
jltflen j) s filterj dataij
datai s
110
Compiled Example
/ rddata, rdldlen, rffilter, rflflen /
ri 0 sub t1 rdl, rfl L0 CUT(ri,rj,rs,t2,t3,
t4,rm) le t2 ri, t1 jeq t2, L3 rs 0 rj
0 L1 CUT(rj,rs,t2,t3,t4) lt t2 rj, rfl jeq
t2, L2 ult t2 rj, rfl jeq t2, Labort ld t3
rf 4rj add t2 ri, rj
ult t4 t2, rdl jeq t4, Labort ld t2 rd
4t2 mul t2 t3, t2 add rs rs, t2 add rj
rj, 1 jmp L1 L2 ult t2 ri, rdl jeq t2,
Labort st rd 4ri rs add ri ri, 1 jmp
L0 L3 ret Labort call abort
111
The Safety Policy
  • The safety policy defines verification conditions
    of the form
  • true, E E
  • saferd(M, E), safewr(M, E, E)
  • array(EA, ES, EL), vector(EA, ES, EL)
  • Prefir array(rd,4,rdl),
    vector(rf,4,rfl)
  • Postfir true

112
VCGen Example
Set rdcd rdlcdl rfcf rflcfl rmcm
Assume precondition array(cd,4,cdl)
vector(cf,4,cfl)
Set ri 0
ri 0 sub t1 rdl, rfl L0 CUT(ri,rj,rs,t2,t3,
t4,rm) le t2 ri, t1 jeq t2, L3 L3 ret
Set t1 sub(cdl,cfl)
Set rici rjcj rscs t2c2 t3c3
t4c4 rmcm
Set t2 le(ci, sub(cdl,cfl))
Assume not(le(ci, sub(cdl,cfl)))
Check postcondition Check rd,rdl,rf,rfl have
initial values
113
VCGen Example
Set ri 0
ri 0 sub t1 rdl, rfl L0 CUT(ri,rj,rs,t2,t3,
t4,rm) le t2 ri, t1 jeq t2, L3 rs 0 rj
0 L1 CUT(rj,rs,t2,t3,t4) lt t2 rj, rfl jeq
t2, L2 L2 ult t2 ri, rdl jeq t2,
Labort st rd 4ri rs
Set t1 sub(cdl,cfl)
Set rici rjcj rscs t2c2 t3c3
t4c4 rmcm
Set t2 le(ci, sub(cdl,cfl))
Assume le(ci, sub(cdl,cfl))
Set rs 0
Set rj 0
Set rjcj rscs t2c2 t3c3 t4c4
Set t2 lt(cj, cfl)
Assume not(lt(cj, cfl))
Set t2 ult(ci, cdl)
Assume ult(ci, cdl)
Check safewr(cm,
add(cd,mul(4,ci)),cs)
114
More on the Safety Policy
  • Some of the inference rules in the LF signature

rdarray saferd(M,add(A,mul(S,I))) lt-
array(A,S,L), ult(I,L). rdvector
saferd(M,add(A,mul(S,I))) lt-
vector(A,S,L), ult(I,L). wrarray
safewr(M,add(A,mul(S,I)),V) lt-
array(A,S,L), ult(I,L).
115
The Checker
  • When the Checker is invoked on
  • safewr(cm, add(cd,mul(4,ci)), cs)
  • There are assumptions
  • assume0 ult(ci,cdl).
  • assume1 not(lt(cj,cfl)).
  • assume2 le(ci, sub(cdl,cfl)).
  • assume3 vector(cf,4,cfl).
  • assume4 array(cd,4,cdl).

116
The Checker, contd
  • The VC
  • safewr(cm, add(cd,mul(4,ci)), cs)
  • can be verified by using the rule
  • wrarray safewr(M,add(A,mul(S,I)),V) lt-
  • array(A,S,L), ult(I,L).
  • and assumptions
  • assume0 ult(ci,cdl).
  • assume4 array(cd,4,cdl).

117
Proof Representation
  • A simple (but somewhat naïve) representation of
    the proof is simply the sequence of proof rules
  • wrarray, assume4, assume0

118
Optimized Code
  • The previous example was somewhat simplified.
  • More realistic code is optimized, usually based
    on inferences about integer values.
  • Such optimizations require that arithmetic
    invariants be placed in the cut points.

119
Optimized Example
/ rddata, rdldlen, rffilter, rflflen /
ri 0 sub t1 rdl, rfl L0 CUT(rigt0,ri,rj,)
le t2 ri, t1 jeq t2, L3 rs 0 rj
0 L1 CUT(rjgt0,rj,rs,) lt t2 rj, rfl jeq
t2, L2 ld t3 rf 4rj add t2 ri, rj
ld t2 rd 4t2 mul t2 t3, t2 add rs
rs, t2 add rj rj, 1 jmp L1 L2 st rd 4ri
rs add ri ri, 1 jmp L0 L3 ret
120
VCGen Example
Set ri 0
ri 0 sub t1 rdl, rfl L0 CUT(rigt0,
ri,rj,rs,t2,t3,t4,rm le t2 ri, t1 jeq
t2, L3 rs 0 rj 0
Set t1 sub(cdl,cfl)
Set rici rjcj rscs t2c2 t3c3
t4c4 rmcm
Assume gt(ci,0)
Set t2 le(ci, sub(cdl,cfl))
Assume le(ci, sub(cdl,cfl))
121
Practical Considerations
122
Trusted Computing Base
  • The trusted computing base is the software
    infrastructure that is responsible for ensuring
    that only safe execution is possible.
  • Obviously, any bugs in the TCB can lead to unsafe
    execution.
  • Thus, we want the TCB to be simple, as well as
    fast and small.

123
VCGens Complexity
  • Fortunately, proofs can be quite small, and
    proofchecking can be quite simple, small, and
    fast.
  • VCGen, at core, is also simple and fast.
  • But in practice it gets to be quite complicated.

124
VCGens Complexity
  • Some complications
  • If dealing with machine code, then VCGen must
    parse machine code.
  • Maintaining the assumptions and current context
    in a memory-efficient manner is not easy.
  • Note that Suns kVM does verification in a single
    pass and only 8KB RAM!

125
VC Explosion
ab gt (xc gt safef(y,c) ? xltgtc gt
safef(x,y)) ? altgtb gt (ax gt safef(y,x) ?
altgtx gt safef(a,y))
Exponential growth in size of the VC is
possible. And it actually happens in practice!
Precondition safef(i,j)
126
VC Explosion
a b
(ab gt P(x,b,c,x) ? altgtb gt P(a,b,x,x)) ? (?a,
c. P(a,b,c,x) gt ac gt safef(y,c)
? altgtc gt safef(a,y))
a x
c x
INV P(a,b,c,x)
a c
a y
c y
Growth can usually be controlled by careful
placement of just the right join-point
invariants.
f(a,c)
127
Stack Slots
  • Each procedure will want to use the stack for
    local storage.
  • This raises a serious problem because a lot of
    information is lost by VCGen (such as the value)
    when data is stored into memory.

128
Stack Slots
  • We avoid this problem by assuming that procedures
    use up to 256 words of stack as registers.
  • Main restriction
  • No indirect addressing of stack slots.

129
Callee-save Registers
  • Standard calling conventions dictate that the
    contents of some registers be preserved.
  • These callee-save registers are specified along
    with the pre/post-conditions for each procedure.
  • The preservation of their values must be verified
    at every return instruction.

130
Function specifications
ANN_FUNCTION(__Jv_instanceof, LF_(/\ (of loc3
(jinstof _4java4lang6Object_C)) (/\ (of
(loc2 jint) (/\ (jelemtype loc1)
(of rm mem))))_LF, LF_(/\ (of eax jbool)
(of rm mem))_LF, RB(ESP,EBP,FTOP), 3,4)
131
Annotations used by Special J
  • ANN_CLASS
  • ANN_FUNCTION
  • ANN_LOCALS
  • ANN_INV
  • ANN_DOM_LOOP
  • ANN_DOMINATOR
  • ANN_SYMBOLADDR
  • ANN_CALLJAVAVIRTUAL
  • ANN_CALLJAVAINTERFACE
  • ANN_JUMPTHROUGHTABLE
  • ANN_INSTALLEDJAVAHANDLER
  • ANN_UNINSTALLEDJAVAHANDLER
  • ANN_UNREACHABLE

132
ANN_CLASS and ANN_FUNCTION
  • Normally, ANN_FUNCTION is not used. Instead,
    ANN_CLASS declares that an object file implements
    a Java class.

public final class Factor1 ANN_CLASS(_7Fac
tor1_vt)
133
ANN_LOCALS
  • As a convenience for VCGen, the number of stack
    slots is declared for each method.

public static void combineTags(Node n, int i)
ANN_LOCALS(__7Factor1_McombineTagsL4NodeXI,
8) .text .align 4 .globl __7Factor1_McombineTagsL4
NodeXI __7Factor1_McombineTagsL4NodeXI
134
ANN_INV / ANN_DOM_LOOP
  • Loop invariants.

ANN_INV(ANN_DOM_LOOP, LF_(/\ (nonnull loc2 )
(/\ (of rm mem ) (of eax (jinstof
_4java4util12ListIterator_vt) )))_LF,
RB(EBP,ESP,FTOP,LOC4,LOC3,LOC2))
135
ANN_DOMINATOR
  • Dominating join points are marked.

ANN_DOMINATOR .L536_dom jle .L237 .L237
ANN_INV(.L536_dom, LF_(/\ (nonnull loc3 )
(/\ (of rm mem ) (of loc3 (jinstof
_4Node_vt) )))_LF, RB(EBP,ESP,FTOP,LOC5,LOC4))
136
Invariants
  • Special J currently emits the followings kinds of
    invariants
  • true, false
  • x y, x ltgt y (x,y regs or consts)
  • x lt y (signed and unsigned)
  • x t
  • jint, jbool,
  • Jclassdesc
  • jinstof(C)
  • implSpecIntf(x,y,z)

137
Virtual method invocation
public static void combineTags(Node n, int i)
if(igt0) if(!n.isString()) Iterator
iter n.getSubtrees() while(iter.hasNext(
)) combineTags((Node)(iter.next()),
i-1)
138
Virtual method invocation, contd
  • For the loop body

pushl 1
vmethod ANN_SYMBOLADDR(0) pushl _4java4util8Iter
ator_vt class pushl -4(ebp)
object call __Jv_LookupInterfaceMethod
addl 12, esp pushl -4(ebp) ANN_CALLJAVAVIRTUA
L(_4java4util8Iterator_vt, 1) next
method call eax addl 4, esp ANN_SYMBOLADDR(0
) pushl _4Node_vt pushl 0 pushl eax call __
Jv_checkCast
139
Jump tables
public static final void closeToString (int t)
throws IOException if(!isEmpty(t))
switch (getColor(t)) case -1 break //
no color case 0 singleTagString('r',
noSecond, false) break case 1
singleTagString('g', noSecond, false) break
case 2 singleTagString('b', noSecond, false)
break case 3 singleTagString('c',
noSecond, false) break case 4
singleTagString('m', noSecond, false) break
case 5 singleTagString('y', noSecond, false)
break case 6 singleTagString('k',
noSecond, false) break case 7
singleTagString('w', noSecond, false) break

140
Jump tables, contd
ANN_DOMINATOR .L181_dom jae .L23 .L33
ANN_JUMPTHROUGHTABLE(.L32, 9) ANN_SYMBOLADDR(0)
jmp .L32(, ebx, 4) .L24 pushl 0 pushl 0
pushl 119 call __3Tag_MsingleTagStringCCZ addl
12, esp jmp .L23 .L25
.L32 .long .L23 .long .L31 .long
.L30 .long .L29 .long .L28 .long .L27 .long
.L26 .long .L25 .long .L24
141
Exception handlers
public Object clone() try return
super.clone() catch (CloneNotSupportedExcep
tion e) return null
142
Exception handlers, contd
__7Context_Mclone pushl ebp movl esp,
ebp call __Jv_GetExcHandler ANN_SYMBOLADDR(0) p
ushl .L11 ANN_SYMBOLADDR(0) pushl _4java4lang26
CloneNotSupportedException_vt pushl ebp pushl
1 pushl (eax) ANN_INSTALLJAVAHANDLER(.L11) movl
esp, (eax) pushl 8(ebp) ANN_DOMINATOR .L14_do
m call __4java4lang6Object_Mclone addl 4,
esp .L9 movl eax, 8(ebp) call __Jv_GetExcHa
ndler movl (esp), ebx ANN_UNINSTALLJAVAHANDLER(
1)
.L11 ANN_INV(.L14_dom, LF_(of rm mem
)_LF, RB(EBP,ESP,FTOP,LOC3,LOC2)) nop .L12
xorl eax, eax movl ebp, esp popl ebp re
t
143
Efficient Representation and Validation of Proofs
144
Goals
  • We would like a representation for proofs that is
  • compact,
  • fast to check,
  • requires very little memory to check,
  • and is canonical, in the sense of accommodating
    many different logics without requiring a
    reimplementation of the checker.

145
Three Approaches
  • 1. Direct representation of a logic.
  • 2. Use of a Logical Framework.
  • 3. Oracle strings.
  • We will reject (1).
  • We consider only (2) and (3).

146
Logical Framework
  • For representation of proofs we use the Edinburgh
    Logical Framework (LF).

147
LFi
Skip?
148
LF Example in Elf Syntax
exp type pred type pf pred -gt type true
pred /\ pred -gt pred -gt pred gt pred -gt
pred -gt pred all (exp -gt pred) -gt pred truei
pf true andi Ppred Rpred pf P -gt pf R
-gt pf (/\ P R) andel Ppred Rpred pf (/\ P
R) -gt pf P impi Ppred Rpred (pf P -gt pf
R) -gt pf (gt P R) alli Pexp -gt pred
(Xexp pf (P X)) -gt pf (all P) alle Pexp
-gt pred Eexp pf (all P) -gt pf (P E)
149
LF as a Proof Representation
  • LF is canonical, in that a single typechecker for
    LF can serve as a proofchecker for many different
    logics specified in LF. See Avron, et
    al. 92
  • But the efficiency of the representation is poor.

150
Size of LF Representation
  • Proofs in LF are extremely large, due to large
    amounts of repetition.
  • Consider the representation of P ? P ? P
    for some predicate P
  • The proof of this predicate has the following LF
    representation

(gt P (/\ P P))
(impi P (/\ P P) (Xpf P andi P P x x))
151
Checking LF
  • The nice thing is that typechecking
  • is enough for proofchecking. The theorem is
    in the LF paper.
  • But the proofs are extremely large.

(impi P (/\ P P) (Xpf P andi P P X X)) pf
(gt P (/\ P P))
152
Implicit LF
  • A dramatic improvement can be achieved by using a
    variant of LF, called Implicit LF, or LFi.
  • In LFi, parts of the proof can be replaced by
    placeholders.

(impi (X andi X X)) pf (gt P (/\
P P))
153
Soundness of LFi
  • The soundness of the LFi type system is given by
    a theorem that states
  • If, in context ?, a term M has type A in LFi (and
    ? and A are placeholder-free), then there is a
    term M such that M has type A in LF.

154
Typechecking LFi
  • The typechecking algorithm for LFi is given in
    Necula Lee, LICS98.
  • A key aspect of the algorithm is that it avoids
    repeated typechecking of reconstructed terms.
  • Hence, the placeholders save not only space, but
    also time.

155
Effectiveness of LFi
  • In experiments with PCC, LFi leads to substantial
    reductions in proof size and checking time.
  • Improvements increase nonlinearly with proof size.

156
The Need for Improvement
  • Despite the great improvement of LFi, in our
    experiments we observe that, in practice, LFi
    proofs are 10-200 the size of the code.

157
How Big is a Proof?
  • A basic question is how much essential
    information is in a proof?
  • In this proof,
  • there are only 2 uses of rules and in each case
    they were the only rule that could have been used.

(impi (X andi x x)) pf (gt P (/\
P P))
158
Improving the Representation
  • We will now improve on the compactness of proof
    representation by making use of the observation
    that large parts of proofs are deterministically
    generated from the inference rules.

159
Additional References
  • For LF
  • Harper, Honsell, Plotkin. A framework for
    defining logics. Journal of the ACM, 40(1),
    143-184, Jan. 1993.
  • Avron, Honsell, Mason, Pollack. Using typed
    lambda calculus to implement formal systems on a
    machine. Journal of Automated Reasoning, 9(3),
    309-354, 1992.

160
Additional References
  • For Elf
  • Pfenning. Logic programming in the LF logical
    framework. Logical Frameworks, Huet Plotkin
    (Eds.), 149-181, Cambridge Univ. Press, 1991.
  • Pfenning. Elf A meta-language for deductive
    systems (system description). 12th International
    Conference on Automated Deduction, LNAI 814,
    811-815, 1994.

161
Oracle-Based Checking
162
Neculas ExampleSyntax of Girards System F
ty type int ty arr ty -gt ty -gt ty all
(ty -gt ty) -gt ty
exp type z exp s exp -gt exp lam (exp
-gt exp) -gt exp app exp -gt exp -gt exp of
exp -gt ty -gt type
163
Neculas ExampleTyping Rules for System F
tz of z int ts Eexp of E int -gt of (s
E) int tlam Eexp-gtexp T1ty T2ty
(Xexp of X T1 -gt of (E X) T2) -gt
of (lam E) (arr T1 T2) tapp E1exp
E2exp Tty T2ty of E1 (arr T2
T) -gt of E2 T2 -gt of (app E1 E2)
T tgen Eexp Tty-gtty (T1ty
of E (T T1)) -gt of E (all T) tins Eexp
Tty-gtty T1ty of E (all T) -gt of
E (T T1)
164
LF Representation
  • Consider the lambda expression
  • It is represented in LF as follows

(?f.(f ?x.x) (f 0)) ?y.y
app (lam Fexp app (app F (lam Xexp X))
(app F 0)) (lam Yexp Y)
165
Neculas Example
  • Now suppose that this term is an applet, with the
    safety policy that all applets must be well-typed
    in System F.
  • One way to make a PCC is to attach a typing
    derivation to the term.

166
Typing Derivation in LF
(tapp (lam Fexp (app (app F (lam Xexp X))
(app F 0))) (lam (Xexp X)) (all
(Tty arr T T)) int (tlam (all (Tty
arr T T)) int (Fexp (app (app F
(lam Xexp X)) (app F 0)))
(FexpFTof F (all (Tty arr T T))
(tapp (app F (lam Xexp X)) (app F 0) int
int (tapp F (lam Xexp X)
(arr int int) (arr int
int) (tins F (Tty arr
T T) (arr int int)
FT) (tlam int int
(Xexp X)
(XexpXTof X int XT)))
(tapp F 0 int int (tins F
(Tty arr T T) int FT)
t0)))) (tgen (lam Yexp Y)
(Tty arr T T) (Tty (tlam T T
(Yexp Y) (Yexp
YTof Y T YT)))))
167
Typing Derivation in LFi
(tapp (all (T arr T T)) int (tlam
(FFTof F (all (Tty arr T T))
(tapp int (tapp
(arr int int) (arr int int)
(tins FT) (tlam
(XXT XT))) (tapp
int int (tins FT) t0)))) (tgen
(T (tlam (Y YT YT)))))
I think. I did this by hand!
168
LF Representation
  • Using 16 bits per token, the LF representation of
    the typing derivation requires over 2,200 bits.
  • The LFi representation requires about 700 bits.
  • (The term itself requires only about 360 bits.)

Skip ahead
169
A Bit More about LFi
  • To convert an LF term into an LFi term, a
    representation algorithm is used. NeculaLee,
    LICS98
  • Intuition When typechecking a term
  • c M1 M2 Mn A (in a context ?)
  • we know, if A has no placeholders, that some of
    the M1Mn may appear in A.

170
A Bit More about LFi, contd
  • For example, when the rule
  • is applied at top level, the first two arguments
    are present in the term
  • and thus can be elided.

tapp E1exp E2exp Tty T2ty
of E1 (arr T2 T) -gt of E2 T2 -gt
of (app E1 E2) T
app (lam Fexp app (app F (lam Xexp X))
(app F 0)) (lam Yexp Y)
171
A Bit More about LFi, contd
  • A similar trick works at lower levels by relying
    on the fact that typing constraints are solved in
    a certain order (e.g., right-to-left).
  • See the paper for complete details.

172
Can We Do Better?
tz of z int ts Eexp of E int -gt of (s
E) int tlam Eexp-gtexp T1ty T2ty
(Xexp of X T1 -gt of (E X) T2) -gt
of (lam E) (arr T1 T2) tapp E1exp
E2exp Tty T2ty of E1 (arr T2
T) -gt of E2 T2 -gt of (app E1 E2)
T tgen Eexp Tty-gtty (T1ty
of E (T T1)) -gt of E (all T) tins Eexp
Tty-gtty T1ty of E (all T) -gt of
E (T T1)
173
Determinism
  • Looking carefully at the typing rules, we
    observe
  • For any typing goal where the term is known but
    the type is not
  • 3 possibilities tgen, tins, other.
  • If type structure is known, only 2 choices, tapp
    or other.

174
How MuchEssential Information?
(tapp (lam Fexp (app (app F (lam Xexp X))
(app F 0))) (lam (Xexp X)) (all
(Tty arr T T)) int (tlam (all (Tty
arr T T)) int (Fexp (app (app F
(lam Xexp X)) (app F 0)))
(FexpFTof F (all (Tty arr T T))
(tapp (app F (lam Xexp X)) (app F 0) int
int (tapp F (lam Xexp X)
(arr int int) (arr int
int) (tins F (Tty arr
T T) (arr int int)
FT) (tlam int int
(Xexp X)
(XexpXTof X int XT)))
(tapp F 0 int int (tins F
(Tty arr T T) int FT)
t0)))) (tgen (lam Yexp Y)
(Tty arr T T) (Tty (tlam T T
(Yexp Y) (Yexp
YTof Y T YT)))))
175
How MuchEssential Information?
  • There are 15 applications of rules in this
    derivation.
  • So, conservatively
  • ?log2 3? ? 15 30 bits
  • In other words, 30 bits should be enough to
    encode the choices made by a type inference
    engine for this term.

176
Oracle-based Checking
  • Idea Implement the proofchecker as a
    nondeterministic logic interpreter whose
  • program consists of the derivation rules, and
  • initial goal is the judgment to be verified.
  • We will avoid backtracking by relying on the
    oracle string.

Skip ahead
177
Why Higher-Order?
  • The syntax of VCs for the Java type-safety policy
    is as follows
  • The LF encodings are simple Horn clauses (and
    requiring only first-order unification).
    Higher-order features only for implication and
    universal quantification.

E x c E1 En F true F1 ? F2
?x.F E E ? F
178
Why Higher-Order?
  • Perhaps first-order Horn logic (or perhaps
    first-order hereditary Harrop formulas) is
    enough.
  • Indeed, first-order expressions and formulas seem
    to be enough for the VCs in type-safety policies.
  • However, higher-order and modal logics would
    require higher-order features.

179
A SimplificationA Fragment of LF
  • Level-0 types.
  • A a A1 ? A2
  • Level-1 types (?-normal form).
  • B a M1 Mn B1 ? B2 ?xA.B
  • Level-0 kinds.
  • K Type A ? K
  • Level-0 terms (?-normal form).
  • M ?xA.M c M1 Mn x M1 Mn

180
LF Fragment
  • This fragment simplifies matters considerably,
    without restricting the application to PCC.
  • Level-0 types to encode syntax.
  • Level-1 types to encode derivations.
  • No level-1 terms since we never reconstruct a
    derivation, only verify that one exists.

181
LF Fragment, contd
Level-0 types.
ty type exp type of exp -gt ty -gt type
Level-1 type family.
Disallowing level-2 and higher type families
seems not to have any practical impact.
182
Logic InterpreterGoals
For Neculas example, the interpreter will be
started with the goal
?tty. of E t
183
Naïve Interpreter
solve(B1 ? B2) ?xB1. solve(B2) solve(?xA.B)
?xA. solve(B) solve(a M1 Mn)
subgoals(B, a M1 Mn) where B
is the type of a level-1
constant or a level-1
quantified
variable (in scope), as selected
by the
oracle. subgoals(B1 ? B2, B) ?xB1.
solve(B2) subgoals(?xA.B, B) ?xA.
solve(B) subgoals(a M1 Mn, a M1 Mn)

M1 M1 ? ? Mn Mn
184
Back to the example
  • Consider
  • solve(of E t)
  • This consults the oracle.
  • Since there are 3 level-1 constants that could be
    used at this point, 2 bits are fetched from the
    oracle string (to select tapp).

185
Higher-Order Unification
  • The unification goals that remain after solve are
    higher-order and thus only semi-decidable.
  • A nondeterministic unification procedure (also
    driven by the oracle string) is used.
  • Some standard LP optimizations are also used.

186
Certifying Theorem Proving
187
Certifying Theorem Proving
  • Time does not allow a description here.
  • See
  • Necula and Lee. Proof generation in the
    Touchstone theorem prover. CADE00.
  • Of particular interest
  • Proof-generating congruence-closure and simplex
    algorithms.

188
Resource Constraints
  • Bounds on certain resources can be enforced via
    counting.
  • In a Reference Intepreter
  • Maintain a global counter.
  • Increment the count for each instruction
    executed.
  • Verify for each instruction that the limit is not
    exceeded.
  • Use the compiler to optimize away the counting
    operations.

189
Ten Good Things About PCC
  • 1. Someone else does all the really hard work.
  • 2. The host system changes very little.
  • ...

190
Logic as a lingua franca
Code
Certifying Prover
Proof
Proof Engine
191
Logic as a lingua franca
Code
Policy
Certifying Prover
VC
Proof
Proof Checker
192
Logic as a lingua franca
iadd iaload ...
Policy
Certifying Prover
VC
Proof
Proof Checker
193
Logic as a lingua franca
addl eax,ebx testl ecx,ecx jz NULLPTR movl
4(ecx),edx cmpl edx,ebx jae ARRAYBNDS movl
8(ecx.ebx.4).edx ...
addl eax, testl ecx,e jz NULLPTR movl
4(ecx), cmpl edx,eb jae ARRAYBND movl 8(ecx.
Policy
Certifying Prover
VC
Proof
Proof Checker
Adequacy of dynamic checks and wrappers can be
verified.
194
Logic as a lingua franca
add eax,ebx movl 8(ecx,ebx,4) ...
Policy
Certifying Prover
VC
Proof
Proof Checker
Safety of optimized code can be verified.
195
Ten Good Things About PCC
  • 3. You choose the language.
  • 4. Optimized (unsafe) code is OK.
  • 5. Verifies that your optimizer and dynamic
    checks are OK.

196
The Role ofProgramming Languages
  • Civilized programming languages can provide
    safety for free.
  • Well-formed/well-typed ? safe.
  • Idea Arrange for the compiler to explain why
    the target code it generates preserves the safety
    properties of the source program.

197
Certifying CompilersNecula Lee, PLDI98
  • Intuition
  • Compiler knows why each translation step is
    semantics-preserving.
  • So, have it generate a proof that safety is
Write a Comment
User Comments (0)
About PowerShow.com