Title: Secure Compiler Seminar 4/11 Visions toward a Secure Compiler
1Secure Compiler Seminar 4/11Visions toward a
Secure Compiler
- Toshihiro YOSHINO lttossy-2_at_yl.is.s.u-tokyo.ac.jp
gt - (D1, Yonezawa Lab.)
2Talk Agenda
- Brief Introduction about TAL and PCC
- Introduction of my Master Thesis
- Visions toward a Secure Compiler
3Brief Introduction about TAL and PCC
4Background
- Program verification Mathematically assure a
program has certain properties - Useful for security
- Memory access safety, information flow analysis,
- Verifying low-level code directly reduces TCB
- TCB Trusted Computing Base
- High-level code must be compiled after verified
? We must trust the compiler - Assemblers are much simpler than compilers
5Current Techniques and Problems
- Code signing
- Based on public key cryptography
- Can prove the genuineness of code
- Cannot prove the safety by itself
- Signature matching
- Use a dictionary of malicious patterns and match
target programs against it - Employed in many antivirus systems
- Pass does NOT mean safety
- Often unable to detect very new virus
6Proof-Carrying CodeNecula et al. 1997
- Technique for safe execution of untrusted code
- Code consumer does not need to trust the producer
- Code distributed with the proof of its safety
- Producer creates a proof
- Consumer verifies the proof against his security
policy
7Proof-Carrying CodeNecula et al. 1997
- Low consumers cost
- Consumer has only to verify the proof
- For example, by typechecking
- Tamper-proof
- If passed the check, code does NOT harm even if
modified - If modification makes the code fail the check,
the code will not run and it is safe - Otherwise code still obeys the consumers
security policy
8Typed Assembly LanguageMorrisett et al. 1999
- Extends a conventional assembly language with
static type checking - An instance of Proof-Carrying Code
- By type checking, it can guarantee
- Memory access safety
- Program never accesses outside the memory area
allocated for it - Interface consistency
- Type agreement of arguments / return value of
functions - etc.
9TAL System Illustrated
TAL System
Type Checker
Code withtype information
Assembler Linker
Code Consumer
10A Brief Example ofTAL Program
Type Information (Used to typechecking a program)
- fact
- movl eax, ecx
- movl 1, eax
- loop
- mull ecx
- decl ecx
- cmpl 0, ecx
- jg loop
- end
- eax B4
- eax B4, ecx B4
- eax B4
Program Code(Same as conventional assembly
languages)
11Related WorkTALK, TOS Maeda, 2005
- TALK TAL for Kernel
- Morrisett et al. uses garbage collector for
memory management in TAL - For OS, GC cannot be assumed
- Must implement memory management (malloc/free)
- TOS Typed Operating System
- An experimental OS written in TALK
12Introduction ofMy Master Thesis
13My Work for Master Thesis
- A Framework Using a Common Language to Build
Program Verifiers for Low-Level Languages - To help developers of program verifiers
- To be a common basis for verification of
low-level programs - Such as assembly and machine languages
14MotivationVerifiers are Hard to Develop
- Especially in low-level languages
- Complex semantics
- Semantics of each instruction is complex
- There are many instructions in a language
- Low portability
- Low-level languages heavily depend on the
underlying architecture - Accordingly, entire verifier also depends on the
underlying architecture
15Our Idea
- Split a verifier into three parts
- Design a common language,
- Translate the target program into that language,
and - Verify the translated program
- These parts are explicitly independent from each
other - Thus we can replace them easily
16Our Idea
Verifier
Translator
(2)
(3)
Result Success/Fail
VerificationLogic
(1)
Semantics of Common Language
17How Do We Solve the Problems?
- Coping with complex semantics
- Only translators care the semantics of the source
language - Translator is reusable
- Once description is done, we can reuse it
- Improving portability
- Verification logic is also reusable
- Once implemented, it can be used for other
architectures simply by replacing translators
18How Do We Solve the Problems?
Verifier
Translator
Translator
Result Success/Fail
VerificationLogic
VerificationLogic
VerificationLogic
Semantics ofCommon Language
19Overview of the Work
- Designed a framework to build program verifier
- Designed a common language ADL
- Discussed the correctness of translators
- Proved that the properties assured are preserved
throughout translation - Implemented the framework using Java
20ADL A Common Language
Verifier
Translator
Result Success/Fail
VerificationLogic
Semantics of Common Language
21ADL A Common LanguageDesign Concept
- ADL Architecture Description Language
- From observation of many architectures
- Data is stored in registers and memory, and
manipulates it according to program - Only jumps are sufficient for control flow
structure - Expressiveness
- Arithmetics, logical operations,
- C-like expressions
- Conservative semantics
- No need to describe indecent programs
- To simplify semantics
22ADL A Common LanguageOverview of the Language
- Imperative language which manipulates registers
and memory - 5 kinds of commands
- nop, error, assignment, goto, if-then-else
- Much like C than assembly
- Infix operators, parenthesized formulae
- Conditional execution by arbitrary condition
using if command - Only goto modifies control flow
- Unconditional branch
23ADL A Common LanguageA Brief Example
- data ...
- main
- ebx data
- eax 0
- goto lp
- lp
- eax eax 4(ebx)
- ebx 4(ebx 4)
- if ebx null then
- goto end
- else goto lp
- end
- goto end
- data ...
- main
- movl data, ebx
- movl 0, eax
- lp
- addl 0(ebx), eax
- movl 4(ebx), ebx
- cmpl 0, ebx
- je end
- jmp lp
- end
- jmp end
ADL
x86
24ADL A Common LanguageRestrictions
- ADL has a few restrictions by design
- Code and data are completely separated
- We assume NOTHING about memory layout of a
program - To simplify the semantics
- Some programs cannot be expressed
- However, most of decent programs can be written
even under these restrictions - To be discussed in the next slide
25ADL A Common Language gt RestrictionsSeparation
of Code and Data
- Do not treat code as data
- ADL programs cannot read / write code
- We cannot express the programs which uses dynamic
code generation - But, patterns of the generated code is fixed in
many cases ? Other solution is possible - For example, prepare a function for each pattern
of code
26ADL A Common Language gt RestrictionsNot Assume
Memory Layout
- Casting is prohibited
- ADL distinguishes integers and pointers
- In real architectures, pointers are not
distinguished from integers - Pointer arithmetic is restricted
- Only pointerinteger, pointer-pointer are defined
- Other operations returns undetermined
- Sufficient for array/structure operations and
offset calculation
27Program Translator
Verifier
Translator
Result Success/Fail
VerificationLogic
Semantics of Common Language
28Program Translator
- Translates low-level programs into ADL
- We must assure that program translators are
correct - Otherwise, we cannot trust the entire verifier
- Correctness is defined in the following discussion
29Program TranslatorWhat Is Correctness of Program
Translation?
- Instruction Function over machine states
- Correctness Correspondence between states of
two machines are preserved in translation
State
State
State
OriginalProgram
TranslatedProgram
State
State
State
30Program TranslatorHow to ConfirmCorrectness of
Translation
- Any programs result in corresponding states for
any input ? Correctness - Total inspection is NOT realistic
- Theorem prover would be useful
- Automatic proving is one of future work
- But how to confirm the correctness of the
description of the source language? - At this time, we take empirical approach
- Test several cases using an interpreter
31Verification Logic
Verifier
Translator
Result Success/Fail
VerificationLogic
Semantics of Common Language
32Verification Logic
- Verifies the properties of translated programs
- Function that takes a program and returns success
or fail - Soundness must be assured
- This is the task for the creator of a
verification logic - Here we do not discuss any further
- Definition Soundness of a verification logic
- Verification logic V State ? Bool
- The set S V(S) is closed about step execution
- If V(S), execution never falls into error state,
and - If V(S) and S?T (? means step execution), then
V(T)
33Verification LogicSoundness of Verification Logic
Machine States
S such that V(S)
Soundness V(S) ? S?Tthen V(T)
34Verification LogicProgram Translation and
Verification
- We proved the following theorem
- If program translator is correct, and
- Verification logic is sound, then
- ? Verification on original program and translated
program are equivalent - Closed subset can be defined on the states of
translation source language
35Implementation
- Framework
- ADL data structures
- ADL interpreter
- Used to confirm the correctness of translators
- Translator, verification logic interfaces
- Translation rule compiler
- Compiles translation rule into Java
implementation of a translator - And for proof of concept,
- Translator from Intel x86 and SPARC
- A simple type checker
36Related WorksFoundational TAL Crary, 2003
- TAL type checker is still large
- TALx86 type checker consists of approx. 23k LoC
in OCaml (!) - TCB is reduced by using a logical framework
- Designed a language called TALT on Twelf logical
framework Pfenning et al., 1999 - Proved GC safety of TALT by machine
- Correspondence between TALT and realistic
architectures are not discussed - TALT type system is fixed
- Our work allows replacement of verification logics
37Future Work
- Automatically confirm the correctness of
translation - Automatic testing
- Cooperating with emulators or debuggers
- Or, build a model and use a theorem prover
- Support dynamic memory allocation
- Currently all memory must be allocated statically
- Support concurrent programs
- Concurrency is not taken into consideration
- To apply for OSes, etc., concurrency takes an
important role
38Visions towarda Secure Compiler
39What Is Secure Compiler?
- A compiler which produces certified code
- For example, TAL code as output
- Like Popcorn compiler in TALx86
- Safe dialect of C ? TALx86
- A compiler which assures correct compilation
(optionally) - Like credible compiler Rinard, 1999
- Reduces TCB
40Motivation
- Infrastructure has been built
- TALK, TOS Maeda, 2005
- Verifier framework Yoshino, 2006
- Next we have to build a house on it!
- Most people do not want to write low-level code
directly - ? Secure Compiler
41Toward Secure World
- If we built a secure compiler
- Memory-error-free systems
- Prevent memory-error-based attacks
- OS kernel, core libraries, network server
- Writing secure code
- Vulnerable code will result in verification
failure - So code security will be improved
- Rest to be discovered
42Tasks to Do
- Determine what properties to assure
- Memory access safety? Information flow?
- Must be mechanically checkable
- Design the verification logic
- Use verifier framework?
- Design the language
- Target TAL-base? ADL?
- ADL can be used as certified language
- Register allocation is done, so simple mapping
will be possible - Source ???