Title: New Approaches to Mobile Code: Reconciling Execution Efficiency with Provable Security
1New Approaches to Mobile CodeReconciling
Execution Efficiencywith Provable Security
UC Irvine project transprose transporting
programs securely
- Michael Franz
- University of California, Irvine
- July 2001
2Introduction
- mobile code is an enabling technology
- download functionality as needed
- handheld, untethered devices, information
appliances - platform-independent ? identical code can run on
PDAs, desktop machines, even supercomputers - but, many unresolved issues with respect to
- performance of the mobile program (on the target)
- performance of the mobile code distribution
mechanism - protecting the host against malicious mobile
programs - guarding a mobile programs secrets against a
malicious host
3Guiding Overall Objective
- make mobile code practical, so that
- eventually, native code will need to exist only
transiently, created on-the-fly and consumed on
the spot - while mobile code will be used as the storage and
distribution medium
4Context
- dynamic code-generation technology is approaching
maturity and processors are becoming fast enough
to sustain it (in real time) - this is rapidly diminishing the value of binary
compatibility - moreover, dynamic optimization techniques yield
better code than static compilation - exploit actual processor parameters (caches, )
- live profiling data may be available
- gt mobile code will define future platform(s)
5Mobile Code Security
- most approaches are based on some type-safe
programming language - host systems publish their policies in terms of
type-safe APIs - conformance to that interface is then guaranteed
by the mobile code transportation scheme - semantically equivalent to transporting source
code - however, for efficiency and quality of dynamic
code generation, usually want to transport a
format closer to the machine while still
preserving source-program type-safety semantics
6Existing Practice Java
- the Java Virtual Machine is the de-facto standard
format for distributing mobile programs - the JVM has an instruction set that has been
designed specifically for representing Java
programs - interestingly enough, there still are JVM
programs for which no legal equivalent Java
source program exists - there are also legal Java programs that are
rejected by all possible JVM bytecode verifiers
Staerk00 - security is obtained by verifying the JVM
bytecode, essentially a symbolic execution of the
program
7Security vs. Efficiency
- the Java Virtual Machine's instruction format is
not very capable in transporting the results of
program analyses and optimizations - as a consequence, when Java byte-code is
transmitted, each recipient must repeat most of
the analyses and optimizations that could have
been performed just once at the origin - the main reason why Java byte-code has these
deficiencies is to allow verification by the
recipient
8Security vs. Efficiency
- for example, a code producer often has
information about the redundancy of a type or
index check - but this fact cannot be communicated safely to
the code consumer - not in a manner that the
recipient can be sure that this is not a false
claim inserted by a malicious third party - similar concerns inhibit common compiler
optimizations such as common subexpression
elimination at the code producers side
9An Alternative Approach PCC
- instead of executing the program symbolically at
the receivers site (which is time consuming and
complex), the code producer attaches a proof
that the code is correct - the proof shortcuts the verification checking
a given solution is often much simpler than
finding it in the first place - the Java KVM for embedded devices uses a kind of
PCC (stack maps) that may become a standard for
Java
10A Third Approach
- instead of verifying or checking, we have been
been investigating a class of mobile code
representations that can provably encode only
legal programs - security is obtained by construction
- the need for verification disappears
- our approach can provide the identical security
guarantees as the Java Virtual Machine, but it
can express most of them statically as a
well-formedness property of the encoding itself - in our solution, an incoming mobile program may
not do the intended task, but it will not do
anything bad - for any definition of bad that
can be cast into a type system - interestingly enough, such intrinsically secure
mobile code is also denser than virtual machine
code, and permits to generate better object code,
and faster
11A Third Approach Two Variants
- we have in fact designed not just one, but two
alternative mobile-code representations, both of
which provide security by construction - they differ in the semantic level at which they
describe the mobile program - high-level close to the source language but
with supporting compiler-related information - low-level as close to what a modern code
generator back-end needs without being
target-machine specific
12Rationale for Multi-Track Approach
- the relative trade-offs (encoding density vs.
decoding/dynamic compilation speed vs. code
quality) are completely unknown and can only be
determined by collecting experience with actual
prototypes - by implementing both the high-level and the
low-level solution, we are exploring the design
space rather than designing an ad-hoc solution
13Low-Level Encoding PLDI01
- SafeTSA preserves control and dataflow
information as well as full typing for each
intermediate result - it is based on SSA form, a representation that is
also used internally by a number of important
state-of-the-art research compilers for Java,
e.g., - IBM T.J. Watson Lab Jalapeño
- Microsoft Marmot
- Sun Microsystems HotSpot Server
- SafeTSA is far easier to parse into a form useful
for code optimization than JVM-code
14Current Status and Results
- based on Martin Oderskys Pizza front-end
- can compile all of Java to safeTSA
- prototype run-time environment almost finished
will provide full interoperability between
safeTSA and JVM-based class files - can mix and match both formats with dynamic
loading - call-backs from JVM to safeTSA are ugly
- safeTSA representation is surprisingly small
15High-Level Encoding Babel01
- ultra-compact representation using grammar-based
compression of abstract syntax trees - goal is to transport the source program along
with as much compiler-related support information
as possible
16Schematic Overview
Source Parser
CodeGenerator
classic Frontend
AST Encoder
AST Decoder
classic Backend
PPM-Model Arithmetic Encoder
PPM-Model Arithmetic Decoder
011000101010
Compression / Decompression
17Compression Overview
- Parsing get AST from source
- Serialize get stream of symbols from AST
- Modeling use context and abstract grammar to
build predictive statistical model - Coding use arithmetic coding with model
18Types of nodes in AST
- String, Integer, Terminal
- List e.g. Block BlockStatement
- Aggregate e.g. IF cond thenbranch elsebranch
- Choice e.g. BinOp Plus Minus
- Information is in choice nodes
- want to guess which choice is taken
19Transmitting an AST
- any predefined serialization will do
- we use depth first (pre-order)
- when serialized, most info in AST is redundant,
e.g. - order and kind of kids of aggregate nodes known
- this is because we use knowledge of the grammar
- must encode index of choice made at choice nodes
20Prediction by Partial Match (PPM)
- dynamically maintain counts of characters seen
after various contexts - contexts may be of various lengths
- eg. for abcd, contexts for d are
- length 1 context c
- length 2 context bc
- length 3 context abc
- predict characters in current context by looking
at what occurred previously
21Maintaining Contexts
a
b
c
d
22Adapting PPM To Work On Trees
- each node is a symbol
- the context is path from root to the current
node in the AST - problem in DFS, what when we reach leaf node and
go back up to ancestor? - pop context all active nodes moved up one
position to their parents (in context tree)
23Encoding
- PPM is used to model the choices made at choice
nodes, i.e. associate a probability with every
choice - these probabilities are used to drive an
arithmetic coder to output bits
24Compressing Constants
- constants (strings, integers, names) are a
significant fraction of source - to compress make table of constants, and refer
to them by their index in this table - further compress maintain different tables for
strings, names etc. reduces number of bits in
index - currently exploring more sophisticated context
modeling ideas for compressing constants
25AST Compression Example
AST for i i 1
Relevant grammar rules Stmt IfWhileAssign. A
ssign Lvalue Expr Lvalue FieldVarAccess Exp
r UnaryBinary Binary BinOp Expr Expr
Choice nodes
Preorder traversal Stmt Assign Lvalue VarAccess i
Expr Binary Expr VarAccess i Expr
Literal IntLiteral 1 BinOp
26AST Compression Example
Context tree
AST for i i 1
27AST Compression Example
AST for i i 1
Context tree
28AST Compression Example
AST for i i 1
Context tree
Model Prob(j) 0.3 Prob(k) 0.5 Prob(i) 0.2
Send model and choice i to arithmetic coder
29Status and Results
- compressor/decompressor prototype written in
Python - completely generic can be used with any
abstract grammar - have implemented the Java abstract grammar
- works with single Java source files as well as
entire packages. - comparison for Java class-file compression with
Pughs results (best published Java compressor)
30Results Classes
Classes from Suns javac package - all sizes in
bytes
31Results Archives
compressed collections of classes - all sizes in
bytes
- compressed ASTs are 5-50 smaller than Pughs
- 3-8 times smaller than uncompressed class files
or JAR files
32Performance-Enhancing Information
- now raise the semantic level of the grammar
- e.g. Escape Analysis
- an object that doesnt escape its defining
scope can be allocated on the stack rather than
on the heap - this optimization alone can often double
performance - the analysis itself is very difficult to do, but
the results of the analysis are easy to verify - augment the type system by escaping/non-escaping
- make this part of the encoding scheme itself
- e.g., gt a non-escaping object cannot be assigned
to a variable from an enclosing scope
33Insights So Far
- abstract syntax trees viable as a mobile code
format - can be highly compressed
- Java archives by factor of 3-8
- 5-50 better than Java bytecode specific
compression by Pugh
34Overall Project Achievements
- lead the way to a genuine improvement over
virtual machine transportation formats - security without need for validation
- tamper-proof performance-improving information
- innovative and generic program compression method
as a useful by-product of this effort
35Task Schedule
- Y1 Milestones
- source-level representation gt Java compression
- low-level representation
- core calculus representation
- Y2 Milestones
- system prototypes
- trade-off analysis
- encoding format comprehensive definition
- End of Project
- system deliverable
- comprehensive documentation
1999
2000
2001
2002
- investigate
- multiple source languages
- graph-based encoding schemes
- proof-carrying code
- investigate
- requirements ofoptimizing code generators
- integration of security vs. compiler-related data
- investigate
- mutual interaction of security, efficiency, and
compression density - security of system
36Mobile Code Security Revisited
- provided through type-safe programming language
and type-safe APIs - semantically equivalent to transporting source
code (everybody does it this way) - but many policies currently cannot be expressed
in terms of a type system and hence need to be
implemented inside the library - open only files in directory X
- initiate connections only with IP addresses in
range - execute no more than N instructions between OS
calls - do not send on network after reading local
files - gt security automata
- need to represent these properties directly and
support them along the whole pipeline from code
producer to code consumer - gt some other PIs in Oasis are working on these
themes and their work can be directly beneficial
to this project
37Transition of Technology
- our prototype implementation(s) will be made
available in source form - the idea is to create a turnkey replacement to
current Java compilers and JVM runtime systems - you simply take your code and recompile using our
compiler - it will then run on our runtime
- our runtime will also run your old JVM class
files - you can even mix our stuff with JVM class files
- gt we simply provide a new (better!) mobile code
transportation layer without changing anything
else
38Thank You