New Approaches to Mobile Code: Reconciling Execution Efficiency with Provable Security - PowerPoint PPT Presentation

1 / 38

About This Presentation

Title:

New Approaches to Mobile Code: Reconciling Execution Efficiency with Provable Security

Description:

mobile code is an enabling technology. download functionality as needed ... Microsoft: Marmot. Sun Microsystems: HotSpot Server ... – PowerPoint PPT presentation

Number of Views:56

Avg rating:3.0/5.0

Slides: 39

Provided by: michae242

Category:

more less

Transcript and Presenter's Notes

Title: New Approaches to Mobile Code: Reconciling Execution Efficiency with Provable Security

1
New Approaches to Mobile CodeReconciling
Execution Efficiencywith Provable Security
UC Irvine project transprose transporting
programs securely

Michael Franz
University of California, Irvine
July 2001

2
Introduction

mobile code is an enabling technology
download functionality as needed
handheld, untethered devices, information
appliances
platform-independent ? identical code can run on
PDAs, desktop machines, even supercomputers
but, many unresolved issues with respect to
performance of the mobile program (on the target)
performance of the mobile code distribution
mechanism
protecting the host against malicious mobile
programs
guarding a mobile programs secrets against a
malicious host

3
Guiding Overall Objective

make mobile code practical, so that
eventually, native code will need to exist only
transiently, created on-the-fly and consumed on
the spot
while mobile code will be used as the storage and
distribution medium

4
Context

dynamic code-generation technology is approaching
maturity and processors are becoming fast enough
to sustain it (in real time)
this is rapidly diminishing the value of binary
compatibility
moreover, dynamic optimization techniques yield
better code than static compilation
exploit actual processor parameters (caches, )
live profiling data may be available
gt mobile code will define future platform(s)

5
Mobile Code Security

most approaches are based on some type-safe
programming language
host systems publish their policies in terms of
type-safe APIs
conformance to that interface is then guaranteed
by the mobile code transportation scheme
semantically equivalent to transporting source
code
however, for efficiency and quality of dynamic
code generation, usually want to transport a
format closer to the machine while still
preserving source-program type-safety semantics

6
Existing Practice Java

the Java Virtual Machine is the de-facto standard
format for distributing mobile programs
the JVM has an instruction set that has been
designed specifically for representing Java
programs
interestingly enough, there still are JVM
programs for which no legal equivalent Java
source program exists
there are also legal Java programs that are
rejected by all possible JVM bytecode verifiers
Staerk00
security is obtained by verifying the JVM
bytecode, essentially a symbolic execution of the
program

7
Security vs. Efficiency

the Java Virtual Machine's instruction format is
not very capable in transporting the results of
program analyses and optimizations
as a consequence, when Java byte-code is
transmitted, each recipient must repeat most of
the analyses and optimizations that could have
been performed just once at the origin
the main reason why Java byte-code has these
deficiencies is to allow verification by the
recipient

8
Security vs. Efficiency

for example, a code producer often has
information about the redundancy of a type or
index check
but this fact cannot be communicated safely to
the code consumer - not in a manner that the
recipient can be sure that this is not a false
claim inserted by a malicious third party
similar concerns inhibit common compiler
optimizations such as common subexpression
elimination at the code producers side

9
An Alternative Approach PCC

instead of executing the program symbolically at
the receivers site (which is time consuming and
complex), the code producer attaches a proof
that the code is correct
the proof shortcuts the verification checking
a given solution is often much simpler than
finding it in the first place
the Java KVM for embedded devices uses a kind of
PCC (stack maps) that may become a standard for
Java

10
A Third Approach

instead of verifying or checking, we have been
been investigating a class of mobile code
representations that can provably encode only
legal programs
security is obtained by construction
the need for verification disappears
our approach can provide the identical security
guarantees as the Java Virtual Machine, but it
can express most of them statically as a
well-formedness property of the encoding itself
in our solution, an incoming mobile program may
not do the intended task, but it will not do
anything bad - for any definition of bad that
can be cast into a type system
interestingly enough, such intrinsically secure
mobile code is also denser than virtual machine
code, and permits to generate better object code,
and faster

11
A Third Approach Two Variants

we have in fact designed not just one, but two
alternative mobile-code representations, both of
which provide security by construction
they differ in the semantic level at which they
describe the mobile program
high-level close to the source language but
with supporting compiler-related information
low-level as close to what a modern code
generator back-end needs without being
target-machine specific

12
Rationale for Multi-Track Approach

the relative trade-offs (encoding density vs.
decoding/dynamic compilation speed vs. code
quality) are completely unknown and can only be
determined by collecting experience with actual
prototypes
by implementing both the high-level and the
low-level solution, we are exploring the design
space rather than designing an ad-hoc solution

13
Low-Level Encoding PLDI01

SafeTSA preserves control and dataflow
information as well as full typing for each
intermediate result
it is based on SSA form, a representation that is
also used internally by a number of important
state-of-the-art research compilers for Java,
e.g.,
IBM T.J. Watson Lab Jalapeño
Microsoft Marmot
Sun Microsystems HotSpot Server
SafeTSA is far easier to parse into a form useful
for code optimization than JVM-code

14
Current Status and Results

based on Martin Oderskys Pizza front-end
can compile all of Java to safeTSA
prototype run-time environment almost finished
will provide full interoperability between
safeTSA and JVM-based class files
can mix and match both formats with dynamic
loading
call-backs from JVM to safeTSA are ugly
safeTSA representation is surprisingly small

15
High-Level Encoding Babel01

ultra-compact representation using grammar-based
compression of abstract syntax trees
goal is to transport the source program along
with as much compiler-related support information
as possible

16
Schematic Overview
Source Parser
CodeGenerator
classic Frontend
AST Encoder
AST Decoder
classic Backend
PPM-Model Arithmetic Encoder
PPM-Model Arithmetic Decoder
011000101010
Compression / Decompression
17
Compression Overview

Parsing get AST from source
Serialize get stream of symbols from AST
Modeling use context and abstract grammar to
build predictive statistical model
Coding use arithmetic coding with model

18
Types of nodes in AST

String, Integer, Terminal
List e.g. Block BlockStatement
Aggregate e.g. IF cond thenbranch elsebranch
Choice e.g. BinOp Plus Minus
Information is in choice nodes
want to guess which choice is taken

19
Transmitting an AST

any predefined serialization will do
we use depth first (pre-order)
when serialized, most info in AST is redundant,
e.g.
order and kind of kids of aggregate nodes known
this is because we use knowledge of the grammar
must encode index of choice made at choice nodes

20
Prediction by Partial Match (PPM)

dynamically maintain counts of characters seen
after various contexts
contexts may be of various lengths
eg. for abcd, contexts for d are
length 1 context c
length 2 context bc
length 3 context abc
predict characters in current context by looking
at what occurred previously

21
Maintaining Contexts

a
b
c
d
22
Adapting PPM To Work On Trees

each node is a symbol
the context is path from root to the current
node in the AST
problem in DFS, what when we reach leaf node and
go back up to ancestor?
pop context all active nodes moved up one
position to their parents (in context tree)

23
Encoding

PPM is used to model the choices made at choice
nodes, i.e. associate a probability with every
choice
these probabilities are used to drive an
arithmetic coder to output bits

24
Compressing Constants

constants (strings, integers, names) are a
significant fraction of source
to compress make table of constants, and refer
to them by their index in this table
further compress maintain different tables for
strings, names etc. reduces number of bits in
index
currently exploring more sophisticated context
modeling ideas for compressing constants

25
AST Compression Example
AST for i i 1
Relevant grammar rules Stmt IfWhileAssign. A
ssign Lvalue Expr Lvalue FieldVarAccess Exp
r UnaryBinary Binary BinOp Expr Expr
Choice nodes
Preorder traversal Stmt Assign Lvalue VarAccess i
Expr Binary Expr VarAccess i Expr
Literal IntLiteral 1 BinOp
26
AST Compression Example
Context tree
AST for i i 1
27
AST Compression Example
AST for i i 1
Context tree
28
AST Compression Example
AST for i i 1
Context tree
Model Prob(j) 0.3 Prob(k) 0.5 Prob(i) 0.2
Send model and choice i to arithmetic coder
29
Status and Results

compressor/decompressor prototype written in
Python
completely generic can be used with any
abstract grammar
have implemented the Java abstract grammar
works with single Java source files as well as
entire packages.
comparison for Java class-file compression with
Pughs results (best published Java compressor)

30
Results Classes
Classes from Suns javac package - all sizes in
bytes
31
Results Archives
compressed collections of classes - all sizes in
bytes

compressed ASTs are 5-50 smaller than Pughs
3-8 times smaller than uncompressed class files
or JAR files

32
Performance-Enhancing Information

now raise the semantic level of the grammar
e.g. Escape Analysis
an object that doesnt escape its defining
scope can be allocated on the stack rather than
on the heap
this optimization alone can often double
performance
the analysis itself is very difficult to do, but
the results of the analysis are easy to verify
augment the type system by escaping/non-escaping
make this part of the encoding scheme itself
e.g., gt a non-escaping object cannot be assigned
to a variable from an enclosing scope

33
Insights So Far

abstract syntax trees viable as a mobile code
format
can be highly compressed
Java archives by factor of 3-8
5-50 better than Java bytecode specific
compression by Pugh

34
Overall Project Achievements

lead the way to a genuine improvement over
virtual machine transportation formats
security without need for validation
tamper-proof performance-improving information
innovative and generic program compression method
as a useful by-product of this effort

35
Task Schedule

Y1 Milestones
source-level representation gt Java compression
low-level representation
core calculus representation

Y2 Milestones
system prototypes
trade-off analysis
encoding format comprehensive definition

End of Project
system deliverable
comprehensive documentation

1999
2000
2001
2002

investigate
multiple source languages
graph-based encoding schemes
proof-carrying code

investigate
requirements ofoptimizing code generators
integration of security vs. compiler-related data

investigate
mutual interaction of security, efficiency, and
compression density
security of system

36
Mobile Code Security Revisited

provided through type-safe programming language
and type-safe APIs
semantically equivalent to transporting source
code (everybody does it this way)
but many policies currently cannot be expressed
in terms of a type system and hence need to be
implemented inside the library
open only files in directory X
initiate connections only with IP addresses in
range
execute no more than N instructions between OS
calls
do not send on network after reading local
files
gt security automata
need to represent these properties directly and
support them along the whole pipeline from code
producer to code consumer
gt some other PIs in Oasis are working on these
themes and their work can be directly beneficial
to this project

37
Transition of Technology

our prototype implementation(s) will be made
available in source form
the idea is to create a turnkey replacement to
current Java compilers and JVM runtime systems
you simply take your code and recompile using our
compiler
it will then run on our runtime
our runtime will also run your old JVM class
files
you can even mix our stuff with JVM class files
gt we simply provide a new (better!) mobile code
transportation layer without changing anything
else

38
Thank You

Write a Comment

User Comments (0)