New Approaches to Mobile Code: Reconciling Execution Efficiency with Provable Security - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

New Approaches to Mobile Code: Reconciling Execution Efficiency with Provable Security

Description:

mobile code is an enabling technology. download functionality as needed ... Microsoft: Marmot. Sun Microsystems: HotSpot Server ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 39
Provided by: michae242
Category:

less

Transcript and Presenter's Notes

Title: New Approaches to Mobile Code: Reconciling Execution Efficiency with Provable Security


1
New Approaches to Mobile CodeReconciling
Execution Efficiencywith Provable Security
UC Irvine project transprose transporting
programs securely
  • Michael Franz
  • University of California, Irvine
  • July 2001

2
Introduction
  • mobile code is an enabling technology
  • download functionality as needed
  • handheld, untethered devices, information
    appliances
  • platform-independent ? identical code can run on
    PDAs, desktop machines, even supercomputers
  • but, many unresolved issues with respect to
  • performance of the mobile program (on the target)
  • performance of the mobile code distribution
    mechanism
  • protecting the host against malicious mobile
    programs
  • guarding a mobile programs secrets against a
    malicious host

3
Guiding Overall Objective
  • make mobile code practical, so that
  • eventually, native code will need to exist only
    transiently, created on-the-fly and consumed on
    the spot
  • while mobile code will be used as the storage and
    distribution medium

4
Context
  • dynamic code-generation technology is approaching
    maturity and processors are becoming fast enough
    to sustain it (in real time)
  • this is rapidly diminishing the value of binary
    compatibility
  • moreover, dynamic optimization techniques yield
    better code than static compilation
  • exploit actual processor parameters (caches, )
  • live profiling data may be available
  • gt mobile code will define future platform(s)

5
Mobile Code Security
  • most approaches are based on some type-safe
    programming language
  • host systems publish their policies in terms of
    type-safe APIs
  • conformance to that interface is then guaranteed
    by the mobile code transportation scheme
  • semantically equivalent to transporting source
    code
  • however, for efficiency and quality of dynamic
    code generation, usually want to transport a
    format closer to the machine while still
    preserving source-program type-safety semantics

6
Existing Practice Java
  • the Java Virtual Machine is the de-facto standard
    format for distributing mobile programs
  • the JVM has an instruction set that has been
    designed specifically for representing Java
    programs
  • interestingly enough, there still are JVM
    programs for which no legal equivalent Java
    source program exists
  • there are also legal Java programs that are
    rejected by all possible JVM bytecode verifiers
    Staerk00
  • security is obtained by verifying the JVM
    bytecode, essentially a symbolic execution of the
    program

7
Security vs. Efficiency
  • the Java Virtual Machine's instruction format is
    not very capable in transporting the results of
    program analyses and optimizations
  • as a consequence, when Java byte-code is
    transmitted, each recipient must repeat most of
    the analyses and optimizations that could have
    been performed just once at the origin
  • the main reason why Java byte-code has these
    deficiencies is to allow verification by the
    recipient

8
Security vs. Efficiency
  • for example, a code producer often has
    information about the redundancy of a type or
    index check
  • but this fact cannot be communicated safely to
    the code consumer - not in a manner that the
    recipient can be sure that this is not a false
    claim inserted by a malicious third party
  • similar concerns inhibit common compiler
    optimizations such as common subexpression
    elimination at the code producers side

9
An Alternative Approach PCC
  • instead of executing the program symbolically at
    the receivers site (which is time consuming and
    complex), the code producer attaches a proof
    that the code is correct
  • the proof shortcuts the verification checking
    a given solution is often much simpler than
    finding it in the first place
  • the Java KVM for embedded devices uses a kind of
    PCC (stack maps) that may become a standard for
    Java

10
A Third Approach
  • instead of verifying or checking, we have been
    been investigating a class of mobile code
    representations that can provably encode only
    legal programs
  • security is obtained by construction
  • the need for verification disappears
  • our approach can provide the identical security
    guarantees as the Java Virtual Machine, but it
    can express most of them statically as a
    well-formedness property of the encoding itself
  • in our solution, an incoming mobile program may
    not do the intended task, but it will not do
    anything bad - for any definition of bad that
    can be cast into a type system
  • interestingly enough, such intrinsically secure
    mobile code is also denser than virtual machine
    code, and permits to generate better object code,
    and faster

11
A Third Approach Two Variants
  • we have in fact designed not just one, but two
    alternative mobile-code representations, both of
    which provide security by construction
  • they differ in the semantic level at which they
    describe the mobile program
  • high-level close to the source language but
    with supporting compiler-related information
  • low-level as close to what a modern code
    generator back-end needs without being
    target-machine specific

12
Rationale for Multi-Track Approach
  • the relative trade-offs (encoding density vs.
    decoding/dynamic compilation speed vs. code
    quality) are completely unknown and can only be
    determined by collecting experience with actual
    prototypes
  • by implementing both the high-level and the
    low-level solution, we are exploring the design
    space rather than designing an ad-hoc solution

13
Low-Level Encoding PLDI01
  • SafeTSA preserves control and dataflow
    information as well as full typing for each
    intermediate result
  • it is based on SSA form, a representation that is
    also used internally by a number of important
    state-of-the-art research compilers for Java,
    e.g.,
  • IBM T.J. Watson Lab Jalapeño
  • Microsoft Marmot
  • Sun Microsystems HotSpot Server
  • SafeTSA is far easier to parse into a form useful
    for code optimization than JVM-code

14
Current Status and Results
  • based on Martin Oderskys Pizza front-end
  • can compile all of Java to safeTSA
  • prototype run-time environment almost finished
    will provide full interoperability between
    safeTSA and JVM-based class files
  • can mix and match both formats with dynamic
    loading
  • call-backs from JVM to safeTSA are ugly
  • safeTSA representation is surprisingly small

15
High-Level Encoding Babel01
  • ultra-compact representation using grammar-based
    compression of abstract syntax trees
  • goal is to transport the source program along
    with as much compiler-related support information
    as possible

16
Schematic Overview
Source Parser
CodeGenerator
classic Frontend
AST Encoder
AST Decoder
classic Backend
PPM-Model Arithmetic Encoder
PPM-Model Arithmetic Decoder
011000101010
Compression / Decompression
17
Compression Overview
  • Parsing get AST from source
  • Serialize get stream of symbols from AST
  • Modeling use context and abstract grammar to
    build predictive statistical model
  • Coding use arithmetic coding with model

18
Types of nodes in AST
  • String, Integer, Terminal
  • List e.g. Block BlockStatement
  • Aggregate e.g. IF cond thenbranch elsebranch
  • Choice e.g. BinOp Plus Minus
  • Information is in choice nodes
  • want to guess which choice is taken

19
Transmitting an AST
  • any predefined serialization will do
  • we use depth first (pre-order)
  • when serialized, most info in AST is redundant,
    e.g.
  • order and kind of kids of aggregate nodes known
  • this is because we use knowledge of the grammar
  • must encode index of choice made at choice nodes

20
Prediction by Partial Match (PPM)
  • dynamically maintain counts of characters seen
    after various contexts
  • contexts may be of various lengths
  • eg. for abcd, contexts for d are
  • length 1 context c
  • length 2 context bc
  • length 3 context abc
  • predict characters in current context by looking
    at what occurred previously

21
Maintaining Contexts

a
b
c
d
22
Adapting PPM To Work On Trees
  • each node is a symbol
  • the context is path from root to the current
    node in the AST
  • problem in DFS, what when we reach leaf node and
    go back up to ancestor?
  • pop context all active nodes moved up one
    position to their parents (in context tree)

23
Encoding
  • PPM is used to model the choices made at choice
    nodes, i.e. associate a probability with every
    choice
  • these probabilities are used to drive an
    arithmetic coder to output bits

24
Compressing Constants
  • constants (strings, integers, names) are a
    significant fraction of source
  • to compress make table of constants, and refer
    to them by their index in this table
  • further compress maintain different tables for
    strings, names etc. reduces number of bits in
    index
  • currently exploring more sophisticated context
    modeling ideas for compressing constants

25
AST Compression Example
AST for i i 1
Relevant grammar rules Stmt IfWhileAssign. A
ssign Lvalue Expr Lvalue FieldVarAccess Exp
r UnaryBinary Binary BinOp Expr Expr
Choice nodes
Preorder traversal Stmt Assign Lvalue VarAccess i
Expr Binary Expr VarAccess i Expr
Literal IntLiteral 1 BinOp
26
AST Compression Example
Context tree
AST for i i 1
27
AST Compression Example
AST for i i 1
Context tree
28
AST Compression Example
AST for i i 1
Context tree
Model Prob(j) 0.3 Prob(k) 0.5 Prob(i) 0.2
Send model and choice i to arithmetic coder
29
Status and Results
  • compressor/decompressor prototype written in
    Python
  • completely generic can be used with any
    abstract grammar
  • have implemented the Java abstract grammar
  • works with single Java source files as well as
    entire packages.
  • comparison for Java class-file compression with
    Pughs results (best published Java compressor)

30
Results Classes
Classes from Suns javac package - all sizes in
bytes
31
Results Archives
compressed collections of classes - all sizes in
bytes
  • compressed ASTs are 5-50 smaller than Pughs
  • 3-8 times smaller than uncompressed class files
    or JAR files

32
Performance-Enhancing Information
  • now raise the semantic level of the grammar
  • e.g. Escape Analysis
  • an object that doesnt escape its defining
    scope can be allocated on the stack rather than
    on the heap
  • this optimization alone can often double
    performance
  • the analysis itself is very difficult to do, but
    the results of the analysis are easy to verify
  • augment the type system by escaping/non-escaping
  • make this part of the encoding scheme itself
  • e.g., gt a non-escaping object cannot be assigned
    to a variable from an enclosing scope

33
Insights So Far
  • abstract syntax trees viable as a mobile code
    format
  • can be highly compressed
  • Java archives by factor of 3-8
  • 5-50 better than Java bytecode specific
    compression by Pugh

34
Overall Project Achievements
  • lead the way to a genuine improvement over
    virtual machine transportation formats
  • security without need for validation
  • tamper-proof performance-improving information
  • innovative and generic program compression method
    as a useful by-product of this effort

35
Task Schedule
  • Y1 Milestones
  • source-level representation gt Java compression
  • low-level representation
  • core calculus representation
  • Y2 Milestones
  • system prototypes
  • trade-off analysis
  • encoding format comprehensive definition
  • End of Project
  • system deliverable
  • comprehensive documentation

1999
2000
2001
2002
  • investigate
  • multiple source languages
  • graph-based encoding schemes
  • proof-carrying code
  • investigate
  • requirements ofoptimizing code generators
  • integration of security vs. compiler-related data
  • investigate
  • mutual interaction of security, efficiency, and
    compression density
  • security of system

36
Mobile Code Security Revisited
  • provided through type-safe programming language
    and type-safe APIs
  • semantically equivalent to transporting source
    code (everybody does it this way)
  • but many policies currently cannot be expressed
    in terms of a type system and hence need to be
    implemented inside the library
  • open only files in directory X
  • initiate connections only with IP addresses in
    range
  • execute no more than N instructions between OS
    calls
  • do not send on network after reading local
    files
  • gt security automata
  • need to represent these properties directly and
    support them along the whole pipeline from code
    producer to code consumer
  • gt some other PIs in Oasis are working on these
    themes and their work can be directly beneficial
    to this project

37
Transition of Technology
  • our prototype implementation(s) will be made
    available in source form
  • the idea is to create a turnkey replacement to
    current Java compilers and JVM runtime systems
  • you simply take your code and recompile using our
    compiler
  • it will then run on our runtime
  • our runtime will also run your old JVM class
    files
  • you can even mix our stuff with JVM class files
  • gt we simply provide a new (better!) mobile code
    transportation layer without changing anything
    else

38
Thank You
Write a Comment
User Comments (0)
About PowerShow.com