Implementing Oblivious Hashing Using Overlapped Instruction Encodings - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Implementing Oblivious Hashing Using Overlapped Instruction Encodings

Description:

Execution and disassembly depend on entry point into code. ... Disassembly Synchronization ... Study of disassembly synchronization and other roadblocks ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 23
Provided by: ven85
Category:

less

Transcript and Presenter's Notes

Title: Implementing Oblivious Hashing Using Overlapped Instruction Encodings


1
Implementing Oblivious HashingUsing Overlapped
Instruction Encodings
Mariusz H. Jakubowski Ramarathnam
Venkatesan Microsoft Research
Matthias Jacob Nokia Research
  • ACM Multimedia and Security 07
  • Dallas, TX (USA)
  • September 20-21, 2007

2
Introduction
  • Field of work Software protection
  • Obfuscation and tamper-resistance
  • Prevention (or delaying) of reverse engineering
    and hacking
  • Securing of content-rights systems (DRM)
  • Background Two specific protection techniques
  • Oblivious hashing (OH) Computing hashes
    (fingerprints) of execution traces
  • Overlapped code Jumping into the middle of
    instructions to obfuscate and protect against
    disassembly
  • Goals of our work
  • Apply overlapped code towards obfuscation and
    tamper-resistance via OH.
  • Study new techniques in terms of formal models,
    avoiding ad hoc approaches.

3
Overview
Oblivious hashing via overlapped code
  • Introduction
  • Background
  • Software protection
  • Oblivious hashing (OH)
  • Overlapped code
  • Code interleaving
  • Conclusion

4
Software Protection
  • Obfuscation
  • Making programs hard to understand
  • Tamper-resistance
  • Making programs hard to modify
  • Obfuscation ? tamper-resistance
  • Tamper-resistance ? obfuscation?

5
Formal Obfuscation
  • Impossible in general
  • Black-box model (Barak et al.)
  • Source code doesnt help adversary who can
    examine input-output behavior.
  • Worst-case programs and poly-time attackers
  • Possible in specific limited scenarios
  • Secret hiding by hashing (Lynn et al.)
  • Point functions (Wee, Kalai et al.)
  • Results difficult to use in practice.

6
Tamper-Resistance
  • Many techniques used in practice e.g.
  • Code-integrity checksums (e.g., Atallah et al.s
    software guards)
  • Anti-debugging and anti-disassembly methods
  • Virtual machines and interpreters
  • Polymorphic and metamorphic code
  • Never-ending battle on a very active field
  • Targets DRM, CD/DVD protection, games, dongles,
    licensing, etc.
  • Defenses Binary packers and cryptors, special
    compilers, transformation tools, programming
    strategies, etc.
  • Current techniques tend to be ad hoc
  • No provable security
  • No analysis of time required to crack protected
    instances

7
Tamper-Resistance Model
Abstraction of software tamper-resistance (Dedic
et al., IH 07)
  • Program A graph G
  • Execution A random walk on G
  • Integrity checks
  • Probabilistic monitoring of a set of Gs nodes
  • Detection of failures that lead to delayed
    responses
  • Security analysis Graph game on G between
    attacker and defender
  • OH and overlapped code in context of model
  • Provide a source of integrity checks.
  • Help enforce local indistinguishability and
    other engineering assumptions about
    implementation.

8
Oblivious Hashing
  • Computation of hashes over program traces
  • Initialize hash values at specific points.
  • Update hashes upon assignments and branches.

INITIALIZE_HASH(hash1) int x
123 UPDATE_HASH(hash1, x) if (GetUserInput() gt
10) UPDATE_HASH(hash1, BRANCH_ID_1) x x
1 UPDATE_HASH(hash1, x) else
UPDATE_HASH(hash1, BRANCH_ID_2)
printf("Hello\n") VERIFY_HASH(hash1)
int x 123 if (GetUserInput() gt 10) x x
1 else printf("Hello\n")
Hash transform
Original code
Hashed code
9
Overlapped Code
  • Code sharing among different paths
  • Semantic Sharing of code blocks among execution
    paths.
  • Physical Sharing of code bytes among machine or
    byte-code instructions.
  • Purposes
  • Anti-disassembly and anti-decompilation
  • Obfuscation
  • Tamper-resistance from code sharing and explicit
    OH

10
Semantic Overlap
increase_win() increase_ctr(win)
return win
increase_loss() increase_ctr(loss)
return loss
  • Code section is shared
  • along different paths

increase_ctr(ctr) (ctr)
Automated via code outlining
return win
return loss
11
Physical Overlap

Execution and disassembly depend on entry point
into code. Sample x86 code B8 B8 04 05 2D 05 90
Offset 0B8 B8 04 05 2D mov eax, 2D0504B805
90 sub eax, 90 Offset 1B8 04 05 2D 05 mov
eax, 52D050490 nop Offset 204 05 add al,
52D 05 90 sub eax, 9005
Offset 305 2D 05 90 add eax, 90052D Offset
4 2D 05 90 sub eax, 9005 Offset 5 05
90 sub eax, 90
Note Disassembly tends to resynchronize
naturally but we can prevent this.
12
Disassembly Synchronization
  • Often observed in practice, but previously not
    explained mathematically.
  • Limits effectiveness of code overlapping for
    security.
  • Requires explicit anti-synchronization measures
    to enforce protection.
  • Rigorous explanation Kruskal count

Example of corruption and synchronization
00411410 55 push ebp
00411411 8B EC mov ebp,esp
00411413 81 EC C0 00 00 00 sub esp,0C0h
00411419 53 push ebx
0041141A 56 push esi
0041141B 57 push edi
0041141C 8D BD 40 FF FF FF lea
edi,ebp-0C0h
00411410 55 push ebp
00411411 8B EC mov ebp,esp
00411413 12 EC adc ch,ah
00411415 C0 00 00 rol byte ptr
eax,0 00411418 00 53 56 add
byte ptr ebx56h,dl 0041141B 57
push edi 0041141C 8D BD 40 FF FF FF lea
edi,ebp-0C0h
Corrupted byte
Synchronization point
13
Disassembly Synchronization
  • Disassembly A leapfrog process over code bytes
  • Each byte address contains an instruction of a
    definite length.
  • After disassembling an instruction, a
    disassembler skips to the next instruction.
  • Example Sequence of instruction lengths at
    consecutive offsets 3 4 6 2 6 3 4 5 3 3 5 4 2 7
    3 1 4

Synchronization point
3 4 6 2 6 3 4 5 3 3 5 4 2 7 3 1 4 3 2 3
3 4 1 4 4 3
3 4 1 4 6 3
4 1 4 2 3 3
4 1 4 6 5
1 4
0 1 2 3 4
Sequence of instruction lengths
Disassembly at offset
Kruskal count Such disassembly synchronizes in
about B2/16 steps, where B average of bytes
per instruction.
14
Disassembly Synchronization
Model of the disassembly process
  • Let InstructionLength(address) length of
    instruction found at address.
  • Starting at slightly different addresses x and
    y, a disassembler iterates
  • x ? x InstructionLength(x)
    (leapfrog x)
  • y ? y InstructionLength(y)
    (leapfrog y)
  • Our goal Compute N approximate number of steps
    before any intermediate x is equal to any
    intermediate y.
  • Treat all possible values of x-y as states of a
    Markov chain.
  • N is the coupling time of this Markov chain.
  • Kruskal count N is about B2/16, where B is the
    average instruction length.

15
Code Interleaving
  • A method to overlap arbitrary code blocks
  • Explicitly prevents disassembly resynchronization
  • Adds tamper-resistance
  • Hash of instruction bytes only (like traditional
    code checksums)
  • Hash of instruction bytes and program state (like
    oblivious hashing)
  • Basic algorithm
  • Code interspersing Create a block of interleaved
    instructions from two code blocks.
  • Code merging Inject hashing instructions
    overlapped with existing instructions.

16
Code Interleaving Basic Idea
SEQ1 INST_1 INST_2
SEQ2 INST_A INST_B
Two input code blocks
17
Code Interleaving Basic Idea
SEQ1 INST_1 INST_2
SEQ1 INST_1 JMP L2 SEQ2 INST_A
JMP LB L2 INST_2 JMP L3 LB
INST_B L3
SEQ2 INST_A INST_B
Two input code blocks
After code interspersing
  • Code interspersing Interleave instructions,
    injecting jumps as needed to maintain control
    flow.

18
Code Interleaving Basic Idea
SEQ1 INST_1 HASH_1 INST_2
HASH_2
SEQ1 INST_1 INST_2
SEQ1 INST_1 JMP L2 SEQ2 INST_A
JMP LB L2 INST_2 JMP L3 LB
INST_B L3
Disassembly at SEQ1
SEQ2 INST_A HASH_A INST_B
SEQ2 INST_A INST_B
Disassembly at SEQ2
Two input code blocks
After code interspersing
After code merging
  • Code interspersing Interleave instructions,
    injecting jumps as needed to maintain control
    flow.
  • Code merging Replace jumps with hash
    instructions, maintaining control flow.
  • E.g. JMP L2 INST_A JMP_LB transforms into
    HASH_1
  • HASH_1 contains INST_A and part of HASH_A.
  • Suitable hash instructions must be found (and fit
    together like puzzle pieces).
  • Various possibilities identified on x86.
  • Can also design custom byte-codes to maximize
    utility of overlapping.

19
Code Interleaving Example
SEQ1 C1 E0 02 shl eax, 2 EB 03
jmp I11 SEQ2 48 dec eax EB 04
jmp I21 I11 90 nop 40 inc
eax EB 03 jmp O I21 C1 E8 03 shr eax,
3 O 90 nop C3 ret
SEQ1 C1 E0 02 shl eax, 2 I11 40
inc eax C3 ret
SEQ2 48 dec eax I21 C1 E8 03 shr
eax, 3 C3 ret
Two input code blocks (x86)
After code interspersing
SEQ1 C1 E0 02 shl eax, 2 81
F1 48 81 E9 90 xor ecx, 90E98148 I11 40
inc eax 81 C1 C1 E8 03 90 add ecx,
9003E8C1 O C3 ret
SEQ2 48 dec eax 81 E9
90 40 81 C1 sub ecx, C1814090 I21 C1 E8 03
shr eax, 3 O 90 nop
C3 ret
Disassembly at SEQ2
Disassembly at SEQ1
After code merging (OH instructions in red)
20
Code Interleaving
  • Observations
  • Tamper-resistance comes from two main sources
  • Implicit Shared instruction bytes
  • Explicit OH instructions
  • Disassembly synchronization is explicitly
    prevented.
  • Method enables code-byte hashes even on
    architectures that do not allow explicit access
    to code bytes.
  • Extensions
  • Iteration to build up complexity
  • Enhances security at little or no implementation
    cost.
  • Complex (emergent) code patterns and behaviors
    can arise.
  • Implementation over custom byte codes designed to
    maximize utility of overlapping (unlike x86)

21
Experimental Results
Performance impact on SpecINT benchmarks 0 no
overlapping, 1 full overlapping
  • Tool implementation using Vulcan
    (binary-rewriting framework)
  • Reasonable impact on performance, depending on
    desired security level
  • Remaining work on analyzing security in practice

22
Conclusion
  • Contributions
  • Investigation of overlapped code for software
    protection
  • Study of disassembly synchronization and other
    roadblocks
  • Design of code interleaving and outlining to
    address limitations
  • Integrity checking via oblivious hashing
  • Placement in context of security models, not ad
    hoc methods
  • Tool implementations to verify practical
    effectiveness
  • Code interleaving and outlining for x86 binaries
  • Iteration framework to enhance security
  • Future work
  • Security analysis in theory and practice
  • Other overlapped-code methods
  • Porting to custom byte-codes
Write a Comment
User Comments (0)
About PowerShow.com