Implementing Oblivious Hashing Using Overlapped Instruction Encodings - PowerPoint PPT Presentation

1 / 22

About This Presentation

Title:

Implementing Oblivious Hashing Using Overlapped Instruction Encodings

Description:

Execution and disassembly depend on entry point into code. ... Disassembly Synchronization ... Study of disassembly synchronization and other roadblocks ... – PowerPoint PPT presentation

Number of Views:43

Avg rating:3.0/5.0

Slides: 23

Provided by: ven85

Category:

more less

Transcript and Presenter's Notes

Title: Implementing Oblivious Hashing Using Overlapped Instruction Encodings

1
Implementing Oblivious HashingUsing Overlapped
Instruction Encodings
Mariusz H. Jakubowski Ramarathnam
Venkatesan Microsoft Research
Matthias Jacob Nokia Research

ACM Multimedia and Security 07
Dallas, TX (USA)
September 20-21, 2007

2
Introduction

Field of work Software protection
Obfuscation and tamper-resistance
Prevention (or delaying) of reverse engineering
and hacking
Securing of content-rights systems (DRM)
Background Two specific protection techniques
Oblivious hashing (OH) Computing hashes
(fingerprints) of execution traces
Overlapped code Jumping into the middle of
instructions to obfuscate and protect against
disassembly
Goals of our work
Apply overlapped code towards obfuscation and
tamper-resistance via OH.
Study new techniques in terms of formal models,
avoiding ad hoc approaches.

3
Overview
Oblivious hashing via overlapped code

Introduction
Background
Software protection
Oblivious hashing (OH)
Overlapped code
Code interleaving
Conclusion

4
Software Protection

Obfuscation
Making programs hard to understand
Tamper-resistance
Making programs hard to modify
Obfuscation ? tamper-resistance
Tamper-resistance ? obfuscation?

5
Formal Obfuscation

Impossible in general
Black-box model (Barak et al.)
Source code doesnt help adversary who can
examine input-output behavior.
Worst-case programs and poly-time attackers
Possible in specific limited scenarios
Secret hiding by hashing (Lynn et al.)
Point functions (Wee, Kalai et al.)
Results difficult to use in practice.

6
Tamper-Resistance

Many techniques used in practice e.g.
Code-integrity checksums (e.g., Atallah et al.s
software guards)
Anti-debugging and anti-disassembly methods
Virtual machines and interpreters
Polymorphic and metamorphic code
Never-ending battle on a very active field
Targets DRM, CD/DVD protection, games, dongles,
licensing, etc.
Defenses Binary packers and cryptors, special
compilers, transformation tools, programming
strategies, etc.
Current techniques tend to be ad hoc
No provable security
No analysis of time required to crack protected
instances

7
Tamper-Resistance Model
Abstraction of software tamper-resistance (Dedic
et al., IH 07)

Program A graph G
Execution A random walk on G
Integrity checks
Probabilistic monitoring of a set of Gs nodes
Detection of failures that lead to delayed
responses
Security analysis Graph game on G between
attacker and defender
OH and overlapped code in context of model
Provide a source of integrity checks.
Help enforce local indistinguishability and
other engineering assumptions about
implementation.

8
Oblivious Hashing

Computation of hashes over program traces
Initialize hash values at specific points.
Update hashes upon assignments and branches.

INITIALIZE_HASH(hash1) int x
123 UPDATE_HASH(hash1, x) if (GetUserInput() gt
10) UPDATE_HASH(hash1, BRANCH_ID_1) x x
1 UPDATE_HASH(hash1, x) else
UPDATE_HASH(hash1, BRANCH_ID_2)
printf("Hello\n") VERIFY_HASH(hash1)
int x 123 if (GetUserInput() gt 10) x x
1 else printf("Hello\n")
Hash transform
Original code
Hashed code
9
Overlapped Code

Code sharing among different paths
Semantic Sharing of code blocks among execution
paths.
Physical Sharing of code bytes among machine or
byte-code instructions.
Purposes
Anti-disassembly and anti-decompilation
Obfuscation
Tamper-resistance from code sharing and explicit
OH

10
Semantic Overlap
increase_win() increase_ctr(win)
return win
increase_loss() increase_ctr(loss)
return loss

Code section is shared
along different paths

increase_ctr(ctr) (ctr)
Automated via code outlining
return win
return loss
11
Physical Overlap

Execution and disassembly depend on entry point
into code. Sample x86 code B8 B8 04 05 2D 05 90
Offset 0B8 B8 04 05 2D mov eax, 2D0504B805
90 sub eax, 90 Offset 1B8 04 05 2D 05 mov
eax, 52D050490 nop Offset 204 05 add al,
52D 05 90 sub eax, 9005
Offset 305 2D 05 90 add eax, 90052D Offset
4 2D 05 90 sub eax, 9005 Offset 5 05
90 sub eax, 90
Note Disassembly tends to resynchronize
naturally but we can prevent this.
12
Disassembly Synchronization

Often observed in practice, but previously not
explained mathematically.
Limits effectiveness of code overlapping for
security.
Requires explicit anti-synchronization measures
to enforce protection.
Rigorous explanation Kruskal count

Example of corruption and synchronization
00411410 55 push ebp
00411411 8B EC mov ebp,esp
00411413 81 EC C0 00 00 00 sub esp,0C0h
00411419 53 push ebx
0041141A 56 push esi
0041141B 57 push edi
0041141C 8D BD 40 FF FF FF lea
edi,ebp-0C0h
00411410 55 push ebp
00411411 8B EC mov ebp,esp
00411413 12 EC adc ch,ah
00411415 C0 00 00 rol byte ptr
eax,0 00411418 00 53 56 add
byte ptr ebx56h,dl 0041141B 57
push edi 0041141C 8D BD 40 FF FF FF lea
edi,ebp-0C0h
Corrupted byte
Synchronization point
13
Disassembly Synchronization

Disassembly A leapfrog process over code bytes
Each byte address contains an instruction of a
definite length.
After disassembling an instruction, a
disassembler skips to the next instruction.
Example Sequence of instruction lengths at
consecutive offsets 3 4 6 2 6 3 4 5 3 3 5 4 2 7
3 1 4

Synchronization point
3 4 6 2 6 3 4 5 3 3 5 4 2 7 3 1 4 3 2 3
3 4 1 4 4 3
3 4 1 4 6 3
4 1 4 2 3 3
4 1 4 6 5
1 4
0 1 2 3 4
Sequence of instruction lengths
Disassembly at offset
Kruskal count Such disassembly synchronizes in
about B2/16 steps, where B average of bytes
per instruction.
14
Disassembly Synchronization
Model of the disassembly process

Let InstructionLength(address) length of
instruction found at address.
Starting at slightly different addresses x and
y, a disassembler iterates
x ? x InstructionLength(x)
(leapfrog x)
y ? y InstructionLength(y)
(leapfrog y)
Our goal Compute N approximate number of steps
before any intermediate x is equal to any
intermediate y.
Treat all possible values of x-y as states of a
Markov chain.
N is the coupling time of this Markov chain.
Kruskal count N is about B2/16, where B is the
average instruction length.

15
Code Interleaving

A method to overlap arbitrary code blocks
Explicitly prevents disassembly resynchronization
Adds tamper-resistance
Hash of instruction bytes only (like traditional
code checksums)
Hash of instruction bytes and program state (like
oblivious hashing)
Basic algorithm
Code interspersing Create a block of interleaved
instructions from two code blocks.
Code merging Inject hashing instructions
overlapped with existing instructions.

16
Code Interleaving Basic Idea
SEQ1 INST_1 INST_2
SEQ2 INST_A INST_B
Two input code blocks
17
Code Interleaving Basic Idea
SEQ1 INST_1 INST_2
SEQ1 INST_1 JMP L2 SEQ2 INST_A
JMP LB L2 INST_2 JMP L3 LB
INST_B L3
SEQ2 INST_A INST_B
Two input code blocks
After code interspersing

Code interspersing Interleave instructions,
injecting jumps as needed to maintain control
flow.

18
Code Interleaving Basic Idea
SEQ1 INST_1 HASH_1 INST_2
HASH_2
SEQ1 INST_1 INST_2
SEQ1 INST_1 JMP L2 SEQ2 INST_A
JMP LB L2 INST_2 JMP L3 LB
INST_B L3
Disassembly at SEQ1
SEQ2 INST_A HASH_A INST_B
SEQ2 INST_A INST_B
Disassembly at SEQ2
Two input code blocks
After code interspersing
After code merging

Code interspersing Interleave instructions,
injecting jumps as needed to maintain control
flow.
Code merging Replace jumps with hash
instructions, maintaining control flow.
E.g. JMP L2 INST_A JMP_LB transforms into
HASH_1
HASH_1 contains INST_A and part of HASH_A.
Suitable hash instructions must be found (and fit
together like puzzle pieces).
Various possibilities identified on x86.
Can also design custom byte-codes to maximize
utility of overlapping.

19
Code Interleaving Example
SEQ1 C1 E0 02 shl eax, 2 EB 03
jmp I11 SEQ2 48 dec eax EB 04
jmp I21 I11 90 nop 40 inc
eax EB 03 jmp O I21 C1 E8 03 shr eax,
3 O 90 nop C3 ret
SEQ1 C1 E0 02 shl eax, 2 I11 40
inc eax C3 ret
SEQ2 48 dec eax I21 C1 E8 03 shr
eax, 3 C3 ret
Two input code blocks (x86)
After code interspersing
SEQ1 C1 E0 02 shl eax, 2 81
F1 48 81 E9 90 xor ecx, 90E98148 I11 40
inc eax 81 C1 C1 E8 03 90 add ecx,
9003E8C1 O C3 ret
SEQ2 48 dec eax 81 E9
90 40 81 C1 sub ecx, C1814090 I21 C1 E8 03
shr eax, 3 O 90 nop
C3 ret
Disassembly at SEQ2
Disassembly at SEQ1
After code merging (OH instructions in red)
20
Code Interleaving

Observations
Tamper-resistance comes from two main sources
Implicit Shared instruction bytes
Explicit OH instructions
Disassembly synchronization is explicitly
prevented.
Method enables code-byte hashes even on
architectures that do not allow explicit access
to code bytes.
Extensions
Iteration to build up complexity
Enhances security at little or no implementation
cost.
Complex (emergent) code patterns and behaviors
can arise.
Implementation over custom byte codes designed to
maximize utility of overlapping (unlike x86)