Practical Data Confinement - PowerPoint PPT Presentation

1 / 60
About This Presentation
Title:

Practical Data Confinement

Description:

Andrey Ermolinskiy, Sachin Katti, Scott Shenker, Lisa Fowler, Murphy McCauley. Introduction. Controlling the flow of sensitive information is one of the central ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 61
Provided by: andreyerm
Category:

less

Transcript and Presenter's Notes

Title: Practical Data Confinement


1
Practical Data Confinement
  • Andrey Ermolinskiy, Sachin Katti, Scott Shenker,
    Lisa Fowler, Murphy McCauley

2
Introduction
  • Controlling the flow of sensitive information is
    one of the central challenges in managing an
    organization
  • Preventing exfiltration (theft) by malicious
    entities
  • Enforcing dissemination policies

3
Why is it so hard to secure sensitive data?
4
Why is it so hard to secure sensitive data?
  • Modern software is rife with security holes that
    can be exploited for exfiltration

5
Why is it so hard to secure sensitive data?
  • Modern software is rife with security holes that
    can be exploited for exfiltration
  • Users must be trusted to remember, understand,
    and obey dissemination restrictions
  • In practice, users are careless and often
    inadvertently allow data to leak
  • E-mail sensitive documents to the wrong parties
  • Transfer data to insecure machines and portable
    devices

6
Our Goal
  • Develop a practical data confinement solution

7
Our Goal
  • Develop a practical data confinement solution
  • Key requirement compatibility with existing
    infrastructure and patterns of use
  • Support current operating systems, applications,
    and means of communication
  • Office productivity apps word processing,
    spreadsheets,
  • Communication E-mail, IM, VoIP, FTP, DFS,
  • Avoid imposing restrictions on user behavior
  • Allow access to untrusted Internet sites
  • Permit users to download and install untrusted
    applications

8
Our Assumptions and Threat Model
  • Users
  • Benign, do not intentionally exfiltrate data
  • Make mistakes, inadvertently violate policies
  • Software platform (productivity applications and
    OS)
  • Non-malicious, does not exfiltrate data in
    pristine state
  • Vulnerable to attacks if exposed to external
    threats
  • Attackers
  • Malicious external entities seeking to exfiltrate
    sensitive data
  • Penetrate security barriers by exploiting
    vulnerabilities in the software platform

9
Central Design Decisions
  • Policy enforcement responsibilities
  • Cannot rely on human users
  • The system must track the flow of sensitive
    information, enforce restrictions when the data
    is externalized

10
Central Design Decisions
  • Policy enforcement responsibilities
  • Cannot rely on human users
  • The system must track the flow of sensitive
    information, enforce restrictions when the data
    is externalized
  • Granularity of information flow tracking (IFT)
  • Need fine-grained byte-level tracking and policy
    enforcement to prevent accidental partial
    exfiltrations

11
Central Design Decisions
  • Placement of functionality
  • PDC inserts a thin software layer (hypervisor)
    between the OS and hardware
  • The hypervisor implements byte-level IFT and
    policy enforcement
  • A hypervisor-level solution
  • Retains compatibility with existing OSes and
    applications
  • Has sufficient control over hardware

12
Central Design Decisions
  • Placement of functionality
  • PDC inserts a thin software layer (hypervisor)
    between the OS and hardware
  • The hypervisor implements byte-level IFT and
    policy enforcement
  • A hypervisor-level solution
  • Retains compatibility with existing OSes and
    applications
  • Has sufficient control over hardware
  • Resolving tension between safety and user freedom
  • Partition the application environment into two
    isolated components a Safe world and a Free
    world

13
Partitioning the User Environment
Safe Virtual Machine
Unsafe Virtual Machine
Access to sensitive data
Unrestricted communication and execution of
untrusted code
Hypervisor
IFT, policy enforcement
Hardware (CPU, Memory, Disk, NIC, USB, Printer, )
14
Partitioning the User Environment
Sensitive data
Non-sensitive data
Trusted code/data
Exposure to the threat of exfiltration
Untrusted (potentially malicious) code/data
15
PDC Use Cases
  • Logical air gaps for high-security environments
  • VM-level isolation obviates the need for multiple
    physical networks
  • Preventing information leakage via e-mail
  • Do not disseminate the attached document
  • Digital rights management
  • Keeping track of copies document self-destruct
  • Auto-redaction of sensitive content

16
Talk Outline
  • Introduction
  • Requirements and Assumptions
  • Use Cases
  • PDC Architecture
  • Prototype Implementation
  • Preliminary Performance Evaluation
  • Current Status and Future Work

17
PDC Architecture Hypervisor
  • PDC uses an augmented hypervisor to
  • Ensure isolation between safe and unsafe VMs
  • Tracks the propagation of sensitive data in the
    safe VM
  • Enforces security policy at exit points
  • Network I/O, removable storage, printer, etc.

18
PDC Architecture Tag Tracking in the Safe VM
  • PDC associates an opaque 32-bit sensitivity tag
    with each byte of virtual hardware state
  • User CPU registers accessible
  • Volatile memory
  • Files on disk

19
PDC Architecture Tag Tracking in the Safe VM
  • These tags are viewed as opaque identifiers
  • The semantics can be tailored to fit the specific
    needs of administrators/users
  • Tags can be used to specify
  • Security policies
  • Levels of security clearance
  • High-level data objects
  • High-level data types within an object

20
PDC Architecture Tag Tracking in the Safe VM
  • An augmented x86 emulator performs fine-grained
    instruction-level tag tracking (current
    implementation is based on QEMU)
  • PDC tracks explicit data flows (variable
    assignments, arithmetic operations)

eax
add eax, ebx
ebx
21
PDC Architecture Tag Tracking in the Safe VM
  • An augmented x86 emulator performs fine-grained
    instruction-level tag tracking (current
    implementation is based on QEMU)
  • PDC also tracks flows resulting from pointer
    dereferencing

mov eax, (ebx)
eax
Tag merge
ebx
Memory
22
Challenges
  • Tag storage overhead in memory and on disk
  • Naïve implementation would incur a 400 overhead
  • Computational overhead of online tag tracking
  • Tag explosion
  • Tag tracking across pointer exacerbates the
    problem
  • Tag erosion due to implicit flows
  • Bridging the semantic gap between application
    data units and low-level machine state
  • Impact of VM-level isolation on user experience

23
Talk Outline
  • Introduction
  • Requirements and Assumptions
  • Use Cases
  • PDC Architecture
  • Prototype Implementation
  • Storing sensitivity tags in memory and on disk
  • Fine-grained tag tracking in QEMU
  • On-demand emulation
  • Policy enforcement
  • Performance Evaluation
  • Current Status and Future Work

24
PDC Implementation The Big Picture
Dom 0
Safe VM
QEMU / tag tracker
App1
App2
Safe VM (emulated)
PageTag Descriptors
NIC
Network daemon
VFS
NFS Client
NFS Server
PDC-ext3
Xen-RPC
Xen-RPC
Event channel
Shadow page tables
Safe VM page tables
Shared ring buffer
PageTag Mask
PDC-Xen (ring 0)
CR3
CPU
25
Storing Tags in Volatile Memory
  • PDC maintains a 64-bit PageTagSummary for each
    page of machine memory
  • Uses a 4-level tree data structure to keep
    PageNumber ?PageTagSummary
    mappings

0
9
19
29
31
PageNumber
Array of 64-bit PageTagSummary structures
26
Storing Tags in Volatile Memory
Page-wide tag for uniformly-tagged pages
PageTagSummary
Pointer to a PageTagDescriptor otherwise
  • PageTagDescriptor stores fine-grained
    (byte-level) tags within a page in one of two
    formats

Linear array of tags (indexed by page offset)
PageTagDescriptor
RLE encoding
27
Storing Tags on Disk
  • PDC-ext3 provides persistent storage for the safe
    VM
  • New i-node field for file-level tags
  • Leaf indirect blocks store pointers to
    BlockTagDescriptors
  • BlockTagDescriptor byte-level tags within a block

Data block
i-node
FileTag
Linear array
Leaf Ind. block
BlockTagDescriptor
Ind. block
RLE
28
Back to the Big Picture
Dom0
Safe VM
QEMU / tag tracker
App1
App2
Emul. CPU Context
Safe VM (emulated)
NIC
Network daemon
VFS
NFS Client
NFS Server
PDC-ext3
Xen-RPC
Xen-RPC
Event channel
Shadow page tables
Safe VM page tables
Shared ring buffer
PageTag Mask
PDC-Xen (ring 0)
CR3
CPU
29
Fine-Grained Tag Tracking
  • A modified version of QEMU emulates the safe VM
    and tracks movement of sensitive data
  • QEMU relies on runtime binary recompilation to
    achieve reasonably efficient emulation
  • We augment the QEMU compiler to generate a tag
    tracking instruction stream from the input stream
    of x86 instructions

Intermediate representation (TCG)
stage 2
Host machine code block (x86)
stage 1
Tag tracking code block
30
Fine-Grained Tag Tracking
  • Tag tracking instructions manipulate the tag
    status of emulated CPU registers and memory

Basic instruction format
Dest. Operand
Src. Operand
Action
Clear, Set, Merge
Reg, Mem
Reg, Mem
  • The tag tracking instruction stream executes
    asynchronously in a separate thread

31
Fine-Grained Tag Tracking
  • Problem some of the instruction arguments are
    not known at compile time
  • Example mov eax,(ebx)
  • Source memory address is not known
  • The main emulation thread writes the values of
    these arguments to a temporary log (a circular
    memory buffer) at runtime
  • The tag tracker fetches unknown values from this
    log

32
Binary Recompilation (Example)
Input x86 instructions
Intermediate representation
Tag tracking instructions
mov eax, 123
movi_i32 tmp0,123
Clear4 eax
st_i32 tmp0,env,0x0
push ebp
ld_i32 tmp0,env,0x14
Set4 mem,ebp,0
ld_i32 tmp2,env,0x10
Merge4 mem,esp,0
movi_i32 tmp14, 0xfffffffc
add_i32 tmp2,tmp2,tmp14
qemu_st_logaddr tmp0,tmp2
st_i32 tmp2,env,0x10
MachineAddr(esp)
Tag tracking argument log
33
Binary Recompilation
  • But things get more complex
  • Switching between operating modes
    (Protected/real/virtual8086, 16/32bit)

34
Binary Recompilation
  • But things get more complex
  • Switching between operating modes
    (Protected/real/virtual8086, 16/32bit)
  • Recovering from exceptions in the middle of a
    translation block

35
Binary Recompilation
  • But things get more complex
  • Switching between operating modes
    (Protected/real/virtual8086, 16/32bit)
  • Recovering from exceptions in the middle of a
    translation block
  • Multiple memory addressing modes

36
Binary Recompilation
  • But things get more complex
  • Switching between operating modes
    (Protected/real/virtual8086, 16/32bit)
  • Recovering from exceptions in the middle of a
    translation block
  • Multiple memory addressing modes
  • Repeating instructions
  • rep movs

37
Binary Recompilation
  • But things get more complex
  • Switching between operating modes
    (Protected/real/virtual8086, 16/32bit)
  • Recovering from exceptions in the middle of a
    translation block
  • Multiple memory addressing modes
  • Repeating instructions
  • rep movs
  • Complex instructions whose semantics are
    partially determined by the runtime state

saved SS
saved ESP
saved EFLAGS
saved CS
iret
saved EIP
38
Back to the Big Picture
Dom0
Safe VM
QEMU / tag tracker
App1
App2
Emul. CPU Context
Safe VM (emulated)
NIC
Network daemon
VFS
NFS Client
NFS Server
PDC-ext3
Xen-RPC
Xen-RPC
Event channel
Shadow page tables
Safe VM page tables
Shared ring buffer
PageTag Mask
PDC-Xen (ring 0)
CR3
CPU
39
On-Demand Emulation
  • During virtualized execution, PDC-Xen uses the
    paging hardware to intercept sensitive data access
  • Maintains shadow page tables, in which all memory
    pages containing tagged data are marked as not
    present

QEMU / tag tracker
PageTag Descriptors
PageTag Mask
  • Access to a tagged page from the safe VM causes a
    page fault and transfer of control to the
    hypervisor

Shadow page tables
Safe VM page tables
PDC-Xen (ring 0)
40
On-Demand Emulation
  • If the page fault is due to tagged data, PDC-Xen
    suspends the guest domain and transfers control
    to the emulator (QEMU)
  • QEMU initializes the emulated CPU context from
    the native processor context (saved upon entry to
    the page fault handler) and resumes the safe VM
    in emulated mode

Safe VM
Dom0
QEMU / tag tracker
Access to a tagged page
Safe VM memory mappings
Emul. SafeVM CPU

Page fault handler
SafeVM VCPU
Dom0 VCPU
Safe VM Memory
Dom0 Memory
41
On-Demand Emulation
  • Returning from emulated execution
  • QEMU terminates the main emulation loop, waits
    for the tag tracker to catch up
  • QEMU then makes a hypercall to PDC-Xen and
    provides
  • Up-to-date processor context for the safe VM VCPU
  • Up-to-date PageTagMask

42
On-Demand Emulation
  • Returning from emulated execution
  • QEMU terminates the main emulation loop, waits
    for the tag tracker to catch up
  • QEMU then makes a hypercall to PDC-Xen and
    provides
  • Up-to-date processor context for the safe VM VCPU
  • Up-to-date PageTagMask
  • The hypercall awakens the safe VM VCPU (blocked
    in the page fault handler)
  • The page fault handler
  • Overwrites the call stack with up-to-date values
    of CS/EIP, SS/ESP, EFLAGS
  • Restores other processor registers
  • Returns control to the safe VM

43
On-Demand Emulation - Challenges
44
On-Demand Emulation - Challenges
  • Updating PTEs in read-only page table mappings
  • Solution QEMU maintains local writable shadow
    copies, synchronizes them in background via
    hypercalls

45
On-Demand Emulation - Challenges
  • Updating PTEs in read-only page table mappings
  • Solution QEMU maintains local writable shadow
    copies, synchronizes them in background via
    hypercalls
  • Transferring control to the hypervisor during
    emulated execution (hypercall and fault handlers)
  • Emulating hypervisor-level code is not an option
  • Solution Transient switch to native execution
  • Resume native execution at the instruction that
    causes a jump to the hypervisor (e.g., int 0x82
    for hypercalls)

46
On-Demand Emulation - Challenges
  • Delivery of timer interrupts (events) in emulated
    mode
  • The hardware clock advances faster in the
    emulated context (i.e., each instruction consumes
    more clock cycles)
  • Xen needs to scale the delivery of timer events
    accordingly

47
On-Demand Emulation - Challenges
  • Delivery of timer interrupts (events) in emulated
    mode
  • The hardware clock advances faster in the
    emulated context (i.e., each instruction consumes
    more clock cycles)
  • Xen needs to scale the delivery of timer events
    accordingly
  • Use of the clock cycle counter (rdtsc
    instruction)
  • Linux timer interrupt/event handler uses the
    clock cycle counter to estimate timer jitter
  • After switching from emulated to native
    execution, the guest kernel observes a sudden
    jump forward in time

48
Policy Enforcement
  • The policy controller module
  • Resides in dom0 and interposes between the
    front-end and the back-end device driver
  • Fetches policies from a central policy server
  • Looks up the tags associated with the data in
    shared I/O request buffers and applies policies

Dom0
Safe VM
Netw. interface back-end
Netw. Interface front-end
Block storage back-end
Block storage front-end
Policy controller
49
Network Communication
  • PDC annotates outgoing packets with
    PacketTagDescriptors, carrying the sensitivity
    tags
  • Current implementation transfers annotated
    packets via a TCP/IP tunnel

Payload
TCPHdr
IPHdr
EthHdr
Annotation
TCP/IP encapsulation
Tags
Payload
TCPHdr
IPHdr
EthHdr
TCPHdr
IPHdr
EthHdr
50
Talk Outline
  • Introduction
  • Requirements and Assumptions
  • Use Cases
  • PDC Architecture
  • Prototype Implementation
  • Preliminary Performance Evaluation
  • Application-level performance overhead
  • Filesystem performance overhead
  • Network bandwidth overhead
  • Current Status and Future Work

51
Preliminary Performance Evaluation
  • Experimental setup
  • Quad-core AMD Phenom 9500, 2.33GHz, 3GB of RAM
  • 100Mbps Ethernet
  • PDC Hypervisor based on Xen v.3.3.0
  • Paravirtualized Linux kernel v.2.6.18-8
  • Tag tracker based on QEMU v.0.10.0

52
Application-Level Overhead
  • Goal estimate the overall performance penalty
    (as perceived by users) in realistic usage
    scenarios
  • First scenario recursive text search within a
    directory tree (grep)
  • Input dataset 1GB sample of the Enron corporate
    e-mail database (http//www.cs.cmu.edu/enron)
  • We mark a fraction (F) of the messages as
    sensitive, assigning them uniform sensitivity tag
  • We search the dataset for a single-word string
    and measure the overall running time

53
Application-Level Overhead
PDC-Xen, paravirt. Linux, tag tracking
Standard Xen, paravirt. Linux
Linux on bare metal
F ()
54
Filesystem Performance Overhead
  • Configurations
  • C1 Linux on bare metal standard ext3
  • C2 Xen, paravirt. Linux dom0 exposes a
    paravirt. block device Guest domain mounts it as
    ext3
  • C3 Xen, paravirt. Linux dom0 exposes ext3 to
    the guest domain via NFS/TCP
  • C4 Xen, paravirt. Linux dom0 exposes ext3 to
    the guest domain via NFS/Xen-RPC
  • C5 Xen, paravirt. Linux dom0 exposes PDC-ext3
    to the guest domain via NFS/Xen-RPC
  • First experiment sequential file write
    throughput
  • Create a file ? write 1GB of data sequentially ?
    close ? sync

55
Filesystem Performance Overhead
  • Configurations
  • C1 Linux on bare metal standard ext3
  • C2 Xen, paravirt. Linux dom0 exposes a
    paravirt. block device Guest domain mounts it as
    ext3
  • C3 Xen, paravirt. Linux dom0 exposes ext3 to
    the guest domain via NFS/TCP
  • C4 Xen, paravirt. Linux dom0 exposes ext3 to
    the guest domain via NFS/Xen-RPC
  • C5 Xen, paravirt. Linux dom0 exposes PDC-ext3
    to the guest domain via NFS/Xen-RPC

56
Filesystem Performance Overhead
  • Second experiment Metadata operation overhead
  • M1 Create a large directory tree (depth6,
    fanout6)
  • M2 Remove the directory tree created by M1 (rm
    rf )

57
Network Bandwidth Overhead
  • We used iperf to measure end-to-end bandwidth
    between a pair of directly-connected hosts
  • Configurations
  • NC1 No packet interception
  • NC2 Interception and encapsulation
  • NC3 Interception, encapsulation, and annotation
    with sensitivity tags
  • Sender assigns sensitivity tags to a random
    sampling of outgoing packets
  • We vary two parameters Tag Prevalence (P) and
    Tag Fragmentation (F)

58
Network Bandwidth Overhead
59
Performance Evaluation - Summary
  • Application performance in the safe VM
  • 10x slowdown in the worst-case scenario
  • We expect to reduce this overhead significantly
    through a number of optimizations
  • Disk and network I/O overhead
  • Proportional to the amount sensitive data and the
    degree of tag fragmentation
  • 4x overhead in the worst-case scenairo (assuming
    32-bit tag identifiers)

60
Summary and Future Work
  • PDC seeks a practical solution to the problem of
    data confinment
  • Defend against exfiltration by outside attackers
  • Prevent accidental policy violations
  • Hypervisor-based architecture provides mechanisms
    for isolation, information flow tracking, and
    policy enforcement
  • Currently working on
  • Improving stability and performance of the
    prototype
  • Studying the issue of taint explosion in Windows
    and Linux environments and its implications on
    PDC
Write a Comment
User Comments (0)
About PowerShow.com