Security Applications For Emulation - PowerPoint PPT Presentation


PPT – Security Applications For Emulation PowerPoint presentation | free to view - id: ef2a-OGVhM


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Security Applications For Emulation


Used in Argos, a system ... in identifying vulnerabilities as they happen, eg Argos. ... Argos works by dynamic taint analysis of network data which is ... – PowerPoint PPT presentation

Number of Views:116
Avg rating:3.0/5.0
Slides: 63
Provided by: silvio2


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Security Applications For Emulation

Security Applications For Emulation

Speaker details
  • An independent researcher.
  • Presented a number of vulnerabilities at the
    first Ruxcon after auditing the opensource
    kernels (FreeBSD, NetBSD, Linux, OpenBSD)?
  • Also interested in Reverse Engineering, speaking
    at CanSecWest on Linux malware.

  • A Presentation examining public research, and the
    results of my own research, on the topic of
    emulation applied to security.
  • Technology review
  • Security applications for emulation
  • Reverse engineering Cisco IOS Heap Management
  • Tracing and evaluating the capabilities of
  • Dynamic Taint Analysis
  • Automated unpacking
  • Symbolic Execution
  • Detecting Runtime Errors in Programs
  • And introducing a new tool for the detecting out
    of bounds heap access in the Linux Kernel

  • Different technologies all sharing similar
  • Virtualization
  • Emulation
  • Dynamic Binary Translation
  • Different types of virtualization
  • Full Virtualization provides a simulation of the
    underlying hardware
  • Host performs native execution of the guest as
    much as possible.
  • Not an emulator, so aiming for near native
  • In i386, if there isn't full virtualization
    hardware support, privileged code is translated
  • Eg VMWare, VirtualBox
  • Virtualization is an important technology, but
    this presentation focuses on the host being able
    to intercept and emulate each individual
    instruction in the guest. This is in contrast to
    virtualization, which executes guest code
    natively as much as possible, with little general
    host interception.

Emulation and Dynamic Binary Translation
  • Emulation
  • Emulator Fetches, Decodes and Executes
    instruction by instruction
  • Different types of emulators whole system
    emulators capable of running unmodified guest
    operating systems, or emulators only capable of
    running applications on specific systems.
  • Guest state is maintained in software, including
    the CPU, system memory, and for whole system
    emulators, hardware devices.
  • Eg Bochs
  • Used in the open source automated unpacker,
    Pandora's Bochs.
  • Dynamic Binary Translation
  • A faster form of emulation
  • Caches blocks of decoded and translated
  • Eg QEMU
  • Used in Argos, a system for capturing 0day
  • Used in my MemCheck tool for detecting Linux
    kernel heap access bugs.

Dynamic Analysis and Emulation
  • An emulator can be used to implement dynamic
  • Dynamic Analysis means running a program and
    seeing whats going on as it executes, eg as in a
  • It can mean identifying specific behaviors in the
    program, such as how the program accesses memory,
    transfers execution control, or treats network
  • Dynamic analysis using a debugger is prone to
    anti-debugging tricks, and is very cumbersome
    when applied in a kernel context.
  • A robust solution is to perform dynamic analysis
    from inside an emulator.
  • Hooks are added in the fetch/decode/execute loop
    of an emulator.
  • When modifying a dynamic binary translator
    generally, instrumentation or callbacks are added
    to the translated code blocks.
  • All the applications for emulation presented, are
    related to or applications of dynamic analysis.

Part i) Reverse Engineering Cisco IOS's
Heap Management
Reverse Engineering Cisco IOS with Dynamips
  • Dynamips is an open source emulator and binary
    translator of Cisco hardware running PPC/MIPS IOS
  • Potential future development environment for IOS
  • Dynamic analysis of IOS
  • My experience is with IOS on MIPS
  • IOS MIPS images use an invalid ELF e_machine
  • Some IDA (5.2) bugs with MIPS (turn off macros to
  • Dynamic analysis, can identify heap management
    functions in IOS and provide a means to
    potentially implement Valgrind style heap
  • It can also be used to reverse engineer other
    components of IOS.
  • Dynamic analysis is different to the static
    approach, and has some advantages
  • Can be completely automated
  • Since the behavior of the IOS implementation is
    relatively constant this method can work across
    different IOS images, providing new or obsolete
    features aren't being examined

IOS Heap Management Basics
  • Well documented public research in developing
    heap based buffer overflow exploits describes
    general heap layout.
  • IOS heap allocated buffers have a header
    appearing directly before the buffer, and a
    trailer that follows the buffer.
  • These 'chunks' form a doubly linked list.
  • Chunk header begins with a known constant
  • This fact is used later in the analysis.

Dynamic Analysis Approach
  • Knowing the header constant of a malloc chunk
    enables us to track memory allocations by
    intercepting writes to memory of that particular
  • Heap management is slightly different in a kernel
    but a kernel or user mode alloc/free still has a
    set of expected semantics and prototypes.
  • An alloc(ation) function returns a pointer to an
    allocated buffer.
  • But don't expect there only to be one argument of
    the allocation size, eg kmalloc in Linux has
    multiple arguments including flags.
  • Free might have multiple arguments also, but one
    of those arguments is certainly a pointer to an
    allocated buffer.
  • By tracking allocations, and checking the
    behavior of functions, we can infer the locations
    of malloc and free.

Identifying Functions with Dynamic Analysis
  • Finding malloc
  • Track writes to memory that write the constant
    that identifies a malloc chunk.
  • Track procedures exits, checking the return value
    for a pointer to a known allocated buffer. This
    return value is the chunk location chunk header
  • First function to return allocated buffer is
    malloc, but sample a number of times to be sure.
  • Finding free
  • Find two malloc calls that return the same
  • Free must have occurred between mallocs since
    logically, allocated buffers can't overlap.
  • Track procedure calls with an argument matching
    freed memory, eg free(ptr)?
  • Sample large enough set, common function among
    samples is free.

Testing the results with a double free and
overlapping allocation checker.
  • How can we determine if malloc and free are the
    only heap management functions.
  • The solution is to trace those functions while
    running IOS, building our own representation of
    the heap, all the while checking for consistency
    in our representation.
  • Certain conditions should always be true in a
    well managed heap. If any assertions fail
    catastrophically, our model of the heap is
  • Only allocated memory can be freed.
  • Allocated memory can not overlap.
  • This results in a checker that can be used to
    detect double free bugs in IOS, as they happen,
    much like Valgrind. But IOS checks the
    consistency of the heap regularly and also during
    free, so the checker is probably only useful for
    automated analysis.

Detecting IOS 0-day
  • Another type of IOS checker could potentially be
    made to detect 0-day attacks.
  • IOS exploitation uses corrupted malloc chunks
    that are subsequently freed.
  • Freeing the corrupt chunk causes an arbitrary
    write to memory.
  • The checker could confirm the consistency of
    header attributes such as the size of each chunk
    through the interception of free calls.
  • For more complete coverage, the chunk header
    could be retrieved and stored after every malloc,
    subsequently being verified before free.
  • In a roll-out, honeypots could automatically
    detect mass 0-day exploitation and raise alarms
    of the attack.

Reference Counting.
  • Tracing malloc and free, shows us conditions
    where we are freeing the same memory twice, or
    performing a double free.
  • Potentially this could indicate a bug in IOS but
    there are simply too many alerts to be
  • In fact, it turns out that as suspected by other
    researchers, allocated buffers are reference
  • Before the two double frees is a call to
    increment the reference count (IncRefCnt) of the
    buffer, thus causing the first free to simply
    decrement the count without actually freeing the
  • MIPS has an atomic addition instruction, used
    only for incrementing the malloc chunk refcnt.
  • Any procedure that uses this instruction on a
    malloc chunk is IncRefCnt.
  • For other architectures, the refcnt field in the
    malloc chunk is at a fixed offset, and writes to
    this address may also indicate the location of

  • Tracing also reveals the appearance of
    overlapping memory allocations.
  • In later versions of IOS, 'MallocLite'
    implementation is used.
  • A 64k allocation is used which is subsequently
    subdivided for use in allocations
  • This feature may affect the writing of heap
    exploits and should be taken into account.
  • If malloc recursively calls itself, requesting
    64k of memory, then MallocLite is allocating this
    larger block of memory.
  • For tracing, ignoring recursive allocations works.

  • The malloc tracer could potentially be used to
    implement a Valgrind style MemCheck tool to
    detect out of bounds heap access.
  • This could be used alongside fuzzing to provide
    more accurate detection of vulnerabilities when
    they happen.
  • Easy to implement, but the initial attempt
    resulted in too many false positives.
  • Problem There are other functions that have
    direct access to internal heap structures besides
    malloc, free and IncRefCnt, eg CheckHeaps.
  • More reversing is required.
  • If Cisco gave me access to the source, I'm pretty
    sure I could whack this out in a week -)?
  • The MemCheck concept was later successfully
    implemented for the Linux Kernel as source code
    is openly available.

Cisco IOS Summary
  • By modifying the open source Cisco emulator,
    dynamips, dynamic analysis of IOS is possible.
  • Dynamic Analysis of IOS can aid in reverse
  • Potentially one day we will have Valgrind style
    IOS memory checking tool, or in the near future a
    0-day detection tool.

Part ii) Tracing execution and evaluating
the capabilities of binaries and potential malware
Tracing and evaluating the capabilities of
  • Running binary inside a sandboxed environment
    logging events of interest.
  • System calls, registry changes, files accessed,
    process management, services started or stopped
  • Public websites offer free online services to
    evaluate binaries and potential malware.
  • Trace useful for quickly determining what a
    binary is doing.
  • May help in determining if binary is malicious.
  • A non emulated approach is to trace the binary
    using a debugger based tool from userspace within
    a VM.
  • Malware almost certain to use anti debugging
    tricks which may make tracing problematic.
  • Another approach is to perform the execution
    inside an emulator.
  • Emulated approach very resistant to modern
    anti-debugging tricks.

  • TTAnalyze A Masters thesis that presented a
    closed source fork of QEMU that logged windows
    system calls.
  • Important as other techniques such as automated
    unpacking are based on similar methods and the
    thesis clearly describes the implementation.
  • Windows XP running as a guest, emulated by a fork
    of QEMU in the host.
  • Host uploads binary to guest using virtual
    network created by VM.
  • Binary is executed in guest environment.
  • Host monitors execution and logs events of

TTAnalyze concepts
  • Host emulator intercepts every instruction.
  • It identifies instructions that belong to the
    process being monitored.
  • How to know what code is part of the process we
    wish to monitor?
  • CR3 register (the page directory base address) is
    unique for each process.
  • Kernel maintains a process list (EPROCESS) with
    these addresses.
  • Given a specific process instruction, it may be
    executing either kernel code or user code.
  • For our target process, kernel code is when EIP
  • For the target process, it checks EIP, and if it
    points to a Windows API call it logs the event.
  • It also logs returning from Windows API calls.
  • To know the addresses of each Windows API call,
    it uses the PEB from the target process used to
    eventually retrieve a list of all loaded DLL's.
  • The library calls in each DLL is parsed, and
    their addresses noted.

TTAnalyze Implementation
  • A component that executes inside the guest
  • Kernel driver to parse kernel EPROCESS list, to
    obtain the page directory address (CR3), and PEB
    of the target process.
  • RPC mechanism to control guest operations from
  • uploading executables to guest
  • Controlling execution of the target process,
    which is initially started in a suspended state
    to allow querying.
  • Querying the pdb/CR3 and PEB kernel driver.
  • QEMU modifications
  • Identifying the process of interest using the CR3
    result from the guest kernel driver.
  • The PEB is used to established a list of
    addresses for each windows API call in a DLL
  • Identifying entering and leaving windows API
    calls in the guest, based on intercepting each
    instruction and checking EIP.

TTAnalyze Implementation Challenges
  • Arguments for system calls which reside in
    virtual memory might be paged out.
  • QEMU page fault handler detects condition then
    alters guest code to access target memory, paging
    it in.
  • Malware can use the Native API directly.
  • Understanding this requires unofficial
    documentation of API.
  • Trap native calls by checking each instruction
    for a OS trap (int 2e or sysenter).

TTAnalyze Attacks
  • Malware might evade detection of Windows API
    calls which is dependant on exact EIP matching.
  • Vulnerable if malware doesn't jump to the very
    beginning of a function, eg Caller might
    implement callee prologue
  • Malware might detect guest changes.
  • Communication channel between host and guest.
  • Kernel driver component.
  • See Pandora's Bochs (An automated unpacker)
    implementation with no guest changes.
  • Malware might detect system emulators
  • CPU Bugs (in errata) generally not implemented
  • Model Specific Registers implementation different
    for different CPU vendors.

Binary Tracing Summary
  • Existing software that traces binaries using a
    userland style debugger based tool in a VM,
    vulnerable to many anti-debugging tricks.
  • An emulator can present a solution to that

Part iii) Using emulation for dynamic taint
Dynamic Taint Analysis
  • A technique used to analyze the the flow of data
    in a program.
  • Has applications in identifying vulnerabilities
    as they happen, eg Argos.
  • Has also been used to identify spyware, eg,
  • Is a general concept that can be used in a number
    of applications, including symbolic execution.
  • Traces the flow of data, instruction by
    instruction, from a source that generates
    'tainted' data, to sinks where the data is used.
  • Variables, registers and memory are tagged as
    being tainted or clean.
  • Destination operand in instruction becomes
    tainted when a source operand is tainted.
  • Sometimes its useful that data can become
    untainted by certain operations.

Dynamic Taint Analysis in Vulnerability Detection
  • Dynamic Taint Analysis has been applied for
    vulnerability detection such as SQL injection, or
    incorrect use of the Unix exec() or system()
    calls which run executables.
  • Source of user input, that is untrusted data,
    taints the data.
  • Flow of untrusted data followed by taint
  • If untrusted data checked in a condition, then
    input validation deemed to have occurred, so
    untaint data.
  • At site of exec(), system(), or even
    mysql_query, check that argument is non tainted.
  • If tainted, then untrusted data assumed to have
    reached privileged code and vulnerability has

Argos A tool for detecting 0day attacks
  • Uses dynamic taint analysis to detect 0day
  • An open source fork of QEMU.
  • Detects exploits as they are happening and
    automatically generates vulnerability
  • Vision is of an automatic worm defense system.
  • Honeypots detect 0day attacks.
  • Generates and delivers vulnerability signatures
    to intrusion prevention systems
  • Argos works by dynamic taint analysis of network
    data which is considered untrusted.
  • Taints data returned from QEMU emulated network
  • Exploits detected when their is code redirection
    under attacker control.
  • If EIP becomes tainted (under the control of the
  • If EIP points to tainted data.
  • Execve system calls checked for tainted arguments.

Dyanamic Taint Analysis Summary
  • Dynamic Taint Analysis is a technique used to
    track the flow of data.
  • Important because it can be used as a general
    technique in more applied topics.
  • Has applications including vulnerability
    detection and is used in places like symbolic

Part iv) Automated Unpacking
  • A packer rewrites an executable, wrapping a new
    layer of code around the original program.
  • Essentially becomes an executable inside an
  • A packer is used to compress, obfuscate or
    encrypt the original executable
  • Today almost all malware is packed.
  • Packers originally used for compression
  • I remember packers (or crunchers) from the early
    90's, and had 2 floppy disks full of them, for
    the Commodore 64!
  • The resulting packed executable consists of a
    runtime unpacking layer and a binary blob of the
    compressed or obfuscated original program.
  • At runtime, the unpacking layer, decompresses the
    blob writing to memory the original executable.
    It then transfers execution back to the original
  • Not all packers follow this behavior. Some
    packers convert the original executable to PCODE.
    At runtime the packed executable acts as a VM.

  • Unpacking is the process of extracting the
    original executable from a packed image.
  • The manual approach is to run the packed
    executable in a debugger, skipping the unpacking
    stub which writes to memory the original image,
    and breaking (in the debugger) when execution
    transfers to the now unpacked image.
  • A dump of memory, but rebuild the image so its a
    valid executable again.
  • Requires fixing the Import Address Table.
  • ImpRec can do this.
  • Debugger scripts can automate the process on
    specific unpackers by identifying instruction
    sequences that indicate which stage the unpacking
    stub is in.

Automated Unpacking
  • Unpacking can be automated.
  • Run packed executable.
  • Track all memory writes by executable.
  • If execution transfers to a priorly written to
    memory location, then unpacking deemed to have
  • May be necessary to repeat as multiple layers may
  • Public automated unpackers available from
    Offensive Computing, and also Pandora's Bochs.

Automated Unpacking Implementation Approaches
  • Multiple approaches in implementation
  • Use hardware page protection in OS to track
    writes and execution. Eg Offensive Computing.
    This results in high performance.
  • If running inside a virtualized environment like
    VMWare, VM might be detected. Offensive
    Computing recommend using a real goat machine.
  • Dynamic Instrumentation or complete emulation of
    packed program to track memory writes and
  • Offensive Computing use instrumentation approach
    with Intel PIN framework.
  • Pandoras Bochs uses the Bochs emulator.

Automated Unpacking using an Emulator
  • Emulation is a mature closed source technology
    used by AntiVirus
  • Original usage of emulation was to detect
    polymorphic virus, but now used for unpacking
  • Typical AntiVirus emulator emulates both the
    instruction set and parts of the operating
  • This is how I wrote my own automated unpacker and
  • There are no software licensing problems since
    the emulator is only a regular piece of
  • Another approach is to use a whole system
    emulator such as Bochs or QEMU running an
    installed OS.
  • Non emulated approaches are more likely to be
    detected or be suspect to anti-debugging tricks
    employed by malware.

Using an AV style Emulator as a CPU checker
  • While developing my AV style emulator, a need
    arose to verify the emulation.
  • I Implemented a program tracer to trace programs
    in parallel to emulation
  • Tracer needed to automatically evade
    anti-debugging tricks
  • Instructions needed to be emulated that would
    indicate the program was being debugged. (eg,
    EFlags popf, rdtsc, or software int1 being
    confused with single stepping)?
  • Library calls also (eg, Process32 which shows
    debuger in process list, and IsDebuggerPresent)?
  • For each traced instruction, the emulator
    executes the same instruction.
  • The CPU state from the tracer is verified against
    the state of the emulator, and checked for
  • Some instructions produced differences between
    emulation and tracing, not due to a fault of the
    emulator or tracer.
  • CPU Bugs. Some Instructions not following Intel
  • Not setting/clearing processor status flags

Automated Unpacking using an Emulator
  • Changes to an emulator required involve modifying
    the software MMU to track memory writes, and
    checking each instruction to see if the EIP
    matches any addresses where memory writes have
  • Similar problems as TTAnalyze are present in
    determining what code is part of the target
  • The Renovo unpacker from the BitBlaze project
    follows the TTAnalyze approach in starting the
    executable in a suspended state, and then using a
    kernel driver in the guest to find the page
    directory base address of the process.
  • Pandora's Bochs uses an unmodified guest system
    and instead watches for changes in the CR3
    register to identify the target process.
  • To determine the value of CR3 it takes into
    account that in kernel mode windows uses the fs
    register to reference a known structure leading
    to the EPROCESS list which like TTAnalyze,
    contains the page directory base address (CR3) of
    each process.

Attacks against Automated Unpackers and Emulators
  • Malware might make use of unimplemented emulation
    of the architecture, instruction set or operating
  • For AV emulators, use of obscure libraries.
  • For whole system emulators, detection of the
    emulator. Malware might check existence of known
    CPU errata.
  • Having malware require activation (eg, using the
    Internet), or only occasionally activating.

Attacks (cont) Virtual Machine Packers
  • Packer translates executable into PCODE.
  • At runtime, PCODE is decoded and executed in the
    style of a virtual machine.
  • PCODE can be polymorphic.
  • This type of packer doesn't follow the 'write to
    memory then execute' algorithm.
  • Eg, TheMida, but fortunately these packers are
    not as common in current malware.
  • No automated method of unpacking against an
    unknown packer of this type.

Automated Unpacking Summary
  • Automated unpacking works on a theory of
    intercepting execution on priorly written to
    memory addresses.
  • Multiple approaches to implementation emulation
    has some advantages.
  • Automated unpacking doesn't work on VM based

Part v) Using emulation to design and
implement symbolic execution
Symbolic Execution
  • A technique used to analyze programs.
  • For unknown input to a program, it maintain
    generalized information on program state,
    systematically exploring program paths.
  • Really a definition for mixed symbolic
  • Execution occurs, by emulating instructions and
    using symbolic formula instead of concrete data
    for user defined input.
  • Example symbolic data can be network packet
    contents, program arguments, file contents etc
  • Symbolic formula contain information on all
    program states on that program path for arbitrary
    user input, that is, all the values the data can
    possibly hold as held true by the symbolic
  • Bug finding is equivalent to solving the
  • Eg, Is this pointer being dereference ever equal
    to 0, given arbitrary user input.
  • And if so, what is the user input that generates
    that bug.

SMT Based Constraint Solvers
  • Symbolic equations are generated for instructions
    that have symbolic arguments.
  • Conditional instructions generate equations which
    are constraints (eg, x
  • Equations handled by Satisfiability over Modulo
    Theory (SMT) Solvers.
  • Efficient SMT based solvers are a relatively new
    achievement in the past decade.
  • Annual SMT competition pits solvers against each
  • Microsoft has their own solver which is free to
    use, but not open source.
  • A number of open source solvers available.
  • SMT Solver can be queried, given a set of
    equations and constraints, to see if certain
    queried constraints are true.
  • Can easily determine if symbolic pointer is
  • SMT solvers can also generate concrete solutions
    from symbolic equations

Applications of Symbolic Execution
  • As a Bug checker
  • Dawson Englers closed source C checker ExE which
    could detect buffer overflows, null pointer
    dereferences and divisions by zero.
  • The open source Catchconv which doesn't explore
    program paths, but checks assertions on a given
    set of input using symbolic execution to find
    signedness bugs.
  • Intelligent fuzzing
  • Symbolic Execution can automatically enumerate
    the paths and data in a program that fuzzing
    normally misses, aiming towards complete
    automated code coverage.
  • Eg, closed source Microsoft Sage research
  • Tracing and evaluating the capabilities of
  • The closed source Bitblaze projects implements
    BitScope which is in a similar vein to TTAnalyze
    except it symbolically explores the many program
    paths in potential malware to find its

Symbolic Execution Implementation
  • Emulator runs program, instruction by
    instruction, generating symbolic equations for
    instructions when a source operand is symbolic,
    such as the symbolic equation ebxeax 10.
  • In an instruction, if a source operand is
    symbolic, destination becomes symbolic.
  • This is implemented using Dynamic Taint Analysis
  • At conditional instructions, two possible
    equations, the condition being true, or the
    condition being false.
  • Symbolic Execution explores each path
  • A symbolic constraint representing the conditions
    truth is given to each path, eg (x 10 and x
  • Feasibility, that is if an equation can be
    satisfied as true, of each path is determined by
    SMT solvers.

Symbolic Execution Challenges
  • Symbolic Execution may never terminate in the
    presence of loops, so loops must be simplified,
    typically through unrolling.
  • Symbolic Execution therefore is not complete.
  • Path Explosion Dealing with functions like
    strcmp with symbolic input, has many possible
    paths an exponential number of paths for the
    size of the string.
  • BitBlaze approach Hard code 'function summaries'
    to deal with common library functions.
  • Dealing with symbolic pointers.
  • Dynamic taint analysis has trouble determining
    the target memory that becomes tainted if a
    pointer is symbolic.
  • Requires SMT solver to determine concrete
    solutions of pointer.
  • SMT solver support used for target architecture
    may not be complete
  • No public solvers support floating point.

Symbolic Execution Summary
  • Symbolic execution is a relatively new method to
    analyze programs.
  • Applications include bug checkers, smart fuzzers,
    and binary evaluation.
  • I believe symbolic execution has a big part in
    the future of automated analysis.

Part vi) Detecting Runtime Errors in Programs
  • Valgrind is a heavyweight dynamic binary
    instrumentation framework.
  • Most well known for the MemCheck checker.
  • Memcheck used as a bug checker for incorrect heap
    use or access.
  • Also detects uninitialized variable use.
  • Translates machine code to IR, then allows
    instrumentation, with modules that implement
    runtime checkers.
  • Valgrind's Memcheck can detect out of bounds or
    invalid heap access and tracks what addresses can
    be accessed by maintaining a 'shadow memory'
    mirroring allocations on the heap.
  • For each address in shadow memory, also stores
    weather its initialized or not.
  • Then checks all guest memory references belong to
    the shadow memory using IR instrumentation.

Valgrind's MemCheck with uninitialized variables
  • Uninitialized variable checker implemented using
    dynamic taint analysis.
  • Newly allocated memory and new stack frames
    considered tainted.
  • Initializing data untaints it.
  • Alert when using tainted/uninitialized data.
  • Naive implementation causes false positives.
  • Memcpy of padded structures or memcpy of
    structures with uninitialized members causes
    false positives.
  • Fixed by warning only when using uninitialized
    variables in system calls, conditions or being
    dereferenced as a pointer.

Detecting Runtime Heap Errors in the Linux Kernel
  • Tools that have similar designs or aims to detect
    some classes of heap errors in the Linux Kernel.
  • KEFence (Linux) / MemGuard (FreeBSD)?
  • Detects overflows (and underflows for KEFence,
    but not both at the same time) of heap buffers.
  • Allocates a guard page next to the allocated
    buffer that page faults on any access.
  • Only detects overflows, not arbitary invalid
  • KmemCheck (Linux)?
  • Used to Detect uninitialized variable bugs.
  • Maintains a shadow memory indicating state of
    data being initialized or not.
  • Page faults on all heap access, then checks
    shadow memory against access.
  • UML Valgrind
  • Doesn't seem active, and source unavailable (

Linux Kernel MemCheck
  • My own runtime checker that detects out of bounds
    heap access in the Linux Kernel.
  • Not Valgrind's MemCheck I named it poorly I
  • Tested under Linux 2.6.26 using a Windows Vista
    Cygwin host.
  • Implemented as a C fork of QEMU.
  • Dumps kernel stack trace on guest access
  • Only reports when a memory access violation
    occurs, much like Valgrind.
  • Not a static analysis tool.
  • Host maintains 'shadow memory' of guest Linux
    Kernel heap that identifies valid heap
  • The shadow memory is created by intercepting the
    heap management functions in the Linux kernel and
    building a representation of the guest heap.
  • MemCheck validates all memory access against this
    shadow memory (like Valgrind).
  • Except in heap management functions like kmalloc,
    kfree etc.

Linux Kernel Heap Management
  • Linux has had several memory allocators, the
    latest Linux kernels now using the slub
  • MemCheck only supports the latest slub
  • There are also three internal allocators in Linux
    that use the heap.
  • The Page Allocator, using the buddy allocator
    internally, which only handles allocations of
    sizes being a predetermined multiple of the page
  • The page allocator can be called directly or
    indirectly from the slub allocator.
  • The Slub Allocator? which handles allocations of
    varying sizes by dividing up a slab that
    originates from the page allocator.
  • The BootMem Allocator which uses a simpler
    algorithm than the other allocators during boot
    time only.

Linux Kernel Heap Tracing and Guest Linux
  • MemCheck must trace the kernel allocator
    functions to properly create its shadow memory.
  • However tracing an unmodified Linux guest
    presents problems.
  • The Page Allocator does not always return the
    address of the allocated page contents, but
    returns a structure of the page description
  • The Slub Allocator defines kmalloc as an inline
    function which can't be intercepted using a
    compile time symbol address.
  • Following internal logic can be difficult, such
    as kmalloc using the page allocator internally.
  • The solution is to use a modified guest Linux
    Kernel that uses instrumentation of the
    allocators that MemCheck can easily intercept

MemCheck QEMU implementation
  • QEMU was modified to implement MemCheck.
  • MemCheck is written in C running in a Windows
    host, so I ported QEMU 0.9.1 to compile under
    g. In hindsight, porting was not necessary and
    not worth the effort. I also backported some
    patches that cause 0.9.1 to fail in windows.
  • QEMU has an optimization of merging basic blocks
    in a translation block. I needed basic block
    granularity to correctly intercept the beginning
    of functions so this QEMU optimization was turned
  • A tracer was implemented to track functions using
    a callback interface on function entry or exit.
  • By tracing the heap management code, a simple
    shadow memory was constructed using C STL maps
    for the implementation.
  • The software MMU in QEMU was modified to check
    the memory access was a valid address in the
    shadow memory.

MemChecking the Linux Kernel
  • The Linux Test Project (LTP) contains 3000 tests
    for the Linux Kernel which exercise much of the
    core kernel code.
  • Ran the default test suite on Linux
    using MemCheck.
  • MemCheck is slow, but still allows for
    interactive sessions.
  • Fedora Linux takes 30 minutes to boot.
  • Let the testsuite to run overnight
  • No out of bounds access detected.
  • Reran the testsuite again using slub debugging
    which in combination to MemCheck, may result in
    more bugs being detected.
  • Again, no out of bounds access detected.
  • While no immediate bugs were identified in, MemCheck may be used against future
    kernel releases, possibly as part of an automated
    test suite, or used to aid kernel debugging and

MemCheck Limitations
  • Because MemCheck is based on QEMU, very little
    hardware is emulated so most of the Linux driver
    code is not tested.
  • Buffer overflows don't necessarily result in
    memory access using invalid heap addresses.
  • A slab based allocator fits heap allocations next
    to each other, so buffers overflow into adjacent
    and valid heap allocations.
  • A solution is to boot Linux using the slub_debug
    kernel option which separates heap objects using
    a redzone.
  • If MemCheck generates a report from a vulnerable
    kernel module, only kernel addresses are given in
    the stack trace no symbolic names are used.

MemCheck TODO
  • A solution to the adjacent buffer problem is to
    associate every heap access with its original
    allocation by tracking heap pointers using
    dynamic taint analysis.
  • This use of dynamic taint analysis could also be
    applied in userland, as a Valgrind checker.
  • Dynamic taint analysis can also be the basis of
    tracking uninitialized variable usage without the
    false positives currently associated with
  • Dynamic taint analysis could also be used to
    implement garbage collection, which could be used
    to identify memory leaks at the exact location of
    each leak.
  • Symbol names for addresses in kernel modules!

MemCheck Packages
  • http// For the
  • http// For commentary
    on some of MemCheck's internals.

Runtime Error Detection Summary
  • Existing tools for runtime error detection
    include Valgrind which detects userland heap
  • Tools for the kernel exist such as kmemcheck
    which detects uninitialized variables.
  • MemCheck is a new tool to detect heap bugs in the
    Linux Kernel, and operates similar to Valgrind.

Thats all folks…
  • A 2008 CQU Graduate looking for interesting