RAMSES Regeneration And iMmunity SErviceS: A Cognitive Immune System - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

RAMSES Regeneration And iMmunity SErviceS: A Cognitive Immune System

Description:

Data structures and functions within program. Used by program components to talk to each other ... Attack: Removes all removable files in web server document ... – PowerPoint PPT presentation

Number of Views:384
Avg rating:3.0/5.0
Slides: 57
Provided by: securesy
Category:

less

Transcript and Presenter's Notes

Title: RAMSES Regeneration And iMmunity SErviceS: A Cognitive Immune System


1
RAMSES (Regeneration And iMmunity SErviceS)A
Cognitive Immune System
Self Regenerative Systems 18 December 2007
  • Mark Cornwell
  • James Just
  • Nathan Li
  • Robert Schrag
  • Global InfoTek, Inc

R. Sekar Stony Brook University
2
Outline
  • Overview
  • Efficient content-based taint identification
  • Syntax and taint-aware policies
  • Memory attack detection and response
  • Testing
  • Red Team suggestions
  • Questions
  • Demo

3
RAMSES Attack Context
  • Attack target program mediatingaccess to
    protected resources/services
  • Attack approach use maliciously crafted input
    to exert unintended control over protected
    resource operations
  • Resource or service uses
  • Well-defined APIs to access
  • OS resources
  • Command interpreters
  • Database servers
  • Transaction servers,
  • Internal interfaces
  • Data structures and functions within program
  • Used by program components to talk to each other

4
Example 1 SquirrelMail Command Injection
Input Interface
sendtonobody rm rf
send_to_list _GETsendto
commandgpg r nobody rm rf 2gt1
command gpg -r send_to_list 2gt1
Program
popen(command) Attack Removes all removable
files in web server document tree
popen(command)
Output Interface
5
Example 2 phpBB SQL Injection
topic-1 UNION SELECT ord(substring(user_passwo
rd,1,1)) FROM phpbb_users WHERE user_id 3
Input Interface
topic_id_GETtopic
sql SELECT p.post_id FROM POSTS_TABLE WHERE
p.topic_id topic_id
sql SELECT p.post_id FROM POSTS_TABLE WHERE
p.topic_id -1 UNION SELECT ord(substring(user_p
assword,1,1)) FROM phpbb_users WHERE user_id 3
Program
sql_query(sql) Attack Steal another users
password
sql_query(sql)
Output Interface
6
Attack Space of Interest (CVE 2006)
Generalized Injection Attacks
7
Detection Approach
  • Attack use maliciously crafted input to exert
    unintended control over output operations
  • Detect exertion of control
  • Based on taint degree towhich output depends
    on input
  • Detect if control is intended
  • Requires policies (or training)
  • Application-independent policies are preferable

8
RAMSES Goals and Approach
  • Taint analysis develop efficient and
    non-invasive alternatives
  • Analyze observed inputs and outputs
  • Needs no modifications to program
  • Language-neutral
  • Leverage learning to speed up analysis
  • Attack detection develop framework to detect a
    wide range of attacks, while minimizing policy
    development effort and FP/FNs
  • Structure-aware policies leverage
    interplaybetween taint and structural changes to
    output requests
  • Use Address-Space Randomization (ASR) for memory
    corruption
  • ASR efficient, in-band, positive tainting for
    pointer-valued data
  • Immunization filter out future attack instances
  • Output filters drop output requests that violate
    taint-based policies
  • Input filters Project policies on outputs to
    those on inputs
  • Relies on learning relationships between input
    and output fields
  • Network-deployable

9
Efficient Content-Based Taint Identification
10
Steps
  • Develop efficient algorithms for inferring flow
    of input data into outputs
  • Compare input and output values
  • Allow for parts of input to flow into parts of
    output
  • Tolerate some changes to input
  • Changes such as space removal, quoting, escaping,
    case-folding are common in string-based
    interfaces
  • Based on approximate substring matching
  • Leverage learning to speed up taint inference
  • Even the efficient content-matching algorithms
    are too expensive to run on every input/output
  • Same learning techniques can be used for
    detecting attacks using anomaly detection

11
Weighted Substring Edit Distance Algorithm
  • Maintain a matrix Dij of minimum edit
    distance between p1..i and s1..j
  • Dij minDi-1j-1 SubstCost(pi,sj),
    Di-1j
    DeleteCost(pi), Dij-1
    InsertCost(sj)
  • D0j 0 (No cost for omitting any prefix of
    s)
  • Di0 DeleteCost(p1)DeleteCost(pi)
  • Matches can be reconstructed from the D matrix
  • Quadratic time and space complexity
  • Uses O(ps) memory and time

12
Improving performance
  • Quadratic complexity algorithms can be too
    expensive for large s, e.g., HTML outputs
  • Storage requirements are even more problematic
  • Solution Use linear-time coarse filtering
    algorithm
  • Approximate D by FD, defined on substrings of s
    of length p
  • Let P (and S) denote a multiset of characters in
    p (resp., s)
  • FD(p, s) min(P-S, S-P)
  • Slide a window of size p over s, compute FD
    incrementally
  • Prove D(p, r) lt t ? FD(p, r) lt t for all
    substrings r of s
  • Result O(p2) space and time complexity in
    practice
  • Implementation results
  • Typically 30x improvement in speed
  • 200x to 1000x reduction in space
  • Preliminary performance measurements 40MB/sec

13
Efficient online operation
  • Weighted edit-distance algorithms are still too
    expensive if applied to every input/output
  • Need to run for every input parameter and output
  • Key idea
  • Use learning to construct a classifier for
    outputs
  • Each class consists of similarly tainted outputs
  • taint identified quickly, once the class is
    known
  • Classifying strings is difficult
  • Our technique operates on parse trees of output
  • For ease of development, generality, and
    tolerance to syntax errors, we use a rough
    parser
  • Classifier is a decision tree that inspects parse
    tree nodes in an order that leads to good
    decisions

14
Decision Tree Construction
  • Examines the nodes of syntax tree in some order
  • The order of examination is a function of the set
    of syntax trees
  • Chooses nodes that are present in all candidate
    syntax trees
  • Avoids tests on tainted data, as they can vary
  • Avoids tests that dont provide significant
    degree of discrimination
  • similar-valued fields will be collected
    together and generalized, instead of storing
    individual values
  • Incorporates a notion of suitability for each
    field or subtree in the syntax tree
  • Takes into account approximations made in parsing

15
Example of a Decision Tree
  • 1. SELECT FROM phpbb_config
  • 2. SELECT u.,s. FROM phpbb_sessions
    s,phpbb_users u WHERE s.session_id'a3523d78160ef
    dafe63d8db1ce5cb0ba' AND u.user_ids.session_user
    _id
  • 3. SELECT FROM phpbb_themes WHERE themes_id1
  • 4. SELECT c.cat_id,c.cat_title,c.cat_order FROM
    phpbb_categories c,phpbb_forums f WHERE
    f.cat_idc.cat_id GROUP BY
    c.cat_id,c.cat_title,c.cat_order ORDER BY
    c.cat_order
  • 5. SELECT FROM phpbb_forums ORDER BY
    cat_id,forum_order
  • switch (1)
  • case ROOT switch (1.1)
  • case CMD switch (1.1.2)
  • case c FINAL _at_1.1.1SELECT
    _at_1.1.3. cat_id,c.cat_title,c.cat_order
    FROM phpbb_categories
    c,phpbb_forums f WHERE f.cat_idc.cat_id GROUP
    BY
    c.cat_id,c.cat_title,c.cat_order ORDER BY
    c.cat_order
  • case u FINAL _at_1.1.1SELECT
    _at_1.1.3. ,s. FROM phpbb_sessions
    s,phpbb_users u WHERE
    s.session_id'a3523d78160efdafe63d8db1ce5cb0ba'
    AND
    u.user_ids.session_user_id
  • case FINAL _at_1.1.1SELECT
    _at_1.1.3FROM phpbb_??????

16
Implementation Status and Next Steps
  • Rough parsers implemented for
  • HTML/XML
  • Shell-like languages (including Perl/PHP)
  • SQL
  • Preliminary performance measurements
  • Construction of decision trees 3MB/sec
  • Classification only 15MB/sec
  • Significant improvements expected with some
    performance tuning
  • Next steps
  • Develop better clustering/classification
    algorithms based on tree edit-distance
  • Current algorithm is based entirely on a top-down
    traversal, and fails to exploit similarities
    among subtrees

17
Syntax and taint-aware policies
18
Overview of Policies
  • Leverage structuretaint to simplify/generalize
    policy
  • Policy structure mirrors that of parse trees
  • And-Or trees with cycles
  • Can specify constraints on values (using regular
    expressions) and taint associated with a parse
    tree node
  • Most attacks detected using one basic policy
  • Controlling commands vs command parameters
  • Controlling pointers vs data

19
Controlling commands Vs parameters
  • Observation parameters dont alter syntactic
    structure of victims requests
  • Policy Structure of parse tree for victims
    request should not be controlled by untrusted
    input (tainted data)
  • Alternate formulation tainted data shouldnt
    span multiple fields or tokens in victims
    request

20
Policy prohibiting structure changes
  • Define structure change without using a
    reference
  • Avoids need for training and associated FP issues
  • Policy 1
  • Tainted data cannot span multiple nodes
  • for binary data, it should not span multiple
    fields
  • Policy 2
  • Tainted data cannot straddle multiple subtrees
  • Tainted data spans two adjacent subtrees, and at
    least one of them is not fully tainted
  • Tainted data overflowed beyond the end of one
    subtree and resulted in a second subtree
  • Both policies can be further refined to constrain
    the node types and children subtrees of the nodes

21
Commands Vs parameters Example 2
  • Memory corruption attack overflowing stack buffer
  • For binary data, we talk about message fields
    rather than parse trees

  • ..
  • Violation tainted data spans multiple stack
    fields
  • Heap overflows involve tainted data spanning
    across multiple heap blocks

22
Attacks Detected by No structure change Policy
  • Various forms of script or command injection
  • SQL injection
  • XPath injection
  • Format string attacks
  • HTTP response splitting
  • Log injection
  • Stack overflow and heap overflow

23
Application-specific policies
  • Not all attacks have the flavor of command
    injection
  • Develop application-specific policies to detect
    such attacks
  • Policy 3 Cross-site scripting no tainted
    scripts in HTML data
  • Policy 4 Path traversal tainted file names
    cannot access data outside of a certain document
    tree
  • Other examples
  • Policy 5 No tainted CMD_NAME or CMD_SEPARATOR
    nodes in shell or SQL commands

24
Implementation status
  • Four test applications
  • phpBB
  • SquirrelMail
  • PHP/XMLRPC
  • WebGoat (J2EE)
  • Detects following attacks without FPs
  • Command injection (Policies 1, 2, 5)
  • SQL injection (1, 2, 5)
  • XSS (3)
  • HTTP Response splitting (2)
  • Path traversal (4)
  • Memory corruption detected using ASR
  • Should be able to detect many other attacks
    easily
  • XPATH injection (1,2), Format-string (1, 2), Log
    injection (1,2)

25
Memory Attack Discussion
26
Memory Error Based Remote Attack
  • Attackers goal
  • Overwrite target of interest to take over
    instruction execution
  • Attackers approach
  • Propagate attacker controlled input to target of
    interest
  • Violate certain structural constraints in the
    propagation process

27
Stack Frame Structural Violation
As stack frame
Function arguments
High
Return address
Previous stack frame
Exception Registration Record
Local variables
Bs stack frame
Function arguments
Return address( to A)
Previous stack frame
Local variables
Cs stack frame
Function arguments
Low
Return address (to B)
EBP
Previous stack frame
FS0
Exception Registration Record
Local variables
ESP
28
Heap Block Structural Violation
Size Previous Size


Segment Index
Flags
Unused
Tag Index
FLink
BLink
Windows Free Heap Block Header Structure
  • Happens when removing free block from
    double-linked list
  • Ability to write 4 bytes into any address,
    usually well known address, like function
    pointer, return address, SEH etc.

29
ASLR and Crash Analysis
  • ASLR randomizes the addresses of targets of
    interest
  • Memory attack using the original address will
    miss and cause crash (exception).
  • Crash analysis tracks back to vulnerability,
    which enables accurate signature generation
  • Structural information usually retrievable at
    runtime, thanks to enhanced debugging technology
  • Crash analysis aided with JIT(Just In-time
    Tracing)
  • JIT triggered at certain events
  • Suspicious network inputs, e.g. sensitive JMP
    address
  • Attach/detach JIT monitor at event of interest
  • Memory dump can be dumped in the right
    granularity, log info from a few KB to a 2GB

30
Crash Root Cause Analysis
Root Cause Analysis
Exception Record/Context, Faulting
thread/Instructions/Registers Stack
trace/Heap/Module/Symbols
Stack Corruption
Heap Corruption
Read Access Violation Bad EIP (Corrupted
Return Address or SEH)
Read Access Violation Bad Deference (Corrupted
Local Variables/passing parameters)
Write Access Violation (Address to write, Value
to write )
31
Stack-based Overflow Analysis
  • Target driven analysis
  • The goal of attack string is to overwrite target
    of interest on stack, e.g., return address, SEH
    handler.
  • Start matching target values from crash dump to
    input, like EIP, EBP and SEH handler
  • More efficient than pattern match in the whole
    address space
  • If any targets are matched in input, expand in
    both directions to find LCS
  • A match usually indicates the input size needed
    to overflow certain targets

32
SEH Overflow and Analysis
  • A unique approach for Windows exploit
  • SEH stands for Structured Exception Handler
  • Windows put EXCEPTION_REGISTRATION_RECORD chain
    on stack with SEH in the record.
  • More reliable and powerful than overwrite return
    address
  • More JMP address to use (pop/pop/ret)
  • An exception (accidental/intentional) is desired
  • Can bypass /GS buffer check
  • SEH crash analysis
  • Catch the first exception as well as the second
    one (caused by ASR)
  • Locate the SEH chain head from first dump,
    usually overwritten by input
  • Usually first exception is enough, second
    exception can be used for confirmation

33
Heap Overflow Analysis
  • How to analyze heap overflow attack?
  • Exploit happens in free blocks unlink
  • Multiple ways to trigger
  • Write Access Violation with ASR
  • with overwriting in invalid address
  • Overwrite 4 bytes value in arbitrary address
  • Interested targets include return address, SEH,
    PEB and UEF
  • Exploit contains the pair (Address To Write,
    Value to Write)
  • Appeared in the overflowed heap blocks
  • Usually contained in registers
  • Should be provided from input by attacker
  • Match found in synthetic heap exploits
  • The value pairs need to be in fixed offset
  • For a given heap overflow vulnerability
  • To enable overwrite the right address with the
    right value desired

34
Case Studies
35
Case Study RPC DCOM
  • Step 1 Exception Analysis
  • FAULTING_IP
  • 18759f
  • ExceptionCode c0000005 (Access violation)
  • Attempt to read from address 0018759f
  • PROCESS_NAME svchost.exe
  • FAULTING_THREAD 00000290
  • PRIMARY_PROBLEM_CLASS STACK_CORRUPTION
  • Step 2 Target Input correlation
  • StackBase 0x6c0000, StackLimit 0x6bc000,Size
    0x4000
  • Begin analyze on Target Overwrite and Input
    Correlation
  • Analyze crash EIP
  • Find EIP pattern at socket input
  • Bytes size to overwrite EIP 128
  • Analyze crash EIP done!
  • Analyze SEH
  • Find SEH byte at socket input
  • Bytes size to overwrite SEH handler 1588
  • Analyze SEH done!

36
Signature Generation
  • Signature generation
  • Signature captures the vulnerability
    characteristics
  • Minimum size to overwrite certain target(s)
  • Use contexts to reduce false positive
  • Using incoming input calling stack
  • Stack offset can uniquely identify the context
  • Using incoming input semantic context
  • Message format like HTTP url/parameter
  • Binary message field

37
Components Implementation
  • RAMSES
  • Crash Monitor
  • Catch interested
  • exception only
  • Snapshots for a
  • given period
  • Self healer

Protected Application
1
Infrastructure Save Crash
Dump Extract Relevant Info Search/Match Disassembl
e
Crash(Exception)
Uses
Windows Debug Engine
Generate
2
Crash Dump
5
Analyze
4
Signature
  • RAMSES
  • Crash Analyzer
  • Fault type detection
  • Security oriented
  • analysis
  • Feedback

Provide Input History
3
Uses
Crash Dump provides the same interface as LIVE
process, so Crash Analyzer actually does NOT
have to work on saved crash dump file.
38
Testing
39
Test Attacks Applications
  • Baseline Applications
  • phpBB (php)
  • squirrelMail (php)
  • WebGoat (java)
  • hMailServer (C)

Many sub languges SQL, XML, JavaScript, HTML,
HTTP, JSON, shell, cmd, path
40
Possible Testbed Configurations
41
Traffic Generation
  • Purpose
  • Coverage of legitmate structural variation in
    monitored structures
  • SQL, command strings, call parameters
  • Stress of log complexity for practicality
  • Multiple users, multiple sessions
  • Performance measurements
  • Program performance metrics
  • Quantify performance impact

42
Traffic Generation to Web Sites
  • Approaches
  • Simple Record/Playback (basic)
  • with minor substitutions (cookies, ips)
  • shell scripts, netcat, MaxQ (jython based
  • Custom DOM/Ajax scripting (learning)
  • Can access dynamically generated browser content
    after(during) client side script eval
  • Automated site crawls of URLS
  • Automated form contents (site specific metadata)
  • COTS tools
  • Load testing and metrics

43
(No Transcript)
44
Red Team Suggestions
45
Suggested Red Team ROEs
  • Initial telecons held in Fall
  • Claim RAMSES will defeat most generalized
    injection attacks on protected applications
  • Red Team should target our current and planned
    applications rather than new ones (unless new
    application, sample attacks and complete traffic
    generator can be provided to RAMSES far enough in
    advance for learning and testing)
  • Remote network access to the targeted application
  • Attack designated application suite
  • Required instrumentation yet to be determined
  • Red Team exercise start 15 April or later

46
RAMSES Project Schedule
Baseline Tasks 1. Refine RAMSES
Requirements 2. Design RAMSES 3. Develop
Components 4. Integrate System 5. Analyze Test
RAMSES 6. Coordinate Rept Prototypes Optional
Tasks O.3 Cross-Area Exper
CY06
CY09
CY07
CY08
Q4
Q1
Q2
Q3
Q4
Q1
Q2
Q3
Q4
Q1
Q3
Q2
Q3
1
2
3
Red Team Exercise
Today 11 September 2007
47
Next Steps
48
Plans
  • Develop input filters from output policies
  • Extend memory error analyzer
  • Demonstrate RAMSES on more applications and
    attack types
  • Native C/C app (most likely app is hMail
    server)
  • Java
  • Integrate components
  • Performance and false positive testing
  • Red Team exercise

49
Questions?
50
Backup
51
Tokenizing and Parsing
  • Focus on rough parsing that reveals approximate
    structure, but not necessarily all the details
  • Accurate parsers are time-consuming to write
  • More important may not gracefully handle errors
    (common in HTML) or language extensions and
    variations (different shells, different flavors
    of SQL)
  • Implemented using Flex/Bison
  • Currently done for SQL and shell command
    languages
  • Parse into a sequence of statements, each
    statement consisting of a command name and
    parameters
  • Incorporates a notion of confidence to deal with
    complex language features, e.g., variable
    substitutions in shell
  • Modest effort for adding additional languages,
    but substantially simplifies subsequent learning
    tasks
  • Dont anticipate significant additions to this
    language list (other than HTML/XML)

52
Taint inference Vs Taint-tracking
  • Disadvantages of learning
  • False negatives if inputs transformed before use
  • Low likelihood for most web apps
  • False positives due to coincidence
  • Mitigated using statistical information
  • Plan to evaluate these experimentally
  • Benefits of learning
  • Low performance overhead
  • Some significant implicit flows handled without
    incurring high false positives
  • Can address attacks multi-step attacks where
    tainted data is first stored in a file/database
    before use
  • More generally, in dealing with information flow
    that crosses module boundaries

53
Attack Coverage 2004
(Stack-smashing, heap overflow, integer overflow,
data attacks)
Generalized Injection Attacks
CVE Vulnerabilities (Ver. 20040901)
54
RAMSES System Concept
Protected System
Web Server (IIS/Apache)
Web App (PHP/ ASP)
SQL Database (MySQL)
Network/App Firewall (e.g. mod_security)


OS DLLs
Application DLLs
Network DLLs
  • Key research problems
  • Learn taint propagation
  • Identify tainted components in output, generate
    filtering criteria
  • Learn input/output transformation
  • Use transformation to project output filters to
    input

55
Advantages of RAMSES Filters
  • Filters easily sharable
  • Complements Application Community focus on end
    user applications
  • Filters are human readable
  • Filter generation algorithms can be enhanced to
    address privacy concerns wrt sharing

56
Filter types
  • Filter Criteria
  • Correlative filters
  • Equality-based filter
  • Structure-based filter
  • Statistical filter
  • Causal filters
  • Filtering criteria derived from attack detection
    criteria (policy or anomaly)
  • Filter Location
  • Input filter
  • Easier to deploy but harder to synthesize
  • Output filter (precedes sensitive operation)
  • Easier to synthesize than input filter, but
    deployment needs deeper instrumentation
  • May be too late for some attacks (memory
    corruption)

Note All filters evaluated using large number of
benign samples and ?1 attack sample
Write a Comment
User Comments (0)
About PowerShow.com