VERNIER Virtualized Execution Realizing Network Infrastructures Enhancing Reliability - PowerPoint PPT Presentation

1 / 76
About This Presentation
Title:

VERNIER Virtualized Execution Realizing Network Infrastructures Enhancing Reliability

Description:

Build a pushdown system which is a model that represents an over approximation ... 'A central problem in system administration is the construction of a secure and ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 77
Provided by: steven228
Category:

less

Transcript and Presenter's Notes

Title: VERNIER Virtualized Execution Realizing Network Infrastructures Enhancing Reliability


1
VERNIERVirtualized Execution Realizing Network
Infrastructures Enhancing Reliability
  • VERNIER Project Team
  • DARPA Application Communities Kickoff Meeting
  • July 7, 2006

2
Outline
  • Background
  • Project Overview
  • Objectives
  • Project Scope
  • Research Challenges
  • Breakthrough Capabilities
  • Expected Results
  • Team Key Personnel and Roles
  • Technical Approach
  • Scenario Exemplars
  • Project Plan Schedule and Milestones
  • Experimentation and Evaluation
  • Technology Transition Plan

3
Background
  • Commercial-off-the-shelf (COTS) software
  • Large organizations, including DoD, have become
    dependent on it
  • Yet, most COTS software is not dependable enough
    for critical applications
  • Security breaches
  • Misconfiguration
  • Bugs
  • Large, homogeneous COTS deployments, such as
    those in DoD, accentuate the risk, since many
    users
  • Experience the same failures caused by the same
    vulnerabilities, configuration errors, and bugs
  • Suffer the same costly, adverse consequences
  • Alternatives, such as government-funded
    development of high-assurance systems present
    significant barriers in
  • Cost
  • Functionality
  • Performance

4
VERNIER Project Objectives
  • Develop new technologies to deliver the benefits
    of scaling techniques to large application
    communities
  • Provide enhanced survivability to the DoD
    computing infrastructure
  • Enhance the cost, functionality, and performance
    advantages of COTS computing environments
  • Investigate and develop new technologies aimed at
    enabling communities of systems running similar,
    widely available COTS software to perform more
    robustly in the face of attacks and software
    faults
  • Deliver a demonstrated, functioning,
    transition-ready system that implements these new
    AC survivability technologies
  • Technical approach Augmented virtual machine
    monitor
  • Commercial transition partner VMware, Inc.

5
Project Scope
  • Collaborative detection and diagnosis of failures
  • Collaborative response to failures
  • Advanced situational awareness capabilities
  • Collective understanding of community state
  • Predictive capability Early warning of potential
    future problems
  • Key goal turn the size and homogeneity of the
    user community into an advantage by converting
    scattered deployments of vulnerable COTS systems
    into cohesive, survivable application communities
    that detect, diagnose, and recover from their own
    failures
  • What COTS?
  • Microsoft Windows, IE, Office suite, and the like

6
Research Challenges
  • Extracting behavioral models from binary programs
  • Breakthrough novel techniques required
  • Quasi-static state analysis for black-box
    binaries
  • Scaled information sharing
  • Networked application communities sharing
    knowledge about the software they run
  • Intelligent, comprehensive recovery
  • Predictive situational awareness
  • Automatic, easy-to-understand gauges

7
Breakthrough Capabilities
8
Expected Results and Impact
  • COTS Product (VMware) with breakthrough
    capabilities for application communities
  • Scalability to 100K nodes running augmented
    VMware and custom Vernier software
  • Automatic collaborative failure diagnosis and
    recovery
  • Survivable robust system
  • Community-aware solution

9
VERNIER Team
  • SRI International, Menlo Park, CA
  • Patrick Lincoln, Principal Investigator
  • Steve Dawson, Project manager integration
  • Linda Briesemeister, Knowledge sharing
    collaborative response
  • Hassen Saidi, Learning-based diagnosis code
    analysis situation awareness
  • Stanford University
  • John Mitchell, Stanford PI code analysis
    host-based detection and response
  • Dan Boneh, Knowledge sharing protocols
  • Mendel Rosenblum, VMM infrastructure
    collaborative response transition liaison
  • Palo Alto Research Center (PARC)
  • Jim Thornton, PARC PI configuration monitoring
    and response situation awareness
  • Dirk Balfanz, Community response management
  • Glenn Durfee, Configuration monitoring and
    response situation awareness
  • Technology transition partner VMWare, Inc.

10
John Boyds OODA Loop
Observe
Orient
Decide
Act
ImplicitGuidance Control
ImplicitGuidance Control
UnfoldingCircumstances
CulturalTraditions
Observations
Decision(Hypothesis)
GeneticHeritage
Analyses Synthesis
Action(Test)
FeedForward
FeedForward
FeedForward
NewInformation
PreviousExperience
OutsideInformation
UnfoldingInteractionWithEnvironment
UnfoldingInteractionWithEnvironment
Feedback
Feedback
Note how orientation shapes observation, shapes
decision, shapes action, and in turn is shaped by
the feedback and other phenomena coming into our
sensing or observing window. Also note how the
entire loop (not just orientation) is an
ongoing many-sided implicit cross-referencing
process of projection, empathy, correlation, and
rejection. From The Essence of Winning and
Losing, John R. Boyd, January 1996.
Defense and the National Interest,
http//www.d-n-i.net, 2001
11
VERNIER Technical Approach
12
Notional Host System Architecture
13
An Abstraction-Based Diagnosis Capability for
VERNIER
  • Hassen Saidi, SRI

14
Objectives
  • Based on the general principle much of security
    amounts to making sure
  • that an application does what it is suppose to
    do.. and nothing else!
  • Build models of applications behaviors (what the
    application is suppose to do).
  • Monitor applications behavior and report
    malfunctions and unintended behaviors (deviations
    from behavior).
  • Use the recorded execution traces as raw data to
    a set of abstraction-based diagnosis engines (why
    did the deviation from good intended behavior
    occurredto the extent to which we can do a good
    job answering such question).
  • Share the state of alerts and diagnosis among the
    nodes of the community (sharing the bad news.but
    also the good ones!).
  • Aggregate the diagnosis outputs and the alerts
    into a situation awareness gauge.

15
Global situation awareness
Situation Awareness Gauge UI
Secure Knowledge Sharing Network
Collaborative diagnosis, collaborative response
Collaborative Response
Learning-Based Diagnosis
Local diagnosis, local response
Quasi-Static Code Analysis
Configuration Analysis
Network Traffic Analysis
INCREASED APPLICATION COMMUNITY SURVIVABILITY
Safe execution
Runtime data
Monitoring and Control App OS
Execution,Configuration, Network Traffic
Dynamic VMM
VERNIER OS Base
VM Kernel
16
Approach
  • We combine a set of well known and well
    established techniques
  • building increasingly accurate models of
    applications behaviors
  • Static analysis combined with predicate
    abstraction to build Dyke and CFG models used for
    static analysis-based intrusion detection
  • Implement mechanisms for monitoring sequences of
    states and actions of an application for the
    following purposes
  • Check if a known bad sequence is executed
    (signature-based!)
  • Check for previously unknown variations of known
    bad sequences (correlation!)
  • Find root-causes for unexpected malfunction and
    malicious exploits (Diagnosis)
  • Diagnosis is performed using techniques borrowed
    from
  • Delta-debugging (root-cause diagnosis)
  • Anomaly detection (correlation)
  • The situation awareness gauge is implemented as a
    platform independent web interface

17
Monitoring-Based Diagnosis
  • We combine these techniques into two phases
  • Monitoring Applications are monitored and
    sequences of executions along with configurations
    are stored.
  • Diagnosis Differences between good runs and bad
    runs are the first clues used for diagnosis
  • Traces of executions are sequences of
  • System calls
  • Method calls
  • Changes in configurations
  • The more information is stored, the better chance
    that malfunctions and malicious behaviors are
    properly diagnosed.

18
Quasi-static binary analysis and predicate
abstraction-based intrusion detection
  • Use static analysis for recovering the control
    flow graph the application.
  • CFG generated by compliers for source code.
  • Recover class hierarchy for object code of OO
    applications.
  • Build a pushdown system which is a model that
    represents an over approximation of the sequences
    of methods and system calls of the application.
  • Deal with context sensitivity to match exit calls
    to return locations.
  • Use predicate abstraction and data flow analysis
    to refine the pushdown system and obtain a more
    accurate model.
  • Improving the knowledge about arguments to
    monitored calls.

19
Better Models and Better Monitoring
  • We are not just interested in detection
    intrusions, but by
  • also generating high-level explanations of why an
  • application deviates from its intended behavior.
  • CFG and Dyke models are all over-approximations
    of the applications behavior (potential attacks
    are only discovered when the application behavior
    deviates from the model).
  • We will use the runs of the application to
    generate under-approximations of the applications
    behavior!
  • Alternatively, ever model representing an
    over-approximation has a dual that represents an
    under-approximation (over and under-approximations
    dont have to be the same type of models!).
  • We will combine over and under approximation to
    reduce the risk of missing possible attacks.
  • We will refine the over and under approximations
    to improve the application model.

20
Combining over and under approximations
Over approximation (constructed by static
analysis)
Under approximation (constructed from runs)
21
What if we dont have a model of the application?
  • We can monitor the application as a blackbox and
    intercept system calls
  • Learn a model of good behaviors
  • Learn a model of bad behaviors
  • Anomalies are difference between good and bad
    behaviors
  • Borrow from delta-debugging techniques to find
    root-causes of misbehaviors

22
Analyzing Differences between runs
  • There are many differences between execution
    traces
  • Could consider arbitrary lengths of different
    sub-sequences
  • Difference of length k should be considered where
    k is defined depending on the application, the
    size of the collected data, and the sensitivity
    of the analysis

23
Delta Differences k2
good run
bad run
a b b b b c c b b d
a b b b b c c b b d
Both sequences have the same set of 2-events
sequences. This means that, k needs to be
increased and that k2 is A too abstract way of
distinguishing the two sequences
24
Delta Differences k3
25
Diagnosis
  • One of the 6 sequences that are not common to the
    two runs is the source of the problem which
    one?!. We can rank the sequences in order of
    importance based on
  • Application specific criteria use distance to
    common sequences for every application-specific
    origin of a sequence (e.g, process identity, or
    user identity)
  • Application-independent criteria use distance to
    common sequences
  • Use distance to common sequences or known bad
    sequences by ignoring order of execution of calls
  • Increasing k provides a better explanation, but
    generates a large number of sequences.

26
More abstraction
  • There are more good runs than bad ones!. We need
    to compare the bad runs to the union of good
    runs union of good runs with a single sequence
    cancel out the one bad run that contains all
    those sequences!
  • Use average-sequence-weight ranking

27
Situation Awareness Gauge
28
Situation Awareness Gauge
  • Implemented as a platform independent web
    interface (e.g. ruby on rails)
  • Content is defined by the databases content
    attacks, failures, diagnosis, etc
  • Gauges a simple Displays of number of attacks and
    failures and various parameters
  • Provide a user with the possibilities of
    initiating responses and diagnosis activities in
    other nodes via the database

29
Configuration-based Detection, Diagnosis,
Recovery, and Situational Awareness
  • Jim Thornton, PARC

30
Importance of Configuration
  • Static configuration state highly correlated with
    system behavior
  • Many attacks/bugs/errors introduced by way of a
    substantive change to configuration
  • A central problem in system administration is
    the construction of a secure and scalable scheme
    for maintaining configuration integrity of a
    computer system over the short term, while
    allowing configuration to evolve gradually over
    the long term Mark Burgess, author of cfengine

31
AC Opportunity
  • Leverage scale of population to learn what are
    bad states in configuration space

Today Every configurationchange is an
uncontrolledexperiment
AC Future Configurationchanges managed as
controlledreversible trials
32
Live Monitoring of Configuration State
  • State analysis
  • Comparative diagnosis
  • Vulnerability assessment
  • Clustering similar nodes and contextualizing
    observations
  • Detect change events
  • Cluster low-level changes into transactions
  • Log events for problem detection, mitigation and
    user interaction
  • Share events in real-time for situational
    awareness
  • Active learning
  • Automated experiments to isolate root causes
  • Managed testing of official changes like patch
    installation

33
Live Control of Configuration State
  • Modification for Reversibility and
    Experimentation
  • Coarse-grained VM rollback
  • Medium-grained Installer/Uninstaller activation
  • Fine-grained Direct manipulation of low-level
    state elements
  • Prevention
  • In-progress detection of changes
  • Interruption of change sequence
  • Reversal of partial effects

34
Identifying Badness
  • Objective Deterministic Criteria
  • Rootkit detection from structural features
  • Published attack signatures
  • Objective Heuristic Criteria
  • Performance outside of normal parameters
  • Subjective End-User Report
  • Dialog with user to gather info, e.g. temporal
    data for failure appearance
  • Administrative Policy
  • Rules specified by administrators within community

35
Local Components
Community
3
App VM
VERNIER VM
Experimental VM
COTS
Console(UI)
Comm
Diag
App 1
App 2
App 1
App 2
Agent
Agent
VERNIER Monitor/Control
1
1
App OS
App OS
VERNIER OS Base
2
VMM (VM Kernel)
36
Key Interfaces
VERNIER-Agent (TCP/IP, XML?) Registry change
events Filesystem change events Install
events Manipulate registry Manipulate
filesystem Control System Restore
VERNIER-VMM (?) Suspend Resume Checkpoint Revert C
lone Reset Lock memory Process events Read
memory Read/write disk
1
2
3
  • VERNIER-Community
  • (?)
  • Cluster management
  • Experience reports
  • Unknown
  • Prevalent
  • Known Bad
  • Presumed Good
  • State exchange
  • Experiment request/response

37
Local Functions
NetworkTap
Communication Manager
Console
ResponseController
Analysis Diagnosis
Configuration Analysis
AgentInside
Event Stream
BehaviorAnalysis
TrafficAnalysis
Local DB Local condition detail Event
logs Labeled condition signatures State
snapshots Experimental data
VMM
Firewall
38
Adapting and Extending Host-based, Run-time Win32
Bot Detection for VERNIER
  • Liz Stinson, Stanford

39
Overview
  • Background on Stanfords botnet research
  • Plans for adapting and extending this work for
    application to VERNIER

40
Exploit botnet characteristic ongoing command
and control
  • Network-based approaches
  • Filtering (protocol, port, host, content-based)
  • Look for traffic patterns (e.g. DynDNS Dagon)
  • Hard (encrypt traffic, permute to look like
    normal traffic, ) botwriters control the
    arena.
  • Host-based approaches
  • Ours Have more info at host level.
  • Since the bot is controlled externally, use this
    meta-level behavioral signature as basis of
    detection

41
Our approach
  • Look at the syscalls made by a program
  • In particular at certain of their args our
    sinks
  • Possible sources for these sinks
  • local mouse, keyboard, file I/O,
  • remote network I/O
  • An instance of external control occurs when data
    from a remote source reaches a sink
  • Surprisingly works really well for all bots
    tested (ago, dsnx, evil, g-sys, sd, spy), every
    command that exhibited external control was
    detected

42
Big picture
43
Design
44
Two modes
  • Cause-and-effect semantics
  • Tight relationship between receipt of some data
    over network and subsequent use of some portion
    of that data in a sink
  • Correlative semantics looser relationship
  • Use of some data that is the same as some data
    received over the network
  • Why necessary?

45
Behaviors ideally disjoint_at_ lowest level in
call stack
46
Results
  • Looked at 6 bots agobot, dsnxbot, evilbot,
    g-sysbot, sdbot, spybot
  • At least 4 have totally indep code bases
  • g-sys non-trivially extends sd
  • Spybot borrows only syn flood implem from sd
  • Wide variation in implementation
  • Every cmd that exhibited external control
    detected almost every instance external control
    flagged (3 false negatives)

47
Results
48
Correlative semantics
  • Why necessary
  • Why bots with C library functions statically
    linked in unconstrained OOB copies
  • In general almost as good as cause-and-effect
    semantics (stat vs. dyn link)
  • Exceptions cmds that format recvd params (e.g.
    via sprintf)

49
Comparison
50
Comparison
51
Benign program testing
  • Tested against some benign programs that interact
    with the network
  • Firefox, mIRC, Unreal IRCd
  • 3 contextual false positives
  • IRCd sent on X heard on Y
  • Firefox dereferencing embedded links
  • Artificial false positives quite a few
  • mIRC DCC capabilities
  • Firefox saving contents to a file,

52
False positives
  • contextual false positives not present in bots
  • external control heuristic correctly detected but
    these actions under these circumstances widely
    accepted as non-malicious
  • artificial false positives not present in bots
  • def of external control implies no user input
    agreeing to particular behavior
  • but we dont track explicitly clean data (that
    received via kb, mouse)
  • spurious false positives
  • any other incorrect flagging of external control

53
Our mechanism review
  • Single behavioral meta-signature detects wide
    variety of behaviors on majority of Win32 bots
  • Resilient to differences in implementation
  • Resilient in face of unconstrained OOB copies
  • Resilient to encryption w/some constraints
  • Resilient to changes in command-and-control
    protocol (e.g. from IRC to HTTP) and parameters
    (e.g. for rendezvous point)

54
Plans for VERNIER
  • (1) Reimplement BotSwat
  • Using correlative semantics
  • With improved statistical analysis comparing
    contents of buffers received over the network to
    arguments of selected syscalls
  • Probably as an entirely kernel-space
    implementation
  • May leverage some Livewire support to confirm
    integrity of BotSwat and its components
  • May also leverage Livewire support to enable
    better resilience to bot use of private
    encryption functions
  • Using its watch memory range X (and let me know
    when it changes) functionality

55
Plans for VERNIER
  • (2) Confirm BotSwat works at detecting back-door
    programs
  • Obtain various samples of these programs
  • Determine whether additional syscalls might need
    to be hooked in order to provide better coverage
    of the functionality exported by these programs

56
Plans for VERNIER
  • (3) Feasibility of simple approach to detecting
    keyloggers
  • If it is the case that the API call to insert
    self into the call chain for receiving keyboard
    input (for an arbitrary window, not owned by the
    calling process) eventually traps to a system
    call, then this is a simple extension to BotSwat
    (a new syscall to hook)
  • Otherwise, we need to provide a user-space
    component to achieve this
  • Any process that signs itself up to receive
    keyboard input not destined for that process is
    suspect
  • Can extend this paradigm to trap calls to read
    another processs memory
  • Win32 API has ReadProcessMemory function call
    that enables one process to read another
    processs memory contents (under certain
    circumstances)

57
Plans for VERNIER
  • (4) Leverage Virtual Machine Introspection (VMI)
    IDS technology to
  • Confirm integrity of kernel component of BotSwat
  • Confirm integrity of keyboard/mouse drivers (to
    ensure that no process is able to obtain
    keyboard/mouse input via replacing the relevant
    kernel-mode device drivers)
  • Possibly also augment BotSwats resilience to
    target programs use of private encryption
    functions, and the like

58
Plans for VERNIER
  • (5) Botnet mitigation whistleblower
  • Once some bot B is detected on some host machine
    via BotSwat, obtain from B (programmatically) the
    CC parameters in order to prevent CC traffic
    for that botnet from entering or leaving the DoD
    network
  • Basically, push out firewall filter
  • Also push sample of bot executable to
    anti-malware scanner so that it can generate a
    signature for this malware executable

59
Plans for VERNIER
  • (6) Botnet RD
  • After detecting a bot and pushing out filters, we
    would like to be able to poke that bot
    (programmatically) in a controlled environment
  • Get it to generate variants of some exploit where
    those variants could be used as input to an
    automated vulnerability signature generator
  • Bot would then be operating effectively as a flow
    classifier
  • Especially for zero-day exploits (or others that
    do not already have a NIDS signature)
  • Requires learning the command used by the bot to
    generate such scan/spread packets as well as
    learning how to gain control of the bot
  • Note this is not attempting to solve the problem
    of automated vulnerability signature generation,
    but simply to get the bot to act as a flow
    classifier

60
Plans for VERNIER
  • (7) Setting the stage generating a version of
    the bot that will not trip anti-malware signature
    scanners
  • From Christodorescu/Jha (Testing Malware
    Detectors), we have techniques for performing
    source-code-level obfuscations, including
    variable renaming and encapsulating/encrypting
    portions of the source code
  • Christodorescu/Jha showed that the major
    anti-virus scanners performed very poorly in
    response to encapsulation using hex encoding

61
Knowledge Sharing in VERNIER
  • Patrick Lincoln, SRI

62
Knowledge Sharing
  • Need Communication is the core concept of a
    community
  • Application communities rely on ability to share
    knowledge Reliable, Efficient, Authentic, Secure
  • Approach two-tier peer-to-peer platform
  • Tuple space (ala Linda)
  • Considering JXTA, jxtaSpaces implementation of
    tuple spaces
  • Two-tier for better scalability
  • If needed, hypercube hashtable index (ala
    Obreiter and Graf)
  • Benefits Reliable, efficient (local) knowledge
    sharing
  • Competition Other possible methods for knowledge
    sharing include explicit messaging, centralized
    database, and statically indexed knowledge
    structures.
  • Other approaches lack scalability, are
    unreliable, and can bedifficult to secure

63
Knowledge Sharing Levels
  • Lower level (within a cluster)
  • Tuple space (ala Linda (Gelernter))
  • Simple queries
  • (, name, ) returns records regarding name
  • Concurrent access and update
  • Higher level (supernodes)
  • Nodes aggregate knowledge of an entire cluster
  • Use abstraction to summarize current situation
  • Application-level multicast to push out summaries
  • Supernode pushes all summary updates into local
    tuple space

64
Group Communication
  • Group communication is key
  • For higher level, certain usual assumptions
  • Reliable delivery
  • Ordered message delivery
  • Spread (www.spread.org) as a basis for
    implementation of group communication
  • Building on secure spread and progress software
    (progress.com)s more secure, reliable, scalable
    variants of spread

65
Group Communication Security and Privacy
Secrecy and Authenticity
  • Security and privacy are critical aspects of
    VERNIER
  • Must authenticate reports and ensure correctness
  • Confidentiality of reports
  • Protecting user privacy (my files, my keystrokes)
  • Protect aspects of applications
  • Protect configuration information
  • Protect vulnerability detection information
  • Community members send status reports to local
    supernode
  • Reports propagated throughout network

66
Group Communication Security
  • Defense against
  • network attacks sending forged messages to
    supernodes
  • PKI
  • Compromised community member sending false
    reports
  • statistical anomaly detection (eg EMERALD)
  • Virtualization
  • Any report generated within compromised virtual
    machine must be consistent with what is observed
    outside the virtualization layer

67
Group Communication Security
  • Secure audit logs
  • Secure log of all P2P status reports
  • Enable post-mortem analysis on detected attacks
  • Cryptographic protection of log (Boneh, Waters)
  • Sanitizing stats reports
  • Status reports reveal private information
  • Special encryption enabling read only by
    credentialed membersand search (as in search
    over encrpyted database) by community
  • Mitigating denial of service attacks on
    supernodes
  • Re-election of supernodes when under attack
  • Securing configuration update messages
  • PKI authenticating legitimate reports from
    community members

68
VERNIER Scenarios, Schedule, and Plans
  • Steve Dawson, SRI

69
Example Scenarios / Use Cases
  • Browser crash demonstrate both local crash
    recovery from a nonmalicious failure and
    proactive community avoidance of the same failure
  • Simple case repeatable Web browser crash occurs
    when visiting a particular URL
  • Local diagnosis launch one or more copies of the
    VM, rolled back to a known good state play back
    step-by-step, observe that visiting the URL
    always causes the crash
  • Local response quarantine the URL
  • Collaborative diagnosis problem reported to the
    community other installations attempt to
    replicate the problem, correlate observed
    behavior with relevant configuration details,
    discover that the problem occurs only for browser
    version X or earlier
  • Collaborative response recommend community-wide
    upgrade
  • More complex variations could involve situations
    in which the circumstances leading to the browser
    crash involve multiple steps or interactions with
    other software

70
Example Scenarios / Use Cases (2)
  • Phishing scenario show how VERNIER can mitigate
    threats even when the attack is unknown and
    requires (unwitting) human participation
  • Cleverly constructed e-mail induces some key
    individuals to run a malicious program that
    subsequently interferes with their ability to
    send and/or receive e-mail
  • Local diagnosis detect and correlate the
    installation actions of the unknown program
    separately, affected users report difficulty with
    e-mail VERNIER runs an experiment with a
    checkpointed VM to determine possible association
    with newly installed program
  • Local response malicious program automatically
    removed (possibly by reverting to checkpointed
    VM)
  • Collaborative diagnosis VERNIER instances share
    information about the installed program even
    before users report a problem community observes
    use of unknown software, raising level of
    suspicion
  • Collaborative response warning to community
    against activity leading to installation of
    malicious program

71
Example Scenarios / Use Cases (3)
  • Patching scenario demonstrate mitigation of
    nonmalicious threats such as new software bugs
  • Variation on the phishing scenario, where
    installation of a seemingly beneficial software
    patch has unintended side effects or introduces a
    new bug not observed previously

72
Schedule and Milestones
73
Experimentation and Evaluation
  • Project testbed
  • Cluster of 300 virtual hosts
  • 30 server-class physical hosts
  • 10 virtual nodes per server
  • Housing and cluster configuration yet to be
    determined
  • Single cluster in one location?
  • Three clusters, one at each participant site?
    Current plan
  • Software
  • Host OS Linux
  • Guest (community) OS Microsoft Windows
  • Applications IE browser (possibly others) MS
    Office
  • Simulations and scalability
  • Financially infeasible to scale to thousands of
    nodes
  • Plan is to use hybrid simulation to test
    scalability
  • Real (live) nodes provide actual data
  • Simulated nodes use synthesized data generated by
    perturbing data collected from real clusters
    supernodes

74
Success Criteria
  • Metrics and targets (team-defined)
  • False positives (FP) / False negatives (FN)
  • Phase 1 FP lt 10, FN lt 20
  • Phase 2 FP lt 1, FN lt 2 (order of magnitude
    improvement)
  • Percent loss of network availability
  • Phase 1 At most 20 per node, with at most 80
    over any 500ms interval
  • Phase 2 At most 5 per node, with at most 20
    over any 500ms interval
  • Average time to recovery
  • Phase 1 Assuming a fix exists (not a FN), at
    most 30 minutes to recover the entire community
  • Phase 2 At most 10 minutes
  • Average network and computational overhead
  • No more than 30 slowdown for applications
  • No more than 100 KB/s average VERNIER-induced
    network traffic per node
  • Percent accuracy of prediction
  • Phase 1 Effects of problems predicted within 15
    minutes of onset set of nodes wrongly predicted
    (either way) differs by no more than 40 of
    actual
  • Phase 2 Prediction within 5 minutes predicted
    set differs by no more than 20

75
Technology Transition
  • Ultimate goal of VERNIER is a COTS solution
  • Transition partner VMware, Inc.
  • Supporting VERNIER initially by providing VMware
    licenses for the testbed
  • May provide limited technical assistance in
    developing necessary VERNIER-to-VMM APIs (details
    currently under discussion)
  • Have agreed to define their own success criteria
    for the technology
  • Functionality, performance, cost, and other
    relevant goals that, if met, would lead VMware to
    pursue further development and integration of
    VERNIER technology into the VMware product line
  • Initial response suggests general agreement with
    the metrics weve already proposed (may want to
    tweak the numbers a bit), plus
  • Breadth of operating system support
  • Breadth of application support

76
Next Steps
  • VERNIER team workshop
  • Full day (at least)
  • Brainstorming and detailed planning
  • Target date week of July 17
  • Continue discussions with VMware on success
    criteria, etc.
Write a Comment
User Comments (0)
About PowerShow.com