Title: Design and Validation of an Intrusion-Tolerant Publish and Subscribe System
1Probabilistic Validation of Computer System
Survivability
William H. Sanders University of Illinois at
Urbana-Champaign whs_at_uiuc.edu www.iti.uiuc.edu
2Defending Against a Wide Variety of Attacks
Economic intelligence
Military spying
Nation-states, Terrorists, Multinationals
Information terrorism
Disciplined strategiccyber attack
Selling secrets
Civil disobedience
Serious hackers
Embarrassing organizations
Harassment
Stealing credit cards
Collecting trophies
Script kiddies
Copy-cat attacks
Curiosity
Thrill-seeking
3Intrusion Tolerance A New Paradigm for Security
4Validation of Computer System/Network
Survivability
- Security is no longer absolute
- Trustworthy computer systems/networks must
operated through attacks, providing proper
service in spite of possible partially successful
attacks - Intrusion tolerance claims to provide proper
operation under such conditions - Validation of security/survivability must be
done - During all phases of the design process, to make
design choices - During testing, deployment, operation, and
maintenance, to gain confidence that the amount
of intrusion tolerance provided is as advertised.
5Validating Computer System Security Research Goal
- CONTEXT Create robust software and hardware that
are fault-tolerant, attack resilient, and easily
adaptable to changes in functionality and
performance over time. - GOAL Create an underlying scientific foundation,
methodologies, and tools that will - Enable clear and concise specifications,
- Quantify the effectiveness of novel solutions,
- Test and evaluate systems in an objective manner,
and - Predict system assurance with confidence.
6Existing Security/Survivability Validation
Approaches
- Most traditional approaches to security
validation have focus on avoiding intrusions
(non-circumventability), or have not been
quantitative, instead focusing on and specifying
procedures that should be followed during the
design of a system (e.g., the Security Evaluation
Criteria DOD85, ISO99). - When quantitative methods have been used, they
have typically either been based on formal
methods (e.g., Lan81), aiming to prove that
certain security properties hold given a
specified set of assumptions, or been quite
informal, using a team of experts (often called a
red team, e.g. Low01) to try to compromise a
system. - Both of these approaches have been valuable in
identifying system vulnerabilities, but
probabilistic techniques are also needed.
7Example Probabilistic Validation Study
- Evaluation of DPASA-DV Project design
- Designing Protection and Adaptation into a
Survivability Architecture Demonstration and
Validation - Design of a Joint Battlespace Infosphere
- Publish, Subscribe and Query features (PSQ)
- Ability to fulfill its mission in the presence of
attacks, failures, or accidents - Uses Multiple, synergistic validation techniques
8JBI Design Overview
9Survivability/Security Validation Goal
- Provide convincing evidence that the design, when
implemented, will provide satisfactory mission
support under real use scenarios and in the face
of cyber-attacks. - More specifically, determine whether the design,
when implemented will meet the project goals - This assurance case is supported by
- Rigorous logical arguments
- Experimental evaluation
- A detailed executable model of the design
10Goal Design a Publish and Subscribe Mechanism
that
- Provides 100 of critical functionality when
under sustained attack by a Class-A red team
with 3 months of planning. - Detects 95 of large scale attacks within 10
mins. of attack initiation and 99 of attacks
within 4 hours with less than 1 false alarm
rate. - Displays meaningful attack state alarms. Prevent
95 of attacks from achieving attacker objectives
for 12 hours. - Reduces low-level alerts by a factor of 1000 and
display meaningful attack state alarms. - Shows survivability versus cost/performance
trade-offs.
11Integrated Survivability Validation Procedure
R
Requirement Decomposition
S
Q
P
Functional Model of the System (Probabilistic
or Logical)
Functional Model of the Relevant Subset of the
System
Model for Access Proxy
Model for Client
Model for PSQ Server
Assumptions
AP1
AP2
AA1
AA2
AA3
Supporting Logical Arguments and Experimentation
M6
M4
M5
M1 (Network Domains)
M2
M3
L1 (ADF)
L2
L3
12Integrated Survivability Validation Procedure
Steps
R
- A precise statement of the requirements
S
Q
P
- High-level functional model description
- Data and alerts flows for the processes related
to the requirements, - Assumed attacks and attack effects
Threat/vulner-ability analysis whiteboarding
Functional Model of the Relevant Subset of the
System
Model for Access Proxy
Model for Client
Model for PSQ Server
AP1
AP2
AA1
AA2
AA3
M6
M4
M5
M1 (Network Domains)
M2
M3
L1 (ADF)
L2
L3
13Integrated Survivability Validation Procedure
Steps
R
S
Q
P
- Detailed descriptions of model component
behaviors representing 2a and 2b, along with
statements of underlying assumptions made for
each component. Probabilistic modeling or
logical argumentation, depending on requirement
Functional Model of the Relevant Subset of the
System
Model for Access Proxy
Model for Client
Model for PSQ Server
AP1
AP2
AA1
AA2
AA3
M6
M4
M5
M1 (Network Domains)
M2
M3
L1 (ADF)
L2
L3
14Integrated Survivability Validation Procedure
Steps
R
S
Q
P
- Construct executable functional model
Probabilistic modeling, if model constructed in
3 is probabilistic
Functional Model of the Relevant Subset of the
System
Model for Access Proxy
Model for Client
Model for PSQ Server
In Parallel
- a) Verification of the modeling assumptions of
Step 3 Logical argumentation and, - b) where possible, justification of model
parameter values chosen in Step 4.
Experimentation
AP1
AP2
AA1
AA2
AA3
M6
M4
M5
M1 (Network Domains)
M2
M3
L1 (ADF)
L2
L3
15Integrated Survivability Validation Procedure
Steps
R
S
Q
P
- Run the executable model for the measures that
correspond to the requirements of Step 1.
Probabilistic modeling
Functional Model of the Relevant Subset of the
System
Model for Access Proxy
Model for Client
Model for PSQ Server
AP1
AP2
AA1
AA2
AA3
M6
M4
M5
M1 (Network Domains)
M2
M3
L1 (ADF)
L2
L3
16Integrated Survivability Validation Procedure
Steps
R
?
- Comparison of results obtained in Step 6, noting
in particular the configurations and parameter
values for which the requirements of Step 1 are
satisfied.
S
Q
P
Functional Model of the Relevant Subset of the
System
Model for Access Proxy
Model for Client
Model for PSQ Server
AP1
AP2
AA1
AA2
AA3
Note that if the requirement being addressed is
not quantitative, steps 4 and 6 are skipped.
M6
M4
M5
M1 (Network Domains)
M2
M3
L1 (ADF)
L2
L3
17Step 1 Requirement Specification
- Expressed in an argument graph
JBI critical mission objectives
JBI critical functionality
JBI mission Detection / Correlation Requirements
Initialized JBI provides essential services
JBI properly initialized
IDS objectives
Authorized publish processed successfully
Authorized subscribe processed successfully
Authorized query processed successfully
Authorized join/leave processed successfully
Unauthorized activity properly rejected
Confidential info not exposed
18Argument Graph for the Design
19Step 2 System and Attack Assumption Definition
Example High level description Steps 4-5 Access
proxy verifies if the client is in valid session
by sending the session key accompanying the IO to
the Downstream Controller for verification Step
6 Access Proxy forwards the IO to the PSQ Server
in its quadrant. ....
20Attack Model Description
- Definitions
- Intrusion, prevented intrusion, tolerated
intrusion - New vulnerabilities
- Assumptions
- Outside attackers only
- Attacker(s) with unlimited resources
- Consider successful (and harmful) attacks only
- No patches applied for vulnerabilities found
during the mission/scenario execution
21Attack Model Description
- Attack propagation
- MTTD mean time to discovery of a vulnerability
- MTTE mean time to exploitation of a
vulnerability - 3 types of vulnerabilities
- Infrastructure-Level Vulnerabilities ? attacks in
depth - OS vulnerability
- Non-JBI-specific application-level vulnerability
- pcommon common-mode failure
- Data-Level Vulnerabilities ? attacks in breadth
- Using the application data of JBI software
- Across process domains
- flaw in protection domains
22Attack Model Description
- Attack effects
- Compromise
- Launching pad for further attacks
- Malicious behavior
- Crash
- Attack propagation stopped
- (DoS)
- Distinction between OSes with and without
protection domains
23Attack Model Description
- Intrusion Detection
- pdetect0 if the sensors are compromised
- pdetect gt 0 otherwise.
- Attack Responses
- Restart Processes
- Secure Reboot
- Permanent Isolation
24Infrastructure Attacks Example
Quadrant 1
Access Proxy, Quad 1, OS 1
DC, Quad 1, OS 1
Policy Server, Quad 1, OS 1
T85 min. discovery of a vulnerability
on the Main PD, OS1
AP IO
DC
ADF NIC
Outside
PS
ADF NIC
AP Hb
Se
all quad components
ADF NIC
AP Alert
Ac
Se
LC
Ac
PSQ Server, Quad 1, OS 1
Guardian, Quad 1, OS 1
Outside
LC
PSQ
Gu
ADF NIC
ADF NIC
Se
Se
SM, Quad 1, OS 1
Publishing Client, OS1
Ac
Ac
ADF NIC
SD
LC
LC
SM
ADF NIC
Se
Ac
Correlator, Quad 1, OS 1
ADF NIC
LC
Co
Crumple Zone
Operations Zone
Executive Zone
Access Proxy, Quad 2, OS 2
Access Proxy, Quad 3, OS 3
PSQ Server, Quad 2, OS 2
Access Proxy, Quad 4, OS 4
PSQ Server, Quad 3, OS 3
AP IO
PSQ Server, Quad 4, OS 4
SM, Quad 1, OS 4
PSQ
AP Hb
ADF NIC
ADF NIC
Se
ADF NIC
SM
AP Alert
Ac
Se
Outside
LC
Ac
LC
25Step 3 Detailed descriptions of model component
behaviors and Assumptions (Access Proxy)
4.4 Access Proxy 4.4.1 Model Description AM1
If a process domain in the DJM proxy is not
corrupted, it forwards the traffic it is
designated to handle from the Quadrant isolation
switch to core quadrant elements and vice versa.
All traffic being forwarded is well-formed (if
the proxy is correct). The following kinds of
traffic are handled 1. IOs (together with
tokens) sent from publishing clients to the core
(we do not distinguish between IOs sent via
different protocols such as RMI or
SOAP/HTTP). . AM2 Attacks on access proxy
attacks on an access proxy are enabled if
either/both 1. Quadrant isolation
switch is ON, and one or more clients are
corrupted, leading to a)
Direct attacks can cause the corruption of the
process domain corresponding to the domain of the
attacking process on the
compromised client. . AM3 If an attack occurs
on the access proxy, it can have the following
effects 1. Direct attacks leading to
process corruption a) Enable
corruption of other process domains on the
host. .. 4.4.2 Facts and Simplifications AF1
Each access proxy runs on a dedicated host
machine. AF2 DoS attacks result in increased
delays. .
Model of Access Proxy
4.4.3 Assumptions AA1 Only well-formed traffic
is forwarded by a correct access proxy. AA2 The
access proxy cannot access cryptographic keys
used to sign messages that pass through it. AA3
Access proxy cannot access the contents on an IO
if application-level end-to-end encryption is
being used. AA4 Attacks on an access proxy can
only be launched from compromised clients, or
from corrupted core elements that interact with
the access proxy during the normal course of a
mission. .
Assumptions
26Step 4 Construct Executable Functional Model
27Step 5 Supporting Logical Arguments
28Logical Argument Sample
PSQ Server Model
Access Proxy Model
Functional Model
AA2 AP Application-layer Integrity
AA3 AP Application-layer Confidentiality
SA3 IO Integrity in PSQ Server
SA4 Client Confidentiality in PSQ Server
Model Assumptions
Private Key Confidentiality
Supporting Arguments
No Cryptography in Access Proxy
No Unauthorized Direct Access
No Unauthorized Indirect Access
Not Preconfigured
Not Reconfigurable
ADF NIC services protected
Keys Not Guessable
Keys Protected from Theft
Physical Protection of CAC device
Protection of CAC Authentication Data
No Compromise of Authorized Process Accessing CAC
DoD Common Access Card (CAC)
Algorithmic Framework
Key Length
Key Lifetime
PKCS 11 Compliance
Tamperproof
29Steps 6 and 7 Measures and Results
- Assumptions CPUB is the conjunction of
- C1PUB the publishing client is successfully
registered with the core - C2PUB the publishing client's mission
application interacts with the client as intended - Definition of a successful publish EPUB is the
conjunction of - E1PUB the data flow for the IO is correct
- E2PUB the time required for the publish
operation is less than tmax - E3PUB the content of the IO received by the
subscriber has the same essential content as that
assembled by the publisher - Measure PEPUBCPUB
- Fraction of successful publishes in a 12 hour
period - Between clients that cannot be compromised
- Objective
- PEPUBCPUB pPUB for a 12-hour mission
30Vulnerability Discovery Rate Study
Fraction of successful publishes versus MTTD
Number of successful intrusions versus MTTD
31Varying the number of OS and OS w/ process domains
32Autonomic Distributed Firewall (ADF) NIC policies
Fraction of successful publishes
Total number of intrusions
- Per-pd policies considerably increase the
performance (10 unavailability vs. 1.5 at
MTTD100 minutes) - ADF NICs can handle per-port policies gt should
take advantage of this feature, implying to set
the communication ports in advance
33Experimental Validation Overview
- System Validation Results
- Methodology Review
- Example Attack Graphs
- Example Attack Step Priority Determination
- Detailed Analysis of Example Attack Steps
- ADF susceptibility to DoS attack vulnerability
- RMI Registration vulnerability
- RMI Method Fuzzer
- Remaining work
34System-Level Validation Methodology (SVM)
- Objectives
- Improve the systems survivability
- Conduct specific system-level validation tasks
- Address all of the system-level concepts and
mechanisms that may contribute to improvement,
e.g., protocols and application scenarios - Main Idea
- Think like an attacker
- Examine whether a given attacker goal can be
achieved - If so, alter the implementation so as to preclude
such achievement - Procedure
- Top-down, beginning with a specific high-level
attacker goal - Critical steps of the high-level attack tree are
elaborated further as sub-trees, down to a level
that admits adversarial testing.
35SVM Step-by-Step
- Step 1 State an attacker goal G in precise
terms, along with accompanying assumptions
regarding the system and its environment. - Step 2 Understand the system components and
interactions that support the event the attacker
wishes to preclude. - Step 3 Construct an attack tree T(G) based on
Step 1 and Step 2, where a leaf (attack step) of
this tree is specific enough to be pursued by a
team of 1-3 persons. - Step 4 Determine all the minimal attacks
associated with T(G) and, using appropriate
quantitative and qualitative criteria, prioritize
the attack steps to be considered by the
individual teams. - Step 5 For a given attack step s, its assigned
team determines whether further expansion is
required. If so, a sub-tree (having s as its
root) is constructed such that its leaves
(low-level steps) are specific enough to admit to
adversarial testing. - Step 6 Determine all the minimal low-level
attacks associated with T(s) and prioritize these
attacks using appropriate quantitative and
qualitative criteria. - Step 7 Guided by priorities determined in Step
6, conduct adversarial testing with respect to
selected low-level attacks to see if goal s can
indeed be realized. - Step 8 Report Results.
36SVM Attacker Goals
- We are currently considering the following
attacker goals - G1 Prevent client publish
- G2 Prevent IO delivery to client (Subscription)
- G3 Prevent a successful query operation
- G4 Prevent a successful client registration
- G5 Defeat confidentiality of IO data
- G6 Modify IO data
- G7 Modify data in repository
-
37G5 Defeat Confidentiality of IO Data
- Definitions for G5
- IO data is data that has been or will be carried
as the protected payload of an IO (Information
Object). - Confidentiality of such data is defeated if it
can be viewed in plaintext (not encrypted) by an
adversary. - Assumptions for G5
- The clients are assumed to not be compromised in
their initial state. An adversary may compromise
any host as part of an attack. - IOs are encrypted by the publishing client,
decrypted by the PSQ server for guardian checks,
and encrypted again for delivery to the
subscribed client. - IOs archived in the PSQ repository are stored in
plain-text. - With the exception of the man in the middle
attack (MITM), the core and client have
successfully exchanged contact information,
configuration information, and have successfully
authenticated each other. - The core has successfully set up the publishing
clients NICs.
38G5 High-level Attack Tree
39G5 Attack Steps/Minimal Attacks
40Elaboration of Attack Steps
- An attack step s of a high-level tree often
requires further development as a sub-tree (with
root s). - If so, it is assigned to a Focus Group that
- Develops an attack sub-tree wherein each leaf
(low-level attack step) is refined enough to
either - admit to adversarial testing, or
- based on other evidence, decide whether it is
realizable by an attacker. - Conducts adversarial tests for type a) steps
- Provides arguments for type b) steps
41Attack Sub-tree for Bypass AP Step
42Bypass AP Attack Steps/Attacks
43Focus Group Status Reports
- Progress of work by each focus group is reported
via a Focus Group Status Report (FGSR) that is
typically updated at least once per week. - For each low-level attack step (leaf of an FG
sub-tree), the FGSR reports the following
information, as illustrated by several entries of
the Bypass AP report.
44Bypass AP FGSR (Sample Entries)
Focus Group Status Report Focus Group name
BypassAP
Last
updated on 11/15/04
LL Step Description SVTask, AS Refs Progress Status Results TR/Bugzilla References Comments Resp
Completely circumvent the AP by exploiting a misconfiguation in the network Path Pass traffic at physical or packet level (outside VPG layer) (1, 7, 2) (2, 32, 2) (3, 32, 2) (4, 10, 2) (5, 16, 2) (6, 32, 2) (7, 16, 2) Complete. Initial results showed that dual-home hosts would forward packets to the core interface if not firewalled, but no mis-wirings were found. However, as lab topologies change and/or the lab is moved the test will need to be re-run. Implemented as a ACME test which attempts to ICMP and TCP ping host in the core network address range from the attacker machine. This test may also be conducted manually using nmap and ping. MI
Exploit Java-based proxying code on AP Path Pass traffic from client at VPG layer but below application layer by compromising and reconfiguring AP (1, 7, 18) (2, 32, 18) (3, 32, 18) (4, 10, 18) (5, 16, 18) (6, ,32, 18) (7, 16, 18) Completed initial examination of the code. No vulnerabilities found Continue with additional code examination as time permits JL
Exploit NIDS vulnerability by modifying configuration via host compromise Path Completely circumvent AP by getting NIDS to forward traffic from client-side non-ADF interface (1, 7, 23) (2, 32, 23) (3, 32, 23) (4, 10, 23) (5, 16, 23) (6, 32, 23) (7, 16, 23) No path to host compromise identified yet. Requires host compromise and root access (at least). SELinux policies make make this attack path more difficult. SD
45 Summary of Attack Steps/Minimal Attacks
- For the seven high-level attack trees that have
been developed, there are - 524 attack steps (including repeats)
- 114 different attack steps
- 35 attack steps under consideration
- 20 attack steps yet to be addressed
- 55 attack steps are trivial, but depend on others
to be accomplished - 4 attack steps that are being ignored for
different reasons - The number of different minimal attacks for each
high-level goal (these are derived automatically
from a goals attack tree) are as follows. - G1 54
- G2 43
- G3 36
- G4 52
- G5 8
- G6 12
- G7 11
- Total number of minimal attacks with respect to
all goals 216
46Attack Steps Currently Being Addressed
47Attack Steps That Will Be Addressed
48Attack Steps that are Trivial, but Depend on
Others to accomplish
49Attack Steps That Will Be Ignored
50Example Attack Step Analysis ADF DOS Attack
- Three Metrics were used to benchmark the ADF.
- Max. Throughput The fastest receive rate at
which there is no packet loss - Available Bandwidth The amount of data that can
be transmitted in a fixed amount of time (when no
flood in progress) - Minimum Flood Rate The lowest rate of flood
which leads to a successful denial of service
attack. - Floods cause packet loss, which in turn lowers
bandwidth due to TCP congestion control. UDP
will suffer high packet loss. - Experimental Setup
- Follows rfc2544 as much as possible
- Max flood rate is 44000 frames/sec 22
Mbits/sec (for 64 Byte frames)
51Max Throughput Experiment Setup
- TCP packets are sent through the loop at a rate
of 100 packets/sec - Each packet has a unique serial number
- Dropped packets are detected by out-of-sequence
packets - Flood traffic is generated at increasing rates
until packet loss occurs in the loop traffic - Flood is composed of Ethernet frames which carry
a TCP-SYN payload destined for a non-listening
port on the target machine - Test performed for 64 byte and 1280 byte frames
52Max Throughput Graph
Frame Size
53Available Bandwidth Experiment Setup
- Iperf is used to measure available bandwidth
between two hosts on the network - Iperf uses maximum frame size to measure bandwidth
54Available Bandwidth vs. Policy Rules (no flood)
DPASA Rule Set Size
55Minimum Flood Rate Experiment Setup
- Flood traffic is generated at increasing rates
until iperf shows near 0 bits-per-second
bandwidth - Flood traffic is 64 byte frames of TCP-SYN
packets destined for a non-listening port - Policy rule-set allows attacker traffic through
the NIC
56 Minimum Flood Rate for Successful DoS
DPASA Rule Set Size
Max 10Mpbs Network
57Summary Of Results
- On a 100MB network,
- ADF NIC is highly vulnerable to DoS flood
attacks. - Attempting to block the attacker does not
mitigate the effect. - VPGs introduce additional performance penalty.
- (EFW also highly vulnerable, though showed
slightly improved results) - On a 10MB network,
- Flooding success depends on other factors, such
as policy size (number of rules to be traversed)
and VPG use. - ADF NIC can handle flooding from 10MB network if
number of rules processed for incoming packets is
small ( lt 16 ). - On average, current DPASA polices discard packet
in 3 rules the maximum is 6. - Feedback to the design team
- Upstream mitigation must be used to prevent the
flood (for example, throttle endpoints to 10MB) - Design policies to discard attacker packets as
soon as possible.
58Design and Implementation Oriented Validation of
Survivable Systems
A. Agbaria, T. Courtney, M. Ihde, W. H. Sanders,
M. Seri, and S. Singh
Design Phase Validation
Implementation Phase Validation
- A study of the design reveals that integrity and
confidentiality can be regarded as probability-1
events. - We obtain the following logical decomposition
- PUB1 PE1 ? E2 E3 ? E4 ? C p
- PUB2 PE3 C 1
- PUB3 PE4 C 1
- It can be shown that
- (PUB1 ? PUB2 ? PUB3) ? PUB
- Let PUB be the requirement of successfully
process a publish request. - Let C be the preconditions.
- Let E be the desired event, i.e., the successful
of a request to publish. - E is a conjunction of
- E1 the data flow of the publish is
- correct
- E2 timeliness
- E3 integrity
- E4 confidentiality
- The requirement PUB PEC p
Attack Tree
Step 1 Formulate a precise statement of R.
Requirement
Sub-requirements
Step 2 If R is logically decomposable, decompose
it iteratively.
Decomposable?
Yes
Logical Decomposition
Step 3 For every atomic requirement Ra
Quantitative?
Logical Argumentation
No
Access Proxy Model
Functional Model
Automatic construction
Attack Graph
AA2 AP Application-layer Integrity
AA3 AP Application-layer Confidentiality
Yes
Model Assumptions
Data Flow
Step 4 Detailed description of components
Build high-level description of System and its
operational environment
Private Key Confidentiality
Supporting Arguments
No Cryptography in AP
No Unauthorized Indirect Access
No Unauthorized Direct Access
Not Preconfigured
Not Reconfigurable
ADF NIC services protected
Keys Protected from Theft
Keys Not Guessable
Not valid
Verify assumptions parameter values
Physical Protection of CAC device
Protection of CAC Authentication Data
DoD (CAC)
Alg. Framework
Key Length
Key Lifetime
No Compromise of Authorized Process Accessing CAC
Step 5 Justify the modeling assumptions of Step 4
PKCS 11 Compliance
Tamperproof
Step 6 Construct a simulation model
Infrastructure-level attacks
Probabilistic measures
Survivable Publish Subscribe System
Probabilistic model of the system and its
operational environment
Management Staff
Quad 1
Quad 2
Quad 3
Quad 4
Executive Zone
Core
Compare with requirement
Step 7 Evaluation and comparing
Operations Zone
System valid w.r.t. the requirement
Crumple Zone
System not valid
Network
Client Zone
Access Proxy
(Isolated Process Domains in SE-Linux)
Domain6
First Restart Domains
Eventually Restart Host
Local Controller
Domain1
Domain2
Domain3
Domain4
Domain5
Proxy Logic
Inspect / Forward / Rate Limit
Forward/Rate limit
PS
Sensor Rpts
PSQImpl
PSQImpl
DC
RMI
IIOP
IIOP
Eascii
TCP
TCP
TCP
UDP
TCP
www.iti.uiuc.edu