Title: The DPASA Survivable JBI A Highwater Mark in Intrusion Tolerant Systems
1The DPASA Survivable JBI- A High-water Mark in
Intrusion Tolerant Systems
- Partha Pal
- On Behalf of the Entire DPASA Team
- BBN Technologies, Adventium Labs, SRI, U Illinois
and U Maryland
The DPASA project was sponsored by DARPA under
an AFRL Contract during 2002-2005 The BBN-led
DPASA Team designed the survivable architecture
for, used it to defense-enable an DoD relevant
information system, and subjected it to multiple
Red-Team evaluations.
2Outline
- Intrusion Tolerance
- The DPASA Approach
- Survivability Architecture
- Design Principles
- Baseline (undefended) and the Survivable System
- Evaluation Results
- Conclusion and Future Direction
3Intrusion Tolerance
4Generations of Security Research
No system is perfectly secure only adequately
secured with respect to the perceived threat.
53rd (moving towards 4th) Generation
- 3rd Generation Tolerance and Survivability
- Assumes that attacks/bad things cannot be
totally prevented some attacks will even
succeed, and may not even be detected on time.. - Focuses on desired qualities or attributes that
need to be preserved/ retained/continued even if
in a degraded manner - availability (of information and service)
- integrity (of information and service)
- confidentiality (of information)
- Next Generation of Survivability
- Regain, recoup, regroup and even improve
6The DPASA Approach
7Drivers and Contributing Factors
Unavailability is detectable
- COTS Bugs and unknown vulnerabilities
Cant reach server Wait or give up..
- Open/interoperable Discovery and use of new
exploits
Corruption
- DistributedMore places to attack
Wrong answer Can get pretty bad..
- Interconnected Attack initiation and propagation
Exfiltration
Stolen data
- Interdependent Cascade effect
Attacker may try to introduce corruption or
steal rather than disrupt!
8Contention Between Defense and the Adversary
- Continued operation
- Preserve C, I and A
- Degrade
Applications
Attacker
memory
CPU
memory
CPU
host
host
- Attacker and Application compete for the same
resources - corrupt
- consume
- Application security
- Adaptive response
The game is inherently biased against the
defense adversary needs to find only one way to
win, whereas the defense needs to cover as many
possibilities as it can. Therefore, in the short
term, successfully denying or delaying the
adversary is a win for the defense..
9Defense Mechanisms
Applications
Attacker
memory
CPU
memory
CPU
host
host
- Defense mechanisms mechanisms that do not
contribute to principal functionality of the
system, but included in the system to
preserve/bolster C, I A - Tools, protocols, subsystems
- Network, Host (OS), Application layer mechanisms
10Survivability Architecture
- Survivability architecture
- survivability goals undefended system design
principles - organization of components, both functional
components from the undefended system and the
added defense mechanisms, their interconnections,
and protocols that govern them.. - Entities, interconnections, protocols..
11Designing for Survivability
- Key motto combine protection-detection-adaptive
response - High barrier to entry from outside as well going
from one part to another - Improve the chance to spot attacker activity
- Adapt to changes caused by the attacker
Key Assets
12Dynamic Defense in Depth
- Multiple layers of defense
- Unlikely that all layers have the same hole
- Dynamically changing the defenses
- Analogous to changing your passwords
- Reduces the likelihood of success to dictionary
attacks - Unpredictable to the attacker
- Disclose as little as possible to the attacker,
confuse, obfuscate his view
Choice and organization of defenses ?
requirements design principles
13Design Principles
- SPOF protection
- Controlled use of diversity
- Physical barriers before key assets
- Robust basis of defense in depth
- Containment layers
- Modularity
- Range of adaptive responses
- Human override
- Minimalism
- Configuration generation from specs
Many of these are surprisingly simplistic and
intuitive--- but it is also surprising how many
of these are routinely ignored in current system
design
14SPOF Protection
- It may be impossible to protect all single
points of failures in a system - Depending on the level of abstraction/granularity
there may be way too many - Do not go overboard in choosing the unit
- A host, a process, an instance representing a
physical object - Not the DMA controller, bus, or the CPU in a
host.. - Units that perform key or essential functions and
are exposed to outside must not be left as SPOF - The web server that runs your electronic store
front, or facilitates collaboration - The database or application server that your
sales force or analysts constantly need - Do not ignore how you access network !!!
- Typically mitigated by redundancy
- Spatial redundancy may not always be possible
- Redundancy in time domain (restart)
- Managing redundancy
- Transparent (middleware)
- Applications are aware of the redundancy
15Diversity and Physical Barriers
SPOF?
Introduce physical barriers using DMZ
4 replicas are still accessible
run same attack 4 times?
- Notion of zones Crumple zone, Operations zone,
Executive zone
- Enablers
- Application level proxies
- Additional features
- Rate limiting
- Size limiting
- Learning usage pattern
- Tunnel termination
- Insertion of protocol diversity
Management decision-making functions
controlled communication
Main operational functionality
Access points
16Controlled Use of Diversity
- Source of artificial diversity
- Hardware architecture
- OS
- Programming language
- Application
- COTS
- n-version programming?
- Automated diversity generation?
LOGOS are registered trademarks of respective
owners
quad1
quad2
quad 4
- Diversity is expensive
- Initial investment, continued maintenance
management
b
d
a
c
- Controlled use of diversity
- In a given situation more diversity is not
necessarily better - Given the organization on left, using 4 different
OS is not better than using 3 - There are situations where a small additional
investment provides a big pay off identify and
take advantage of these!
a
b
c
d
SE LINUX
WINDOWS
SOLARIS
b
b
b
17Robust Basis for Defense in Depth
- It is likely that a majority of the defense
mechanisms are software - Depends on hardware, OS and network services
- May depend on other software mechanisms as well!
- How to avoid house of cards in building defense
in depth?
- Forming a robust basis useful things to consider
while trying to satisfy a need - Hardware based mechanisms
- Cryptographic strengths
- Assumptions about operating environment
- Redundancy Hardware based vs. software based
- Interconnecting hosts in a network or
inter-network use of managed switches is better
than programming it in
- Storing and using private keys smart
cards/separate co-processors is better than using
the main disk/memory/CPU
- Fine grain packet filtering and encryption NIC
based solution is better than software tools
(IPTables etc)
18Containment Layers
- Containment layers architectural construct that
helps limit the spread of attacks/attack effects - Two main dimensions to consider
- Spatial and Functional
Containment in spatial dimension
Adding the functional dimension
19Modularity
- Survivable system must adapt to changes caused by
attacks - Is Containment Redundancy enough to support
adaptive response?
X
Will the system still work if you kill the
affected application? What if we have to go up
in the spatial containment hierarchy shutdown
the host, quarantine the host or the network
containing the host?
- Modularity is the design property that
facilitates such responses - Enablers
- Actuator mechanisms to effect the response
- Post-action coordination (implemented in code)
healing/recovery, masking/degradation
20Range of Adaptive Response
- Survivable system must adapt to changes caused by
attacks - It is important to have a range of adaptive
responses - Some symptoms are more critical than others,
e.g., port scan vs. all heartbeats went down - In some cases response delayed is response
failed, e.g., observed an attack signature - Some responses are more severe than others, e.g.,
restoring a file vs. isolating a network
- Rapid response Local scope, fully automated,
local decision making based on local observation. - Spurious file process delete kill
- Lost file recover
- Coordinated response System wide scope, mostly
automated, coordinated decision making (multiple
rounds of message exchanges) based on
corroborated information from multiple parts of
the system - Restart a function, reboot a host, isolate a
network - Human assisted response
- Clean a host and restart
- Examine the log (forensics) to identify a
signature and patch
- Enablers
- Advanced middleware, Sensors and correlators,
Logical decision tools/expert systems
21Baseline and Survivable System
22Baseline (Undefended System)
Information Object Repository
Information
Metadata Repository
Metadata
JBOSS APP Server
JBOSS APP Server
Security Repository
Security data
Repository
Repository
Repository
PSQ
Srvr
PSQ
Srvr
PSQ
Srvr
CORE LAN
CORE LAN
Solaris
Solaris
Emulated
Emulated
Emulated
Windows
Windows
Public IP Network
Public IP Network
Public IP Network
IP Network
IP Network
IP Network
various versions
various versions
TAP
AODB
Client 5
Client 6
WxHaz
Client 1
CombatOps
MAF
CombatOps
Client 9
MAF
Client 10
TARGET
HUB
TARGET
Client 7
HUB
ChemHaz
Client 2
CAF
CAF
Client 8
Client LAN 4
Client LAN 3
EDC
Client 3
AODBSVR
AODBSVR
DB SVR 1
SWDIST
TAPDB
JEES
SWDIST
BE SVR 1
TAPDB
DB SVR 2
Client 4
Client LAN 2
Client LAN 1
23Defense-Enabled System
QIS
QIS
QIS
QIS
QIS
QIS
QIS
QIS
VLAN
VLAN
Bump In Wire w/ADF
Bump In Wire w/ADF
ADF NIC
ADF NIC
Experiment Control/logging network
SeLinux
SeLinux
WinXP
Pro
WinXP
Solaris 8
Win2000
VPN Router
VPN Router
VPN Router
VPN Router
VPN Router
VPN Router
VPN Router
VPN Router
VPN Router
VPN Router
VPN Router
VPN Router
Emulated IP network using VLANS in a single Cisco
3750
VPN Router
VPN Router
VPN Router
VPN Router
VPN Router
VPN Router
VPN Router
VPN Router
VPN Router
VPN Router
VPN Router
VPN Router
HUB
HUB
HUB
HUB
HUB
HUB
HUB
HUB
HUB
HUB
WxHaz
TAP
AODB
Client 1
Client 5
Client 6
CombatOps
CombatOps
Client 9
ChemHaz
TARGET
HUB
Client 2
TARGET
Client 7
HUB
CAF
CAF
Client 8
EDC
Client 3
AODBSVR
JEES
AODBSVR
DB SVR 1
Client 4
SWDIST
TAPDB
SWDIST
BE SVR 1
TAPDB
DB SVR 2
Client LAN 4
Client LAN 3
Client LAN 1
Client LAN 2
24Key Aspects of the Survivability Architecture
- Defense mechanisms
- Policy enforcement
- Encryption
- Authentication
- Detection and correlation
- Redundancy/redundancy management
- Adaptive response (recover, degrade)
- Design principles and enablers
- Multiple layers policy, encryption,
authentication - SPOF, Diversity, Hardware grounding, Modularity,
Containment, Range of adaptive response - Architectural elements
- Zones, Quadrants, Survivable Middleware,
Protection domains - System Managers (SM), Access Proxies (AP), Local
Controllers (LC) - Protocols
- Corruption Tolerant PSQ Embedded in the
Survivable Middleware - Heartbeats
25Some Annotations
- Policy Enforcement (permissions, capabilities)
- JVM security policy
- SELinux/CSA policies
- Process Protection Domain
- System Protection Domain
- ADF policies
- Network Procetion Domain
- Encryption
- Outer VPN
- ADF VPGs
- Application level encryptiom
- Authentication
- VPN level (router-router, router-hosts)
- ADF level (host to host)
- Application level
- Adaptive responses
- Restore files
- Kill processes
- Isolate host
- Reboot host
- Retry PSQ operations
- Adjust redundancy/level of tolerance (degrade)
- Restart application
- Quarantine network segments
- Detection and Correlation
- Embedded sensors (applications, proxies,
heartbeats) - Policy engines
- NIDS
- EMERALD
- Advisor
26Survivable Middleware
Undefended Pub/Sub Middleware
Password login. No
redundancy at core
Survivable Middleware adds a stronger level of
authentication, access control and
reliability Cryptography-based login, Redundant
core, Transparent protocol based on weak
assumptions, multiple transports
27Evaluation Results
28Red Team Evaluation (Adversarial)
- Run 1
- Defended system ran for 14 hours with no visible
impact - The policies were so tight that the red team had
no visibility of their actions or their impact - Run 2 (modified the policy to enable red team the
visibility they requested) - 12 hour scenario completed, but the red team was
able to cause significant hiccups during the
scenario - With the added visibility they were able to DOS
specific clients when they needed to publish
information - Run 3 (different red team)
- Within an hour they took out the PIX VPNs!
- Residual flaw in the Cisco router configuration
(recall red teams have complete knowledge of
everything) in addition to the agreed upon span
port, the configuration also gave them a trunk
port access! - Rerunning the same attack without the trunk port
did not succeed, but the red team was divided in
their opinion about whether the attack could be
customized to work w/o the trunk port access
Although having access to trunk ports in multiple
routers in a backbone is a considerable amount of
privilege, run 3 exposed and exploited the
tradeoff we made in the design !
No loss of published information or corruption!
29Red Team Evaluation (Cooperative)
Compressed 3 hr scenario
Traitor blue team member(s) worked with the red
team
Red team started inside the defended system with
attack code pre-positioned
High-level access and higher privilege implied
some of the sensors were blind
New flaws and defense opportunities exposed
Bad refs Spread, Java serialization, MSQL
Injection, ADF Policy Server exploit
This extraordinary success of the defense
required considerable human help
30Conclusion and Future Direction
31Current Conclusion
- A high-water mark in survivable system design
- Proof that information systems can be made highly
survivable - Survivability Architecture individual
mechanisms abound, this was a great first example
of integrating them coherently with a tight and
consistent policy - There is no such thing as improbable risk
against a highly motivated adversary - Exploiting the SPOF PIX VPN routers were
assessed to be an improbable risk - Created a daunting level of difficulty to breach
confidentiality and integrity, but availability
is not there yet - That is despite all the redundancy, diversity and
adaptive response - Loss is easily detected
- Human intelligence required in interpreting
observed information and controlling the
architecture
32Future Direction
- What to do with availability
- Beyond degradation?
- Regenerate? Learn while you regenerate?
- Artificial diversity?
- Minimizing the need for human intelligence?
- Motivation
- Cost issue
- Response time
- Human factors
- Can there be an expert system/expert assistant?
33Reference Material
- For more information
- Papers about this project
- http//www.dist-systems.bbn.com/papers/2005/ACSAC/
index2.shtml - http//www.dist-systems.bbn.com/papers/2005/ACSAC/
index.shtml - http//www.dist-systems.bbn.com/papers/2006/NCA/in
dex.shtml - http//www.dist-systems.bbn.com/papers/2005/NCA/in
dex.shtml - Other BBN papers
- Michael Atighetchi, Partha Pal, Franklin Webber,
Richard Schantz, Christopher Jones, Joseph
Loyall. Adaptive Cyberdefense for Survival and
Intrusion Tolerance. IEEE Internet Computing,
Vol. 8, No. 6, November/December 2004, pp. 25-33.
- http//www.dist-systems.bbn.com/papers/2006/SPE/in
dex.shtml - COCA
- http//www.cs.cornell.edu/home/ldzhou/coca.htm
- MAFTIA paper
- http//www.maftia.org/
- OASIS book
- Useful technologies
- ADF (3COM, Secure Computing, Adventium Labs)
- http//doi.ieeecomputersociety.org/10.1109/DSN.200
6.17 - http//doi.ieeecomputersociety.org/10.1109/DISCEX.
2001.932222 - SELinux, CSA
- http//www.nsa.gov/selinux/
- http//www.cisco.com/en/US/products/sw/secursw/ps5
057/index.html - EMERALD (SRI)
- http//www.csl.sri.com/projects/emerald/
- Routers, Managed switches (Various vendors Cisco,
HP etc) - http//www.cisco.com/warp/public/707/21.html
- http//www.hp.com/rnd/index.htm
- Tripwire (Tripware Inc), Veracity (Rocksoft)
- http//www.tripwire.com/index.cfm