Title: Dr. Bhavani Thuraisingham The University of Texas at Dallas (UTD) November 2013
1Dr. Bhavani ThuraisinghamThe
University of Texas at Dallas (UTD)November 2013
- Assured Cloud Computing for Assured Information
Sharing
2Team Members
- Sponsor Air Force Office of Scientific Research
- The University of Texas at Dallas
- Dr. Murat Kantarcioglu Dr. Latifur Khan Dr.
Kevin Hamlen Dr. Zhiqiang Lin, Dr. Kamil Sarac - Sub-contractors
- Prof. Elisa Bertino (Purdue)
- Ms. Anita Miller, Dr. Bob Johnson (North Texas
Fusion Center) - Collaborators
- Late Dr. Steve Barker, Kings College, U of London
(EOARD) - Dr. Barbara Carminati Dr. Elena Ferrari, U of
Insubria (EOARD)
3Outline
- Objectives
- Assured Information Sharing
- Layered Framework
- Our Research
- Education
- Acknowledgement
- Research Funded by Air Force Office of Scientific
Research - Education funded by the National Science
Foundation
4Objectives
- Cloud computing is an example of computing in
which dynamically scalable and often virtualized
resources are provided as a service over the
Internet. Users need not have knowledge of,
expertise in, or control over the technology
infrastructure in the "cloud" that supports them. - Our research on Cloud Computing is based on
Hadoop, MapReduce, Xen - Apache Hadoop is a Java software framework that
supports data intensive distributed applications
under a free license. It enables applications to
work with thousands of nodes and petabytes of
data. Hadoop was inspired by Google's MapReduce
and Google File System (GFS) papers. - XEN is a Virtual Machine Monitor developed at the
University of Cambridge, England - Our goal is to build a secure cloud
infrastructure for assured information sharing
applications
5Information Operations Across Infospheres
Assured Information Sharing
- Objectives
- Develop a Framework for Secure and Timely Data
Sharing across Infospheres - Investigate Access Control and Usage Control
policies for Secure Data Sharing - Develop innovative techniques for extracting
information from trustworthy, semi-trustworthy
and untrustworthy partners - Budget FY06-8 AFOSR 300K, State Match. 150K
Data/Policy for Coalition
Publish Data/Policy
Publish Data/Policy
Publish Data/Policy
Component
Component
Data/Policy for
Data/Policy for
Agency A
Agency C
Component
Data/Policy for
Agency B
- Scientific/Technical Approach
- Conduct experiments as to how much information is
lost as a result of enforcing security policies
in the case of trustworthy partners - Develop more sophisticated policies based on
role-based and usage control based access
control models - Develop techniques based on game theoretical
strategies to handle partners who are
semi-trustworthy - Develop data mining techniques to carry out
defensive and offensive information operations
- Accomplishments
- l Developed an experimental system for
determining information loss due to security
policy enforcement - Developed a strategy for applying game theory for
semi-trustworthy partners simulation results - Developed data mining techniques for conducting
defensive operations for untrustworthy partners - Challenges
- Handling dynamically changing trust levels
Scalability
6Architecture 2005-2008
Data/Policy for Coalition
Export
Export
Data/Policy
Data/Policy
Export
Data/Policy
Component
Component
Data/Policy for
Data/Policy for
Agency A
Agency C
Component
Data/Policy for
Trustworthy Partners Semi-Trustworthy
Partners Untrustworthy Partners
Agency B
7Our Approach
- Integrate the Medicaid claims data and mine the
data next enforce policies and determine how
much information has been lost (Trustworthy
partners) Prototype system Application of
Semantic web technologies - Apply game theory and probing to extract
information from semi-trustworthy partners - Conduct Active Defence and determine the actions
of an untrustworthy partner - Defend ourselves from our partners using data
mining techniques - Conduct active defence find our what our
partners are doing by monitoring them so that we
can defend our selves from dynamic situations - Trust for Peer to Peer Networks (Infrastructure
security) -
8Policy Enforcement PrototypeDr. Mamoun Awad
(postdoc) and students
Coalition
9 Game Theory for Assured Information Sharing
- Studies such interactions through mathematical
representations of gain - Each party is considered a player
- The information they gain from each other is
considered a payoff - Scenario considered a finite repeated game
- Information exchanged in discrete chunks each
round - Situation terminates at a finite yet
unforeseeable point in the future - Actions within the game are to either lie or tell
the truth - Our Goal All players draw conclusion that
telling the truth is the best option
10Incentive Issues in Assured Information
SharingDoD MURI Project 2008 - 2013, AFOSR
- Motivation
- Misaligned incentives could be a significant
problem in Information Security - Software bugs vs. software companies incentives
- Incentive issues in information sharing have been
explored to some extent - Incentive issues in file sharing p2p networks
- Assured information sharing creates new
challenges - Security considerations vs. utility
- Technical Approach
- Verify that the other participants do not lie
about their data - If the data is revealed as it is
- Trust but verify (Our initial results DKE 08
paper) - If the data is not revealed (e.g., SMC techniques
are used) - Non-cooperative computing
- Mechanism design
- SMC with rational adversaries
11Layered Framework for Assured Cloud Computing
12Secure Query Processing with Hadoop/MapReduce
- We have studied clouds based on Hadoop
- Query rewriting and optimization techniques
designed and implemented for two types of data - (i) Relational data Secure query processing with
HIVE - (ii) RDF data Secure query processing with
SPARQL - Demonstrated with XACML policies
- Joint demonstration with Kings College and
University of Insubria - First demo (2011) Each party submits their data
and policies - Our cloud will manage the data and policies
- Second demo (2012) Multiple clouds
13Fine-grained Access Control with Hive
System Architecture
- Table/View definition and loading,
- Users can create tables as well as load data into
tables. Further, they - can also upload XACML policies for the table
they are creating. - Users can also create XACML policies for
tables/views. - Users can define views only if they have
permissions for all tables specified in the query
used to create the view. They can also either
specify or create XACML policies for the views
they are defining. - CollaborateCom 2010
14SPARQL Query Optimizer for Secure RDF Data
Processing
To build an efficient storage mechanism using
Hadoop for large amounts of data (e.g. a billion
triples) build an efficient query mechanism for
data stored in Hadoop Integrate with
Jena Developed a query optimizer and query
rewriting techniques for RDF Data with XACML
policies and implemented on top of JENA IEEE
Transactions on Knowledge and Data Engineering,
2011
Web Interface
New Data
Answer
Query
Server Backend
15Demonstration Concept of Operation
Agency 1
Agency 2
Agency n
User Interface Layer
Relational Data
RDF Data
Fine-grained Access Control with Hive
SPARQL Query Optimizer for Secure RDF Data
Processing
16RDF-Based Policy Engine
Interface to the Semantic Web
Technology By UTDallas
Inference Engine/ Rules Processor e.g., Pellet
Policies Ontologies Rules In RDF
JENA RDF Engine
RDF Documents
17RDF-based Policy Engine on the Cloud
- Determine how access is granted to a resource as
well as how a document is shared - User specify policy e.g., Access Control,
Redaction, Released Policy - Parse a high-level policy to a low-level
representation - Support Graph operations and visualization.
Policy executed as graph operations - Execute policies as SPARQL queries over large RDF
graphs on Hadoop - Support for policies over Traditional data and
its provenance - IFIP Data and Applications Security, 2010, ACM
SACMAT 2011
A testbed for evaluating different policy sets
over different data representation. Also
supporting provenance as directed graph and
viewing policy outcomes graphically
18Integration with Assured Information Sharing
Agency 1
Agency 2
Agency n
User Interface Layer
SPARQL Query
RDF Data and Policies
Policy Translation and Transformation Layer
RDF Data Preprocessor
MapReduce Framework for Query Processing
Hadoop HDFS
Result
19Architecture
20Key Feature 1 Policy Reciprocity
- Agency 1 wishes to share its resources if Agency
2 also shares its resources with it - Use our Combined policies
- Allow agents to define policies based on
reciprocity and mutual interest amongst
cooperating agencies - SPARQL query
- SELECT B
- FROM NAMED uri1 FROM NAMED uri2
- WHERE P
21Key Feature 2 Develop and Scale Policies
- Agency 1 wishes to extend its existing policies
with support for constructing policies at a finer
granularity. - The Policy engine
- Policy interface that should be implemented by
all policies - Add newer types of policies as needed
22Key Feature 3 Justification of Resources
- Agency 1 asks Agency 2 for a justification of
resource R2 - Policy engine
- Allows agents to define policies over provenance
- Agency 2 can provide the provenance to Agency 1
- But protect it by using access control or
redaction policies
23Key Feature 4 Development Testbed
- Policy framework provides three configurations
- A standalone version for development and
testing - A version backed by a relational database
- A cloud-based version
- achieves high availability and scalability while
maintaining low setup and operation costs
24Secure Storage and Query Processing in a Hybrid
Cloud
- The use of hybrid clouds is an emerging trend in
cloud computing - Ability to exploit public resources for high
throughput - Yet, better able to control costs and data
privacy - Several key challenges
- Data Design how to store data in a hybrid cloud?
- Solution must account for data representation
used (unencrypted/encrypted), public cloud
monetary costs and query workload characteristics - Query Processing how to execute a query over a
hybrid cloud? - Solution must provide query rewrite rules that
ensure the correctness of a generated query plan
over the hybrid cloud
25Hypervisor integrity and forensics in the Cloud
Applications
OS
Linux
Solaris
XP
MacOS
integrity
forensics
Virtualization Layer (Xen, vSphere)
Hypervisor
Cloud integrity forensics
Hardware Layer
- Secure control flow of hypervisor code
- Integrity via in-lined reference monitor
- Forensics data extraction in the cloud
- Multiple VMs
- De-mapping (isolate) each VM memory from physical
memory
26Cloud-based Malware DetectionDr. Mehedy
27Cloud-based Malware Detection
- ACM Transactions on Management Information
Systems - Binary feature extraction involves
- Enumerating binary n-grams from the binaries and
selecting the best n-grams based on information
gain - For a training data with 3,500 executables,
number of distinct 6-grams can exceed 200
millions - In a single machine, this may take hours,
depending on available computing resources not
acceptable for training from a stream of binaries - We use Cloud to overcome this bottleneck
- A Cloud Map-reduce framework is used
- to extract and select features from each chunk
- A 10-node cloud cluster is 10 times faster than a
single node - Very effective in a dynamic framework, where
malware characteristics change rapidly
28Identity Management Considerations in a Cloud
- Trust model that handles
- (i) Various trust relationships, (ii) access
control policies based on roles and attributes,
iii) real-time provisioning, (iv) authorization,
and (v) auditing and accountability. - Several technologies have to be examined to
develop the trust model - Service-oriented technologies standards such as
SAML and XACML and identity management
technologies such as OpenID. - Does one size fit all?
- Can we develop a trust model that will be
applicable to all types of clouds such as private
clouds, public clouds and hybrid clouds Identity
architecture has to be integrated into the cloud
architecture.
29Education
- NSF Capacity Building Grant on Assured Cloud
Computing - Introduce cloud computing into several cyber
security courses - Completed courses
- Data and Applications Security
- Data Storage
- Digital Forensics
- Secure Web Services
- Computer and Information Security
- Capstone Course
- One course that covers all aspects of assured
cloud computing - Week long course to be given at Texas Southern
University
30Directions
- Secure VMM (Virtual Machine Monitor) and VNM
(Virtual Network Monitor) - Exploring XEN VMM and examining security issues
- Developing automated techniques for VMM
introspection - Examine VMM issues
- Integrate Secure Storage Algorithms into Hadoop
- Identity Management