Hadoop online trainings - PowerPoint PPT Presentation

View by Category
About This Presentation
Title:

Hadoop online trainings

Description:

Hadoop Online Training : kelly technologies is the bestHadoop online Training Institutes in Bangalore. ProvidingHadoop online Training by real time faculty in Bangalore. – PowerPoint PPT presentation

Number of Views:22
Slides: 38
Provided by: srikanthhadoop
Tags:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Hadoop online trainings


1
  • Assured Cloud Computing for Assured Information
    Sharing

Presented By
2
Outline
  • Objectives
  • Assured Information Sharing
  • Layered Framework for a Secure Cloud
  • Cloud-based Assured Information Sharing
  • Cloud-based Secure Social Networking
  • Other Topics
  • Secure Hybrid Cloud
  • Cloud Monitoring
  • Cloud for Malware Detection
  • Cloud for Secure Big Data
  • Education
  • Directions
  • Related Books

www.kellytechno.com
3
Team Members
  • Sponsor Air Force Office of Scientific Research
  • The University of Texas at Dallas
  • Dr. Murat Kantarcioglu Dr. Latifur Khan Dr.
    Kevin Hamlen Dr. Zhiqiang Lin, Dr. Kamil Sarac
  • Sub-contractors
  • Prof. Elisa Bertino (Purdue)
  • Ms. Anita Miller, Late Dr. Bob Johnson (North
    Texas Fusion Center)
  • Collaborators
  • Late Dr. Steve Barker, Dr. Maribel Fernandez,
    Kings College, U of London (EOARD)
  • Dr. Barbara Carminati Dr. Elena Ferrari, U of
    Insubria (EOARD)

www.kellytechno.com
4
Objectives
  • Cloud computing is an example of computing in
    which dynamically scalable and often virtualized
    resources are provided as a service over the
    Internet. Users need not have knowledge of,
    expertise in, or control over the technology
    infrastructure in the "cloud" that supports them.
  • Our research on Cloud Computing is based on
    Hadoop, MapReduce, Xen
  • Apache Hadoop is a Java software framework that
    supports data intensive distributed applications
    under a free license. It enables applications to
    work with thousands of nodes and petabytes of
    data. Hadoop was inspired by Google's MapReduce
    and Google File System (GFS) papers.
  • XEN is a Virtual Machine Monitor developed at the
    University of Cambridge, England
  • Our goal is to build a secure cloud
    infrastructure for assured information sharing
    and related applications

www.kellytechno.com
5
Information Operations Across Infospheres
Assured Information Sharing
  • Objectives
  • Develop a Framework for Secure and Timely Data
    Sharing across Infospheres
  • Investigate Access Control and Usage Control
    policies for Secure Data Sharing
  • Develop innovative techniques for extracting
    information from trustworthy, semi-trustworthy
    and untrustworthy partners

Data/Policy for Coalition
Publish Data/Policy
Publish Data/Policy
Publish Data/Policy
Component
Component
Data/Policy for
Data/Policy for
Agency A
Agency C
Component
Data/Policy for
Agency B
  • Scientific/Technical Approach
  • Conduct experiments as to how much information is
    lost as a result of enforcing security policies
    in the case of trustworthy partners
  • Develop more sophisticated policies based on
    role-based and usage control based access
    control models
  • Develop techniques based on game theoretical
    strategies to handle partners who are
    semi-trustworthy
  • Develop data mining techniques to carry out
    defensive and offensive information operations
  • Accomplishments
  • l Developed an experimental system for
    determining information loss due to security
    policy enforcement
  • Developed a strategy for applying game theory for
    semi-trustworthy partners simulation results
  • Developed data mining techniques for conducting
    defensive operations for untrustworthy partners
  • Challenges
  • Handling dynamically changing trust levels
    Scalability

www.kellytechno.com
6
Our Approach
  • Policy-based Information Sharing
  • Integrate the Medicaid claims data and mine the
    data
  • Enforce policies and determine how much
    information has been lost (Trustworthy partners)
  • Application of Semantic web technologies
  • Apply game theory and probing to extract
    information from semi-trustworthy partners
  • Conduct Active Defence and determine the actions
    of an untrustworthy partner
  • Defend ourselves from our partners using data
    analytics techniques
  • Conduct active defence find our what our
    partners are doing by monitoring them so that we
    can defend our selves from dynamic situations

www.kellytechno.com
7
Policy Enforcement Prototype
Coalition
www.kellytechno.com
8
Layered Framework for Assured Cloud Computing
www.kellytechno.com
9
Secure Query Processing with Hadoop/MapReduce
  • We have studied clouds based on Hadoop
  • Query rewriting and optimization techniques
    designed and implemented for two types of data
  • (i) Relational data Secure query processing with
    HIVE
  • (ii) RDF data Secure query processing with
    SPARQL
  • Demonstrated with XACML policies
  • Joint demonstration with Kings College and
    University of Insubria
  • First demo (2011) Each party submits their data
    and policies
  • Our cloud will manage the data and policies
  • Second demo (2012) Multiple clouds

www.kellytechno.com
10
Fine-grained Access Control with Hive
System Architecture
  • Table/View definition and loading,
  • Users can create tables as well as load data into
    tables. Further, they
  • can also upload XACML policies for the table
    they are creating.
  • Users can also create XACML policies for
    tables/views.
  • Users can define views only if they have
    permissions for all tables specified in the query
    used to create the view. They can also either
    specify or create XACML policies for the views
    they are defining.
  • CollaborateCom 2010

www.kellytechno.com
11
SPARQL Query Optimizer for Secure RDF Data
Processing
To build an efficient storage mechanism using
Hadoop for large amounts of data (e.g. a billion
triples) build an efficient query mechanism for
data stored in Hadoop Integrate with
Jena Developed a query optimizer and query
rewriting techniques for RDF Data with XACML
policies and implemented on top of JENA IEEE
Transactions on Knowledge and Data Engineering,
2011
Web Interface
Answer
New Data
Query
Server Backend
www.kellytechno.com
12
Demonstration Concept of Operation
Agency 1
Agency 2
Agency n

User Interface Layer
Relational Data
RDF Data
Fine-grained Access Control with Hive
SPARQL Query Optimizer for Secure RDF Data
Processing
www.kellytechno.com
13
RDF-Based Policy Engine
Interface to the Semantic Web
Technology By UTDallas
Inference Engine/ Rules Processor e.g., Pellet
Policies Ontologies Rules In RDF
JENA RDF Engine
RDF Documents
www.kellytechno.com
14
RDF-based Policy Engine on the Cloud
  • Determine how access is granted to a resource as
    well as how a document is shared
  • User specify policy e.g., Access Control,
    Redaction, Released Policy
  • Parse a high-level policy to a low-level
    representation
  • Support Graph operations and visualization.
    Policy executed as graph operations
  • Execute policies as SPARQL queries over large RDF
    graphs on Hadoop
  • Support for policies over Traditional data and
    its provenance
  • IFIP Data and Applications Security, 2010, ACM
    SACMAT 2011

A testbed for evaluating different policy sets
over different data representation. Also
supporting provenance as directed graph and
viewing policy outcomes graphically
www.kellytechno.com
15
Integration with Assured Information Sharing
Agency 1
Agency 2
Agency n

User Interface Layer
SPARQL Query
RDF Data and Policies
Policy Translation and Transformation Layer
RDF Data Preprocessor
MapReduce Framework for Query Processing
www.kellytechno.com
Hadoop HDFS
Result
16
Architecture
www.kellytechno.com
17
Policy Reciprocity
  • Agency 1 wishes to share its resources if Agency
    2 also shares its resources with it
  • Use our Combined policies
  • Allow agents to define policies based on
    reciprocity and mutual interest amongst
    cooperating agencies
  • SPARQL query
  • SELECT B
  • FROM NAMED uri1 FROM NAMED uri2
  • WHERE P

www.kellytechno.com
18
Develop and Scale Policies
  • Agency 1 wishes to extend its existing policies
    with support for constructing policies at a finer
    granularity.
  • The Policy engine
  • Policy interface that should be implemented by
    all policies
  • Add newer types of policies as needed

www.kellytechno.com
19
Justification of Resources
  • Agency 1 asks Agency 2 for a justification of
    resource R2
  • Policy engine
  • Allows agents to define policies over provenance
  • Agency 2 can provide the provenance to Agency 1
  • But protect it by using access control or
    redaction policies

www.kellytechno.com
20
Other Example Policies
  • Agency 1 shares a resource with Agency 2 provided
    Agency 2 does not share with Agency 3
  • Agency 1 shares a resource with Agency 2
    depending on the content of the resource or until
    a certain time
  • Agency 1 shares a resource R with agency 2
    provided Agency 2 does not infer sensitive data S
    from R (inference problem)
  • Agency 1 shares a resource with Agency 2 provided
    Agency 2 shares the resource only with those in
    its organizational (or social) network

www.kellytechno.com
21
Analyzing and Securing Social Networks in
the CloudAnalyticsLocation Mining from Online
Social NetworksPredicting Threats from Social
Network Data, Sentiment AnalysisCloud Platform
for implementationSecurity and
PrivacyPreventing the Inference of Private
Attributes (liberal or conservative gay or
straight)Access Control in Social NetworksCloud
Platform for implementation
www.kellytechno.com
22
Security Policies for On-Line Social Networks
(OSN)
  • Security Policies ate Expressed in SWRL (Semantic
    Web Rules Language) examples

www.kellytechno.com
23
Security Policy Enforcement
  • A reference monitor evaluates the requests.
  • Admin request for access control could be
    evaluated by rule rewriting
  • Example Assume Bob submits the following admin
    request
  • Rewrite as the following rule

www.kellytechno.com
24
Framework Architecture
www.kellytechno.com
25
Secure Social Networking in the Cloud with
Twitter-Storm
Social Network 1
Social Network N
Social Network 2

User Interface Layer
Relational Data
RDF Data
Fine-grained Access Control with Hive
SPARQL Query Optimizer for Secure RDF Data
Processing
www.kellytechno.com
26
Secure Storage and Query Processing in a Hybrid
Cloud
  • The use of hybrid clouds is an emerging trend in
    cloud computing
  • Ability to exploit public resources for high
    throughput
  • Yet, better able to control costs and data
    privacy
  • Several key challenges
  • Data Design how to store data in a hybrid cloud?
  • Solution must account for data representation
    used (unencrypted/encrypted), public cloud
    monetary costs and query workload characteristics
  • Query Processing how to execute a query over a
    hybrid cloud?
  • Solution must provide query rewrite rules that
    ensure the correctness of a generated query plan
    over the hybrid cloud

www.kellytechno.com
27
Hypervisor integrity and forensics in the Cloud
Applications
OS
Linux
Solaris
XP
MacOS
integrity
forensics
Virtualization Layer (Xen, vSphere)
Hypervisor
Cloud integrity forensics
Hardware Layer
  • Secure control flow of hypervisor code
  • Integrity via in-lined reference monitor
  • Forensics data extraction in the cloud
  • Multiple VMs
  • De-mapping (isolate) each VM memory from physical
    memory

www.kellytechno.com
28
Cloud-based Malware Detection
www.kellytechno.com
29
Cloud-based Malware Detection
  • Binary feature extraction involves
  • Enumerating binary n-grams from the binaries and
    selecting the best n-grams based on information
    gain
  • For a training data with 3,500 executables,
    number of distinct 6-grams can exceed 200
    millions
  • In a single machine, this may take hours,
    depending on available computing resources not
    acceptable for training from a stream of binaries
  • We use Cloud to overcome this bottleneck
  • A Cloud Map-reduce framework is used
  • to extract and select features from each chunk
  • A 10-node cloud cluster is 10 times faster than a
    single node
  • Very effective in a dynamic framework, where
    malware characteristics change rapidly

www.kellytechno.com
30
Identity Management Considerations in a Cloud
  • Trust model that handles
  • (i) Various trust relationships, (ii) access
    control policies based on roles and attributes,
    iii) real-time provisioning, (iv) authorization,
    and (v) auditing and accountability.
  • Several technologies are being examined to
    develop the trust model
  • Service-oriented technologies standards such as
    SAML and XACML and identity management
    technologies such as OpenID.
  • Does one size fit all?
  • Can we develop a trust model that will be
    applicable to all types of clouds such as private
    clouds, public clouds and hybrid clouds Identity
    architecture has to be integrated into the cloud
    architecture.

www.kellytechno.com
31
Big Data and the Cloud
  • Big Data describes large and complex data that
    cannot be managed by traditional data management
    tools
  • From Petabytes to Zettabytes to Exabytes of data
  • Need tools for capture, storage, search, sharing,
    analysis, visualization of big data.
  • Examples include
  • Web logs, RFID and surveillance data, sensor
    networks, social network data (graphs), text and
    multimedia, data pertaining to astronomy,
    atmospheric science, genomics, biogeochemical,
    biological fields, video archives
  • Big Data Technologies
  • Hadoop/MapReduce Platform, HIVE Platform, Twitter
    Storm Platform, Google Apps Engine, Amazon EC2
    Cloud, Offerings from Oracle and IBM for Big Data
    Management, Other Cassandra, Mahut, PigLatin, -
    - - -
  • Cloud Computing is emerging a critical tool for
    Big Data Management
  • Critical to maintain Security and Privacy for Big
    Data

www.kellytechno.com
32
Security and Privacy for Big Data
  • Secure Storage and Infrastructure
  • How can technologies such as Hadoop and MapReduce
    be Secured
  • Secure Data Management
  • Techniques for Secure Query Processing
  • Examples Securing HIVE, Cassandra
  • Big Data for Security
  • Analysis of Security Data (e.g., Malware
    analysis)
  • Regulations, Compliance Governance
  • What are the regulations for storing, retaining,
    managing, transferring and analyzing Big Data
  • Are the corporations compliance with the
    regulations
  • Privacy of the individuals have to be maintained
    not just for raw data but also for data
    integration and analytics
  • Roles and Responsibilities must be clearly
    defined

www.kellytechno.com
33
Security and Privacy for Big Data
  • Regulations Stifling Innovation?
  • Major Concern is too many regulations will stifle
    Innovation
  • Corporations must take advantage of the Big Data
    technologies to improve business
  • But this could infringe on individual privacy
  • Regulations may also interfere with Privacy
    example retaining the data
  • Challenge How can one carry out Analytics and
    still maintain Privacy?
  • National Science F Workshop Planned for Spring
    2014 at the University of Texas at Dallas

www.kellytechno.com
34
Education on Secure Cloud Computing and Related
Technologies
  • Secure Cloud Computing
  • NSF Capacity Building Grant on Assured Cloud
    Computing
  • Introduce cloud computing into several cyber
    security courses
  • Completed courses
  • Data and Applications Security, Data Storage,
    Digital Forensics, Secure Web Services
  • Computer and Information Security
  • Capstone Course
  • One course that covers all aspects of assured
    cloud computing
  • Week long course to be given at Texas Southern
    University
  • Analyzing and Securing Social Networks
  • Big Data Analytics and Security

www.kellytechno.com
35
Directions
  • Secure VMM and VNM
  • Designing Secure XEN VMM
  • Developing automated techniques for VMM
    introspection
  • Determine a secure network infrastructure for the
    cloud
  • Integrate Secure Storage Algorithms into Hadoop
  • Identity Management in the Cloud
  • Secure cloud-based Big Data Management/Social
    Networking

www.kellytechno.com
36
Related Books
  • Developing and Securing the Cloud, CRC Press
    (Taylor and Francis), November 2013
    (Thuraisingham)
  • Secure Data Provenance and Inference Control with
    Semantic Web, CRC Press 2014, In Print
    (Cadenhead, Kantarcioglu, Khadilkar,
    Thuraisingham)
  • Analyzing and Securing Social Media, CRC Press,
    2014, In preparation (Abrol, Heatherly,
    Khan, Kantarcioglu, Khadilkar, Thuraisingham)

www.kellytechno.com
37
Thankyou
Presented By
About PowerShow.com