Big Data and Hadoop Components - PowerPoint PPT Presentation

About This Presentation
Title:

Big Data and Hadoop Components

Description:

Explore the various hadoop components that constitute the overall Hadoop Ecosystem and Hadoop Architecture. – PowerPoint PPT presentation

Number of Views:1824

less

Transcript and Presenter's Notes

Title: Big Data and Hadoop Components


1
Hadoop Components Architecture Big Data
Hadoop Training
2
Understand how the hadoop ecosystem works to
master Apache Hadoop skills and gain in-depth
knowledge of big data ecosystem and hadoop
architecture. However, before you enroll for any
big data hadoop training course it is necessary
to get some basic idea on how the hadoop
ecosystem works. Learn about the various hadoop
components that constitute the Apache Hadoop
architecture in this presentation.
3
Defining Architecture Components of the Big Data
Ecosystem
4
(No Transcript)
5
Core Hadoop Components
  •  
  •  Hadoop Common
  • 2) Hadoop Distributed File System (HDFS) 
  • 3) MapReduce- Distributed Data Processing
    Framework of Apache Hadoop
  • 4)YARN
  •  
  • Read More in Detail about Hadoop Components -
    https//www.dezyre.com/article/hadoop-components-a
    nd-architecture-big-data-and-hadoop-training/114

6
Data Access Components of Hadoop Ecosystem-  Pig
and Hive
7
Apache Pig
  • ?Apache Pig is a convenient tool developed by
    Yahoo for analysing huge data sets efficiently
    and easily. It provides a high level data flow
    language Pig Latin that is optimized, extensible
    and easy to use.

8
Apache Hive
  • ? Hive developed by Facebook is a data warehouse
    built on top of Hadoop and provides a simple
    language known as HiveQL similar to SQL for
    querying, data summarization and analysis. Hive
    makes querying faster through indexing.

9
Data Integration Components of Hadoop Ecosystem-
Sqoop and Flume
10
Apache Sqoop
  • Sqoop component is used for importing data from
    external sources into related Hadoop components
    like HDFS, HBase or Hive. It can also be used for
    exporting data from Hadoop o other external
    structured data stores. 

11
Flume
  • ?Flume component is used to gather and aggregate
    large amounts of data. Apache Flume is used for
    collecting data from its origin and sending it
    back to the resting location (HDFS).

12
Data Storage Component of Hadoop Ecosystem HBase
13
HBase
  • HBase is a column-oriented database that uses
    HDFS for underlying storage of data. HBase
    supports random reads and also batch computations
    using MapReduce. With HBase NoSQL database
    enterprise can create large tables with millions
    of rows and columns on hardware machine. 

14
Monitoring, Management and Orchestration
Components of Hadoop Ecosystem- Oozie and
Zookeeper
15
Oozie
  • Oozie is a workflow scheduler where the workflows
    are expressed as Directed Acyclic Graphs. Oozie
    runs in a Java servlet container Tomcat and makes
    use of a database to store all the running
    workflow instances, their states ad variables
    along with the workflow definitions to manage
    Hadoop jobs (MapReduce, Sqoop, Pig and Hive).

16
Zookeeper
  • Zookeeper is the king of coordination and
    provides simple, fast, reliable and ordered
    operational services for a Hadoop cluster.
    Zookeeper is responsible for synchronization
    service, distributed configuration service and
    for providing a naming registry for distributed
    systems.
  •  
  • To Know about other Hadoop Components -
    https//www.dezyre.com/article/hadoop-components-a
    nd-architecture-big-data-and-hadoop-training/114
Write a Comment
User Comments (0)
About PowerShow.com