Big Data Online Training - PowerPoint PPT Presentation


PPT – Big Data Online Training PowerPoint presentation | free to download - id: 89c873-M2YxO


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Big Data Online Training


Learntek is global online training provider on Big Data Analytics, Hadoop, Machine Learning, Deep Learning, IOT, AI, Cloud Technology, DEVOPS, Digital Marketing and other IT and Management courses. We are dedicated to designing, developing and implementing training programs for students, corporate employees and business professional. – PowerPoint PPT presentation

Number of Views:40
Learn more at:
Slides: 19
Provided by: Learntek
Tags: big_data


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Big Data Online Training


  • The following topics will be covered in our
  • Online Training

What is Hadoop?
  • Big Data Hadoop Training Hadoop is a free, Java
    -based programming framework that supports the
    processing of large data sets in a distributed
    computing environment. It is part of the Apache
    project sponsored by the Apache Software
    Foundation. Hadoop makes it possible to run
    applications on systems with thousands of nodes
    involving thousands of terabytes of storage
    capacity. Its distributed file system facilitates
    rapid data transfer rates among nodes and allows
    the system to continue operating uninterrupted in
    case of a node failure. This approach lowers the
    risk of catastrophic system failure, even if a
    significant number of nodes become inoperative.

Copyright _at_ 2015 Learntek. All Rights Reserved.
Why Hadoop?
  • Large Volumes of Data Ability to store and
    process huge amounts of variety (structure,
    unstructured and semi structured) of data,
    quickly. With data volumes and varieties
    constantly increasing, especially from social
    media and the Internet of Things (IoT), thats a
    key consideration.
  • Computing Power Hadoops distributed computing
    model processes big data fast. The more computing
    nodes you use, the more processing power you
  • Fault Tolerance Data and application processing
    are protected against hardware failure. If a node
    goes down, jobs are automatically redirected to
    other nodes to make sure the distributed
    computing does not fail. Multiple copies of all
    data are stored automatically.
  • Flexibility Unlike traditional relational
    database, you dont have to process data before
    storing it, You can store as much data as you
    want and decide how to use it later. That
    includes unstructured data like text, images and
    videos etc.
  • Low Cost The open-source framework is free and
    used commodity hardware to store large quantities
    of data.
  • Scalability You can easily grow your system to
    handle more data simply by adding nodes. Little
    administration is required.

Big Data Hadoop Training Hadoop Introduction
Big Data Hadoop Training Introduction to Data and System Types of Data Traditional way of dealing large data and its problems Types of Systems Scaling What is Big Data Challenges in Big Data Challenges in Traditional Application New Requirements What is Hadoop? Why Hadoop? Brief history of Hadoop Features of Hadoop Hadoop and RDBMS Hadoop Ecosystems overview
Hadoop Installation
Installation in detail Creating Ubuntu image in VMwareDownloading Hadoop Installing SSH Configuring Hadoop, HDFS MapReduce Download, Installation Configuration Hive Download, Installation Configuration Pig Download, Installation Configuration Sqoop Download, Installation Configuration Hive Configuring Hadoop in Different Modes
Hadoop Distribute File System (HDFS)
File System Concepts Blocks Replication Factor Version File Safe mode Namespace IDs Purpose of Name Node Purpose of Data Node Purpose of Secondary Name Node Purpose of Job Tracker Purpose of Task Tracker HDFS Shell Commands copy, delete, create directories etc. Reading and Writing in HDFS Difference of Unix Commands and HDFS commands Hadoop Admin Commands Hands on exercise with Unix and HDFS commands Read / Write in HDFS Internal Process between Client, NameNode DataNodes. Accessing HDFS using Java API Various Ways of Accessing HDFS Understanding HDFS Java classes and methods Admin 1. Commissioning / DeCommissioning DataNode Balancer Replication Policy Network Distance / Topology Script
Map Reduce Programming
About MapReduce Understanding block and input splits MapReduce Data types Understanding Writable Data Flow in MapReduce Application Understanding MapReduce problem on datasets MapReduce and Functional Programming Writing MapReduce Application Understanding Mapper function Understanding Reducer Function Understanding Driver Usage of Combiner Understanding Partitioner Usage of Distributed Cache Passing the parameters to mapper and reducer Analysing the Results Log files Input Formats and Output Formats Counters, Skipping Bad and unwanted Records Writing Joins in MapReduce with 2 Input files. Join Types. Execute MapReduce Job Insights. Exercises on MapReduce. Job Scheduling Type of Schedulers.
Hive concepts Schema on Read VS Schema on Write Hive architecture Install and configure hive on cluster Meta Store Purpose Type of Configurations Different type of tables in Hive Buckets Partitions Joins in hive Hive Query Language Hive Data Types Data Loading into Hive Tables Hive Query Execution Hive library functions Hive UDF Hive Limitations
  • Pig basics
  • Install and configure PIG on a cluster
  • PIG Library functions
  • Pig Vs Hive
  • Write sample Pig Latin scripts
  • Modes of running PIG
  • Running in Grunt shell
  • Running as Java program
  • PIG UDFs

HBase concepts HBase architecture Region server architecture File storage architecture HBase basics Column access Scans HBase use cases Install and configure HBase on a multi node cluster Create database, Develop and run sample applications Access data stored in HBase using Java API
  • Install and configure Sqoop on cluster
  • Connecting to RDBMS
  • Installing Mysql
  • Import data from Mysql to hive
  • Export data to Mysql
  • Internal mechanism of import/export

  • Introduction to OOZIE
  • Oozie architecture
  • XML file specifications
  • Specifying Work flow
  • Control nodes
  • Oozie job coordinator

  • Introduction to Flume
  • Configuration and Setup
  • Flume Sink with example
  • Channel
  • Flume Source with example
  • Complex flume architecture

  • Introduction to ZooKeeper
  • Challenges in distributed Applications
  • Coordination
  • ZooKeeper Design Goals
  • Data Model and Hierarchical namespace
  • Cilent APIs

Hadoop 1.0 Limitations MapReduce Limitations History of Hadoop 2.0 HDFS 2 Architecture HDFS 2 Quorum based storage HDFS 2 High availability HDFS 2 Federation YARN Architecture Classic vs YARN YARN Apps YARN multitenancy YARN Capacity Scheduler
  • Knowledge in any programming language, Database
    knowledge and Linux Operating system. Core Java
    or Python knowledge helpful.

(No Transcript)