The Best MapReduce Online Training with Job Assist - PowerPoint PPT Presentation

About This Presentation
Title:

The Best MapReduce Online Training with Job Assist

Description:

Mindmajix MapReduce Training helps you to learn implementation for processing and generating large data sets with a parallel, distributed algorithm on a cluster. The framework takes care of scheduling tasks, monitoring them and re-executing any faile d tasks. – PowerPoint PPT presentation

Number of Views:26

less

Transcript and Presenter's Notes

Title: The Best MapReduce Online Training with Job Assist


1
Learn Overview of Hadoop Mapreduce
https//mindmajix.com/
2
(No Transcript)
3
Overview of Hadoop MapReduce
  • MapReduce  is a soft work framework for easily
    writing applications which process vast amounts
    of data (multi tera bytes) in- parallel on large
    clusters of commodity hard work in a reliable,
    fault- tolerant manner. (Learn about the basics
    of MapReduce in the column Mapreduce Tutorial)
  • Map Reduce job usually splits the input data-set
    into independent chunks which are processed by
    the map tasks in a completely parallel manner.
  • The framework sorts the outputs of the maps,
    which are then act as inputs to the reduce tasks.
  • Typically, both input and output of the job are
    stored in a file system and the framework takes
    care of scheduling tasks, monitoring them and re
    executes the failed tasks.
  • MapReduce framework consists of a single master
    Job tracker and one slave task tracker per
    cluster node.
  • The master is responsible for scheduling the jobs
    component tasks on the slaves, monitoring them
    and re-executing the failed tasks and the solves
    execute the tasks as directed by the master.

https//mindmajix.com/
4
What Are The Different Types of The Input Formats?
  • 1. Text input format
  • 2. Key value text input format
  • 3. Nline input Format
  • 4. Sequence File input format
  • 5. Sequence File As Text Input Format
  • 6. Sequence file As Binary Input Format
  • 7. DB Input Format
  • 8. Multiple inputs

https//mindmajix.com/
5
What Are The Different Types of The Output
Formats?
  • 1. Text output format
  • 2. Sequence file output format
  • 3. Sequence File As Binary Output format

https//mindmajix.com/
6
  •  1. Text input format   It is a default input
    format and record is a line of input.   The key
    for each record is the byte offset from beginning
    of each line and value.   The output format for
    this is the Text output format and its final
    output is a key-value pair which is delimited by
    a tab.2. Key value text input format   If
    the output of the Text output format is sent to
    the i/p tab, we can specify the input format for
    this job, as this acts as a key value pair since
    the output is a delimited key       value
    pair.   We can override the default delimiter
    using this property

https//mindmajix.com/
7
  • 3. Nline input Format   This is used if we
    want to move each mapper to receive a fixed
    number of lines as input.   Map reduce. Input.
    Line input format. Lines per map is set properly
    to the N value.   This also works as a Text
    input format, but the difference is in the number
    of lines per input.   Map Reduce has support
    for binary format as well4. Sequence File input
    format   This file format shores sequences of
    binary key-value pairs.   Sequence files are
    well suited as a format for Map reduce data since
    they are split table.   To use the sequence
    file data as input as input to the map per, we
    need to mention input format as the sequence file
    input format.   Need to mention the key-value
    data types as per the sequence file key and value
    types

https//mindmajix.com/
8
  • 5. Sequence File As Text Input Format
  •   This is like a sequence file input format, but
    it converts the sequence files, keys and values
    to text objects.    The conversion is performed
    by calling to storing() on the keys and values.?
  • 6. Sequence file As Binary Input FormatThis is
    live sequence file input format that retrieves
    the sequence files, keys and values as opaque
    binary objects and they are encapsulated as Bytes
    writable objects.
  • 7. DB Input FormatIt is used when reading the
    data from a relational database using JDBC.
  • 8. Multiple inputs   This technique is used
    when we need to process the data which could be
    in the same format or in a different format but
    may have different representation.   While
    using this, we need to mention a map per and
    input format for each input pat   

https//mindmajix.com/
9
  • Text output format
  • This is the default output format, the key and
    values are separated by tab delimiter.The
    delimiter can be changed using the property.
  • We can support the key or value from the output
    using Null writable type.
  • 2. Sequence file output formatIt writes
    sequence files for its output.
  • 3. Sequence File As Binary Output format It
    writes keys and values in raw binary format into
    a sequence file container. The output format
    classes generates set of files as output and one
    file for each reduces and their names
    are part-r-00000,part-r-00001 etc.If we want to
    write multiple output files for each reduce then
    we will use multiple output classThis will
    generate one output file for each key in the
    reduces and the name can be pre fined with the
    key. 

https//mindmajix.com/
10
Data Types and custom writable
  •   Hadoop comes with a large selection of
    writable classes in the org. a apache.Hadoop. io
    package   Hadoop provides all writable wrappers
    for all the JAVA primiters types except char and
    hare a get() and set() method for retrieving and
    storing the data.     1.  Byte writable- byte,
    Boolean writable-boolean.     2.  Short
    writable-short, int writable and vint writable-
    int     3.  Float writable-float, double
    writable-double.     4.  Long writable and V
    long writable- long.
  •   When we have numerics, we can select either
    fined length (Int writable and long writable)
    or variable length (V int writable and v long
    writable)Text It is equivalent to string in
    Java

https//mindmajix.com/
11
  • Null writable It is a special type of writable.
    And No bytes are written to. or read from the
    stream.   In Map Reduce, a key or value can be
    declared as a Null writable when we dont want to
    use this in the final output.   It is an
    immutable single ton and the instance can be
    received by null writable. get()   This will
    store an empty value in the output.Writable
    collections- There are six writable collection
    types in Hadoop.   Array writable,
    TwoDArrayWritable, Array primitive writable.
      Map writable, StoredMapWritable and
    EnumSetWritable   ArrayPrimitiveWritable is a
    wrapper for arrays of Java Primitives   Array
    writable and TwoDArrayWritable are writable
    implementations for arrays and two dimensional
    arrays (array of arrays) of writable instances.
      All the elements of an array writable or two D
    Array writable must be instances of the same
    class.
  • Custom Writable- Instead of the existing
    writable classes, if we want to implement our own
     writable classes, then we can develop a custom
    writable by implementing a writable comparable
    interface.

https//mindmajix.com/
12
  • Boost Your Career Opportunities By Enrolling Into
    The Mindmajix
  • Technologies Free Live Hadoop Administration
    Demo.
  • Contact Details
  • Mindmajix Technologies
  • INDIA 91 924 633 3245
  • USA - 1 201 3780 518, 1 972-427-3027
  • Email info_at_mindmajix.com
  • Official Website https//mindmajix.com/
  • Learn how to use Hadoop Mapreduce, from
    beginner basics to advanced
  • techniques
  • Hadoop Tutorial
  • Hadoop Interview Questions
  • Mapreduce Tutorial
  • MapReduce Interview Questions

https//mindmajix.com/
Write a Comment
User Comments (0)
About PowerShow.com