Apache Trafodian - PowerPoint PPT Presentation

About This Presentation
Title:

Apache Trafodian

Description:

This presentation gives an overview of the Apache Trafodian project. It explains Trafodian architecture in relation to Hadoop/HBase and it's process structure. Links for further information and connecting – PowerPoint PPT presentation

Number of Views:40
Slides: 11
Provided by: semtechs

less

Transcript and Presenter's Notes

Title: Apache Trafodian


1
What Is Apache Tez ?
  • An application framework
  • Build on top of Apache Hadoop YARN
  • Uses directed-acyclic-graphs ( DAG's )
  • Open source / Apache 2.0 license
  • Scaleable
  • Performant

2
Hadoop Eco Sphere
3
Tez DAG
  • Tez directed-acyclic-graphs ( DAG )
  • Distributed data processing
  • Vertices represent data transformation
  • Edges represent data movement
  • For data processing applications
  • TEZ is an execution engine
  • Built on top of YARN

4
Tez Performance
  • Performance improvement compared to Map Reduce
  • No need for HDFS storage between MR jobs
  • Better execution performance
  • Expressive dataflow API for DAG
  • Visualise what you wish to construct
  • Add processor vertices to graph
  • Add data movement edges to graph
  • To build the computational DAG that you require

5
Tez Deployment
  • Tez is client side
  • Install Tez client locally
  • Build task DAG
  • Load DAG/Tez libraries to HDFS
  • Execute YARN based job
  • From Tez client
  • Using HDFS based DAG library

6
Tez Existing MR Tasks
  • Tez can process existing Map Reduce ( MR ) tasks
  • No need for any modification
  • Allows for phased migration
  • Of existing MR jobs to DAG's
  • Allows for near real time task types
  • Rather than just MR tasks which are
  • Batch oriented
  • Iterative
  • Resource intensive

7
Tez API
  • Tez DAG defines the job
  • Vertex defines one DAG job step
  • Requires user logic and resources for step
  • Edge defines one DAG data movement step
  • From producer to consumer
  • Edge properties define movement
  • How data moves
  • Schedules when data moves relationally
  • Defines durability of data

8
Tez Hive
  • Increased performance
  • Compared to Map Reduce usage
  • No need to use HDFS for intermediate steps
  • Greater parallelism via DAG's
  • Less complex steps in DAG compared to MR
  • Reduced latency
  • Higher throughput
  • Better speed

9
Available Books
  • See Big Data Made Easy
  • Apress Jan 2015
  • See Mastering Apache Spark
  • Packt Oct 2015
  • See Complete Guide to Open Source Big Data
    Stack
  • Apress Jan 2018
  • Find the author on Amazon
  • www.amazon.com/Michael-Frampton/e/B00NIQDOOM/
  • Connect on LinkedIn
  • www.linkedin.com/in/mike-frampton-38563020

10
Connect
  • Feel free to connect on LinkedIn
  • www.linkedin.com/in/mike-frampton-38563020
  • See my open source blog at
  • open-source-systems.blogspot.com/
  • I am always interested in
  • New technology
  • Opportunities
  • Technology based issues
  • Big data integration
Write a Comment
User Comments (0)
About PowerShow.com