An introduction to Apache Sqoop - PowerPoint PPT Presentation

About This Presentation
Title:

An introduction to Apache Sqoop

Description:

An introduction to Apache Sqoop, what is it ? How does it assist in large volume data transfer between Hadoop and external sources ? – PowerPoint PPT presentation

Number of Views:3875
Slides: 10
Provided by: semtechs
Category:

less

Transcript and Presenter's Notes

Title: An introduction to Apache Sqoop


1
Apache Sqoop
  • What is it ?
  • How does it work ?
  • Interfaces
  • Example
  • Architecture

www.semtech-solutions.co.nz info_at_semtech-solutions
.co.nz
2
Scoop What is it ?
  • A command line interface
  • ( plus web in scoop2 )?
  • For data import / export to Hadoop
  • Uses Map jobs from Map Reduce
  • Supports incremental loads
  • Written in Java
  • Licensed by Apache
  • Uses plugins for new types of data source

www.semtech-solutions.co.nz info_at_semtech-solutions
.co.nz
3
Scoop How does it work ?
  • Data sliced into partitions
  • Mappers transfer data
  • Data types determined via meta data
  • Many data transfer formats supported
  • i.e. CSV, Avro
  • Can import into
  • Hive ( use --hive-import flag )?
  • Hbase ( use hbase flags )?

www.semtech-solutions.co.nz info_at_semtech-solutions
.co.nz
4
Scoop Interfaces
  • Get data from
  • Relational databases
  • Data warehouses
  • NoSQL databases
  • Load to Hive and Hbase
  • Integrates with Oozie
  • for scheduling

www.semtech-solutions.co.nz info_at_semtech-solutions
.co.nz
5
Scoop Example
  • An example scoop command to
  • load data from mySql into Hive
  • bin/sqoop-import
  • --connect jdbcmysql//ltmysql hostgtltmsql
    portgt/db3 \
  • -username ltusernamegt \
  • -password ltpasswordgt \
  • --table lttableNamegt \
  • --hive-table ltHive tableNamegt \
  • --create-hive-table \
  • --hive-import \
  • --hive-home lthive pathgt

www.semtech-solutions.co.nz info_at_semtech-solutions
.co.nz
6
Scoop Architecture
  • Scoop has moved from
  • Scoop1 to Scoop 2
  • Changed from client to server install
  • Now has web and command line access
  • Server now accesses Hive Hbase
  • Oozie uses REST API

www.semtech-solutions.co.nz info_at_semtech-solutions
.co.nz
7
Scoop Architecture - Scoop1
www.semtech-solutions.co.nz info_at_semtech-solutions
.co.nz
8
Scoop Architecture - Scoop2
www.semtech-solutions.co.nz info_at_semtech-solutions
.co.nz
9
Contact Us
  • Feel free to contact us at
  • www.semtech-solutions.co.nz
  • info_at_semtech-solutions.co.nz
  • We offer IT project consultancy
  • We are happy to hear about your problems
  • You can just pay for those hours that you need
  • To solve your problems
Write a Comment
User Comments (0)
About PowerShow.com