An introduction to Apache Spark - PowerPoint PPT Presentation

About This Presentation
Title:

An introduction to Apache Spark

Description:

A introduction to Apache Spark, what is it and how does it work ? Why use it and some examples of use. – PowerPoint PPT presentation

Number of Views:6388
Slides: 8
Provided by: semtechs

less

Transcript and Presenter's Notes

Title: An introduction to Apache Spark


1
Apache Spark
  • What is it ?
  • How does it work ?
  • Benefits
  • Tuning
  • Examples

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
2
Spark What is it ?
  • Open Source
  • Alternative to Map Reduce for certain
    applications
  • A low latency cluster computing system
  • For very large data sets
  • May be 100 times faster than Map Reduce for
  • Iterative algorithms
  • Interactive data mining
  • Used with Hadoop / HDFS
  • Released under BSD License

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
3
Spark How does it work ?
  • Uses in memory cluster computing
  • Memory access faster than disk access
  • Has API's written in
  • Scala
  • Java
  • Python
  • Can be accessed from Scala and Python shells
  • Currently an Apache incubator project

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
4
Spark Benefits
  • Scales to very large clusters
  • Uses in memory processing for increased speed
  • High Level API's
  • Java, Scala, Python
  • Low latency shell access

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
5
Spark Tuning
  • Bottlenecks can occur in the cluster via
  • CPU, memory or network bandwidth
  • Tune data serialization method i.e.
  • Java ObjectOutputStream vs Kryo
  • Memory Tuning
  • Use primitive types
  • Set JVM Flags
  • Store objects in serialized form i.e.
  • RDD Persistence
  • MEMORY_ONLY_SER

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
6
Spark Examples
  • Example from spark-project.org, Spark job in
    Scala.
  • Showing a simple text count from a system log.
  • / SimpleJob.scala /
  • import spark.SparkContext
  • import SparkContext._
  • object SimpleJob
  • def main(args ArrayString)
  • val logFile "/var/log/syslog" // Should be
    some file on your system
  • val sc new SparkContext("local", "Simple
    Job", "YOUR_SPARK_HOME",
  • List("target/scala-2.9.3/simple-project_2.9.
    3-1.0.jar"))?
  • val logData sc.textFile(logFile,
    2).cache()?
  • val numAs logData.filter(line gt
    line.contains("a")).count()?
  • val numBs logData.filter(line gt
    line.contains("b")).count()?
  • println("Lines with a s, Lines with b
    s".format(numAs, numBs))?

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
7
Contact Us
  • Feel free to contact us at
  • www.semtech-solutions.co.nz
  • info_at_semtech-solutions.co.nz
  • We offer IT project consultancy
  • We are happy to hear about your problems
  • You can just pay for those hours that you need
  • To solve your problems
Write a Comment
User Comments (0)
About PowerShow.com