Apache Samza - PowerPoint PPT Presentation

About This Presentation
Title:

Apache Samza

Description:

This presentation gives an overview of the Apache Samza project. It explains Samza's stream processing capabilities as well as its architecture, users, use cases etc. Links for further information and connecting – PowerPoint PPT presentation

Number of Views:270
Slides: 14
Provided by: semtechs

less

Transcript and Presenter's Notes

Title: Apache Samza


1
What Is Apache Samza ?
  • An asynchronous computational framework
  • For distributed sub second stream processing
  • Fault tolerance, isolation and stateful
    processing
  • Open source / Apache 2.0 license
  • Developed in Java and Scala
  • Runs stand-alone or on YARN

2
Samza Use Cases
  • Applications that require millisecond - second
    response
  • Streaming analytics
  • DDOS attack detection
  • Fraud detection
  • Metric anomaly detection
  • System notifications
  • Performance monitoring

3
Samza Users
4
Samza Partitioned Stream
  • Samza uses streams to process data
  • Collections of ordered immutable objects
  • Each object uses a key-value pair
  • Each stream is sharded into partitions
  • This allows the architecture to scale

5
Samza API's
  • High Level Streams API (Java)
  • Stream based processing API
  • Low Level Task API (Java)
  • Message based processing API
  • Table API
  • Random access by key data sources
  • Testing Samza
  • Samza's testing Integration framework
  • Samza SQL
  • Stream processing via SQL and UDF's
  • Apache BEAM
  • Samza provides a Beam runner for application
    execution

6
Samza Architecture
7
Samza Architecture
  • Application are broken down into tasks
  • Each task consumes data from a stream partition
  • Tasks are executed with containers
  • A coordinator assigns tasks to containers
  • Tasks checkpoint their last processed task offset
  • Each task has its own state store for state
    management
  • Samza replicates changes to local store in
    separate stream
  • This allows later recovery of local stores

8
Samza Architecture
  • Task container coordination

9
Samza Architecture
  • Fault tolerance of state

10
Samza Architecture
  • Incremental checkpointing

11
Samza Architecture
  • State management

12
Available Books
  • See Big Data Made Easy
  • Apress Jan 2015
  • See Mastering Apache Spark
  • Packt Oct 2015
  • See Complete Guide to Open Source Big Data
    Stack
  • Apress Jan 2018
  • Find the author on Amazon
  • www.amazon.com/Michael-Frampton/e/B00NIQDOOM/
  • Connect on LinkedIn
  • www.linkedin.com/in/mike-frampton-38563020

13
Connect
  • Feel free to connect on LinkedIn
  • www.linkedin.com/in/mike-frampton-38563020
  • See my open source blog at
  • open-source-systems.blogspot.com/
  • I am always interested in
  • New technology
  • Opportunities
  • Technology based issues
  • Big data integration
Write a Comment
User Comments (0)
About PowerShow.com