Apache Samza PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Apache Samza


1
What Is Apache Samza ?
  • An asynchronous computational framework
  • For distributed sub second stream processing
  • Fault tolerance, isolation and stateful
    processing
  • Open source / Apache 2.0 license
  • Developed in Java and Scala
  • Runs stand-alone or on YARN

2
Samza Use Cases
  • Applications that require millisecond - second
    response
  • Streaming analytics
  • DDOS attack detection
  • Fraud detection
  • Metric anomaly detection
  • System notifications
  • Performance monitoring

3
Samza Users
4
Samza Partitioned Stream
  • Samza uses streams to process data
  • Collections of ordered immutable objects
  • Each object uses a key-value pair
  • Each stream is sharded into partitions
  • This allows the architecture to scale

5
Samza API's
  • High Level Streams API (Java)
  • Stream based processing API
  • Low Level Task API (Java)
  • Message based processing API
  • Table API
  • Random access by key data sources
  • Testing Samza
  • Samza's testing Integration framework
  • Samza SQL
  • Stream processing via SQL and UDF's
  • Apache BEAM
  • Samza provides a Beam runner for application
    execution

6
Samza Architecture
7
Samza Architecture
  • Application are broken down into tasks
  • Each task consumes data from a stream partition
  • Tasks are executed with containers
  • A coordinator assigns tasks to containers
  • Tasks checkpoint their last processed task offset
  • Each task has its own state store for state
    management
  • Samza replicates changes to local store in
    separate stream
  • This allows later recovery of local stores

8
Samza Architecture
  • Task container coordination

9
Samza Architecture
  • Fault tolerance of state

10
Samza Architecture
  • Incremental checkpointing

11
Samza Architecture
  • State management

12
Available Books
  • See Big Data Made Easy
  • Apress Jan 2015
  • See Mastering Apache Spark
  • Packt Oct 2015
  • See Complete Guide to Open Source Big Data
    Stack
  • Apress Jan 2018
  • Find the author on Amazon
  • www.amazon.com/Michael-Frampton/e/B00NIQDOOM/
  • Connect on LinkedIn
  • www.linkedin.com/in/mike-frampton-38563020

13
Connect
  • Feel free to connect on LinkedIn
  • www.linkedin.com/in/mike-frampton-38563020
  • See my open source blog at
  • open-source-systems.blogspot.com/
  • I am always interested in
  • New technology
  • Opportunities
  • Technology based issues
  • Big data integration
Write a Comment
User Comments (0)
About PowerShow.com