Apache Kafka PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Apache Kafka


1
What Is Apache Kafka ?
  • A stream processing platform
  • Open source / Apache 2.0 license
  • Written in Java and Scala
  • A publish/subscribe system for record streams
  • Scaleable / fault tolerant
  • Topic based partition FIFO queues

2
How Does Kafka Work ?
  • Kafka runs as a cluster of servers
  • Stores records in topics
  • Topics are partitioned into queues
  • Partitions are stored across cluster
  • Consumers organised into groups
  • Stream processors transform records
  • Reusable connectors process queues
  • For instance database connectors

3
Kafka API'S
  • Producer API
  • Allows applications to publish to topics
  • Consumer API
  • Applications subscribe to topics / process data
    streams
  • Streams API
  • Applications acts as stream processor,
    transforming stream
  • Connector API
  • Build reusable producers / consumers
  • I.E. RDBMS connectors/producers/consumers
  • Admin API
  • For topic and broker management

4
Kafka Logical Architecture
5
Kafka Topic Queue Offsets
6
Kafka Topic Queue Offsets
  • Records published to Topics
  • Topics are multi subscriber
  • Topics contain partition queues
  • A partition queue contains an sequence of records
  • Each record has a queue offset ( position )
  • Consumers use the offset to read records
  • Queue record retention is configurable

7
Kafka Producer Consumer
8
Kafka Producer Consumer
  • Producers write to partitions i.e. Producer1 ? P0
  • Producers responsible for record ? partition
    mapping
  • Kafka only guarantees order with a partition
  • Kafka cluster contains ltngt servers
  • Partitions mapped to servers
  • Consumers members of consumer groups
  • Each consumer must maintain it's partition read
    offset

9
Kafka's Stack Role
  • A low latency messaging system
  • Records load balanced across partitions
  • As a storage system
  • Using local file system storage
  • Scales horizontally in terms of performance
  • As a stream processing system
  • Using stream API to transform data
  • Data replication provides fault tolerance

10
Available Books
  • See Big Data Made Easy
  • Apress Jan 2015
  • See Mastering Apache Spark
  • Packt Oct 2015
  • See Complete Guide to Open Source Big Data
    Stack
  • Apress Jan 2018
  • Find the author on Amazon
  • www.amazon.com/Michael-Frampton/e/B00NIQDOOM/
  • Connect on LinkedIn
  • www.linkedin.com/in/mike-frampton-38563020

11
Connect
  • Feel free to connect on LinkedIn
  • www.linkedin.com/in/mike-frampton-38563020
  • See my open source blog at
  • open-source-systems.blogspot.com/
  • I am always interested in
  • New technology
  • Opportunities
  • Technology based issues
  • Big data integration
Write a Comment
User Comments (0)
About PowerShow.com