Apache Samoa ML PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Apache Samoa ML


1
What Is Apache Samoa ?
  • An Apache incubator project
  • A machine learning framework
  • A distributed scaleable system
  • Deploys to existing Apache systems
  • Storm, S4, Samza, AVRO
  • Deploy a Samoa algorithm these systems
  • Samoa abstracts implementation via API
  • Designed for stream processing
  • Offers a range of ML algorithms

2
Samoa Terms
  • Samoa terms that might be of use

PE PI EPI Spout Bolt ML
Processing element Processing item Entrance
processing item A storm term for a data source A
storm term for a data join element Machine
learning
3
Samoa Algorithms
  • Samoa supported algorithms
  • Prequential Evaluation Task
  • Vertical Hoeffding Tree Classifier
  • Adaptive Model Rules Regressor
  • Bagging and Boosting
  • Distributed Stream Clustering
  • Distributed Stream Frequent Itemset Mining
  • SAMOA for MOA users

4
Samoa Architecture
5
Samoa Architecture
6
Samoa Architecture
  • The aim of Samoa is to provide implementation
    abstraction
  • For stream processing algorithms
  • Written using it's API
  • Against the stream processing systems that it
    supports
  • So for instance, write an algorithm once and
  • Deploy to S4 and Storm
  • The deployment process creates a platform jar
  • That you can deploy to the specific platform

7
Samoa Topology
8
Samoa Topology
  • Samoa provides a simple topology for stream
    processing
  • This includes the elements
  • Processor
  • Content Event
  • Stream
  • Task
  • Topology Builder
  • Learner
  • Processing Item

9
Samoa Processor
  • Processor is the basic logical processing unit
  • All logic is written in the processor
  • In Samoa, a Processor is an interface
  • Users can implement this interface
  • To build their own custom class
  • A processor in a Samoa topology can be
  • A processor in the topology
  • An entrance processor which sources the stream

10
Samoa Content Event
  • A message or an event is called Content Event in
    Samoa
  • It is an event which contains content which
  • Needs to be processed by the processors
  • ContentEvent has been implemented as an
    interface in Samoa
  • Users need to implement ContentEvent interface
  • To create their custom message classes

11
Samoa Stream
  • A stream is a physical unit of SAMOA topology
  • Which connects different Processors with each
    other
  • Stream is also created by a TopologyBuilder
  • Just like a Processor
  • A stream can have a single source but many
    destinations
  • A Processor which is the source of a stream owns
    the stream

12
Samoa Task
  • Task is similar to a job in Hadoop
  • Task is an execution entity
  • A topology must be defined inside a task
  • Samoa can only execute classes
  • That implement Task interface

13
Samoa Topology Builder
  • TopologyBuilder is a builder class
  • Which builds physical units of the topology
  • And assemble them together
  • Each topology has a name
  • An example topology might have
  • An EntrancePI
  • Some PI's
  • Some streams

14
Samoa Learner
  • Learners are sub-topologies
  • Use init() function to
  • Add streams
  • Add processors
  • Specify connections to the topology
  • Use getInputProcessor() function to
  • Add processor that will manage the input stream
  • Use getResultStream() function to
  • Specify what is going to be the output stream

15
Samoa Processing Item
  • Processing Item is a hidden physical unit of the
    topology
  • Is just a wrapper of Processor
  • It is used internally
  • Is not accessible from the API
  • Connects the Processor to the other processors
    in the topology
  • Simple Processing Item (PI)
  • Entrance Processing Item (EntrancePI)

16
Available Books
  • See Big Data Made Easy
  • Apress Jan 2015
  • See Mastering Apache Spark
  • Packt Oct 2015
  • See Complete Guide to Open Source Big Data
    Stack
  • Apress Jan 2018
  • Find the author on Amazon
  • www.amazon.com/Michael-Frampton/e/B00NIQDOOM/
  • Connect on LinkedIn
  • www.linkedin.com/in/mike-frampton-38563020

17
Connect
  • Feel free to connect on LinkedIn
  • www.linkedin.com/in/mike-frampton-38563020
  • See my open source blog at
  • open-source-systems.blogspot.com/
  • I am always interested in
  • New technology
  • Opportunities
  • Technology based issues
  • Big data integration
Write a Comment
User Comments (0)
About PowerShow.com