Apache MADlib AI/ML - PowerPoint PPT Presentation

About This Presentation
Title:

Apache MADlib AI/ML

Description:

This presentation gives an overview of the Apache MADlib AI/ML project. It explains Apache MADlib AI/ML in terms of it's functionality, it's architecture, dependencies and also gives an SQL example. Links for further information and connecting – PowerPoint PPT presentation

Number of Views:43
Slides: 19
Provided by: semtechs

less

Transcript and Presenter's Notes

Title: Apache MADlib AI/ML


1
What Is Apache MADlib ?
  • For scalable in-database analytics
  • Open source Apache 2.0 license
  • For machine learning in SQL
  • At big data scale
  • Offers graph, statistics, analytics, deep
    learning
  • Provides data-parallel implementations
  • For structured and unstructured data

2
MADlib Prerequisites
  • Currently supports databases
  • PostgreSQL
  • Needs Python extension specified
  • Greenplum (distributed db)
  • Apache Hawq ( v1.12 ) (distributed db)
  • Requires the GNU M4 Unix macro processor
  • Works with Python 2.6 and 2.7

3
MADlib Architecture
4
MADlib Architecture
  • MADlib has three main layers
  • Python driver functions
  • Main entry point from user input
  • Largely responsible for algorithm flow control
  • Validating input parameters
  • Executing SQL statements
  • Evaluating the results
  • Potentially looping to execute more SQL
    statements
  • Until some convergence criteria has been hit

5
MADlib Architecture
  • MADlib has three main layers
  • C implementations functions
  • C definitions of the core functions/aggregates
  • Needed for particular algorithms
  • Implemented in C rather than Python
  • For performance reasons

6
MADlib Architecture
  • MADlib has three main layers
  • C database abstraction layer
  • Provide a programming interface
  • Abstracts all the Postgres internal details
  • Provides support for different back end platforms
  • Focuses on the internal functionality
  • Rather than the platform integration logic

7
MADlib Data Types and Transformations
  • Arrays and Matrices
  • Encoding Categorical Variables
  • Path
  • Pivot
  • Sessionize
  • Stemming

8
MADlib Graph Functionality
  • All Pairs Shortest Path
  • Breadth-First Search
  • HITS
  • Measures
  • PageRank
  • Single Source Shortest Path
  • Weakly Connected Components

9
MADlib Model Selection / Sampling
  • Model Selection
  • Cross Validation
  • Prediction Metrics
  • Train-Test Split
  • Sampling
  • Balanced Sampling
  • Stratified Sampling

10
MADlib Statistics / Supervised Learning
  • Statistics
  • Descriptive Statistics
  • Inferential Statistics
  • Probability Functions
  • Supervised Learning
  • Conditional Random Field
  • k-Nearest Neighbors
  • Neural Network
  • Regression Models
  • Support Vector Machines
  • Tree Methods

11
MADlib Time Series / Unsupervised Learning
  • Time Series Analysis
  • ARIMA
  • Unsupervised Learning
  • Association Rules
  • Clustering
  • Dimensionality Reduction
  • Topic Modelling

12
MADlib Utilities
  • Columns to Vector
  • Database Functions
  • Linear Solvers
  • Mini-Batch Preprocessor
  • PMML Export
  • Term Frequency
  • Vector to Columns

13
MADlib Deep Learning Example SQL
  • First define the model configurations to train
  • Meaning either model architectures or
    hyperparameters
  • Load them into a model selection table
  • The combination of model architectures and
    hyperparameters
  • Constitutes the model configurations to train
  • In the picture there are three model
    configurations
  • Represented by the three different purple shapes

14
MADlib Deep Learning Example SQL
15
MADlib Deep Learning Example SQL
  • Once we have model combinations
  • In the model selection table
  • Call the fit function to train the models
  • In parallel.
  • In the picture the three orange shapes
  • Represent the three models that have been
    trained

16
MADlib Deep Learning Example SQL
17
Available Books
  • See Big Data Made Easy
  • Apress Jan 2015
  • See Mastering Apache Spark
  • Packt Oct 2015
  • See Complete Guide to Open Source Big Data
    Stack
  • Apress Jan 2018
  • Find the author on Amazon
  • www.amazon.com/Michael-Frampton/e/B00NIQDOOM/
  • Connect on LinkedIn
  • www.linkedin.com/in/mike-frampton-38563020

18
Connect
  • Feel free to connect on LinkedIn
  • www.linkedin.com/in/mike-frampton-38563020
  • See my open source blog at
  • open-source-systems.blogspot.com/
  • I am always interested in
  • New technology
  • Opportunities
  • Technology based issues
  • Big data integration
Write a Comment
User Comments (0)
About PowerShow.com