Machine Learning Using Spark Online Training - PowerPoint PPT Presentation

About This Presentation
Title:

Machine Learning Using Spark Online Training

Description:

Learntek is global online training provider on Big Data Analytics, Hadoop, Machine Learning, Deep Learning, IOT, AI, Cloud Technology, DEVOPS, Digital Marketing and other IT and Management courses. We are dedicated to designing, developing and implementing training programs for students, corporate employees and business professional. – PowerPoint PPT presentation

Number of Views:82
Learn more at: http://www.learntek.org

less

Transcript and Presenter's Notes

Title: Machine Learning Using Spark Online Training


1
  • MACHINE LEARNING USING SPARK

2
  • The following topics will be covered in our
  • Machine Learning Using Spark 
  • Online Training

3
What is Machine Learning?
  • Machine learning Using Spark-Spark MLlib is an
    application of artificial intelligence (AI) that
    provides systems the ability to automatically
    learn and improve from experience without being
    explicitly programmed. Machine learning focuses
    on the development of computer programs that can
    access data and use it learn for themselves.

4
Into to Machine Learning Using Spark
  • MLlib is  Sparks machine learning (ML) library.
    Its goal is to make practical machine learning
    scalable and easy. At a high level, it provides
    tools such as
  • ML Algorithms common learning algorithms such as
    classification, regression, clustering, and
    collaborative filtering
  • Featurization feature extraction,
    transformation, dimensionality reduction, and
    selection
  • Pipelines tools for constructing, evaluating,
    and tuning ML Pipelines
  • Persistence saving and load algorithms, models,
    and Pipelines
  • Utilities linear algebra, statistics, data
    handling, etc.

5
Tools
  • This course will be delivered using Scala and
    PYTHON API. For explaining statistical concept, R
    language will also be using. Visualization part
    will be covered using Bokeh/ggplot library.

6
Introduction to Apache Spark
  • Spark Programming model
  • RDD and Data Frame
  • Transformation and Action
  • Broadcast and Accumulator
  • Running HDP on local machine
  • Launching Spark Cluster

7
Basic Statistics 
 Mean, Mode, Media, Range, Variance, Standard Deviation, Quartiles, Percentiles Sampling Sampling Methods Sampling Errors Probability Distributions  Normal distribution, t-distribution, Chi-square, F Margin of Error, Confidence Interval, Significance level, Degree of Freedom Hypothesis concept, Type I and Type II error P-value, t-Test, Chi-square Test Correlation Coefficient
8
Machine Learning Using Spark
  • Introduction to Spark MLlib
  • Data types Vector, Labeled Point
  • Feature Extraction
  • Feature Transformation, Normalization
  • Feature Selectors
  • Locality Sensitive Hashing(LSH)

9
Regression Analysis with Spark
  • Types of Regression Models
  • Gradient Descent
  • Linear Regression, Generalized Linear Regression
  • MSE, RMSE MAE, R-squared Coefficient
  • Transforming the target variable
  • Tuning Model Parameters

10
Classification Model with Spark
Linear Models, Naives Bayes Model, Decision Tree Logistic Regression Linear Support Vector Machine Random Forest Gradient-Boosted Trees Training Classification Models Accuracy and prediction error Precision and Recall ROC curve and AUC Cross validation
11
Clustering 
  • Hierarchical clustering
  • K-mean clustering

12
Dimensionality Reduction
  • Principal Component Analysis
  • Singular Value Decomposition
  • Clustering as dimensionality reduction
  • Training a dimensionality reduction model
  • Evaluating dimensionality reduction models

13
Recommendation Engine
  • Content based filtering
  • Collaborative based filtering
  • Overview of Movie Lens data
  • Training a recommendation model
  • Using the recommendation model
  • Performance Evaluation

14
Text Processing
Feature Hashing TF-IDF model Tokenization Stop words TF-IDF Weightings Training a TF-IDF model Usage of TF-IDF model Evaluating TF-IDF models
15
Prerequisites
  • Prior  understanding of exploratory data analysis
    and data visualization  will help immensely in
    learning machine learning concept and 
    applications. This  include basic  statistical
    technique for data analysis. Having some
    knowledge of R programming or some Python
    packages like sci-kit, numpy will be useful.
    However , we are going to cover basic  statistics
    technique  as part of this course  before going
    deep into machine learning . This will help
    everyone to gain maximum from this course.

16
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com