Data Science Concept by Raj Krishna Paul - PowerPoint PPT Presentation

About This Presentation
Title:

Data Science Concept by Raj Krishna Paul

Description:

It gives clear concept about , What is Data Science ? Why Data Science as emerging area of research and jobs and How to proceed – PowerPoint PPT presentation

Number of Views:64

less

Transcript and Presenter's Notes

Title: Data Science Concept by Raj Krishna Paul


1
Data Science Concept
  • Raj Krishna Paul B S Engg (USA)
  • Team Lead Verizon Data Service, India
  • Email rajkr.paul_at_gmail.com
  • Subir Paul, B Tech, M Tech, Ph .D
  • Professor faculty of Engg, Jadavpur University
    India
  • Email spaul_at_metal.juvu.ac.in

2
Data Science Visualization
3
Why Data Science
  • Mathematical Relationship between a output and
    the several input parameters not known viz.
    stock market, health data, human activity, mobile
    activity
  • Because the relationship is very complicated ,
    inter relationship of several parameters involved
  • Advent of Big Data , Statistics, Programming, We
    model a hypothesis, test it, train it till the
    output predict with the minimum errors develop
    predictive relationship
  • Higher the availability Data volume More Accuracy
  • Its Emerging area of study , as Big Data
    available in all sphere of science, Engg,
    Economics, Social affairs

4
What it can Predict
  • Stock Market share prices with date time , type
    industries, commodity, people , country,cities
  • People Behavior and trend of buying commodities,
    use of mobile data plan, investment
  • Damage and Loss due to Natural Calamities
  • Relationship between Bank products and People
    type in different regions , countries, cities
  • Life prediction of Big structures in corrosive
    Env

5
How does it help Big Industries
  • Guides the Big Entrepreneur to plan and decide
    which way to go, which products they can increase
    price and still making profits,
  • Measures to be taken by a govt to reduce the loss
    of people and property due to natural calamity
  • Develop High Strength and resistance Future
    materials to totally stop unpredictable
    Structures failures Aeroplane, Bridge, ships

6
Data Science How it is done
  • Collection of Big Data from the web , various
    data source,
  • Data Integrity Manage Missing data, duplicate
    data, out of data, inconsistent data, multiple
    addresses of person, negative salary, Data time
    in character format to numeric
  • Data Cleaning missing files, smoothening data,
    filtering, sampling

7
Big Data Sources
Its All Happening On-line
User Generated (Web Mobile)
Every Click Ad impression Billing event Fast
Forward, pause, Server request Transaction Networ
k message Fault
..
Internet of Things / M2M
Health/Scientific Computing
8
  • Make a subset out of Big Data of Important
    interest of Investigation to Party or Firm
  • Randomly Select sample of data frames
  • Apply Statistical laws equation to find and fit
    scatter Data to some known distribution, Normal
    Distribution, Poisson
  • Make Graphics and visual representation of the
    results to study and find linear or non linear
    relationship
  • Make a hypothesis of Input and output
  • Test the Hypothesis with data if fail Modify

9
Statistical Tests
  • t test, Chi-square Tests, Identity of
    samples
  • Distribution Normal ,Binomial, Poisson
  • Mean , Mode, Median, Variance, Sd,
  • Correlation, Regression
  • ANOVA/MANOVA Fit a Model,

10
Tools for Statistical Modeling
  • Genetic Algorithm (GA) Input are the genes,
    output the chromosomes , the combination of best
    genes to produce a product
  • Artificial Neural Network (ANN) A model is
    trained with say 60 data , tested with 20 and
    predicted 20 data. Each time error between
    prediction and actual is reduced by modifying the
    NN Architect till a global minima is achieved

11
Data Science Programming
  • R Programming
  • Python Programming
  • SAS Programming

12
Final Delivery
  • Finally a Predictive Model is delivered
  • It correctly predicts an output of commodity or
    product
  • Helps the production and Marketing units of a
    company to take the right steps to carry forward
    the business with higher profits
Write a Comment
User Comments (0)
About PowerShow.com