Big Data Analytics - PowerPoint PPT Presentation

View by Category
About This Presentation
Title:

Big Data Analytics

Description:

Big Data Analytics research issues – PowerPoint PPT presentation

Number of Views:3298
Slides: 37
Provided by: vtyagi4u

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Big Data Analytics


1
Big Data Analytics
Ph.D. Research Scholar Vikas Kumar
(201651001) SEAS Ahmedabad University
Ahmedabad Vikas.tyagi_at_ahduni.edu.in
Supervisor Prof. Sanjay Chaudhary SEAS Ahmedabad
University Ahmedabad sanjay.chaudhary_at_ahduni.edu.
in
2
OUTLINE
  • INTRODUCTION
  • DATA ANALYTICS
  • BIG DATA ANALYTICS
  • OPEN RESEARCH ISSUES
  • CONCLUSIONS
  • REFERENCES

3
INTRODUCTION
  • (Big data analytics)

4
Big Data Definition
  • (Fisher et. Al.)
  • Big data means that the data is unable to be
    handled and processed by most current information
    system or methods
  • Most of the traditional data mining methods or
    data analytics developed for a centralized data
    analysis process may not be able to be applied
    directly to big data.

5
Big Data Definition (Cont)
  • (Laney et. Al.)
  • A well known definition of Big Data known as 3Vs
  • Volume (Data is Huge)
  • Velocity (Data is changing with time and coming
    with a velocity)
  • Variety (Data is coming from multiple sources in
    multiple forms)

6
Big Data Definition (Cont)
  • (Latest Enhanced Definition)
  • The 3Vs definition was incomplete so following
    dimensions to the data are added in definition
  • Veracity
  • Validity
  • Value
  • Variability
  • Venue
  • Vocabulary and
  • Vagueness
  • The data satisfying set of all these properties
    is known as Big Data.

7
Sources of Big Data
  • (Big data analytics)

8
https//www.google.de/search?qevolutionofbusine
ssintelligencenewwindow1tbmischtbousource
univsaXeigEGoU5KXBuTb4QSGsoH4BQved0CDsQsAQb
iw1366bih64
9
(No Transcript)
10
Application domains of Big Data
11
http//www.meltinfo.com/ppt/ibm-big-data
12
Big Data in Business Intelligence
13
The Evolution of Business Intelligence
scale
scale
2000s
2010s
1990s
https//www.google.de/search?qevolutionofbusine
ssintelligencenewwindow1tbmischtbousource
univsaXeigEGoU5KXBuTb4QSGsoH4BQved0CDsQsAQb
iw1366bih64
14
OLTP Online Transaction Processing
(DBMSs) OLAP Online Analytical Processing
(Data Warehousing) RTAP Real-Time Analytics
Processing (Big Data Architecture technology)
15
Big data in design and engineering
  • Engineering department of manufacturing
    companies.
  • Boeings new 787 aircraft is perhaps the best
    example of Big Data, a plane designed and
    manufactured.
  • Big Data needs to be transferred for conversion
    into machining related information to allow the
    product to be manufactured.

16
Reasons for the importance of Big Data
  • Increase innovation and development of next
    generation product
  • Improve customer satisfaction
  • Sharpen competitive advantages
  • Create more narrow segmentation of customers
  • Reduce downtime

17
Cloud and big data
  • In fact from a Cloud perspective I believe that
    the transfer and archiving of Big Data will
    become a key capability of a manufacturing
    focused cloud environment.
  • Servers based on the Intel Xeon processor E5
    and E7 families are at the heart of
    infrastructure that supports both cloud and big
    data environments.
  • Ideal for storing and processing large volumes of
    data
  • Web based tools will allow you to upload your Big
    Data to the manufacturing cloud, 

18
Big data in Ecommerce
  • Collect, store and organize data from multiple
    data sources.
  • Big Data track and better understand a variety of
    information from many different sources(i.e.,
    inventory management system, CRM, Adword/Adsence
    analytics, email service provider statistics
    etc.).

19
Big Data and HPC Software systems
20
There are a lot of Big Data and HPC Software
systems in 17 (21) layers Build on do not
compete with the 293 HPC-ABDS systems
21
Functionality of 21 HPC-ABDS Layers
  1. Message Protocols
  2. Distributed Coordination
  3. Security Privacy
  4. Monitoring
  5. IaaS Management from HPC to hypervisors
  6. DevOps
  7. Interoperability
  8. File systems
  9. Cluster Resource Management
  10. Data Transport
  11. A) File managementB) NoSQLC) SQL
  12. In-memory databasescaches / Object-relational
    mapping / Extraction Tools
  13. Inter process communication Collectives,
    point-to-point, publish-subscribe, MPI
  14. A) Basic Programming model and runtime, SPMD,
    MapReduceB) Streaming
  15. A) High level Programming B) Frameworks
  16. Application and Analytics
  17. Workflow-Orchestration

Here are 21 functionalities. (including 11, 14,
15 subparts) Lets discuss how these are used in
particular applications 4 Cross cutting at
top 17 in order of layered diagram starting at
bottom
22
(No Transcript)
23
Software for a Big Data Initiative
  • Functionality of ABDS and Performance of HPC
  • Workflow Apache Crunch, Python or Kepler
  • Data Analytics Mahout, R, ImageJ, Scalapack
  • High level Programming Hive, Pig
  • Batch Parallel Programming model Hadoop, Spark,
    Giraph, Harp, MPI
  • Streaming Programming model Storm, Kafka or
    RabbitMQ
  • In-memory Memcached

24
Software for a Big Data Initiative (Cont)
  • Data Management Hbase, MongoDB, MySQL
  • Distributed Coordination Zookeeper
  • Cluster Management Yarn, Slurm
  • File Systems HDFS, Object store (Swift),Lustre
  • DevOps Cloudmesh, Chef, Puppet, Docker, Cobbler
  • IaaS Amazon, Azure, OpenStack, Docker, SR-IOV
  • Monitoring Inca, Ganglia, Nagios

25
SIX Forms of MapReduce
MR Basic Statistics
PP Local Analytics
Iterative
Graph
Streaming
Shared Memory
26
Big Data in Agriculture Recommendation System
27
Required BDA of ARS
  • Data Sources
  • Geo Spatial Data Analytics (Agro-Eco zones and
    water resources)
  • Price Data from different APMCs
  • Crop yield Data from government agencies
  • Knowledge bases (Ontologies)
  • Analytics Required
  • Suitable crop pattern identification in a region
  • Disease Identification in crop
  • Recommendations based on observations

28
  • Analytics Required (cont)
  • Machine Learning algorithms development for BDA
  • Inferencing engine for recommendation generation
  • Different Analytics service development for Data
    integration and communication.

29
Open Research Issues
  • The service development and advance machine
    learning for Big Data Analytic system will be
    entirely different from development of
    conventional Information System Development.
  • Big Data can not be handled on a centralized
    system and hence parallel algorithms should be
    designed to perform in BDA environment.

30
Open Research Issues (Cont)
  • Platform and framework perspective
  • Input and output ratio of platform The
    assumption of infinite computing resource is
    thoroughly impractical.
  • Communication between systems Big Data Analytics
    system should be able to integrate the data and
    analytics from different subsystems and the
    communication cost need to be optimized (A
    typical cost optimization problem).
  • Bottleneck on data analytics systems The data
    deluge of big data will fill up the input
    system of Data analytics and it will increase the
    computation load of data analysis.

31
Open Research Issues (Cont)
  • Platform and framework perspective
  • Bottleneck on data analytics systems One of the
    current solution to the avoidance of bottlenecks
    in data analytics system is to add more computing
    resources while the other is to split the
    analysis work to different computation nodes. A
    complete consideration for the whole data
    analytics to avoid the bottleneck is needed for
    BDA.
  • Security Issues

32
Open Research Issues (Cont)
  • Data Mining Perspective
  • Data Mining Algorithms for working on Map-Reduce
    solution Most of the traditional data mining
    algorithms are not designed for parallel
    computing therefore, they are not particularly
    useful for the Big Data mining. We need to design
    or modify the existing algorithms to become
    compatible for map-reduce architecture.
  • Noise, Outlier, incomplete and inconsistent
    data these problems inherited from conventional
    systems will be scaled in BDA and thus their
    effect need to be controlled in distributed
    environment.

33
Open Research Issues (Cont)
  • Data Mining Perspective
  • Bottlenecks on Data Mining Algorithms
    Synchronization issues between the speed and
    process completion time required by different
    processing nodes. The bottlenecks of data mining
    algorithms will become an open issue for the BDA
    which explains that we need to take in to account
    this issue while developing a new data mining
    algorithm for BDA.
  • Privacy Issue

34
Conclusions
  • While developing a BDA system we need to take
    care of input data, analytics requirement,
    parallel processing and distribution of computing
    task.
  • BDA open opportunity for developing scalable
    algorithms for Machine Learning and data mining.
  • BDA has wide scope in Agriculture domain and we
    found that only a little contribution of big data
    analytics is there in literature.

35
References
  • Chun-Wei Tsai Chin-Feng Lai, H.-C. C. v.
    Vasilakos, A. Big data analytics a survey,
    Journal of Big Data Springer Open Journal, 2015
  • Russom, P. others Big data analytics TDWI Best
    Practices Report, Fourth Quarter, 2011, 1-35
  • Assunção, M. D. Calheiros, R. N. Bianchi, S.
    Netto, M. A. Buyya, R. Big Data computing and
    clouds Trends and future directions Journal of
    Parallel and Distributed Computing, Elsevier,
    2015, 79, 3-15
  • Chen, M. Mao, S. Liu, Y. Big data a survey
    Mobile Networks and Applications, Springer, 2014,
    19, 171-209
  • I. Witten, E. F. hall. null, M. Data Mining
    Practical Machine Learning Tools and Techniques
    Morgan kaufmann, san Mateo, Ca, 2011,

36
Thanks
About PowerShow.com