The New BI Ecosystem: How Big Data Merges Top Down and Bottom up Computing - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

The New BI Ecosystem: How Big Data Merges Top Down and Bottom up Computing

Description:

The New BI Ecosystem: How Big Data Merges Top Down and Bottom up Computing Wayne W. Eckerson Director of Research and Founder Founder, BI Leadership Forum – PowerPoint PPT presentation

Number of Views:954
Avg rating:3.0/5.0
Slides: 30
Provided by: Vince52
Category:

less

Transcript and Presenter's Notes

Title: The New BI Ecosystem: How Big Data Merges Top Down and Bottom up Computing


1
The New BI Ecosystem How Big Data Merges Top
Down and Bottom up Computing
  • Wayne W. Eckerson
  • Director of Research and Founder
  • Founder, BI Leadership Forum

2
Agenda
  • Big data platforms
  • Relational databases
  • Analytical databases
  • Hadoop
  • New analytical ecosystem

3
What comes next?
  • Kilobyte (KB) 103 bytes
  • Megabyte (MB) 106 bytes
  • Gigabyte (GB) 109 bytes
  • Terabyte (TB) 1012 bytes
  • Petabyte (PB) 1015 bytes
  • 1018 bytes
  • 1021 bytes
  • 1024 bytes

Exabyte (EB)
Zettabyte (ZB)
Yottabyte (YB)
4
What is big data?
  1. Lots of data
  2. Different types of data
  3. More data than you can handle
  4. Purpose-built analytical systems
  5. Distributed file system
  6. New staging area and archive
  7. A Java developers employment act
  8. A replacement for the RDBMS
  9. A club for hip data people

Yes!
5
Information explosion
Source IDC Digital Universe 2009 White Paper,
Sponsored by EMC, May 2009
Every 18 months, non-rich structured and
unstructured enterprise data doubles
6
Data deluge
  • Structured data
  • Call detail records
  • Point of sale records
  • Claims data
  • Semi-structured data
  • Web logs
  • Sensor data
  • Email, Twitter
  • Unstructured data
  • Video, Audio,
  • Images, Text

A Sea of Sensors, The Economist, Nov 4, 2010
7
From transactions to observations
Structured ? Semi-Structured
? Unstructured
8
Three big data platforms (systems)
  • General purpose relational database
  • Analytical database
  • Hadoop

9
1. General purpose RDBMS- Powers first
generation DW
  • Benefits
  • RDBMS already inhouse
  • SQL-based
  • Trained DBAs

Operational System
Operational System
ETL
BI Server
ETL
Reports / Dashboards
Data Mart
Data Warehouse
Data Warehouse
Operational System
  • Challenges
  • Cost to deploy and upgrade
  • Doesnt support complex analytics
  • Scalability and performance

Operational System
10
2. Analytical platforms
1010data Aster Data (Teradata) Calpont Datallegro
(Microsoft) Exasol Greenplum (EMC) IBM
SmartAnalytics Infobright Kognitio Netezza
(IBM) Oracle Exadata Paraccel Pervasive Sand
Technology SAP HANA Sybase IQ (SAP) Teradata Verti
ca (HP)
  • Purpose-built database management systems
    designed explicitly for query processing and
    analysis that provides dramatically higher
    price/performance and availability compared to
    general purpose solutions.
  • Deployment Options
  • Software only (Paraccel, Vertica)
  • Appliance (SAP, Exadata, Netezza)
  • Hosted(1010data, Kognitio)

11
Game-changing technology
  • Quicker to deploy
  • Preconfigured and tuned
  • Fast ROI
  • Faster and more scalable
  • Faster query response times
  • Linear performance
  • Built-in analytics
  • Libraries of functions
  • Extensible SDK
  • Less costly
  • Less power, cooling, space
  • Fewer people to maintain

12
Business value of analytic platforms
  • Kelley Blue Book Consolidates millions of auto
    transactions each week to calculate car
    valuations
  • ATT Mobility Tracks purchasing patterns for
    80M customers daily to optimize targeted
    marketing

13
3. Hadoop
  • Ecosystem of open source projects
  • Hosted by Apache Foundation
  • Google developed and shared concepts
  • Distributed file system that scales out on
    commodity servers with direct attached storage
    and automatic failover.

14
Hadoop distilled Whats new?
  • Benefits
  • Comprehensive
  • Agile
  • Expressive
  • Affordable

BIG DATA
  • Drawbacks
  • Immature
  • Batch oriented
  • Expertise
  • TCO

15
Hadoop ecosystem
Source Hortonworks
16
Hadoop use cases
  • Sabre Holdings
  • Analyze airline shopping data
  • Vestas
  • Site wind turbines by modeling larger volumes of
    weather data
  • CBS Interactive
  • Optimize ad placement and pricing
  • Nokia
  • Identify new data services

17
Hadoop hype
Overheard Hadoop will replace relational
databases. Hadoop will replace data
warehouses. Hadoop has a superior query engine
compared to analytical platforms. Use Hadoop
for any application that requires more than one
node.
Gartner Group Hype Cycle
18
Hadoop adoption rates
Based on 158 respondents, BI Leadership Forum,
April, 2012
19
Hadoop workloads
Based on respondents that have implemented
Hadoop. BI Leadership Forum, April, 2012
20
Which platform do you choose?
Hadoop
Analytic Database
General Purpose RDBMS
Structured ? Semi-Structured
? Unstructured
21
Big data platform comparison
RDBMS Analytical Database Hadoop
Purpose OLTP Analytics Anything
Volume Low Moderate High
Variety Relational Relational Variable
Access SQL SQL Java
Latency Low Moderate High
Concurrency High Moderate Low
Cost per GB High Moderate Low
Role DW Hub or data mart DW or Sandbox Staging area and archive
22
The New BI Ecosystem
23
BI Framework 2020
Business Intelligence
End-User Tools
Operational Dashboards (DW-driven dashboards )
Design Framework
Keyword search faceted navigation
Architecture
Data Warehousing
Reporting Analysis
Content Intelligence
Keyword search, BI tools, Xquery, Hive, Java,
etc.
MapReduce, XML schema, Key-value pairs, graph
notation, etc.
HDFS, NoSQL databses
Event-Driven Alerts and Dashboards
Continuous Intelligence
Event-driven
Analytic Sandboxes
Analytic Sandboxes
Ad hoc query, Spreadsheets, OLAP, Visual
Analysis, Analytic Workbenches, Hadoop
Decision Automation
Non-relational queries
Excel, Access, OLAP, Data mining, visual
exploration
Analytics Intelligence
24
BI Framework
  • Pros
  • - Alignment
  • Consistency
  • Cons
  • - Hard to build
  • - Politically charged
  • - Hard to change
  • - Expensive
  • - Schema Heavy

Data Warehousing Architecture
Non-volatile Data
  • Pros
  • - Quick to build
  • - Politically uncharged
  • - Easy to change
  • Low cost
  • Cons
  • - Alignment
  • - Consistency
  • - Schema Light

Volatile Data
Analytics Architecture
25
The new analytical ecosystem
26
Analytical sandboxes
Operational Systems (Structured data)
Operational System
Extract, Transform, Load (Batch, near real-time,
or real-time)
Casual User
Streaming/ CEP Engine
Alerts
Operational System
Reports /Dashboards
BI Server
Data Warehouse Virtual Sandboxes
Machine Data
Dept Data Mart
Hadoop Cluster
Top-down Architecture
Bottom-up Architecture
Web Data
In-memory Sandbox
Query
Upload query
Audio/video Data

Free- Standing Sandbox
Query
Query
Query
Analytic platform or non-relational database
External Data
Query
Power User
Documents Text
27
Workflows
Capture only whats needed
Analytical database (DW)
Source Systems
Capture in case its needed
9. Report and mine data
Analytical tools
6. Parse, aggregate
28
Recommendations
  • Explore applications for multi-structured data
  • Apply the right tool for the job
  • RDBMS, Analytical platform, Hadoop, NoSQL
  • Make power users full-fledged members of your BI
    environment
  • Reconcile top-down and bottom-up BI environments
  • ? Create an analytical ecosystem!

29
Questions?
  • Analytical thought leader
  • Founder, BI Leadership Forum
  • Director of Research, TechTarget
  • Former director of research at TDWI
  • Author
  • Wayne Eckerson
  • weckerson_at_bileadership.com

30
Deployment options
Pros Cons Best Suited For
Software-only Runs on any hardware Tuning options Potential hardware-software compatibility issues Established data centers with strong hardware policies
Appliance Load and go Minimal DBA oversight needed -Forklift upgrades - Proprietary hardware - SMB companies, depts. at larger firms, augment an EDW
Hosted Good for prototyping Subscription based Expensive long term Security Companies with minimal IT expertise or space in data center.
  • Challenges versus RDBMS
  • Workload management
  • Tools integration
  • Management and administration

31
Comparison of platforms
Courtesy Michael Embry, Sabre Holdings
32
Hadoops impact on the data warehouse
Based on respondents that have implemented
Hadoop. BI Leadership Forum, April, 2012
Write a Comment
User Comments (0)
About PowerShow.com