Title: The New BI Ecosystem: How Big Data Merges Top Down and Bottom up Computing
1The New BI Ecosystem How Big Data Merges Top
Down and Bottom up Computing
- Wayne W. Eckerson
- Director of Research and Founder
- Founder, BI Leadership Forum
2Agenda
- Big data platforms
- Relational databases
- Analytical databases
- Hadoop
- New analytical ecosystem
3What comes next?
- Kilobyte (KB) 103 bytes
- Megabyte (MB) 106 bytes
- Gigabyte (GB) 109 bytes
- Terabyte (TB) 1012 bytes
- Petabyte (PB) 1015 bytes
- 1018 bytes
- 1021 bytes
- 1024 bytes
Exabyte (EB)
Zettabyte (ZB)
Yottabyte (YB)
4What is big data?
- Lots of data
- Different types of data
- More data than you can handle
- Purpose-built analytical systems
- Distributed file system
- New staging area and archive
- A Java developers employment act
- A replacement for the RDBMS
- A club for hip data people
Yes!
5Information explosion
Source IDC Digital Universe 2009 White Paper,
Sponsored by EMC, May 2009
Every 18 months, non-rich structured and
unstructured enterprise data doubles
6Data deluge
- Structured data
- Call detail records
- Point of sale records
- Claims data
- Semi-structured data
- Web logs
- Sensor data
- Email, Twitter
- Unstructured data
- Video, Audio,
- Images, Text
A Sea of Sensors, The Economist, Nov 4, 2010
7From transactions to observations
Structured ? Semi-Structured
? Unstructured
8Three big data platforms (systems)
- General purpose relational database
- Analytical database
- Hadoop
91. General purpose RDBMS- Powers first
generation DW
- Benefits
- RDBMS already inhouse
- SQL-based
- Trained DBAs
Operational System
Operational System
ETL
BI Server
ETL
Reports / Dashboards
Data Mart
Data Warehouse
Data Warehouse
Operational System
- Challenges
- Cost to deploy and upgrade
- Doesnt support complex analytics
- Scalability and performance
Operational System
102. Analytical platforms
1010data Aster Data (Teradata) Calpont Datallegro
(Microsoft) Exasol Greenplum (EMC) IBM
SmartAnalytics Infobright Kognitio Netezza
(IBM) Oracle Exadata Paraccel Pervasive Sand
Technology SAP HANA Sybase IQ (SAP) Teradata Verti
ca (HP)
- Purpose-built database management systems
designed explicitly for query processing and
analysis that provides dramatically higher
price/performance and availability compared to
general purpose solutions. - Deployment Options
- Software only (Paraccel, Vertica)
- Appliance (SAP, Exadata, Netezza)
- Hosted(1010data, Kognitio)
11Game-changing technology
- Quicker to deploy
- Preconfigured and tuned
- Fast ROI
- Faster and more scalable
- Faster query response times
- Linear performance
- Built-in analytics
- Libraries of functions
- Extensible SDK
- Less costly
- Less power, cooling, space
- Fewer people to maintain
12Business value of analytic platforms
- Kelley Blue Book Consolidates millions of auto
transactions each week to calculate car
valuations - ATT Mobility Tracks purchasing patterns for
80M customers daily to optimize targeted
marketing
133. Hadoop
- Ecosystem of open source projects
- Hosted by Apache Foundation
- Google developed and shared concepts
- Distributed file system that scales out on
commodity servers with direct attached storage
and automatic failover.
14Hadoop distilled Whats new?
- Benefits
- Comprehensive
- Agile
- Expressive
- Affordable
BIG DATA
- Drawbacks
- Immature
- Batch oriented
- Expertise
- TCO
15Hadoop ecosystem
Source Hortonworks
16Hadoop use cases
- Sabre Holdings
- Analyze airline shopping data
- Vestas
- Site wind turbines by modeling larger volumes of
weather data - CBS Interactive
- Optimize ad placement and pricing
- Nokia
- Identify new data services
17Hadoop hype
Overheard Hadoop will replace relational
databases. Hadoop will replace data
warehouses. Hadoop has a superior query engine
compared to analytical platforms. Use Hadoop
for any application that requires more than one
node.
Gartner Group Hype Cycle
18Hadoop adoption rates
Based on 158 respondents, BI Leadership Forum,
April, 2012
19Hadoop workloads
Based on respondents that have implemented
Hadoop. BI Leadership Forum, April, 2012
20Which platform do you choose?
Hadoop
Analytic Database
General Purpose RDBMS
Structured ? Semi-Structured
? Unstructured
21Big data platform comparison
RDBMS Analytical Database Hadoop
Purpose OLTP Analytics Anything
Volume Low Moderate High
Variety Relational Relational Variable
Access SQL SQL Java
Latency Low Moderate High
Concurrency High Moderate Low
Cost per GB High Moderate Low
Role DW Hub or data mart DW or Sandbox Staging area and archive
22The New BI Ecosystem
23BI Framework 2020
Business Intelligence
End-User Tools
Operational Dashboards (DW-driven dashboards )
Design Framework
Keyword search faceted navigation
Architecture
Data Warehousing
Reporting Analysis
Content Intelligence
Keyword search, BI tools, Xquery, Hive, Java,
etc.
MapReduce, XML schema, Key-value pairs, graph
notation, etc.
HDFS, NoSQL databses
Event-Driven Alerts and Dashboards
Continuous Intelligence
Event-driven
Analytic Sandboxes
Analytic Sandboxes
Ad hoc query, Spreadsheets, OLAP, Visual
Analysis, Analytic Workbenches, Hadoop
Decision Automation
Non-relational queries
Excel, Access, OLAP, Data mining, visual
exploration
Analytics Intelligence
24BI Framework
- Pros
- - Alignment
- Consistency
- Cons
- - Hard to build
- - Politically charged
- - Hard to change
- - Expensive
- - Schema Heavy
Data Warehousing Architecture
Non-volatile Data
- Pros
- - Quick to build
- - Politically uncharged
- - Easy to change
- Low cost
- Cons
- - Alignment
- - Consistency
- - Schema Light
Volatile Data
Analytics Architecture
25The new analytical ecosystem
26Analytical sandboxes
Operational Systems (Structured data)
Operational System
Extract, Transform, Load (Batch, near real-time,
or real-time)
Casual User
Streaming/ CEP Engine
Alerts
Operational System
Reports /Dashboards
BI Server
Data Warehouse Virtual Sandboxes
Machine Data
Dept Data Mart
Hadoop Cluster
Top-down Architecture
Bottom-up Architecture
Web Data
In-memory Sandbox
Query
Upload query
Audio/video Data
Free- Standing Sandbox
Query
Query
Query
Analytic platform or non-relational database
External Data
Query
Power User
Documents Text
27Workflows
Capture only whats needed
Analytical database (DW)
Source Systems
Capture in case its needed
9. Report and mine data
Analytical tools
6. Parse, aggregate
28Recommendations
- Explore applications for multi-structured data
- Apply the right tool for the job
- RDBMS, Analytical platform, Hadoop, NoSQL
- Make power users full-fledged members of your BI
environment - Reconcile top-down and bottom-up BI environments
- ? Create an analytical ecosystem!
29Questions?
- Analytical thought leader
- Founder, BI Leadership Forum
- Director of Research, TechTarget
- Former director of research at TDWI
- Author
- Wayne Eckerson
- weckerson_at_bileadership.com
30Deployment options
Pros Cons Best Suited For
Software-only Runs on any hardware Tuning options Potential hardware-software compatibility issues Established data centers with strong hardware policies
Appliance Load and go Minimal DBA oversight needed -Forklift upgrades - Proprietary hardware - SMB companies, depts. at larger firms, augment an EDW
Hosted Good for prototyping Subscription based Expensive long term Security Companies with minimal IT expertise or space in data center.
- Challenges versus RDBMS
- Workload management
- Tools integration
- Management and administration
31Comparison of platforms
Courtesy Michael Embry, Sabre Holdings
32Hadoops impact on the data warehouse
Based on respondents that have implemented
Hadoop. BI Leadership Forum, April, 2012