Building Scalable OLAP Applications with Mondrian and MySQL - PowerPoint PPT Presentation

Loading...

PPT – Building Scalable OLAP Applications with Mondrian and MySQL PowerPoint presentation | free to download - id: d5c2c-ZDc1Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Building Scalable OLAP Applications with Mondrian and MySQL

Description:

and Mondrian Project Founder. Tuesday, April 24th 2007. Agenda. Pentaho Introduction ... mondrian.pentaho.org/documentation/mdx.php ) Scalability & Tuning ... – PowerPoint PPT presentation

Number of Views:892
Avg rating:3.0/5.0
Slides: 42
Provided by: julia60
Learn more at: http://www.hydromatic.net
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Building Scalable OLAP Applications with Mondrian and MySQL


1
  • Building Scalable OLAP Applications with Mondrian
    and MySQL

Julian Hyde Chief Architect, OLAP, at
Pentaho and Mondrian Project Founder Tuesday,
April 24th 2007
2
Agenda
  • Pentaho Introduction
  • What is OLAP?
  • Key features and architecture
  • Getting started
  • Schemas queries
  • Tuning scalability
  • Active Data Warehousing
  • Case Studies
  • Business Intelligence suite
  • Q A

3
Pentaho Introduction
  • Worlds most popular enterprise open source BI
    Suite
  • 2 million lifetime downloads, averaging 100K /
    month
  • Founded in 2004 Pioneer in professional open
    source BI
  • Management - proven BI and open source veterans
  • from Business Objects, Cognos, Hyperion, JBoss,
    Oracle, Red Hat, SAS
  • Board of Directors deep expertise and proven
    success in open source
  • Larry Augustin - founder, VA Software, helped
    coin the phrase open source
  • New Enterprise Associates investors in
    SugarCRM, Xensource, others
  • Index Ventures investors in MySQL, Zend, others
  • Widely recognized as the leader in open source BI
  • Distributed worldwide by Red Hat via the Red Hat
    Exchange
  • Embedded in next release of OpenOffice (40
    million users worldwide)

4
What is OLAP?
  • View data dimensionally
  • i.e. Sales by region, by channel, by time period
  • Navigate and explore
  • Ad Hoc analysis
  • Drill-down from year to quarter
  • Pivot
  • Select specific members for analysis
  • Interact with high performance
  • Technology optimized for rapid interactive
    response

5
A brief history of OLAP
2010
Pentaho
BEE
Open-source BI suites
OpenI
JRubik
JPivot
Palo
Mondrian
Open-source OLAP
MySQL
2000
Microsoft OLAP Services
Oracle partitions, bitmap indexes
RDBMS support for DW
Informix MetaCube
Sybase IQ
Codds 12 Rules of OLAP
Business Objects
1990
MicroStrategy
Desktop OLAP
Essbase
Cognos PowerPlay
Spreadsheets
1980
Comshare
Mainframe OLAP
Express
1970
APL
6
Mondrian features and architecture
7
Key Features
  • On-Line Analytical Processing (OLAP) cubes
  • automated aggregation
  • speed-of-thought response times
  • Open Architecture
  • 100 Java
  • J2EE
  • Supports any JDBC data source
  • MDX and XML/A
  • Analysis Viewers
  • Enables ad-hoc, interactive data exploration
  • Ability to slice-and-dice, drill-down, and pivot
  • Provides insights into problems or successes

8
How Mondrian Extends MySQL for OLAP Applications
  • MySQL Provides
  • Data storage
  • SQL query execution
  • Heavy-duty sorting, correlation, aggregation
  • Integration point for all BI tools
  • Mondrian Provides
  • Dimensional view of data
  • MDX parsing
  • SQL generation
  • Caching
  • Higher-level calculations
  • Aggregate awareness

9
Open Architecture
Microsoft Excel (via Spreadsheet Services)
Viewers
  • Open Standards (Java, XML, MDX, XML/A, SQL)
  • Cross Platform (Windows Unix/Linux)
  • J2EE Architecture
  • Server Clustering
  • Fault Tolerance
  • Data Sources
  • JDBC
  • JNDI

Web Server
JPivot servlet
J2EE Application Server
File or RDBMS Repository
JPivot servlet
XML/A servlet
Mondrian
Cube Schema XML
Cube Schema XML
Cube Schema XML
cube
cube
cube
JDBC
JDBC
JDBC
RDBMS
RDBMS
10
Getting started
11
Getting started with Mondrian
  • Download mondrian from http//mondrian.pentaho.org
    and unzip
  • Copy mondrian.war to TOMCAT_HOME/webapps.
  • Create a database
  • mysqladmin create foodmart mysql mysqlgt
    grant all privileges on . to 'foodmart'_at_'localho
    st' identified by 'foodmart' Query OK, 0 rows
    affected (0.00 sec) mysqlgt quit Bye
  • Install demo dataset (300,000 rows)
  • java -cp "/mondrian/lib/mondrian.jar/mondrian/l
    ib/log4j-1.2.9.jar/mondrian/lib/eigenbase-xom.jar
    /mondrian/lib/eigenbase-resgen.jar/mondrian/lib/
    eigenbase-properties.jar/usr/local/mysql/mysql-co
    nnector-java-3.1.12-bin.jar"     
    mondrian.test.loader.MondrianFoodMartLoader     
    -verbose -tables -data -indexes     
    -jdbcDriverscom.mysql.jdbc.Driver     
    -inputFile/mondrian/demo/FoodMartCreateData.sql
         -outputJdbcURL "jdbcmysql//localhost/food
    mart?userfoodmartpasswordfoodmart
  • Start tomcat, and hit http//localhost8080/mondri
    an

12
demonstration
13
Schemas and queries
14
A Mondrian schema consists of
  • A dimensional model (logical)
  • Cubes virtual cubes
  • Shared private dimensions
  • Calculated measures in cube and in query language
  • Parent-child hierarchies
  • mapped onto a star/snowflake schema (physical)
  • Fact table
  • Dimension tables
  • Joined by foreign key relationships

15
Writing a Mondrian Schema
  • Regular cubes, dimensions, hierarchies
  • Shared dimensions
  • Virtual cubes
  • Parent-child hierarchies
  • Custom readers
  • Access-control

lt!-- Shared dimensions --gt ltDimension
name"Region"gt ltHierarchy hasAll"true
allMemberName"All Regions"gt ltTable
name"QUADRANT_ACTUALS"/gt ltLevel
name"Region" column"REGION
uniqueMembers"true"/gt lt/Hierarchygt
lt/Dimensiongt ltDimension name"Department"gt
ltHierarchy hasAll"true allMemberName"All
Departments"gt ltTable name"QUADRANT_ACTUALS"
/gt ltLevel name"Department
column"DEPARTMENT uniqueMembers"true"/gt
lt/Hierarchygt lt/Dimensiongt
  • (Refer to http//mondrian.pentaho.org/documentatio
    n/schema.php )

16
Tools
  • Schema Workbench
  • Pentaho cube designer
  • cmdrunner

17
MDX Multi-Dimensional Expressions
  • A language for multidimensional queries
  • Plays the same role in Mondrians API as SQL does
    in JDBC
  • SQL-like syntax
  • but un-SQL-like semantics

SELECT Measures.Unit Sales ON COLUMNS,
Store.USA, Store.USA.CA ON ROWS FROM
Sales WHERE Time.1997.Q1
  • (Refer to http//mondrian.pentaho.org/documentatio
    n/mdx.php )

18
Scalability Tuning
19
Tuning is a Process
  • Some techniques
  • Test performance on a realistic data set!
  • Use tracing to identify long-running SQL
    statements
  • mondrian.trace.level2
  • Tune SQL queries
  • Run the same query several times, to examine
    cache effectiveness
  • When Mondrian starts, prime the cache by running
    queries
  • Get to know Mondrians system properties

20
Tuning MySQL for Mondrian
  • Fact table
  • Index foreign keys
  • Try indexing multiple combinations of foreign
    keys
  • Use MyISAM or MERGE (also known as MRG_MyISAM)
  • Partitioned tables
  • Dimension tables
  • Index primary and foreign keys
  • Use MyISAM for large dimension tables, MEMORY for
    smaller dimension tables
  • Create aggregate tables
  • Contain pre-computed summaries
  • Prevent full table scan followed by massive GROUP
    BY
  • Parent-child hierarchies
  • Create closure tables

21
Scaling to large datasets
  • Tune MySQL
  • Partitions allow parallelism
  • Aggregate tables allow you to do the hard work to
    load-time, not runtime
  • Increase cache size

22
Scaling to large numbers of concurrent users
  • Tune MySQL
  • Increase JVM memory
  • Make sure that connections are using the same
    cache
  • Multiple web servers
  • Multiple mondrian instances
  • Multiple DBMS instances with read-only copies of
    the data

23
(Diagrams -- hidden)
Viewers
24
Active Data Warehousing
25
Active Data Warehousing
  • An Active Data Warehouse
  • Extends the traditional DW to support operational
    processing
  • Provides a single view of the business
  • Is updated in near real-time
  • Cost/benefit
  • High cost (technical challenges)
  • High reward (time is money)

26
Technical challenges of Active Data Warehousing
  • DBMS has mixed work-load
  • Short, fast queries to query specific records
  • Longer queries to look at changing patterns
  • Emphasis on recent data
  • Background DML to keep database up to date
  • Continuous, incremental ETL
  • Continuous or near real-time
  • Incremental
  • Consider message-driven ETL
  • OLAP cache
  • OLAP servers use cache and/or MOLAP database for
    performance
  • Cache consistency

27
Mondrian and Active Data Warehouse
  • ROLAP with caching
  • No MOLAP data store to maintain
  • Disable aggregate tables
  • Unless your ETL process can maintain these
    incrementally
  • Either turn off cache
  • mondrian.rolap.star.disableCachingtrue
  • Pure ROLAP mode
  • Results are cached for the lifetime of a single
    MDX statement
  • Or use new CacheControl API to selectively flush
    mondrians cache

28
Cache control
  • In an Active Warehouse, the most recent data are
    modified much more often than historic data
  • Cache control API allows the application to
    selectively flush the cache

year quarter month day unit_sales
2007 Q1 1 1 1500
2007 Q1 1 2 1700
2007 Q1 1 3 1250
2007 Q1 1 4 1800

2007 Q2 4 22 1378
2007 Q2 4 23 1920
2007 Q2 4 24 350
2007 Q2 4 24 450
2007 Q2 4 24 525
  • (Refer to http//mondrian.pentaho.org/documentatio
    n/cache_control.php )

29
Cache control A simple example
  • Lets run a simple MDX query

SELECT Time.1997,  Time.1997.Q1,
Time.1997.Q2 ON COLUMNS, Customers.USA
,  Customers.USA.OR,  Customers.USA.WA
ON ROWS FROM Sales
30
Cache contents after running query
  • year1997, nationUSA, measureunit_sales
  • year1997, nationUSA, stateOR, WA,
    measureunit_sales

year nation state unit_sales
1997 USA OR 67,659
1997 USA WA 124,366
year nation unit_sales
1997 USA 266,773
  • year1997, quarter, nationUSA, stateOR, WA,
    measureunit_sales
  • year1997, quarterany, nationUSA,
    measureunit_sales

year quarter nation unit_sales
1997 Q1 USA 66,291
1997 Q2 USA 62,610
year quarter nation state unit_sales
1997 Q1 USA OR 19,287
1997 Q1 USA WA 30,114
1997 Q2 USA OR 15,079
1997 Q2 USA WA 29,479
31
Flushing part of a segment
Cache manager
OR WA
Q1 19,287 30,114
Q2 15,079 29,479
  • year1997, quarter, nationUSA, stateOR, WA,
    measureunit_sales
  • year1997, quarter, nationUSA, stateOR, WA,
    measureunit_sales exclude stateWA and
    quarterQ2

year quarter nation state unit_sales
1997 Q1 USA OR 19,287
1997 Q1 USA WA 30,114
1997 Q2 USA OR 15,079
1997 Q2 USA WA 29,479
  • year1997, quarter, nationUSA, stateCA, WA,
    measureunit_sales

CA WA
Q1 16,890 30,114
Q2 18,052 30,079
32
Code using CacheControl to flush part of a segment
  • Connection connection
  • CacheControl cacheControl connection.getCacheCon
    trol(null)
  • Cube salesCube connection.getSchema().lookupCube
    ("Sales", true)
  • SchemaReader schemaReader salesCube.getSchemaRea
    der(null)
  • // Create region containing Customer.USA.OR.
  • Member memberOregon schemaReader.getMemberByUniq
    ueName(
  • new StringCustomer", USA", OR", true)
  • CacheControl.CellRegion regionOregon
  • cacheControl.createMemberRegion(memberOregon,
    true)
  • // Create region containing Customer.USA.OR.
  • Member memberTimeQ2 schemaReader.getMemberByUniq
    ueName(
  • new StringTime", 1997", Q2", true)
  • CacheControl.CellRegion regionTimeQ2
  • cacheControl.createMemberRegion(memberTimeQ2,
    true)
  • // Create region containing (Customer.USA.OR
    , Time.1997.Q2)
  • CacheControl.CellRegion regionOregonQ2
  • cacheControl.createCrossjoinRegion(regionOregon,
    regionTimeQ2)

33
Case studies
34
Case Study Frontier Airlines
  • Frontier Airlines
  • Key Challenges
  • Understanding and optimizing fares to ensure
  • Maximum occupancy (no empty seats)
  • Maximum profitability (revenue per seat)
  • Pentaho Solution
  • Pentaho Analysis (Mondrian)
  • Chose Open Source RDBMS and Mondrian over Oracle
  • 500 GB of data, 6 server cluster
  • Results
  • Comprehensive, integrated analysis to set
    strategic pricing
  • Improved per-seat profitability (amount not
    disclosed)
  • Why Pentaho
  • Rich analytical and MDX functionality
  • Cost of ownership

The competition is intense in the airline
industry and Frontier is committed to staying
ahead of the curve by leveraging technology that
will help us offer the best prices and the best
flight experience. the application fits right
in with our philosophy of providing world-class
performance at a low price.
35
Case Study U.S.-based, Public Software Company
  • OEM Example
  • Key Requirements
  • Embedding analysis within a pricing planning
    application
  • Incumbent Vendor
  • MicroStrategy
  • Decision Criteria
  • Embeddability
  • Extensibility
  • Cost of ownership
  • Best Practices
  • Evaluate replacement costs holistically
  • Treat migrations as an opportunity to improve a
    deployment, not just move it
  • Understand and prioritize requirements
    functionality, usability, architecture, etc.

The old solution (MicroStrategy) was built to be
a standalone application. It expected to own the
browser session, to have separate security, and
its own user interface standards. We were able
to make Mondrian a seamless part of the
application, and it was much easier to do so.
36
The big picture
37
Business Intelligence Suite
  • Mondrian OLAP
  • Analysis tools
  • Pivot table
  • Charting
  • Dashboards
  • ETL (extract/transform/load)
  • Integration with operational reporting
  • Integration with data mining
  • Actions on operational data
  • Design/tuning tools

38
Pentaho Open Source BI Offerings
All available in a Free Open Source license
39
A Sample of Joint MySQL-Pentaho Users
Pentaho provided a robust, open source platform
for our sales reporting application, and the
ongoing support we needed. The experts at OpenBI
provided outstanding services and training, and
allowed us to deploy and start generating results
very quickly.
We selected Pentaho for its ease-of-use.
Pentaho addressed many of our requirements --
from reporting and analysis to dashboards, OLAP
and ETL, and offered our business users the
Excel-based access that they wanted.
40
Next Steps and Resources
  • Contact Information
  • Julian Hyde, Chief Architect, jhyde_at_pentaho.org
  • More information http//www.pentaho.org and
    http//mondrian.pentaho.org
  • Pentaho Community Forum http//community.pentaho.o
    rg
  • Go to Developer Zone
  • Discussions
  • Pentaho BI Platform including Mondrian
    http//www.pentaho.org/download/latest
  • Mondrian OLAP Library only
  • http//sourceforge.net/project/showfiles.php?group
    _id35302

41
Birds of a Feather session MySQL Data
Warehousing and BI
  • Speakers
  • James Dixon, Senior Architect and Chief
    Technology (a.k.a. "Chief Geek"), Pentaho
    Corporation
  • Roland Bouman, Certification Developer, MySQL
    A.B.
  • Matt Casters, Data Integration Architect and
    Kettle Project Founder, Pentaho
  • Julian Hyde, OLAP Architect and Mondrian Project
    Founder, Pentaho
  • Brian Miezejewski, Principal Consultant, MySQL
    inc.
  • Track Data Warehousing and Business Intelligence
  • Date TONIGHT!!
  • Time 730pm - 900pm
  • Location Camino Real

42
Thank you for attending!
About PowerShow.com