Extreme Performance with Oracle Data Warehousing - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

Extreme Performance with Oracle Data Warehousing

Description:

This presentation contains information proprietary to Oracle Corporation ... Extreme Performance with Oracle Data Warehousing Andreas Katsaris Data warehouse and BI ... – PowerPoint PPT presentation

Number of Views:548
Avg rating:3.0/5.0
Slides: 50
Provided by: arisantCo
Category:

less

Transcript and Presenter's Notes

Title: Extreme Performance with Oracle Data Warehousing


1
Extreme Performancewith Oracle Data Warehousing
ltInsert Picture Heregt
Andreas Katsaris Data warehouse and BI Practice
Manager, Arisant LLC
2

Oracle 1 for Data Warehousing
Source IDC, July 2009 Worldwide Data
Warehouse Management Tools 2008 Vendor Shares
3
Gartner Magic Quadrant for BI Platforms
2008
2009
4
Market Position Recognized as a leader since OWB
10.2
  • 10,000 companies rely on OWB
  • Optimized for Oracle environments

2005
5
Oracle Data Warehousing Complete, Open and
Integrated
BI Applications
  • Standard components
  • Certified configurations
  • Comprehensive security
  • Higher availability
  • Easier to manage
  • Lower cost of ownership

BI Tools
Data Models
Database
Operating System
Smart Storage
6
Distributed Data Marts and ServersExpensive Data
Warehouse Architecture
7
Consolidate onto a Data WarehouseSingle Source
of Truth on Low-Cost Servers and Storage
Oracle Database 11g
8
Choice of Data Warehouse Solutions
Reference Configurations
Database Machine
Custom Solutions
Complete system, including software, servers,
networking and storage
Flexibility for the most demanding data warehouse
Best-practice configurations for data warehousing
9
Drastically Simplified Deployments
  • Database Machine eliminates the complexity of
    deploying database systems
  • Months of configuration, troubleshooting, tuning
  • Database Machine is ready on day one
  • Pre-built, tested, standard, supportable
    configuration
  • Runs existing applications unchanged
  • Extreme performance out of the box

Months to Days
http//www.oracle.com/technology/products/bi/db/db
machine/ds_db_machine.pdf
10
Best Data Warehouse Machine
  • Massively parallel high volume hardware to
    quickly process vast amounts of data
  • Exadata runs data intensive processing directly
    in storage
  • Most complete analytic capabilities
  • OLAP, Statistics, Spatial, Data Mining, Real-time
    transactional ETL, Efficient point queries
  • Powerful warehouse specific optimizations
  • Flexible Partitioning, Bitmap Indexing, Join
    indexing, Materialized Views, Result Cache
  • Dramatic new warehousing capabilities

OLAP
ETL
Data Mining
New
11
Sun Oracle Database Machine Hardware Improvements
  • Same architecture as previous Database Machine
  • Same number and type of Servers, CPUs, Disks

New
Latest Technologies
Faster
Xeon 5500 Nehalem
80 Faster CPUs
100 Faster Networking
40 Gb InfiniBand
50 Faster Disk Throughput
6 Gb SAS Links
200 Faster Memory
DDR3 DRAM
Better
33 More SAS Disk Capacity
600 GB SAS Disks
100 More SATA Disk Capacity
2 TB SATA Disks
125 More Memory
72 GB per DB Node
100 More Ethernet Connectivity
4 Ethernet links per DB Node
New
Plus Flash Storage!
12
Exadata Database Processing in Storage
  • Exadata storage servers implement data intensive
    processing in storage
  • Row filtering based on where predicate
  • Column filtering
  • Join filtering
  • Incremental backup filtering
  • Storage Indexing
  • Scans on encrypted data
  • Data Mining model scoring
  • 10x reduction in data sent to DB servers is
    common
  • No application changes needed
  • Processing is automatic and transparent
  • Even if cell or disk fails during a query

13
Simple Query Example
What were my sales yesterday?
Exadata Storage Grid
Optimizer Chooses Partitions and Indexes to Access
Oracle Database Grid
Scan compressed blocks in partitions/indexes Retr
ieve sales amounts forSept 24
Select sum(sales) where Date24-Sept
SUM
10 TB scanned 1 GB returned to servers
14
Flash
50
Query Throughput Uncompressed Data
  • Flash storage more than doubles scan throughput
  • 50 GB/sec
  • Smart
  • Knows when to avoid caching based on object
    reusability and size
  • Accepts user directives at table, index and
    segment level
  • Combined with Columnar compression
  • Up to 50 TB of data fits in flash
  • Queries on compressed data run up to
  • 500 GB/sec

HITACHI USP V
TERADATA 2550
NETEZZA TwinFin 12
SUN ORACLE Database Machine
15
Exadata Hybrid Columnar Compression
  • Data is stored by columnand then compressed
  • the decompression of the data is offloaded to
    Exadata eliminating CPU overhead on the database
    servers
  • Query Mode for data warehousing
  • Optimized for speed
  • 10X compression ratio is typical
  • Scans improve proportionally
  • Archival Mode for infrequently accessed data
  • Optimized to reduce space
  • 15X compression is typical
  • Up to 50X for some data

Up To
50X
16
Exadata Storage IndexTransparent I/O Elimination
with No Overhead
Table
Index
  • Exadata Storage Indexes maintain summary
    information about table data in memory
  • Store MIN and MAX values of columns
  • Typically one index entry for every MB of disk
  • Eliminates disk I/Os if MIN and MAX can never
    match where clause of a query
  • Completely automatic and transparent

Min B 1 Max B 5
Min B 3 Max B 8
Select from Table where Blt2 - Only first
set of rows can match
17
Benefits Multiply
1 TB with compression
10 TB of user data Requires 10 TB of IO
100 GB with partition pruning
Subsecond On Database Machine
20 GB with Storage Indexes
5 GB Smart Scan on Memory or Flash
Data is 10x Smaller, Scans are 2000x faster
18
Database Machine Success
  • Database Machine is succeeding in all geographies
    and industries against every competitor

19
Oracles Data Integration Strategy Pervasive
Data Integration
  • Embed Data Integration within Oracle Database
  • Integrated, optimized and Best for Oracle
    Database
  • Easiest way to load external information into
    Oracle Database
  • Provide Comprehensive Data Integration
  • Comprehensive Heterogeneous Technology Foundation
  • Integrated Runtime, Data Management Tools and
    Administration
  • Best of Breed significant architectural
    differentiators vs. competitors
  • Hot Pluggable broad support of sources
    packaged applications
  • Pre-Integrate Solutions for Oracle Portfolio
  • Make data integration pervasive with Lower Cost
    Complexity

20
ODI Enterprise EditionBundling E-LT Forces
Legacy Sources
Oracle Data Integrator Oracle Warehouse
Builder
Data Warehouse
E-LT Transformation vs. E-T-L
Application Sources
Real Time Change Data Capture
Planning System
High Speed Batch Data Movement
Set-based DataTransformations
OLTP Sources
21
OWB 11g Release 2Heterogeneous connectivity with
high performance native code
  • Extensible heterogeneous connectivity
  • Connect to any JDBC, ODBC or gateway enabled data
    store
  • Enhanced support for Change Data Capture and
    real-time extracts
  • Advanced ERP/CRM/MDM connectors
  • Call any web service as a data provider, publish
    extracts as web services
  • High-performance native code for any platform
  • Use extensible code templates to create native
    code
  • Leverage the database engine for best performance
  • Use native un-loaders/loaders for extremely fast
    bulk loading
  • Enhanced Developer Productivity
  • New declarative user interface based on Fusion
    Client Platform
  • Rich metadata support for end-to-end data lineage
    and impact analysis
  • Oracle BI EE metadata generation

22
High-Level Roadmap Unified platform that is a
superset of OWB and ODI
  • OWB/ODI investments are fully protected
  • No forced migrations
  • Natural upgrade path
  • No regressions in functionality
  • Training and support to continue for both
    product releases

10gR2
11gR1
10gR1
11gR2
OWB
ODI
11gR2
11gR1
10gR3
23
Oracle Runs the Largest Databases
  • Website Personalization
  • 500,000,000 unique users
  • 200 Terabyte Data Warehouse
  • Meteorological Research
  • 220 Terabyte Oracle database
  • Worlds largest database on Linux

How are these companies successful using Oracle?
24
Star Query OptimizationSpecific Data Warehouse
Access Methods
Q What was the total number of umbrellas sold in
Boston during the month of May 2008 ?
Customers
Times
Sales
Products
Channel
25
Oracle Partitioning
  • Optimized performance
  • Comprehensive partitioning strategies to address
    business problems
  • One consistent way to manage all your data
  • Not just for data warehouse and high-end OLTP
  • New partitioning features bring partitioning to
    every application
  • Reduced total cost of ownership
  • Place less used data on lower cost storage
  • Proven functionality in 8th generation
  • Experience comes with age and customer usage

26
The Concept of PartitioningSimple yet powerful
SALES
SALES
SALES
Europe
USA
Jan
Feb
Jan
Feb
Large Table Difficult to Manage
Partition Divide and Conquer Easier to
Manage Improve Performance
Composite Partition Higher Performance More
flexibility to match business needs
27
Partition for PerformancePartition Pruning
Q What was the total sales amount for May 20 and
May 21 2009?
Sales Table
5/17
5/18
Select sum(sales_amount) From SALES Where
sales_date between to_date(05/20/2009,MM/DD/YYY
Y) And to_date(05/22/2009,MM/DD/YYYY)
5/19
5/20
5/21
5/22
28
Partition to Manage Data Growth
SALES TABLE (7 years)
2003
2008
2009




5 Active
95 Less Active
29
Partitioning - 11g enhancements
  • Partition Advisor
  • New composite combinations
  • list/range, range/range, list/hash, list/list
  • Interval Partitioning
  • Automatic creation of range-based partitions
  • REF Partitioning
  • Partition detail table based on the master-table
    key
  • Virtual-Column Based Partitioning
  • Partition based on an expression

30
Interval Partitioning
  • Minimizes periodic partition maintenance (No need
    to create new partitions)
  • Partition segments allocated as soon as new data
    arrives
  • Local indexes are created and maintained as well
  • Requires at least one range partition
  • Range key value determines the range high point
  • Partitioning key can only be a single column, and
    either DATE or NUMBER datatype
  • CREATE TABLE sales()
  • PARTITION BY RANGE (sales_date)
  • INTERVAL(NUMTOYMINTERVAL(1, 'MONTH'))
  • ( PARTITION p1 VALUES LESS THAN
    (TO_DATE('1-2-2006', 'DD-MM-YYYY')) )

31
ILM Information Lifecycle Management
  • Why Bother
  • Compliance
  • Performance
  • Cost
  • Data Maintenance

5 - most active
35 - less active
60 - historical
32
ILM Information Lifecycle Management
  • Implementation of different tiers of storage
  • Consider Oracle ILM Assistant (free!)
  • Leverages Oracle Partitioning
  • Uses Lifecycle Definitions
  • Calculates storage costs savings
  • Simulates the impact of partitioning on a table
  • Advises how to partition a table
  • Generates scripts to move data when required

33
In-Memory Parallel Execution
New
  • Challenge
  • Traditionally Parallel Execution takes advantage
    of the IO capacity of a system
  • Disk speeds are not keeping up with Moores law
    while CPU and Memory are
  • Solution
  • In-Memory Parallel Execution harness the memory
    capacity of the entire system
  • An affinity algorithm places fragments of a
    object in memory on different RAC nodes

34
In-Memory Parallel Execution
Determine the size of the table being looked at
SQLstatement
35
In-Memory Parallel Execution
QphH 1 TB TPC-H
  • A single database machine has over 400GB of
    memory usable for caching
  • Database release 11.2 introduces parallel query
    processing on memory cached data
  • Harnesses memory capacity of entire database
    cluster for queries
  • Foundation for world record 1TB TPC-H
  • Exadata Hybrid Columnar Compression enables
    multi-terabyte tables or partitions to be cached
    in memory

Faster than in-memory specialized startups
Memory has 100x more bandwidth than Disk
Source Transaction Processing Council, as of
9/14/2009 Oracle on HP Bladesystem c-Class 128P
RAC, 1,166,976 QphH_at_1000GB, 5.42/QphH_at_1000GB,
available 12/1/09. Exasol on PRIMERGY RX300 S4,
1,018,321 QphH_at_1000GB, 1.18/QphH_at_1000GB,
available 08/01/08. ParAccel on SunFire X4100
315,842 QphH_at_1000GB, 4.57 /QphH_at_1000GB,
available 10/29/07.
36
Automatic Degree of ParallelismAuto DOP
  • Optimizer derives the DOP from the statement
    based on resource requirements for all scans
    operations
  • Applies to all types of statements Query, DML, or
    DDL
  • Explain plan has been enhanced to show DOP
    selected
  • SQL Tune now uses Auto DOP to recommend
    parallelism

37
Parallel Statement Queuing
Statement is parsedand oracle automatically
determines DOP
SQLstatements
If not enough parallel servers available
queuestatements
FIFO Queue
If enough parallel servers available execute
immediately
38
Summary managementMaterialized Views
  • Separate database object
  • Stores pre-calculated and aggregated results
  • Database supports sophisticated transparent query
    rewrite
  • Queries don't change run against base tables
  • Join-backs, additional aggregations etc.
    supported
  • Database supports incremental fast refresh
  • Based on the query definition


39
Enhanced MV RefreshRefresh time for conventional
insert, aggregate MV
  • On average 30 40 better refresh performance in
    11g Release 2

40
Summary Management Today
41
Cube Organized Materialized Views
Summaries
SQL Query
Region
Date
Product
Channel
OLAP Cube
42
Cubes and Cube Organized MViews
  • Stored in special db areas called Analytic
    Workspaces (which are stored in BLOBs)
  • Manipulated via Analytic Workspaces Manager tool
  • 11g- Best of both worlds
  • Rewrite and refresh features of regular MVs
  • Performance benefits of OLAP cubes
  • Expose OLAP cube as a relational object access
    via SQL
  • CUBE_TABLE function searches cube using SQL

43
Cubes and Cube Organized MViews
  • Query cube as if it was a relational object
  • SQLgt explain plan for
  • 2 select from table(cube_table('GLOBAL.PRICE_
    CUBE'))
  • PLAN_TABLE_OUTPUT
  • --------------------------------------------------
    -------------------------------------
  • Plan hash value 3184667476
  • --------------------------------------------------
    ------------------------------------
  • Id Operation Name
    Rows Bytes Cost (CPU) Time
  • --------------------------------------------------
    ------------------------------------
  • 0 SELECT STATEMENT
    2000 195K 29 (0) 000001
  • 1 CUBE SCAN PARTIAL OUTER PRICE_CUBE
    2000 195K 29 (0) 000001
  • --------------------------------------------------
    ------------------------------------
  • Refresh using DBMS_MVIEW.REFRESH

44
Customer Case Study
  • Reporting Application
  • Cube-organized MVs replaced table-based MVs
  • Time to build aggregate data reduced by 89
  • Longer running queries reduced from 5 minutes to
    12 seconds
  • Transparent access to cube-MV
  • No changes to reporting applications

45
Advanced Compression
  • Table Compression
  • Table Scan Performance 2x faster
  • Storage Savings 2x smaller
  • DML Performance 5 slower
  • CREATE TABLE SALES_FACT () COMPRESS FOR ALL
    OPERATIONS
  • RMAN Compression
  • 40 faster than compressed backups in 10g
  • Slightly better compression ratio than in 10g
  • RMANgt CONFIGURE COMPRESSION ALGORITHM zlib
  • RMANgt backup as COMPRESSED BACKUPSET database
    archivelog all
  • Data Pump Compression
  • expdp hr FULLy DUMPFILEdpump_dirfull.dmp
    COMPRESS

46
Advanced Compression
47
SQL Query Result Cache
  • Caching of query results or PL/SQL function calls
  • DML/DDL against dependent database objects
    invalidates cache
  • Candidate queries
  • access many, many rows
  • return few rows (small result set)
  • executed many times

48
SQL Query Result Cache
  • result_cache_mode init.ora parameter
  • AUTO (optimizer uses repetitive executions to
    determine if query will be cached)
  • MANUAL (need use / RESULT_CACHE / hint in
    queries)
  • FORCE (All results are stored in cache)
  • result_cache_max_size init.ora parameter
  • default is dependent on other memory settings
  • (0.25 of memory_target or 0.5 of sga_target or
    1 of shared_pool_size)
  • 0 disables result cache
  • never gt75 of shared pool (built-in restriction)

49
QA
  • Thank you for attending
  • If you have follow-up questions I will be here
    for the rest of the day or can be contacted by
    email -andreas.katsaris_at_arisant.com
Write a Comment
User Comments (0)
About PowerShow.com