Oracle9i for data warehousing - PowerPoint PPT Presentation

1 / 55
About This Presentation
Title:

Oracle9i for data warehousing

Description:

Oracle has over a dozen patents for bitmap index technology ... Bitmap indexes are 3-20x smaller than b-tree indexes ... so that each bitmap can be located ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 56
Provided by: johnab9
Category:

less

Transcript and Presenter's Notes

Title: Oracle9i for data warehousing


1
(No Transcript)
2
Why Oracle9i for Data Warehousing
3
The Old WayFragmented Information Supply Chain
OLAPEngine
DataIntegrationEngine
Data Warehouse Engine
MiningEngine
  • Protracted implementation and maintenance cycle
  • Synchronization and currency issues
  • Information Management chaos

4
The New Way Oracle9i
Oracle9i
Data Warehousing
ETL
OLAP
Data Mining
  • Single business intelligence platform
  • Reduce administration and implementation costs
  • Faster deployment
  • Improved scalability and reliability

5
Oracle9iComplete, Therefore Simple
6
Oracle9i DatabaseSingle business-intelligence
data server
Metadata
Relational
OLAP
Data Mining
ETL
7
Oracle9i Application ServerRuns All Your
Business Intelligence Applications
Portal
Metadata
Query Reporting
BI Components
Web Site Analysis
8
Oracle Database for Data Warehousing Continuous
Innovation
Oracle 7.3
Oracle 8.0
  • Hash Join
  • Bitmap Indexes
  • Parallel-Aware Optimizer
  • Partition Views
  • Instance Affinity Function Shipping
  • Parallel Union All
  • Asynchronous Read-Ahead
  • Histograms
  • Anti-Join

Oracle8i
  • Partitioned Tables and Indexes
  • Partition Pruning
  • Parallel Index Scans
  • Parallel Insert, Update, Delete
  • Parallel Bitmap Star Query
  • Parallel ANALYZE
  • Parallel Constraint Enabling
  • Server Managed Backup/Recovery
  • Point-in-Time Recovery
  • Summary Management
  • New Partitioning Schemes
  • Resource Manager
  • Progress Monitor
  • Adaptive Parallel Query
  • Server-based Analytic Functions
  • Transportable Tablespaces
  • Direct Loader API
  • Functional Indexes
  • Partition-wise Joins
  • Security Enhancements
  • and more ...

9
New Oracle9i RDBMS featuresExtending Oracles
leadership
  • Automatic Memory Tuning
  • ETL Infrastructure
  • Change data capture
  • External tables
  • Table functions
  • Upserts
  • Multi-table INSERTs
  • Resumable statements
  • Transportable tablespace enhancements
  • List Partitioning
  • Internal enhancements for
  • parallel query
  • aggregation
  • cost-based optimization
  • Bitmap Join Indexes
  • Analytic SQL fns
  • Grouping sets
  • FIRST/LAST aggregates
  • Inverse distribution
  • Hypothetical rank
  • Proactive query governing
  • Enhancements to MVs
  • Broader refresh and rewrite capabilities
  • More sophisticated summary advisor
  • Full Outer Joins
  • WITH-clause

10
Real Application Clusters(RAC)
  • Exploiting clustered systems
  • Shared Cache ArchitectureOne database
  • Avoiding application downtime by single node
    failure
  • Allows applications to become
  • Highly scaleable
  • Highly available

11
Cache Fusion means Scalability
  • Protocol that allows instances to combine their
    data caches into a shared global cache
  • Global Cache Service (GCS) coordinates sharing
  • Key features are
  • Direct sharing of volatile buffer caches
  • Efficient inter-node messaging framework
  • Fast recovery from node failures using cache and
    CPU resources from all surviving nodes

12
Manage Large Volumes of Data
13
Managing Large Volumes of Data
  • Partitioning and parallelism are crucial for VLDB
  • Parallelism for all operations
  • DBA operations loading, index-creation,
    table-creation, data-modification, backup and
    recovery
  • End-user operations Queries
  • Unbounded scalability Real Application Clusters
  • Partitioning provides incremental operations
    for
  • Data loading
  • Indexing
  • Referential Integrity
  • Backup and recovery

14
How Parallel Execution Works
  • With serial execution only one process is used
  • With parallel execution
  • One parllel execution coordinator process
  • Many parallel execution servers
  • Table is dynamically partitioned into granules

15
VLDB Manageability and Performance Constraints
  • Table availability
  • Large tables are more vulnerable to disk failure
  • It is too costly to have a large table
    inaccessible for hours due to recovery
  • Large table manageability
  • They take to long to be loaded
  • Indexes take too long to be built
  • Partial deletes take hours
  • Performance considerations
  • Large table and index scans are costly
  • Scanning a subset improves performance

16
Benefits of Partioning
  • Availability
  • Partions can be independently managed
  • Backup and restore operations can be done on
    individual partitions
  • Partitions that are unavailable do not affect
    queries on DML operations on other paritions that
    use the same table or index
  • Manageability
  • A partition can be moved from one tablespace to
    another
  • A partition can be dropped, truncated, added
  • A partition can be divided at user-defined value

17
Benefits of Partioning (2)
  • Performance
  • The optimizer eliminates partitions that not have
    to be scanned
  • Partitions can be scanned in parallel
  • Partitions can be load-balanced across physical
    devices
  • Join operations can be optimized to join by the
    partition

18
Partitioning Methods
  • Range Partitioning
  • Hash partitioning
  • List Partioning
  • Composite Partioning

Hashpartitioning
Rangepartitioning
List Partioning
Composite partitioning
19
Partitioned Indexes
  • Indexes can be partitioned like tables
  • Partitioned or nonpartioned indexes can be used
    with partitioned or nonpartitioned tables
  • Partioned indexes can be
  • Global or local
  • Prefixed or nonprefixed

20
Guidelines for Partioning Indexes
21
Rolling Window Operations
Partitioned Tables with Local Indexes
...
01-Feb
96-Jun
01-Mar
96-May
96-Apr
22
Rolling Window Operations
Partitioned Tables with Local Indexes 1. Load
and index new month
...
01-Feb
96-May
96-Apr
96-Jun
01-Apr
01-Mar
23
Rolling Window Operations
Partitioned Tables with Local Indexes 1. Load
and index new month 2. Add new month to table
...
01-Apr
01-Feb
96-May
96-Apr
96-Jun
01-Mar
24
Rolling Window Operations
Partitioned Tables with Local Indexes 1. Load
and index new month 2. Add new month to table 3.
Remove old month from table
...
01-Apr
01-Feb
96-May
96-Apr
96-Jun
01-Mar
25
Rolling Window Operations
  • Partitioned Tables with Local Indexes
  • New data has been loaded with virtually no
    disruption
  • Powerful methodology for managing time-based
    updates to the Warehouse

...
01-Apr
01-Feb
96-May
96-Jun
01-Mar
26
List Partitioning
Same benefits as rolling window data is
partitioned according to business requirements
Online Queries
Europe Region
Americas Region
Asia Region
1200 GMT
Maintenance
27
List Partitioning
Same benefits as rolling window data is
partitioned according to business requirements
Online Queries
Europe Region
Americas Region
Asia Region
Maintenance
28
List Partitioning
Same benefits as rolling window data is
partitioned according to business requirements
Online Queries
Europe Region
Americas Region
Asia Region
Maintenance
29
Table Compression What is it?
  • Tables can be compressed
  • Compression can also be specified at the
    partition level
  • Indexes and index-organized tables are not
    compressed
  • Typical compression ratios are 31 - 51
  • Compression is dependent upon the actual data
  • Compression algorithm based on removing data
    redundancy
  • All DDL/DML commands are supported on compressed
    tables

30
Table CompressionWhat isnt it?
  • This is not a generic zip-style compression
  • Not all tables will have good compression
  • Compression algorithm guarantees that compression
    will never increase size of table
  • Most large DW tables seem to compress well
  • Compression happens between column/row values,
    not within column/row values
  • Long character strings are not compressed unless
    the exact same string appear multiple times
  • LOB/BLOB columns are not compressed

31
Table CompressionHow it works
  • Duplicate values are stored in symbol table for
    each block

Uncompressed
Compressed
  • ltrowidgt 650-506-7000 650-123-4567
  • ltrowidgt 650-506-7000 650-506-7001
  • ltrowidgt 650-506-7000 650-456-7890
  • ltrowidgt 650-506-7000 650-098-7654
  • ltrowidgt 650-506-7000 650-123-4567
  • ltrowidgt 650-506-7001 650-123-4567
  • ltrowidgt 650-506-7001 650-123-4567

ltsymbol table ltAgt 650-506-7000,
ltBgt650-506-7001, ltCgt650-123-4567gt ltrowidgt
ltAgt 650-123-4567 ltrowidgt ltAgt ltBgt ltrowidgt ltAgt 6
50-456-7890 ltrowidgt ltAgt 650-098-7654 ltrowidgt ltA
gt ltCgt ltrowidgt ltBgt ltCgt ltrowidgt ltBgt ltCgt
32
Table CompressionUsage
  • Creating compressed tables
  • CREATE TABLE T1(id integer) COMPRESS
  • Converting tables to compressed tables
  • ALTER TABLE T3 MOVE COMPRESS
  • Creating compressed tablespaces
  • CREATE TABLESPACE tabspace_2
  • DATAFILE 'diskatabspace_file2.dat' SIZE 20M
  • DEFAULT COMPRESS STORAGE ( )

33
Table Compression Performance Impact
  • Queries on compressed tables may observe minor
    performance degradation
  • Performance impact depends upon the query
  • Many queries will be faster
  • Compression reduces IO but increases CPU
    utilization
  • For a set of heterogeneous queries, performance
    should degrade by no more than 5
  • Load and direct-path INSERT performance will be
    slower
  • Data must be compressed as it is added to the
    table

34
Table Compression When to Use it
  • Data warehouses containing large volumes of
    historical data
  • Compress all of the older data in a data
    warehouse
  • Integrate compression into the rolling window
    paradigm
  • For example, most recent 3 months of data could
    be stored uncompressed and the previous 21 months
    could be stored compressed
  • Materialized views and other derived data sets
  • Generally, compression should be applied to data
    that is infrequently updated

35
Manage large numbers of concurrent users
36
Manage large numbers of users
  • Key requirements
  • Guarantee optimal resource utilization all the
    time
  • Provide the appropriate amount of resources to
    every job or query based on priority and system
    load
  • Pro-actively prevent runaway queries
  • Pro-actively prevent system overloading
  • Managing large numbers of users should be simple
    and automated

37
Appropriate Resources to Each Query
  • CPU
  • Business-critical processes receive more CPU
  • Database Resource Manager allows DBA to assign
    CPU resources to groups of users
  • Memory
  • Oracle9i dynamically allocates runtime memory
    based on current available memory and each
    querys requirements
  • Parallelism
  • Degree of parallelism is dynamically chosen based
    on available resources and each querys
    requirements

38
Automatic Runtime Memory Tuning
  • One parameter
  • PGA_AGGREGATE_SIZE ltsizegt
  • Dynamic allocation of runtime memory based upon
    each querys requirements
  • In data-warehouse environments, gt50 of a
    servers physical memory is typically used for
    query runtime memory
  • Benefits
  • Reduced overall memory usage
  • Improved throughput
  • Simplified tuning

39
Pro-active management of DW Workloads
  • Predictive Query Governing and Dynamic
    Re-prioritization
  • Queries which are estimated to take longer than
    an DBA-specified limit will abort or be
    de-prioritized
  • Automatic Queuing
  • A limit can be set on the number of active
    session for each group of users queries
    submitted which exceed this limit will be queued
  • Via Database Resource Manager

40
Example Scenario
  • Power Users
  • Up to 70 of the CPU resources
  • Any degree of parallelism
  • Any query which is expected to take over one hour
    will be migrated to background
  • Report Users
  • Up to 20 of the CPU resources
  • No parallelism
  • Limit of 40 concurrent queries
  • Any query which is expected to take over 20
    minutes will be aborted
  • Background Jobs
  • Up to 10 of the CPU resources
  • Any degree of parallelism
  • Limit of 5 concurrent queries

41
Fast Query Performance
  • The best approach for every query
  • Integrated
  • Comprehensive

Materialized Views
Parallel Operations
Query Optimizer
Partitioning
Access Join Methods
42
Bitmap Indexes
  • The most common index type in Oracle DW
    environments
  • Bitmap indexes introduced in Oracle 7.3
  • Bitmap join indexes introduced in Oracle9i
  • Oracle has over a dozen patents for bitmap index
    technology
  • Oracle provides patented compression technique
    for bitmap indexes
  • Bitmap indexes are 3-20x smaller than b-tree
    indexes
  • Less storage yields better query performance and
    more indexed columns

43
Bitmap indexes introduction
Structure of a bitmap index
CREATE BITMAP INDEX PROD_COLOR ON PROD(COLOR)
Separate bitmap created for each value of the
color column A high-level b-tree structure is
created so that each bitmap can be located
44
Bitmap indexes characteristics
  • Columns with Low-to-Medium Cardinality
  • Set-based manipulation of data
  • Especially good for large, complex queries
  • Orders of magnitude performance improvement
  • Fully integrated within Oracle9i
  • Created and managed similar to other Oracle9i
    indexes
  • Used to accelerate single-table access, joins,
    and aggregation
  • Transparently selected by the query optimizer

45
Bitmap join indexes
Sales
Customer
CREATE BITMAP INDEX cust_sales_bji ON
Sales(Customer.state) FROM Sales, Customer WHERE
Sales.cust_id Customer.cust_id
46
Bitmap join indexes
Sales
Customer
CREATE BITMAP INDEX cust_sales_bji ON
Sales(Customer.state) FROM Sales, Customer WHERE
Sales.cust_id Customer.cust_id
Sales(Customer.state)
Index key is Customer.State
47
Bitmap join indexes
Sales
Customer
CREATE BITMAP INDEX cust_sales_bji ON
Sales(Customer.state) FROM Sales, Customer WHERE
Sales.cust_id Customer.cust_id
Sales(Customer.state)
Sales(Customer.state)
SELECT SUM(SALES.DOLLAR_AMOUNT FROM Sales,
Customer WHERE Sales.cust_id Customer.cust_id AN
D CUSTOMER.STATE California
Index key is Customer.State
Indexed table is Sales
48
Materialized Views
  • Currently, indexes provide fast path access to
    specific data
  • Materialized views work on the same principle
  • A Materialized view is an instantiation of a SQL
    statement - a view with data storage
  • Materialized views can be partitioned, indexed
    separately
  • Used for query rewrite to increase performance
  • Rewrites are transparent to applications
  • Rewrites do not require any special privileges

CREATE MATERIALIZED VIEW sf_sales AS SELECT
FROM sales WHERE city_name SAN FRANCISCO
CREATE
Sales
SF_Sales
SELECT prod_code FROM sales WHERE city_name
SAN FRANCISCO
SELECT prod_code FROM sf_sales
49
SQL support for analytic calculations
  • Why enhance the RDBMS for analytic calculations?
  • Benefits
  • Performance
  • Scalability
  • Simpler SQL development

50
Analytic Functions Examples
  • Rank
  • Top 10 sales-reps in each region
  • Moving Window
  • Todays stock price minus 200-day moving average
  • Period-over-period comparisons
  • Percentage growth of Jan-99 sales over Jan-98
  • Ratio-to-report
  • Januarys sales as a percentage of the entire
    years

51
Platform for Business IntelligenceData
Warehousing
Data Warehousing
  • Foundation of the Business Intelligence Platform
  • More data
  • More users
  • Faster
  • Simple management
  • Oracle9i introduces dozens of new DW features

ETL
OLAP
Data Mining
Oracle9i
52
Platform for Business IntelligenceETL
Data Warehousing
ETL
OLAP
Data Mining
Oracle9i
53
Platform for Business IntelligenceOLAP
OLAP Services Analysis-ready Oracle
database Support for complex, multidimensional
queries Highly scalable Development platform for
Internet-ready analytical applications Java OLAP
API Business Intelligence Beans and JDeveloper
Data Warehousing
ETL
OLAP
Data Mining
Oracle9i
54
Oracle9i OLAP Services
Business Intelligence Beans
OLAP Services
Java OLAP API
Query Processor
Metadata Provider
SQL Generator
Multidimensional Engine
Metadata Provider
Oracle Relational Database
Analytic Workspace
Metadata
Data
Data
Metadata
Forecasts Models AllocationsConsolidations
Scenarios Custom Functions
Data Warehouse - Query and Reporting
55
Platform for Business IntelligenceData Mining
Data Warehousing
ETL
OLAP
Data Mining
Oracle9i
56
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com