Top ten DW features - PowerPoint PPT Presentation

About This Presentation
Title:

Top ten DW features

Description:

Oracle10g for data warehousing - short trip back in the history ... NetZero Oracle 2.47. Telecom Italia Informix 2.32. 2001 Survey. 1998 Survey ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 49
Provided by: noc8
Learn more at: http://www.nocoug.org
Category:
Tags: features | netzero | ten | top

less

Transcript and Presenter's Notes

Title: Top ten DW features


1
Oracle 10g for Data Warehousing
Hermann Baer, Oracle Product Management Data
Warehousing Server Technologies
NoCOUG Winter Conference, Feb 8th 2005
2
Agenda
  • Oracle10g for data warehousing - short trip back
    in the history
  • Continuous innovation over decades
  • Adoption trends and drivers
  • What do we see in the market
  • Design and build a Data Warehouse
  • Ensure a well-balanced system
  • Optimize Oracle
  • Oracle Database 10gR2 sneak preview

3
The way to Oracle10g
  • Data Warehousing development started decades ago
    with Oracle 7.0
  • Primary focus on performance and scalability
  • Extended with Manageability and the BI platform
    vision in the Oracle8i time frame
  • Data Warehousing Imperatives
  • Efficient Extract, Transform, Load (ETL)
  • Managing Large Data Volumes
  • Fast Query Response
  • Supporting Large User Population
  • Managing Simply

4
Oracle10g for Data Warehousing Continuous
Innovation
Oracle 7.3
Oracle8i
Oracle9i
  • Hash and Composite Partitioning
  • Resource Manager
  • Progress Monitor
  • Adaptive Parallel Query
  • Server-based Analytic Functions
  • Materialized Views
  • Transportable Tablespaces
  • Direct Loader API
  • Functional Indexes
  • Partition-wise Joins
  • Security Enhancements

Oracle10g
  • List and Range-List Partitioning
  • Table Compression
  • Bitmap Join Index
  • Self-Tuning Runtime Memory
  • New Analytic Functions
  • Grouping Sets
  • External Tables
  • MERGE
  • Multi-Table Insert
  • Proactive Query Governing
  • System Managed Undo
  • Self-tuning SQL Optimization
  • SQL Access Advisor
  • Automatic Storage Manager
  • Self-tuning Memory
  • Change Data Capture
  • SQL Models
  • SQL Frequent Itemsets
  • SQL Partition Outer Joins
  • Statistical functions
  • and much more ...

5
Agenda
  • Oracle10g for data warehousing - short trip back
    in the history
  • Continuous innovation over decades
  • Adoption trends and drivers
  • What do we see in the market
  • Design and build a Data Warehouse
  • Ensure a well-balanced system
  • Optimize Oracle
  • Oracle Database 10gR2 sneak preview

6
Main Trends and Drivers
  • Oracle VLDWs are growing
  • Less systems, more data
  • DW systems are consolidated
  • Global view of the business
  • Importance of Data Warehousing increases
    dramatically
  • Growing operational/tactical importance
  • Cost Effectiveness becomes more important
  • Better decisions, lower cost

7
Oracle VLDWs are growing
Winter 2003 VLDB SurveyLargest Database Size,
Decision Support
  • Sears Teradata 4.63
  • HCIA Informix 4.50
  • Wal-Mart Teradata 4.42
  • Tele Danmark DB2 2.84
  • CitiCorp DB2 2.47
  • MCI Informix 1.88
  • NDC Health Oracle 1.85
  • Sprint Teradata 1.30
  • Ford Oracle 1.20
  • Acxiom Oracle 1.13

8
Oracle VLDWs are growing
  • Powerful RDBMS functionality becomes more
    important and visible, e.g.
  • Partitioning
  • Table compression
  • Automatic Storage Management (ASM)
  • Parallel processing

9
Increasing Importance of DW
  • Latency between operational and analytical data
    must be minimized
  • Intelligence when you need it
  • Need for new and enhanced analytical capabilities
  • More value from your data
  • Classical strengths of an RDBMS become more
    important
  • E.g. Security, B/R, Availability, Concurrency

10
Cost Effectiveness
  • Safe money whenever possible
  • Commodity servers
  • Commodity disks
  • Software manageability
  • Example Amazon
  • 16 low cost Intel boxes replaced one SuperDome
  • Low cost storage arrays replaced high end storage
    arrays
  • 2 DBAs

11
Cost Effectiveness
Pay and Scale Incrementally
12
Cost Effectiveness
Pay and Scale Incrementally ... with RAC
13
Cost Effectiveness
  • Commodity components make specific database
    functionality more important
  • RAC for Scalability and Availability
  • Resource Manager
  • Automatic Storage Management (ASM)
  • RMAN / Oracle Backup (Oracle10gR2)

14
Oracle Database 10gDW Major Feature Summary
  • Reduced Total Cost of Ownership
  • Manageability
  • Workload Repository
  • Automatic SQL Tuning
  • Self-Tuning Global Memory
  • ASM
  • ULDB support
  • Database size extended to Exabytes (BIGFILES)
  • Unlimited size LOBs
  • Hash Partitioned Global Indexes
  • ASM removes file system limits
  • More Value From Your Data
  • Many New OLAP Features
  • New Data Mining algorithms
  • Stand-alone Data Mining Tool
  • Advanced Statistics
  • SQL Model Clause
  • Frequent Item Sets
  • Partition Outer Join
  • Intelligence When You Need It
  • Cross Platform Transportable Tablespaces
  • Data Pump
  • Async Change Data Capture
  • Enhancements to MERGE

15
Agenda
  • Oracle10g for data warehousing - short trip back
    in the history
  • Continuous innovation over decades
  • Adoption trends and drivers
  • What do we see in the market
  • Design and build a Data Warehouse
  • Ensure a well-balanced system
  • Optimize Oracle
  • Oracle Database 10gR2 sneak preview

16
Build the foundation for Success
  • Even after decades of innovation, a computer
    still consists of three main components
  • CPU provides the computing power
  • Memory stores the transient data for
    computational operations
  • Disks (I/O) store the persistent information
  • Getting the best performance is finding the right
    balance of all these components and use them
    optimally
  • Size your system appropriately
  • Design your database appropriately
  • Use the database appropriately
  • Data Warehousing is just a special kind of
    application

17
Configuring for your Workload
  • CPU requirements depend on user workload
  • Concurrency of users, ratio of CPU-related tasks
  • Memory requirement mostly user-process driven
  • IO requirements depend on query-mix
  • CPU vs. IO
  • Relative CPU power for IO related tasks
  • Logically Random IOs (predominant in star schema)
  • required for index driven queries, e.g. Index
    lookups, Index driven joins, Index scans
  • Logically Sequential IOs (predominant in 3rd NF
    schema)
  • required for table scans, e.g. Hash Joins
  • Find the balance between CPU and IO

18
Configuring for Throughput
Sizing Guidelines
  • Oracle can read 300MB/sec per GHz/CPU power
  • Direct Read, multi-block IO,
  • e.g, parallel full table scan ('lab environment')
  • An average DW system should plan for 75
    -100MB/sec per GHz/CPU
  • Typical mixture of IO and CPU intensive
    operations
  • Ball park number, adjust accordingly
  • TPC-H plans for appr. 200MB per 3GHz Xeon

19
Configuring for Throughput
FC-Switch1
FC-Switch2
Disk Array 1
Disk Array 2
Disk Array 3
Disk Array 4
Disk Array 5
Disk Array 6
Disk Array 7
Disk Array 8
20
Configuring for ThroughputBit is not Byte
21
Configuring for Throughput
FC-Switch1
FC-Switch2
Disk Array 1
Disk Array 2
Disk Array 3
Disk Array 4
Disk Array 5
Disk Array 6
Disk Array 7
Disk Array 8
22
Configuring the Storage
  • Design for throughput, not capacity
  • Keep it simple
  • Try using RAID 01
  • Use S.A.M.E. methodology
  • Stripe And Mirror Everything
  • At the HW level, if available
  • Using ASM capabilities
  • Leverage ASM whenever possible
  • Striping and Mirroring capabilities
  • Automatic rebalancing
  • Enables low cost storage

23
Calibrate your System
  • You can easily compute the theoretical I/O
    performance of your system
  • Typically measured by the minimum of I/O
    channel capacity, I/O controller capacity, disk
    I/O capacity
  • Verify the I/O performance limits using OS-level
    commands
  • Do this prior to using the database
  • Cover basic IO operations and the average future
    load pattern
  • Random single block IO vs. sequential multi block
    IO
  • Concurrency

24
Calibrate your SystemThroughput dd vs. ORCL
DIRECT READ
  • Oracle drives about 90 of what dd can drive with
    a table scan
  • If you do not get the expected throughput fix the
    hardware

25
Agenda
  • Oracle10g for data warehousing - short trip back
    in the history
  • Continuous innovation over decades
  • Adoption trends and drivers
  • What do we see in the market
  • Design and build a Data Warehouse
  • Ensure a well-balanced system
  • Optimize Oracle
  • Oracle Database 10gR2 sneak preview

26
Schema which way to go?
  • Dont get lost in theory and academia
  • Philosophical discussions wont help (Star
    fights 3NF)
  • None of the two extremes will work (RedBrick?,
    Teradata?)
  • Design according to your business needs
  • Reality shows that most of the customers are
    doing a mix and match
  • 3NF more in an ODS layer
  • Denormalized 3NF in DW/Stage for general
    purposes
  • Dimensional model for subject areas, e.g. sales,
    marketing (remember shared dimensions!)

OLAP will not be covered in this presentation
27
Schema which way to go?
  • The chosen schema approach determines used Oracle
    functionality
  • The chosen schema approach determines IO pattern
  • Logically Random IOs (predominant in star schema)
  • required for index driven queries, e.g. Index
    lookups, Index driven joins, Index scans
  • Logically Sequential IOs (predominant in 3rd NF
    schema)
  • required for table scans, e.g. Hash Join
  • Oracle has both functionality to
  • Push the IO to the limit
  • Optimize the IO requirements

28
Schema which way to go?
Star Schema
  • Leading performance for dimensional schemas
  • Innovative usage of bitmap indexes and bitmap
    join indexes
  • Index access instead of large table access
  • Bitmap indexes 3 to 20 times smaller than btree
    indexes
  • Support for complex star schemas
  • Multiple fact tables
  • Snowflake schemas
  • Large number of dimensions
  • Fully integrated
  • Parallel execution
  • Partition Pruning

29
I/O Minimize Requests
Partition Pruning
  • Only the relevant partitions will be accessed
  • Optimizer knows or finds the relevant partitions
  • Static pruning with known values in advance
  • Dynamic pruning uses internal recursive SQL to
    find the relevant partitions
  • Minimizes I/O operations
  • Also provides order of magnitude performance gains

30

Materialized Views
Monthly Sales by Region
Query
What were the sales in the West and South regions
for the past three Quarters?
Detail
QueryRewrite
31
Schema which way to go?
3NF example
Jan Feb Mar Apr
Jan Feb Mar Apr
CUSTOMER_ORDERS
CUSTOMER_ORDER_PRODUCTS
...
...
...
...
Jan
Jan
32
Schema which way to go?
Schema Agnostic - Parallel Execution
  • Use parallelism to enable single process
    scalability
  • Unrestricted parallelism
  • No data layout requirement or restriction (as in
    shared nothing systems)
  • All operations can be parallelized

33
Schema which way to go?
Schema Agnostic - Parallel Execution
  • I/O bandwidth requirement increases with single
    process parallelism and multi-user concurrency
  • Plan for your systems expected I/O throughput
    based on average concurrent users and parallelism

34
Schema which way to go?
Schema Agnostic - Parallel Execution
  • I/O bandwidth requirement increases with single
    process parallelism and multi-user concurrency
  • Plan for your systems expected I/O throughput
    based on average concurrent users and parallelism

35
Schema which way to go?
Schema Agnostic - Parallel Execution
  • I/O bandwidth requirement increases with single
    process parallelism and multi-user concurrency
  • Plan for your systems expected I/O throughput
    based on average concurrent users and parallelism

36
Schema which way to go?
Oracles functionality
  • Star schema
  • Range-partition fact tables by time
  • Bitmap indexes on dimension-key columns of fact
    table
  • Star transformation for end-user queries
  • Materialized views for pre-aggregated cubes
  • 3NF or normalized schema
  • Composite range-hash partitioning on large tables
  • Partition-wise joins and parallel execution are
    key performance enabler for joining large tables
  • Hybrid environments
  • Use both dogmas concurrently in the same system
    without affecting each other

37
Init.ora less is more
Lessons learned from History
  • Do not de-tune Oracle
  • Very often, our performance engineers are getting
    improvements just by removing parameters
  • Results can be poor optimizer plans, wasted
    memory, and serialization points
  • Trust Oracle
  • Dont try and second guess the software
  • With the exception of buffer and subject area
    related parameters, the system defaults are
    usually optimum

38
Init.ora less is more
Basic Rules
  • Ensure that data warehouse relevant parameters
    are set
  • Not all parameters are enabled by default in
    older database releases prior to Oracle10g
  • Size and set buffer and memory related parameters
  • Two parameters are enough
  • Do not touch other parameters unless necessary

39
Init.ora less is more
Data Warehouse relevant parameters
  • COMPATIBLE
  • Database release version to enable new
    functionality
  • OPTIMIZER_FEATURES_ENABLED
  • Database release version to enable new
    functionality
  • DB_MULTIBLOCK_READ_COUNT
  • Maximize multiblock I/O (use multiple of OS I/O
    size)
  • DISK_ASYNCH_IO
  • Set to TRUE (Only relevant for older Linux
    versions)
  • PARALLEL_MAX_SERVERS
  • Adjust to system capabilities (default to 5 prior
    to Oracle10g)
  • QUERY_REWRITE_ENABLED
  • Set to TRUE, enabled by default with Oracle10g
  • QUERY_REWRITE_INTEGRITY
  • ENFORCED by default, can be potentially lowered
  • STAR_TRANSFORMATION_ENABLED
  • Set to TRUE

40
Build the foundation for SuccessSummary
  • Data Warehousing is just a special kind of
    application
  • Ensure a well-tuned I/O subsystem
  • Size for I/O throughput, not for disk capacity
  • Use appropriate hardware / storage
  • Find a schema balance
  • Design according your needs using the appropriate
    model, not the other way around
  • Init.ora settings less is more

41
Agenda
  • Oracle10g for data warehousing - short trip back
    in the history
  • Continuous innovation over decades
  • Adoption trends and drivers
  • What do we see in the market
  • Design and build a Data Warehouse
  • Ensure a well-balanced system
  • Optimize Oracle
  • Oracle Database 10gR2 sneak preview

42
ETL Enhancements
  • DML error logging
  • Column values that are too large
  • Constraint violations (NOT NULL, unique,
    referential, check constraints)
  • Errors raised during trigger execution
  • Type conversion errors
  • Partition mapping errors
  • Distributed Change Data Capture
  • Enables 9.2 as source for asynchronous CDC

43
DML Error Logging (example)
INSERT INTO sales SELECT product_id,
customer_id , TRUNC(sales_date), 3,
promotion_id , quantity, amount FROM
sales_activity_direct LOG ERRORS
INTO sales_activity_errors('load_20050801') REJECT
LIMIT UNLIMITED
44
Performance Enhancements
  • Sort
  • ORDER BY statements
  • (B-tree) index creation
  • Up to 5 times performance improvement
  • Aggregation
  • GROUP BY statements
  • Materialized views using aggregations
  • Implicit use of aggregations, e.g. statistics
    gathering
  • Two to three times performance improvement
  • Query rewrite using multiple materialized views

45
Partitioning Enhancements
  • Scalability
  • Maximum number of partitions 64K - 1M
  • Resource optimization for DROP TABLE of a
    partitioned table
  • Support for partitioning on index-organized
    tables
  • Support for hash-partitioned global indexes
  • Performance
  • Support for Multi dimensional partition pruning

46
Other Enhancements
  • Manageability
  • SQL Access Advisor improvements
  • Materialized view refresh improvements
  • Analytics
  • SQL model clause enhancements

47
Summary
  • Oracle10g for data warehousing - short trip back
    in the history
  • The most powerful and successful DW platform
  • Adoption trends and drivers
  • Be visionary, though conservative
  • Guarantee success and protect investments
  • Design and build a Data Warehouse
  • Ensure a well-balanced system
  • Optimize Oracle
  • Oracle Database 10gR2 Beta Interested?

48
A
Write a Comment
User Comments (0)
About PowerShow.com