Practical Space Management in Data Warehouse Environments - PowerPoint PPT Presentation

1 / 76
About This Presentation
Title:

Practical Space Management in Data Warehouse Environments

Description:

Alternative way CTAS compress: CREATE TABLE T1 compress. AS SELECT * FROM T1_UNCOMPRESSED; ... SUV Black Rock Climber 55. Index Key Compression Example (continued) ... – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 77
Provided by: lisal5
Category:

less

Transcript and Presenter's Notes

Title: Practical Space Management in Data Warehouse Environments


1
Practical Space Management in Data Warehouse
Environments
Hamid Minoui Database Specialists,
Inc. www.dbspecialists.com hminoui_at_dbspecialists.
com
2
Objectives
  • To point out data warehouse space management
    issues
  • Suggest resolutions
  • Recommend space management methodologies
  • Provide proactive prevention strategies
  • Cover both Oracle 9i and Oracle 10g space
    management features

3
Characteristics of a Data Warehouse
  • The data
  • Large amount of data loads and ETL operations
  • Very large size (Terabytes)
  • Change in structure of source data
  • Contains lots of historical data
  • Data massaging and aggregations
  • Multiple sources of data
  • Dynamic nature of data

4
Characteristics of a Data Warehouse (continued)
  • Maintenance activities
  • Space management
  • Table re-organizations
  • Index rebuilds
  • Partition maintenance
  • Refresh maintenance on materialized views
  • Job and scheduling management

5
Characteristics of a Data Warehouse (continued)
  • Typical issues
  • Data integrity issues
  • Data security issues
  • Space issues
  • Query performance issues
  • Duplicate rows

6
Characteristics of a Data Warehouse (continued)
  • Database features frequently used
  • Materialized views (MV)
  • Bitmap indexes and bitmap-join indexes
  • Index organized tables (IOT)
  • Parallel execution
  • Table and index partitioning
  • Table and index compression
  • Load utilities and facilities

7
Other Characteristics
  • Star schemas, snow flakes or 3rd Normal Form
  • Have dimensions and hierarchy
  • Frequent need to collect statistics
  • Use of bulk and parallel loads
  • Variety in the generated queries
  • Dynamic nature of queries
  • Divided into areas (staging, ODS, and target
    area)
  • Often associated with smaller data marts

8
Performance Tuning and Resolutions
  • Frequent query tuning
  • Star transformation
  • De-normalization
  • Pre-aggregations via materialized views
  • BTree, IOT, function based, bitmap, bitmap-join
    indexes
  • Use of database resource management

9
Why is Space a Coveted Resource in a Data
Warehouse?
  • Lots of disk space is consumed
  • Stores all enterprise data
  • Segments are mostly large
  • Many indexes
  • Years of historical data kept online
  • Many versions of the same data
  • Duplicated and de-normalized data
  • Various levels and dimensions of data (monthly,
    weekly, daily)

10
Why is Space a Coveted Resource in a Data
Warehouse?
  • Enough reserve space needed
  • For daily/weekly/monthly growth
  • Recall offline old data when needed
  • Data correction
  • Materialized views and their growth
  • Emergency needs
  • Data files and tablespace growth
  • Temporary tablespaces

11
Reacting to Space Issues
  • Down sides
  • Often, not enough time to react
  • Delay in the load
  • Wasted resources to reload
  • Up sides
  • Loads are usually scheduled
  • Once data is loaded, most of it wont change

12
Issues with Database Backups in a Data Warehouse
  • Too many files to backup every night
  • Backup takes a long time to complete
  • System resources busy during backup
  • Possible licensing issues with third-party backup
    software
  • Restoring and recovery after a failure can take a
    long time

13
A Typical Backup Strategy
  • Make non-current table spaces READONLY every
    month
  • Perform a special backup of READONLY tablespaces
  • Exclude the READONLY table spaces from regular
    hot backups
  • Never backup temporary tablespaces
  • Caveat You must wait until all transactions are
    committed

14
Avoiding Unnecessary Redo Log Generation
  • Create some tables and all indexes with NOLOGGING
    for any segment that can be re-generated without
    doing a database recovery
  • SQLLoader with direct path load
  • CREATE TABLE AS SELECT from external or transient
    tables
  • INSERT using append hint
  • Use global temporary tables
  • insert / append / into transiant_table selec
    t from source_table
  • create table transient_table as select from
    source_table

15
Speeding Up Bulk Load Operations
  • Before the load
  • Make all non-unique indexes unusable
  • Disable the primary and unique constraints if the
    source data is trusted
  • Disable all triggers on the table
  • Set the session to skip unusable indexes

16
Speeding Up Bulk Load Operations
  • Implement the load
  • Use append and parallel hints with insert
  • Commit the transaction
  • After the load
  • Rebuild indexes
  • Enable triggers and constraints

17
Space Issues in Data Warehouses
  • Permanent tablespaces (data, indexes)
  • Temporary tablespaces (temp segments)
  • UNDO segments and tablespace

18
Space Issues with Permanent Tablespaces
  • Caused by
  • Poor extent sizing
  • Setting maxextents
  • PCT_INCREASE gt 0
  • Small data files (tablespaces)
  • User quota on tablespace

19
Space Issues with Temporary Tablespace
  • Caused by
  • Not enough space for the sort segments
  • Other temp segments such as global temporary
    tables
  • Multiple users sharing the same temporary space
  • Multiple queries with sort requirements running
    at any time

20
Space Issues with Temporary Tablespace
  • Partially resolved by
  • Oracle 9i - Dynamic PGA memory allocation
  • PGA_AGGREGATE_TARGETltinteger valuegt
  • WORKAREA_SIZE_POLICYAUTO
  • Oracle 10g - Tablespace Group assignment

21
Space Issues Associated with Undo Segments
  • Long running queries causing ORA-1555 (snapshot
    too old)
  • Small UNDO tablespace
  • Small rollback segments

22
Database Block Size (DB_BLOCK_SIZE)
  • Should seriously be considered
  • An important decision with new data warehouse
    projects
  • Inappropriate value can be disastrous and
    detrimental
  • Small value can
  • Impact I/O efficiency for majority of queries
  • Negatively influence overall database performance

23
Appropriate DB_BLOCK_SIZE Value
  • Multiple of the OS block size
  • As large as your I/O subsystem can handle in a
    single read
  • As large as supported by Oracle
  • Best benefit from larger block size if
  • Database is configured on raw devices, or
  • Direct I/O is available to you.

24
Benefits of Larger DB_BLOCK_SIZE Value
  • Efficiency with index scan
  • A larger block size reduces the number of reads
    required to probe an index and scan a range of
    values from its leaf blocks
  • Less memory requirement for buffer cache
  • Fewer buffers needed for index branch blocks
  • Better compression ratio for tables, indexes
  • Improvement in block density
  • Amount of space used by fixed portion
  • of bock header is reduced

25
Benefits of Larger DB_BLOCK_SIZE Value
  • Blocks can accommodate longer rows less chance
    for row chaining
  • Less occurrence of ORA-1555
  • Increase in size of the transaction table in undo
    segments header blocks
  • Fewer writes required for data loads
  • Because of the reduced block level overhead, less
    redo logs are generated when blocks are modified
    sequentially

26
Disks, I/O and Database Files Configuration
  • A poorly configured I/O subsystem can badly
    impact I/O performance
  • Poor I/O performance can impair a data warehouse
  • Configure disk and distribute data for read and
    write efficiency
  • Use raw I/O if possible, otherwise use direct I/O
  • Make use of asynchronous I/O, parallel read and
    parallel writes

27
Disks, I/O and Database Files Configuration
  • Stripe and Mirror or Mirror and Stripe the disks
  • RAID-10 or RAID-01
  • Evenly spread your data and Stripe And Mirror
    Everything (SAME) on many disks
  • Reserve room on file systems for auto extendable
    files

28
Managing the UNDO Segments
  • Manual undo (rollback segments) management
  • Pre Oracle 9i practices
  • Too many manual interventions by DBA

29
Managing UNDO (continued)
  • Automatic Undo Management (AUM)
  • Much better Highly recommended
  • Allows controlling retention of committed
    transactions undo information (UNDO_RETENTION)
  • Better monitoring statistics
  • Infrequent occurrence of ORA-1555
  • SMON periodically manage space and shrinks undo
    segments

30
UNDO_RETENTION Parameter Setting
  • Set to a value equal to the time used by the
    longest running query
  • Undo is expired when retention time is reached
  • Expired undo will be de-allocated if needed by
    new transactions
  • Unexpired undo are re-used if space is needed
    (undo reuse)
  • Default value is 300 seconds

31
Undo Reuse and Undo Stealing
  • Undo ReuseUnexpired undo of the same segment
    will be reused
  • Undo StealingUnexpired undo of another segment
    is used
  • Undo reuse is more common. Occurs when
  • UNDO tablespace is too small, or
  • UNDO_RETENTION value is too large

32
Monitoring the UNDO Segments Statistics
  • Statistics are gathered in VUNDOSTAT every 10
    minutes
  • Helps sizing UNDO tablespaces and tune
    UNDO_RETENTION
  • Statistics are retained for 7 days

33
VUNDOSTAT
BEGIN_TIME Beginning time for this interval
END_TIME Ending time for this interval
UNDOTSN Tablespace ID of the last active undo within the interval
UNDOBLKS Number of consumed undo blocks within the period
MAXQUERYLEN The longest length of time (in seconds) a query took to complete within this period
TXNCOUNT Total number of transactions executed with the period
34
VUNDOSTAT (continued)
UNXPSTEALCNT Number of attempts to obtain undo space by stealing unexpired extents from other undo segments
UNXPBLKRELCNT Number of unexpired blocks released from undo segments to be used by other transactions
SSOLDERRNT Number of times ORA-1555 occurred with the period
NOSPACERRCNT Number of times space was unavailable in the undo tablespace when requested and failed
35
Tuning UNDO_RETENTION
  • Oracle 9i
  • Manually adjust to the time taken by the longest
    query
  • SELECT MAX (MAXQUERYLEN) FROM VUNDOSTAT
  • Oracle 10g
  • Automatically tracked and tuned by RDBMS

36
The UNDO Tablespace
  • Created at DB creation or with CREATE UNDO
    TABLESPACE
  • Use VUNDOSTAT for sizing and monitoring
  • Space issues if UNDO_RETENTION is too large
  • Use AUTOEXTEND
  • RETENTION_GUARANTEE clause
  • Sizing formulaUndo Segment Space Required (MB)
    (undo_retention undo_blcks/secs
    DB_BLOCK_Size)/1024

37
Database Fragmentation Issues
  • Best to reduce or eliminate fragmentation to
    avoid wastage and improve performance
  • Tablespace level (or file level) fragmentation
  • Segment level fragmentation
  • Block level fragmentation

38
Tablespace Level Fragmentation
  • Bubble Fragmentation
  • Free block of space not large enough for another
    extent
  • Honeycomb Fragmentation
  • Free un-coalesced space next to each other but
    considered separate

39
Segment Level Fragmentation
  • Space allocated to segment is not fully utilized
    (wasted)
  • Space above the high water mark (unused blocks)
  • Free segment blocks below the high water mark

40
Block Level Fragmentation
  • Blocks are not empty but there is space within a
    block that is not used
  • Caused by
  • Setting of PCTFREE and PCTUSED
  • Deletions
  • Row migrations

41
Tablespace Planning
  • Use locally managed tablespaces (LMTs) with
    UNIFORM size extents
  • 64K bitmaps on file header are used to manage
    extents
  • Improves performance and significantly reduces
    overhead associated with updating dictionary
    tables (recursive SQL)
  • No need to use ST enqueue
  • No more tablespace fragmentation

42
Tablespace Planning
  • Use Automatic Segment Space Management (ASSM)
  • Set at the tablespace level
  • Tablespace must be locally managed
  • Uses bitmap instead of freelist to manage space
    within segments

43
Benefits of ASSM
  • No more need for FREELISTS, FREELIST GROUPS and
    PCTUSED
  • Reduces segment level and block level
    fragmentations
  • Reduces the number of buffer free waits
  • Adds efficiency to space usage
  • Provides better use of space within the blocks

44
LMT Considerations
  • The bitmap is 64K
  • Make the size of each file a multiple of UNIFORM
    extent64K
  • Storage parameters
  • Avoid setting them
  • If already defined on segments reorganize, or
    rebuild with storage parameters matching
    tablespace

45
Multiple Tablespace Size Models
  • SAFE (methodology)
  • Group segments according to size (3 groups)
  • Use 3 tablespace model having different UNIFORM
    extents
  • Assign each group to one of the size model
  • Develop a naming convention

Segment Size Extent Size Size Model
lt 128 M 128 KB Small
gt 128 M lt 4 GB 4 MB Medium
gt 4 GB 128 MB Large
46
Tablespaces for Different Types of Segments
  • Separate indexes and tables
  • Better manageability
  • Different type of usage
  • Reduces wastage (indexes are rebuilt often in
    data warehouses)

47
Adjust Settings of PCTFREE and PCTUSED Parameters
  • Avoid using default values
  • Set according to usage
  • Most of the times PCTFREE0 and PCTFREE99 should
    be enough
  • If ASSM, no need for PCTUSED
  • More compact data in blocks reduces waste and
    improves I/O

48
Use Index Organized Tables (IOTs)
  • When most of the columns are indexed
  • When associated tables are used
  • Columns are pre-sorted
  • Makes better use of space and improve performance
  • Good for certain data warehouse tables

49
Table Compression
  • Introduced in Oracle 9i R2
  • Improves read only operations and factors out
    repetitive values within a block
  • Replaces duplicate values in a block with a
    reference to a symbol table in the block
  • Very low CPU overhead to reconstruct the block
  • Significantly fewer blocks, leading to better I/O
  • Very flexible (not all blocks are compressed)
  • Associated with bulk load operations

50
Table Compression
  • To compress a table useALTER TABLE t1 MOVE
    compress
  • To compress a table partition useALTER TABLE T1
    MOVE PARTTION P1 compress
  • Alternative way CTAS compressCREATE TABLE T1
    compressAS SELECT FROM T1_UNCOMPRESSED
  • Table or partition not available (locked) during
    move
  • Use DBMS_REDEFINITION for online move

51
To Get the Best Results
  • To achieve the best compression ratio
  • Analyze the table to get column statistics
    SELECT COLUMN_NAME, NUM_DISTINCT, NUM_NULLS,
    AVG_COL_LENFROM DBA_TAB_COLUMNS
  • Identify best candidate columns for sorting as
    columns with
  • Lowest number of distinct values (low
    NUM_DISTINCT)
  • Least amount of null values (low NUM_NULLS)
  • Longest average length (high AVG_COL_LEN)
  • CTAS compress and use order by candidate_column

52
Table Compression Limitations
  • Can not be used on LOB field
  • Can not be used for IOTs
  • Can not compress tables with bitmap indexes
  • With Oracle 9i, cannot drop or add columns to
    compressed tables

53
Index Key Compression
  • Introduced in Oracle 8i
  • Compression of leading index columns
  • Indexes are grouped into a suffix and prefix
    entry
  • Suffix entry made out of unique pieces
  • Prefix entry consist of the grouping piece
  • Can offer significant space savings and better
    I/O performance

54
Index Key Compression Example
  • Current years Car Inventory table, index CAR_IND
    indexes columns are Type, Color, Model
  • Before compression

ltSUVgtltBlackgtltRock Climbergt ltSedangtltBluegtltCharismagt
ltSUVgtltBlackgtltJungle Cruisergt ltSedangtltBluegtltFantasygt
ltSUVgtltBlackgtltMountaineergt ltSedangtltBluegtltStarletgt
.
55
Index Key Compression Example (continued)
  • ALTER INDEX CAR_IND compress 3
  • After compression

ltSUVgtltBlackgt ltRock Climbergt ltJungle Cruisergt ltMountaineergt
ltSedangtltBluegt ltCharismagtltFantasygtltStarletgt
.
56
Index Key Compression
  • Partitioned indexes cannot be compressed
  • Bitmap indexes cannot be compressed
  • Can be defined on IOT
  • Slight CPU overhead during index scan
  • Consumes much less space
  • Increases I/O throughput and buffer cache
    efficiency
  • Ideal for data warehouses

57
Identifying Keys to Compress
  1. Validate or analyze the indexVALIDATE INDEX
    INDX1
  2. Query the index_stats viewSELECT NAME,
    OPT_CMPR_COUNT, OPT_CMPR_PCTSAVEFROM
    index_stats
  3. Examine outputNAME OPT_CMPR_COUNT OPT_CMPR
    _PCTSAVE------- -------------- ------------------
    ----INDX1 2
    57

58
De-Allocating Unused Space
  • Segment Level
  • Blocks above the segment high water mark (unused
    blocks)
  • Space below the segment high water mark (free
    blocks)
  • Tablespace Level
  • Free space within tablespace
  • Data file level
  • Unallocated space above the highest allocated
    extent (file high water mark)

59
Identify Segment Space Usage
  • DBMS_SPACE.UNUSED_SPACE
  • Information about amount of unused space in
    segment and position of high water mark
  • DBMS_SPACE.FREE_BLOCKS
  • Information about the number of blocks on the
    freelist groups
  • DBMS_SPACE.SPACE_USAGE
  • Information about the space usage of blocks under
    the high water mark

60
De-Allocate Segment Free Space
  • Unused blocks-
  • ALTER TABLE INDEX CLUSTER segment_name
  • DEALLOCATE UNUSED KEEP nK
  • De-allocates only space above segment high water
    mark, retaining space specified by KEEP
  • Other Unused space-
  • Pre Oracle 10g Reorganize table, rebuild index
  • Table move, export/import, DBMS_REDEFINITION
    interface)
  • Oracle 10g Online segment shrink

61
Two-Phase Online Segment Shrink
  • ALTER TABLE table SHRINK SPACE
  • Phase 1
  • A series of DELETE and INSERT statements applied
    to move data to the beginning of the segment
  • DML-compatible changes are held on rows and
    blocks
  • Phase 2
  • High water mark adjusted to the appropriate
    location.
  • Exclusive lock is held
  • Unused blocks (above high water mark) are
    de-allocated

62
One-Phase Online Segment Shrink
  • ALTER TABLE table SHRINK SPACE COMPACT
  • With COMPACT keyword only the first phase is
    executed.
  • To implement phase 2, issue it without COMPACT
    keyword at a later time

63
One-Phase Online Segment Shrink (continued)
  • Restrictions
  • Row movement must be enabled
  • Triggers based on ROWID of table must be disabled
  • In data warehouses, locking might not be a
    problem on some tables

64
De-allocating Space at the Tablespace Level
  • Caused by tablespace fragmentation
  • Index rebuilds, table moves, partition move, etc.
  • Not having UNIFORM size extents

65
De-allocating Space at the Data File Level
  • File size larger than the last block used in the
    file
  • Size over-estimated
  • Auto extended

66
Shrinking Data Files
  • The statement
  • ALTER DATABASE DATAFILE file_name resize n (K
    M)
  • Attempts to size the data file to exactly n K (or
    M)
  • It is safe. It will fail with ORA-03297, if there
    are blocks of data beyond the requested resize
    value
  • ORA-03297 File contains nnn blocks of data
    beyond requested resize value.

67
Steps to Shrink Data Files to High Water Mark
Position
  • 1) Create a temporary table preferably a GTT
  • CREATE global temporary table SPACE_ADMIN_GTTON
    COMMIT PRESERVE ROWS ASSELECT FILE_NAME,
    TABLESPACE_NAME, BYTES, BYTES, BYTESFROM
    DBA_DATAFILES WHERE 10
  • Create another table with name of tablespace to
    shrink
  • CREATE GLOBAL TEMPORAY TABLE SHRINKING_TBS_GTTON
    COMMIT PRESERVE ROWSASSELECT TABLESPACE_NAME
    FROM DBA_TABLESPACESWHERE TABLESPACE_NAME in
    (TBS1,TBS2,TBS3)COMMIT

68
Steps to Shrink Data Files to High Water Mark
Position (continued)
  • 3) Get DB_BLOCK_SIZE
  • column value new_val blksizeselect value from
    vparameterwhere name 'db_block_size'

69
Steps to Shrink Data Files to High Water Mark
Position (continued)
  • 4) Calculate the files high water mark and save
  • INSERT INTO SPACE_ADMIN_GTTSELECT file_name,
    tablespace_name, ceil( (nvl(hwm,1)blksiz
    e)/1024/1024 ) smallest, ceil(
    blocksblksize/1024/1024) currsize,
    ceil( blocksblksize/1024/1024) - ceil(
    (nvl(hwm,1)blksize)/1024/1024 ) savingsFROM
    DBA_DATA_FILES a, ( SELECT file_id,
    max(block_idblocks-1) hwm FROM
    DBA_EXTENTS GROUP BY file_id ) bWHERE
    a.file_id b.file_id()AND a.tablespace_name
    IN (SELECT tablespace_name FROM
    SHRINKING_TBS_GTT)COMMIT

70
Steps to Shrink Data Files to High Water Mark
Position (continued)
  • 5) Generate ALTER DATABASE commands
  • column cmd format a95 word_wrappedset trimspool
    onSPOOL c\TMP\dbf_resize.sqlSELECT 'alter
    database datafile '''file_name''' resize '
    smallest 'm' cmdFROM
    SPACE_ADMIN_GTTWHERE savings gt 5SPOOL OFF

71
Automatically Resolving Space Issues
  • Oracle 9i Feature called RESUMABLE SPACE
    ALLOCATION
  • Allows an active session to be suspended if a
    space issue is encountered
  • The session resumes automatically when
  • Space issue is fixed
  • A timeout period (default 2 hours) is reached
  • Beneficial for data warehouse environments

72
Steps for Resumable Space Allocation
  • DBA grants RESUMABLE privilege to user
  • User makes session resumable with
  • ALTER SESSION ENABLE RESUMABLE
  • 3. If session encounters space problem, it is
    suspended

73
Steps for Resumable Space Allocation
  • 4. If AFTER SUPSPEND TRIGGER exists, it gets
    executed
  • 5. If trigger does not exit (or disabled) or if
    the trigger does not fix the space problem,
    session remains suspended
  • 6. Session resumes when space problem is fixed or
    timeout value is reached

74
Other Helpful Space-Related Features
  • Oracle-Managed Datafiles (OMF)
  • DBA_ADVISOR family of views
  • Oracle10g Workload Repository (AWR) and segment
    advisor
  • Oracle 10g Grid Control for monitoring

75
Conclusion
  • Oracle is consistent in offering new space
    management related features in every release
  • Should be used by DBAs for best practices
  • They enhance performance, reduce waste, improve
    availability, reduce frequency of failures, and
    provide better monitoring
  • Data warehouse operations that rely heavily on
    space and I/O performance benefit the most from
    these features

76
Contact Information
  • Hamid Minoui
  • Database Specialists, Inc.
  • 388 Market Street, Suite 400
  • San Francisco, CA 94111
  • Tel 415/344-0500
  • Email hminoui_at_dbspecialists.com
  • Web www.dbspecialists.com
Write a Comment
User Comments (0)
About PowerShow.com