A Simple Approach to DB2 for LUW Index Redesign Session 1302A - PowerPoint PPT Presentation

1 / 55
About This Presentation
Title:

A Simple Approach to DB2 for LUW Index Redesign Session 1302A

Description:

Almost all index tuning involves NO changes to SQL text 80% of benefits ... Uses as many columns as possible to delimit index scans. Can use inequalities, e.g. ... – PowerPoint PPT presentation

Number of Views:139
Avg rating:3.0/5.0
Slides: 56
Provided by: benha8
Category:

less

Transcript and Presenter's Notes

Title: A Simple Approach to DB2 for LUW Index Redesign Session 1302A


1
A Simple Approach to DB2 for LUW Index
RedesignSession 1302A
  • Rob Williams, MHC Inc
  • rob_at_mhubel.com

2
Agenda
  • Objectives
  • Benefits of index redesign
  • Index structure
  • B index structure
  • Definition of cardinality
  • Column that should/ should not be indexed
  • The Approach
  • The 7 steps to index redesign
  • Tools and verification
  • DB2 Catalog
  • DB2 Explain
  • Index advisor

3
Tuning Index Design
  • Almost all index tuning involves NO changes to
    SQL text
  • gt80 of benefits of tuning lt5 of SQL
  • Get physical design in order
  • Index changes
  • RUNSTATS
  • Buffer pools

4
Index Changes
  • Primary physical design technique available for
    existing applications
  • Table and SQL changes are more time consuming
  • Application testing is required
  • Index changes often require no functional testing
  • May wish to change clustering index
  • At most require rebind of static SQL
  • J2EE servers sometimes need a reboot

5
Logical I/O
DB2 Table
  • Hopefully, answer sets are small
  • Design indexes and SQL to touch less data when
    building result table
  • Concurrency issues

DB2 Processing
Answer Set
6
A Key Issue
  • One key issue for SQL performance
  • Result set size
  • Whenever you write or review an SQL statement,
    ask yourself
  • How many rows should be returned from this
    statement?
  • Compare the answer set size to the size of the
    tables
  • Optimal indexing should reduce the number of rows
    touched to produce the result
  • Event monitor

7
Indexes
  • Faster access to data
  • Enforce primary keys in DB2 referential integrity
  • Strongly consider an index on each foreign key
  • Ensure uniqueness of values in non-key columns
  • Indexes can be added or dropped at any time (e.g.
    add indexes for special time of year processing)

8
Physical Characteristics of Indexes
  • Indexes are stored in index space
  • Can be stored in the same tablespace as your data
    or in a separate database
  • Because a bufferpool is attached at a tablespace
    level having the index in a separate tablespace
    gives you more control over memory
  • All indexes are B-Trees
  • Logarithmic amortized time for search, insert,
    update, delete
  • B-Trees are not optimized for in memory
    processing, they designed to minimize the number
    of IO requests
  • Look at solidDB
  • A table can have multiple indexes

9
DB2 Index Structure
  • Indexes can be built using multiple columns
  • Indexes can be multi-column
  • Uses as many columns as possible to delimit index
    scans
  • Can use inequalities, e.g. gt, gt, lt, lt
  • Can use index columns to avoid sort
  • Index clustering
  • Restored by REORG to specified index
  • Can specify clustering index
  • Statistic collected by RUNSTATS

10
Index Structure
  • Each time root page splits, a new index level is
    added - requires one additional I/O

11
The Row Identifier (RID)
  • The RID is used within indexes to point to the TS
    location of a row
  • Stores RIDs of duplicate keys in order
  • Normal RID is 4 bytes Large is 6 bytes in LUW
  • Large now the default as of 9.5 LUW
  • 4 Bytes for page number 2 bytes for ID in page
    (4294967296 pages,65536 rows)

12
Columns Which Should be Indexed
  • Columns frequently used in WHERE clause in
    queries (searches)
  • Columns that are not frequently updated
  • Column must have sufficient cardinality to be a
    viable choice for DB2 Optimizer
  • Definition of Cardinality - Number of elements of
    the set
  • Tables Number of rows in table
  • Indexes Number of distinct values in index key
  • Column Number of distinct values in column
  • Unique index cardinality matches table
    cardinality

13
Disadvantages of Indexes
  • Increased disk storage
  • 6/4 Bytes per RIDsizeof(columns indexed)1
    B-Tree Map overhead
  • pseudo-deleted, possibly uncommitted, and RID
    hole
  • Increased overhead for update activity
  • Each insert and delete causes index update
  • Updates to indexed columns
  • Locking problems more likely with indexes if
    concurrent write access
  • Increased memory usage

14
Index Tuning Myths
  • Many redundant indexes
  • Performance problem? Add this index
  • Results in more indexes to update
  • Possible buffer pool contention
  • Columns added to achieve
  • Index keys unique or with higher cardinality
  • Index only access
  • Do not need to go to the tablespace to fetch a
    row
  • Another reason NOT to do select all the time
  • Issue with ORM layers such as hibernate
  • Fewer index entries per page
  • More index I/O

15
Index Change Guidelines
  • Do not change the primary index
  • Unless you have defined a superkey and you are
    really sure
  • Index only access
  • Reorder indexes
  • Low card and no skew
  • Check clustering index placement
  • By default it is the primary key unless specified
  • Fact tables typically have the wrong cluster
  • Remove redundant indexes
  • Ways to handle this will be discussed later
  • Remove unnecessary index columns
  • Dont aim for index only access unless its
    necessary
  • Add index columns if needed
  • Add indexes carefully when necessary

16
1. Primary Index
  • Do not alter primary index columns
  • Column addition or removal will affect uniqueness
    of key, affect the integrity of the application
  • Only two cases when you adjust your primary index
  • Non optimal order of columns
  • Dont need to worry about application problems as
    it is still the same primary key
  • You have defined a superkey that is not the
    candidate key
  • Reduce the columns in your primary key

17
Unique or Duplicates Allowed
  • Primary indexes
  • Unique
  • Unique Indexes
  • Might break application
  • Duplicate indexes
  • Ensure sufficient cardinality for index to exist
  • Avoid measures to artificially increase
    cardinality
  • DB2 indexes do not require artificial measures to
    handle duplicate RIDs

18
2. Index Only Access
  • Index only access is useful for on-line
    transaction processing
  • When single or very few columns are needed
  • Often overused
  • Results in long keys
  • May still need more columns from table
  • Fewer keys on each index page

19
3. Reorder indexes
20
4. Clustering Index
  • Key physical design consideration
  • Sometimes placed on primary key by mistake
  • Meaning of clustering differs between DBMS
  • Place clustering index on the most time-sensitive
    process
  • Valuable low cardinality index
  • Highest use with range predicates or large joins
  • Higher chance of a merge join
  • To avoid sorts
  • To uniformly spread INSERTs
  • Actively choose a clustering index

21
Clustering Index
  • Clustering index stores data in a table in the
    order of the index
  • This index is best for retrieving multiple rows
  • More qualified values found per page
  • Great for joins that touch a large number of rows
  • Clustering defined as CLUSTER
  • Clustered describes physical state
  • Clusterratio
  • Respected by inserts
  • Restored by REORG

22
Clustering Order
23
Candidates For Clustering Index
  • Column(s) with low cardinality
  • May be only way for index to be chosen
  • Large joins
  • Columns frequently used in range predicates
  • lt, gt, BETWEEN, LIKE, IN, or non-unique values
  • Columns frequently processed in sequence
  • GROUP BY, ORDER BY
  • Primary key for cursor repositioning / batch
    restart
  • Clustering does not help when unique column(s)
    are used with equal predicates
  • Return of 1 row

24
Changing Clustering Example
  • Table Name Database Tblspace Cols
    Rows Pages
  • F6.PS_PYMNT_VCHR_XREF FSPF6AP APLARG2 53
    398603 65696
  • Index Name Table Name UCls
    Col Keys
  • F6.PS_PYMNT_VCHR_XREF F6.PS_PYMNT_VCHR_XREF U NN
    3 398603
  • F6.PSPYMNT_VCHR_XREF F6.PS_PYMNT_VCHR_XREF D NN
    5 398580
  • F6.PSAPYMNT_VCHR_XREF F6.PS_PYMNT_VCHR_XREF D YY
    3 206
  • F6.PSBPYMNT_VCHR_XREF F6.PS_PYMNT_VCHR_XREF D NN
    4 84832

25
Changing Clustering Example
  • Index Name Column Seq
    O Lth DatTyp
  • F6.PS_PYMNT_VCHR_XREF BUSINESS_UNIT 1
    A 5 CHAR
  • F6.PS_PYMNT_VCHR_XREF VOUCHER_ID 2
    D 8 CHAR
  • F6.PS_PYMNT_VCHR_XREF PYMNT_CNT 3
    A 4 INT
  • F6.PSPYMNT_VCHR_XREF BUSINESS_UNIT 1
    A 5 CHAR
  • F6.PSPYMNT_VCHR_XREF VOUCHER_ID 2
    D 8 CHAR
  • F6.PSPYMNT_VCHR_XREF PYMNT_ID 3
    A 10 CHAR
  • F6.PSPYMNT_VCHR_XREF BANK_CD 4
    A 5 CHAR
  • F6.PSPYMNT_VCHR_XREF BANK_ACCT_KEY 5
    A 4 CHAR
  • F6.PSAPYMNT_VCHR_XREF PAY_CYCLE 1
    A 6 CHAR
  • F6.PSAPYMNT_VCHR_XREF PAY_CYCLE_SEQ_NUM 2
    D 4 INT
  • F6.PSAPYMNT_VCHR_XREF PYMNT_SELCT_STATUS 3
    A 1 CHAR
  • F6.PSBPYMNT_VCHR_XREF BANK_SETID 1
    A 5 CHAR
  • F6.PSBPYMNT_VCHR_XREF BANK_CD 2
    A 5 CHAR
  • F6.PSBPYMNT_VCHR_XREF BANK_ACCT_KEY 3
    A 4 CHAR
  • F6.PSBPYMNT_VCHR_XREF PYMNT_ID 4
    A 10 CHAR

26
Changing Clustering Index
  • Drop and redefine indexes
  • Tougher if primary index is involved
  • No alter index like mainframe
  • Or Reorg with INDEX command via desired index

27
MDC Indexes
  • Dimension
  • Block index column
  • eg, year, region, itemId
  • Slice
  • Key value in one dimension, eg. year 1997
  • Cell
  • Unique set of dimension values,
  • eg, year 1997,
  • region Canada, AND
  • itemId 1

28
MDC Design
  • Choose dimensions that are not too granular
  • More rows per cell
  • Consider generated columns
  • Can even build only one dimension
  • Maybe alternative to standard clustering index
  • Choice of extent size is important
  • Larger extent size reduces I/O
  • Smaller block indexes, inserts quicker

29
Generated Columns (LUW)
  • Stored value is computed using an expression
  • Rather than through an INSERT or UPDATE
  • Generated columns for frequently used expressions
  • Optimization improved by query rewrite
  • Can index generated columns (non-unique)
  • Maybe improve join strategies
  • Fragment from table definition
  • DISC_TIME TIMESTAMP,
  • DISC_month integer generated always as
    (Month(DISC_TIME)),
  • DISC_YEAR integer generated always as
    (YEAR(DISC_TIME))
  • Adding generated column via ALTER TABLE requires
    constraint checking

30
MDC Design
  • Choose dimensions that are not too granular
  • More rows per cell
  • Consider generated columns
  • Can even build only one dimension
  • Maybe alternative to standard clustering index
  • Choice of extent size is important
  • Larger extent size reduces I/O
  • Smaller block indexes, inserts quicker
  • Slower load time
  • MDC Async Delete in 9.5

31
4. Remove Redundant Indexes
  • Throwing mud at a brick wall design
  • Competition for buffer pool space
  • Two indexes, very similar in design, might both
    be chosen when one would do
  • Let cardinality in first columns of one index be
    used for both
  • Capture all the SQL against your system and run
    it against the design advisor

32
Vendor Table Example
33
Improved Vendor Indexes
34
Indexing Recommendations
35
5. Unnecessary Index Columns
  • Some columns do not help narrow search
  • Redundant if front columns have narrowed search
    already
  • Remove columns to make index key smaller and have
    more entries per page
  • Additional columns may help with joins
  • Leave columns in only if no other columns are
    needed from the table.

36
Number Order of Columns
  • Myth place columns with highest cardinality as
    first column
  • Reality Place columns most often known first in
    an index
  • Of course, must have sufficient cardinality to be
    useful

37
6. Add Columns To Indexes Where Helpful
  • Worst case to date
  • 20 indexes on 1 table, each with a single column
  • Many columns had low cardinality not useful at
    all
  • Add columns to
  • Further narrow a search
  • Avoid a sort
  • Give index only access in special cases
  • Consider multi-column indexes versus multiple
    indexes

38
7. Add Indexes Only If Necessary
  • Sometimes, insufficient indexes have been defined
    and used
  • Seen as many table scans and prefetch
  • I/O is the enemy
  • Have properly designed indexes that stay in the
    buffer pool
  • Avoid extra demand for buffer pool space

39
Verification Tools
  • The DB2 Catalog
  • Explain
  • Visual Explain
  • SQL Capture
  • Methods / tools to capture SQL
  • Design Advisor
  • A DB2 LUW feature, but usable for all platforms

40
The DB2 Catalog
  • DB2 catalog should be queried for relevant index
    definitions and statistics
  • Number of columns and order
  • Statistics
  • Cardinality
  • Number of levels
  • Index size versus table

41
Comparing Table and Index Statistics
  • SELECT I.NAME, NPAGES, CARD, FIRSTKEYCARD AS
    FIRSTK, FULLKEYCARD AS FULLKEY, NLEAF,
  • NLEVELS AS NLEV, CLUSTERING CLUSTERED AS CL,
    UNIQUERULE AS U, T.COLCOUNT AS TBCOL,
  • I.COLCOUNT AS IXCOL, RECLENGTH AS RECLEN FROM
    SYSCAT.TABLES T, SYSCAT.INDEXES I
  • WHERE T.CREATOR I.TBCREATOR AND T.CREATOR
    'F6' AND T.NAME I.TBNAME AND CARD gt20000
  • ORDER BY CARD DESC, 1
  • --------------------------------------------
    -------------------------------------
  • NAME NPAGES CARD FIRSTK
    FULLKEY NLEAF NLEV CL U TBCOL IXCOL RECLEN
  • --------------------------------------------
    -------------------------------------
  • PS_JRNL_LN 132381 6284688 6
    6284688 77261 4 YY U 28 6 199
  • PSAJRNL_LN 132381 6284688 1712
    2357 11740 3 NN D 28 3 199
  • PSBJRNL_LN 132381 6284688 36110
    6282385 58064 4 NY D 28 4 199
  • PSCJRNL_LN 132381 6284688 1
    50442 9061 3 NN D 28 4 199
  • PSDJRNL_LN 132381 6284688 721
    1183 11534 3 NY D 28 2 199
  • PS_DEPRECIATION 61952 3647937 4
    3647937 93694 4 YY U 24 16 137
  • PSADEPRECIATION 61952 3647937 33
    242 5794 3 NN D 24 3 137
  • PSBDEPRECIATION 61952 3647937 4
    156613 7478 3 NY D 24 5 137
  • PS_VCHR_ACCTG_LINE 85189 2679644 7
    2679644 39694 4 YY U 102 9 649
  • PSAVCHR_ACCTG_LINE 85189 2679644 3
    3117 4310 3 NN D 102 5 649
  • PSBVCHR_ACCTG_LINE 85189 2679644 317
    317 4092 3 NN D 102 1 649

42
Explain
  • Compare key columns against columns used
  • UNIX and Windows
  • db2expln and dynexpln show number used and
    defined
  • Plan table is optional but useful

43
SQL Capture
  • UNIX Windows
  • Snapshot monitor
  • Event monitor
  • Administrative views
  • Recent statements cache
  • Software products are available

44
Design Advisor
  • Callable via command line or Control Center

45
Design Advisor Calculate
46
Design Advisor Recommendations
47
Index Advisor Sample
  • db2advis -d pdss -t 3 -s " SELECT
    DCD_AMT_CLAIMED, DCD_AMT_PAID, DCD_AMT_COPAY,
    DCD_AMT_DEDUCT, DCD_AMT_PFA, DCD_AMT_RESERVES,
    DCD_AMT_DISALLOW, DCD_AUDNBR, DCD_AUDSUB,
    DCD_DTL_FLG, DCD_PROC_YMD, DCD_PAID_YMD,
    DCD_PROCEDURE, DCD_PROC, DCD_PROC_MOD, DCD_RECID,
    DCD_SITE_CDE, DCD_SERV_PAR, DCD_RSN_CDE,
    DCD_SYS_TIME
  • FROM QDSPR.VRG_DRDETL
  • WHERE DCD_SITE_ID '?' AND ( DCD_PROC_YMD
    BETWEEN '01/01/1000' AND '01/01/1000') AND
    DCD_PROC IN ('?','?','?')"
  • execution started at timestamp 2001-03-15-09.02.11
    .017097
  • Calculating initial cost (without recommmended
    indexes) 1017572.750000 timerons
  • Initial set of proposed indexes is ready.
  • Found maximum set of 1 recommended indexes
  • Cost of workload with all indexes included
    6581.913086 timerons
  • total disk space needed for initial set 2302 MB
  • total disk space constrained to -1 MB

48
Index Advisor Sample
  • 1 indexes in current solution
  • 1017572.7500 timerons (without indexes)
  • 6581.9136 timerons (with current solution)
  • 99.35 improvement
  • Trying variations of the solution set.
  • --
  • -- execution finished at timestamp
    2001-03-15-09.02.21.409868
  • --
  • -- LIST OF RECOMMENDED INDEXES
  • --
  • -- index1, 2302MB
  • CREATE INDEX WIZ2 ON "QDSPR
    "."TDS00E00_DRCLDETLA" ("DCD_SITE_NBR" ASC,
    "DCD_SITE_ID" ASC, "DCD_PROC_YMD" ASC, "DCD_PROC"
    ASC, "DCD_RECID" ASC)
  • --
  • Index Advisor tool is finished.

49
Design Advisor
  • Strengths
  • Good to review indexes on tables - easier in v8.2
  • Considerations
  • Where to get workload statistics
  • Event monitor data / statement level trace /
    recent statements history / product
  • Still need to choose the clustering index
  • Single statement use
  • Could lead to over-indexing
  • Likes to recommend index-only

50
The Real World Isnt So Simple
  • Total Total Total Total
    Sort Rows Rows Count
  • Application NameCPU Sort Sorts
    ElapsedOverfloRead Written
  • Used Time Time
  • cfCannedSearch 24.580 6416
    31591.11 3 460K 374K 2
  • Package Name
  • QUERY011 24.580 6416
    31591.11 3 460K 374K 1
  • Section Number
  • 3 24.580 6416
    31591.11 3 460K 374K 1
  •  
  • SELECT DISTINCT A.PROJECTNO
  • FROM AGENCY_VIEW V, FILINGGENERALINFO A,
    FILINGSUBMISSIONS B
  • WHERE V.USERID H00001 AND V.PROJECTNO
    A.PROJECTNO AND
  • A.PROJECTNO B.PROJECTNO AND (
    (B.SUBMITDATE H00002 AND
  • B.SUBMITTIME gt H00003 ) OR (B.SUBMITDATE
    gt H00002 ) )
  • AND ( (B.SUBMITDATE H00004 AND
    B.SUBMITTIME lt H00005 )
  • OR (B.SUBMITDATE lt H00004 ) ) AND
    A.CATFILING IN ( H00006, H00007 , H00008
  • , H00009 , H00010 , H00011 ,
    H00012 , H00013 , H00014 , H00015
  • , H00016 , H00017 , H00018 ,
    H00019 , H00020 , H00021 , H00022
  • , H00023 , H00024 , H00025 ) AND (
    (A.LASTUPDATE H00026

51
Cardinality Exercise
  • Objective Using simple calculations, determine
    whether if indexes will improve performance
  • Table 500,000 rows on 24,000 pages
  • What is the minimum usable cardinality for a
    non-unique, non-clustered index?
  • Consider
  • Number of prefetch I/Os assuming prefetch
    quantity of 32
  • Versus
  • Number of index and data page I/Os

52
Cardinality Calculations
  • Prefetch I/Os to read whole table
  • 24,000 / 32 750 prefetch I/Os
  • Non-clustered
  • Index keys/RIDs located
  • Reverse solving for data I/O of 750 pages read
    500,000 / 750 667 minimum fullkeycard
  • Reality in this case, be suspicious of any index
    with lt 2,000 fullkeycard
  • Verify results with Explain
  • 100 Clustered index
  • Rows per page 500,000 / 24,000 20 RPP
  • 20 times fewer I/O on full key

53
Summary
  • Index redesign is done to correct initial
    physical design errors
  • Too many indexes and/or columns
  • Too few indexes and/or columns
  • Usually same problem exists throughout design
  • Careful analysis is required
  • Benefits justify the effort

54
Thanks for coming
Session 1302A A Simple approach to DB2 for LUW
Index Redesign
  • Rob WilliamsMartin Hubel Consulting Inc.
  • www.mhubel.com martin_at_mhubel.comWith Thanks
    to Embarcadero Technologies

55
IBM and IDUG announce Worldwide XML Contest
The largest Investment in XML community
worldwide 200,000 Participants Worldwide from
more than 30 countries
Starts September 2008
Winners announced December 2008
Many ways to participate Create a video Write
Xquery and SQL queries Submit a database
utility Port an application Write a pureXML
application
Open to students and professionals
Dont forget to pick up contest bookmarks at the
DB2 ped
Write a Comment
User Comments (0)
About PowerShow.com