Title: A Simple Approach to DB2 for LUW Index Redesign Session 1302A
1A Simple Approach to DB2 for LUW Index
RedesignSession 1302A
- Rob Williams, MHC Inc
- rob_at_mhubel.com
2Agenda
- Objectives
- Benefits of index redesign
- Index structure
- B index structure
- Definition of cardinality
- Column that should/ should not be indexed
- The Approach
- The 7 steps to index redesign
- Tools and verification
- DB2 Catalog
- DB2 Explain
- Index advisor
3Tuning Index Design
- Almost all index tuning involves NO changes to
SQL text - gt80 of benefits of tuning lt5 of SQL
- Get physical design in order
- Index changes
- RUNSTATS
- Buffer pools
4Index Changes
- Primary physical design technique available for
existing applications - Table and SQL changes are more time consuming
- Application testing is required
- Index changes often require no functional testing
- May wish to change clustering index
- At most require rebind of static SQL
- J2EE servers sometimes need a reboot
5Logical I/O
DB2 Table
- Hopefully, answer sets are small
- Design indexes and SQL to touch less data when
building result table - Concurrency issues
DB2 Processing
Answer Set
6A Key Issue
- One key issue for SQL performance
- Result set size
- Whenever you write or review an SQL statement,
ask yourself - How many rows should be returned from this
statement? - Compare the answer set size to the size of the
tables - Optimal indexing should reduce the number of rows
touched to produce the result - Event monitor
7Indexes
- Faster access to data
- Enforce primary keys in DB2 referential integrity
- Strongly consider an index on each foreign key
- Ensure uniqueness of values in non-key columns
- Indexes can be added or dropped at any time (e.g.
add indexes for special time of year processing)
8Physical Characteristics of Indexes
- Indexes are stored in index space
- Can be stored in the same tablespace as your data
or in a separate database - Because a bufferpool is attached at a tablespace
level having the index in a separate tablespace
gives you more control over memory - All indexes are B-Trees
- Logarithmic amortized time for search, insert,
update, delete - B-Trees are not optimized for in memory
processing, they designed to minimize the number
of IO requests - Look at solidDB
- A table can have multiple indexes
9DB2 Index Structure
- Indexes can be built using multiple columns
- Indexes can be multi-column
- Uses as many columns as possible to delimit index
scans - Can use inequalities, e.g. gt, gt, lt, lt
- Can use index columns to avoid sort
- Index clustering
- Restored by REORG to specified index
- Can specify clustering index
- Statistic collected by RUNSTATS
10Index Structure
- Each time root page splits, a new index level is
added - requires one additional I/O
11The Row Identifier (RID)
- The RID is used within indexes to point to the TS
location of a row - Stores RIDs of duplicate keys in order
- Normal RID is 4 bytes Large is 6 bytes in LUW
- Large now the default as of 9.5 LUW
- 4 Bytes for page number 2 bytes for ID in page
(4294967296 pages,65536 rows)
12Columns Which Should be Indexed
- Columns frequently used in WHERE clause in
queries (searches) - Columns that are not frequently updated
- Column must have sufficient cardinality to be a
viable choice for DB2 Optimizer - Definition of Cardinality - Number of elements of
the set - Tables Number of rows in table
- Indexes Number of distinct values in index key
- Column Number of distinct values in column
- Unique index cardinality matches table
cardinality
13Disadvantages of Indexes
- Increased disk storage
- 6/4 Bytes per RIDsizeof(columns indexed)1
B-Tree Map overhead - pseudo-deleted, possibly uncommitted, and RID
hole - Increased overhead for update activity
- Each insert and delete causes index update
- Updates to indexed columns
- Locking problems more likely with indexes if
concurrent write access - Increased memory usage
14Index Tuning Myths
- Many redundant indexes
- Performance problem? Add this index
- Results in more indexes to update
- Possible buffer pool contention
- Columns added to achieve
- Index keys unique or with higher cardinality
- Index only access
- Do not need to go to the tablespace to fetch a
row - Another reason NOT to do select all the time
- Issue with ORM layers such as hibernate
- Fewer index entries per page
- More index I/O
15Index Change Guidelines
- Do not change the primary index
- Unless you have defined a superkey and you are
really sure - Index only access
- Reorder indexes
- Low card and no skew
- Check clustering index placement
- By default it is the primary key unless specified
- Fact tables typically have the wrong cluster
- Remove redundant indexes
- Ways to handle this will be discussed later
- Remove unnecessary index columns
- Dont aim for index only access unless its
necessary - Add index columns if needed
- Add indexes carefully when necessary
161. Primary Index
- Do not alter primary index columns
- Column addition or removal will affect uniqueness
of key, affect the integrity of the application - Only two cases when you adjust your primary index
- Non optimal order of columns
- Dont need to worry about application problems as
it is still the same primary key - You have defined a superkey that is not the
candidate key - Reduce the columns in your primary key
17Unique or Duplicates Allowed
- Primary indexes
- Unique
- Unique Indexes
- Might break application
- Duplicate indexes
- Ensure sufficient cardinality for index to exist
- Avoid measures to artificially increase
cardinality - DB2 indexes do not require artificial measures to
handle duplicate RIDs
182. Index Only Access
- Index only access is useful for on-line
transaction processing - When single or very few columns are needed
- Often overused
- Results in long keys
- May still need more columns from table
- Fewer keys on each index page
193. Reorder indexes
204. Clustering Index
- Key physical design consideration
- Sometimes placed on primary key by mistake
- Meaning of clustering differs between DBMS
- Place clustering index on the most time-sensitive
process - Valuable low cardinality index
- Highest use with range predicates or large joins
- Higher chance of a merge join
- To avoid sorts
- To uniformly spread INSERTs
- Actively choose a clustering index
21Clustering Index
- Clustering index stores data in a table in the
order of the index - This index is best for retrieving multiple rows
- More qualified values found per page
- Great for joins that touch a large number of rows
- Clustering defined as CLUSTER
- Clustered describes physical state
- Clusterratio
- Respected by inserts
- Restored by REORG
22Clustering Order
23Candidates For Clustering Index
- Column(s) with low cardinality
- May be only way for index to be chosen
- Large joins
- Columns frequently used in range predicates
- lt, gt, BETWEEN, LIKE, IN, or non-unique values
- Columns frequently processed in sequence
- GROUP BY, ORDER BY
- Primary key for cursor repositioning / batch
restart - Clustering does not help when unique column(s)
are used with equal predicates - Return of 1 row
24Changing Clustering Example
- Table Name Database Tblspace Cols
Rows Pages - F6.PS_PYMNT_VCHR_XREF FSPF6AP APLARG2 53
398603 65696 - Index Name Table Name UCls
Col Keys - F6.PS_PYMNT_VCHR_XREF F6.PS_PYMNT_VCHR_XREF U NN
3 398603 - F6.PSPYMNT_VCHR_XREF F6.PS_PYMNT_VCHR_XREF D NN
5 398580 - F6.PSAPYMNT_VCHR_XREF F6.PS_PYMNT_VCHR_XREF D YY
3 206 - F6.PSBPYMNT_VCHR_XREF F6.PS_PYMNT_VCHR_XREF D NN
4 84832
25Changing Clustering Example
- Index Name Column Seq
O Lth DatTyp - F6.PS_PYMNT_VCHR_XREF BUSINESS_UNIT 1
A 5 CHAR - F6.PS_PYMNT_VCHR_XREF VOUCHER_ID 2
D 8 CHAR - F6.PS_PYMNT_VCHR_XREF PYMNT_CNT 3
A 4 INT - F6.PSPYMNT_VCHR_XREF BUSINESS_UNIT 1
A 5 CHAR - F6.PSPYMNT_VCHR_XREF VOUCHER_ID 2
D 8 CHAR - F6.PSPYMNT_VCHR_XREF PYMNT_ID 3
A 10 CHAR - F6.PSPYMNT_VCHR_XREF BANK_CD 4
A 5 CHAR - F6.PSPYMNT_VCHR_XREF BANK_ACCT_KEY 5
A 4 CHAR - F6.PSAPYMNT_VCHR_XREF PAY_CYCLE 1
A 6 CHAR - F6.PSAPYMNT_VCHR_XREF PAY_CYCLE_SEQ_NUM 2
D 4 INT - F6.PSAPYMNT_VCHR_XREF PYMNT_SELCT_STATUS 3
A 1 CHAR - F6.PSBPYMNT_VCHR_XREF BANK_SETID 1
A 5 CHAR - F6.PSBPYMNT_VCHR_XREF BANK_CD 2
A 5 CHAR - F6.PSBPYMNT_VCHR_XREF BANK_ACCT_KEY 3
A 4 CHAR - F6.PSBPYMNT_VCHR_XREF PYMNT_ID 4
A 10 CHAR
26Changing Clustering Index
- Drop and redefine indexes
- Tougher if primary index is involved
- No alter index like mainframe
- Or Reorg with INDEX command via desired index
27MDC Indexes
- Dimension
- Block index column
- eg, year, region, itemId
- Slice
- Key value in one dimension, eg. year 1997
- Cell
- Unique set of dimension values,
- eg, year 1997,
- region Canada, AND
- itemId 1
28MDC Design
- Choose dimensions that are not too granular
- More rows per cell
- Consider generated columns
- Can even build only one dimension
- Maybe alternative to standard clustering index
- Choice of extent size is important
- Larger extent size reduces I/O
- Smaller block indexes, inserts quicker
29Generated Columns (LUW)
- Stored value is computed using an expression
- Rather than through an INSERT or UPDATE
- Generated columns for frequently used expressions
- Optimization improved by query rewrite
- Can index generated columns (non-unique)
- Maybe improve join strategies
- Fragment from table definition
- DISC_TIME TIMESTAMP,
- DISC_month integer generated always as
(Month(DISC_TIME)), - DISC_YEAR integer generated always as
(YEAR(DISC_TIME)) - Adding generated column via ALTER TABLE requires
constraint checking
30MDC Design
- Choose dimensions that are not too granular
- More rows per cell
- Consider generated columns
- Can even build only one dimension
- Maybe alternative to standard clustering index
- Choice of extent size is important
- Larger extent size reduces I/O
- Smaller block indexes, inserts quicker
- Slower load time
- MDC Async Delete in 9.5
314. Remove Redundant Indexes
- Throwing mud at a brick wall design
- Competition for buffer pool space
- Two indexes, very similar in design, might both
be chosen when one would do - Let cardinality in first columns of one index be
used for both - Capture all the SQL against your system and run
it against the design advisor
32Vendor Table Example
33Improved Vendor Indexes
34Indexing Recommendations
355. Unnecessary Index Columns
- Some columns do not help narrow search
- Redundant if front columns have narrowed search
already - Remove columns to make index key smaller and have
more entries per page - Additional columns may help with joins
- Leave columns in only if no other columns are
needed from the table.
36Number Order of Columns
- Myth place columns with highest cardinality as
first column - Reality Place columns most often known first in
an index - Of course, must have sufficient cardinality to be
useful
376. Add Columns To Indexes Where Helpful
- Worst case to date
- 20 indexes on 1 table, each with a single column
- Many columns had low cardinality not useful at
all - Add columns to
- Further narrow a search
- Avoid a sort
- Give index only access in special cases
- Consider multi-column indexes versus multiple
indexes
387. Add Indexes Only If Necessary
- Sometimes, insufficient indexes have been defined
and used - Seen as many table scans and prefetch
- I/O is the enemy
- Have properly designed indexes that stay in the
buffer pool - Avoid extra demand for buffer pool space
39Verification Tools
- The DB2 Catalog
- Explain
- Visual Explain
- SQL Capture
- Methods / tools to capture SQL
- Design Advisor
- A DB2 LUW feature, but usable for all platforms
40The DB2 Catalog
- DB2 catalog should be queried for relevant index
definitions and statistics - Number of columns and order
- Statistics
- Cardinality
- Number of levels
- Index size versus table
41Comparing Table and Index Statistics
- SELECT I.NAME, NPAGES, CARD, FIRSTKEYCARD AS
FIRSTK, FULLKEYCARD AS FULLKEY, NLEAF, - NLEVELS AS NLEV, CLUSTERING CLUSTERED AS CL,
UNIQUERULE AS U, T.COLCOUNT AS TBCOL, - I.COLCOUNT AS IXCOL, RECLENGTH AS RECLEN FROM
SYSCAT.TABLES T, SYSCAT.INDEXES I - WHERE T.CREATOR I.TBCREATOR AND T.CREATOR
'F6' AND T.NAME I.TBNAME AND CARD gt20000 - ORDER BY CARD DESC, 1
- --------------------------------------------
------------------------------------- - NAME NPAGES CARD FIRSTK
FULLKEY NLEAF NLEV CL U TBCOL IXCOL RECLEN - --------------------------------------------
------------------------------------- - PS_JRNL_LN 132381 6284688 6
6284688 77261 4 YY U 28 6 199 - PSAJRNL_LN 132381 6284688 1712
2357 11740 3 NN D 28 3 199 - PSBJRNL_LN 132381 6284688 36110
6282385 58064 4 NY D 28 4 199 - PSCJRNL_LN 132381 6284688 1
50442 9061 3 NN D 28 4 199 - PSDJRNL_LN 132381 6284688 721
1183 11534 3 NY D 28 2 199 - PS_DEPRECIATION 61952 3647937 4
3647937 93694 4 YY U 24 16 137 - PSADEPRECIATION 61952 3647937 33
242 5794 3 NN D 24 3 137 - PSBDEPRECIATION 61952 3647937 4
156613 7478 3 NY D 24 5 137 - PS_VCHR_ACCTG_LINE 85189 2679644 7
2679644 39694 4 YY U 102 9 649 - PSAVCHR_ACCTG_LINE 85189 2679644 3
3117 4310 3 NN D 102 5 649 - PSBVCHR_ACCTG_LINE 85189 2679644 317
317 4092 3 NN D 102 1 649
42Explain
- Compare key columns against columns used
- UNIX and Windows
- db2expln and dynexpln show number used and
defined - Plan table is optional but useful
43SQL Capture
- UNIX Windows
- Snapshot monitor
- Event monitor
- Administrative views
- Recent statements cache
- Software products are available
44Design Advisor
- Callable via command line or Control Center
45Design Advisor Calculate
46Design Advisor Recommendations
47Index Advisor Sample
- db2advis -d pdss -t 3 -s " SELECT
DCD_AMT_CLAIMED, DCD_AMT_PAID, DCD_AMT_COPAY,
DCD_AMT_DEDUCT, DCD_AMT_PFA, DCD_AMT_RESERVES,
DCD_AMT_DISALLOW, DCD_AUDNBR, DCD_AUDSUB,
DCD_DTL_FLG, DCD_PROC_YMD, DCD_PAID_YMD,
DCD_PROCEDURE, DCD_PROC, DCD_PROC_MOD, DCD_RECID,
DCD_SITE_CDE, DCD_SERV_PAR, DCD_RSN_CDE,
DCD_SYS_TIME - FROM QDSPR.VRG_DRDETL
- WHERE DCD_SITE_ID '?' AND ( DCD_PROC_YMD
BETWEEN '01/01/1000' AND '01/01/1000') AND
DCD_PROC IN ('?','?','?')" - execution started at timestamp 2001-03-15-09.02.11
.017097 - Calculating initial cost (without recommmended
indexes) 1017572.750000 timerons - Initial set of proposed indexes is ready.
- Found maximum set of 1 recommended indexes
- Cost of workload with all indexes included
6581.913086 timerons - total disk space needed for initial set 2302 MB
- total disk space constrained to -1 MB
48Index Advisor Sample
- 1 indexes in current solution
- 1017572.7500 timerons (without indexes)
- 6581.9136 timerons (with current solution)
- 99.35 improvement
- Trying variations of the solution set.
- --
- -- execution finished at timestamp
2001-03-15-09.02.21.409868 - --
- -- LIST OF RECOMMENDED INDEXES
- --
- -- index1, 2302MB
- CREATE INDEX WIZ2 ON "QDSPR
"."TDS00E00_DRCLDETLA" ("DCD_SITE_NBR" ASC,
"DCD_SITE_ID" ASC, "DCD_PROC_YMD" ASC, "DCD_PROC"
ASC, "DCD_RECID" ASC) - --
- Index Advisor tool is finished.
49Design Advisor
- Strengths
- Good to review indexes on tables - easier in v8.2
- Considerations
- Where to get workload statistics
- Event monitor data / statement level trace /
recent statements history / product - Still need to choose the clustering index
- Single statement use
- Could lead to over-indexing
- Likes to recommend index-only
50The Real World Isnt So Simple
- Total Total Total Total
Sort Rows Rows Count - Application NameCPU Sort Sorts
ElapsedOverfloRead Written - Used Time Time
- cfCannedSearch 24.580 6416
31591.11 3 460K 374K 2 - Package Name
- QUERY011 24.580 6416
31591.11 3 460K 374K 1 - Section Number
- 3 24.580 6416
31591.11 3 460K 374K 1 -
- SELECT DISTINCT A.PROJECTNO
- FROM AGENCY_VIEW V, FILINGGENERALINFO A,
FILINGSUBMISSIONS B - WHERE V.USERID H00001 AND V.PROJECTNO
A.PROJECTNO AND - A.PROJECTNO B.PROJECTNO AND (
(B.SUBMITDATE H00002 AND - B.SUBMITTIME gt H00003 ) OR (B.SUBMITDATE
gt H00002 ) ) - AND ( (B.SUBMITDATE H00004 AND
B.SUBMITTIME lt H00005 ) - OR (B.SUBMITDATE lt H00004 ) ) AND
A.CATFILING IN ( H00006, H00007 , H00008 - , H00009 , H00010 , H00011 ,
H00012 , H00013 , H00014 , H00015 - , H00016 , H00017 , H00018 ,
H00019 , H00020 , H00021 , H00022 - , H00023 , H00024 , H00025 ) AND (
(A.LASTUPDATE H00026
51Cardinality Exercise
- Objective Using simple calculations, determine
whether if indexes will improve performance - Table 500,000 rows on 24,000 pages
- What is the minimum usable cardinality for a
non-unique, non-clustered index? - Consider
- Number of prefetch I/Os assuming prefetch
quantity of 32 - Versus
- Number of index and data page I/Os
52Cardinality Calculations
- Prefetch I/Os to read whole table
- 24,000 / 32 750 prefetch I/Os
- Non-clustered
- Index keys/RIDs located
- Reverse solving for data I/O of 750 pages read
500,000 / 750 667 minimum fullkeycard - Reality in this case, be suspicious of any index
with lt 2,000 fullkeycard - Verify results with Explain
- 100 Clustered index
- Rows per page 500,000 / 24,000 20 RPP
- 20 times fewer I/O on full key
53Summary
- Index redesign is done to correct initial
physical design errors - Too many indexes and/or columns
- Too few indexes and/or columns
- Usually same problem exists throughout design
- Careful analysis is required
- Benefits justify the effort
54Thanks for coming
Session 1302A A Simple approach to DB2 for LUW
Index Redesign
- Rob WilliamsMartin Hubel Consulting Inc.
- www.mhubel.com martin_at_mhubel.comWith Thanks
to Embarcadero Technologies
55IBM and IDUG announce Worldwide XML Contest
The largest Investment in XML community
worldwide 200,000 Participants Worldwide from
more than 30 countries
Starts September 2008
Winners announced December 2008
Many ways to participate Create a video Write
Xquery and SQL queries Submit a database
utility Port an application Write a pureXML
application
Open to students and professionals
Dont forget to pick up contest bookmarks at the
DB2 ped