A Simple Approach to DB2 for LUW Index Redesign Session 1302A - PowerPoint PPT Presentation

1 / 55

About This Presentation

Title:

A Simple Approach to DB2 for LUW Index Redesign Session 1302A

Description:

Almost all index tuning involves NO changes to SQL text 80% of benefits ... Uses as many columns as possible to delimit index scans. Can use inequalities, e.g. ... – PowerPoint PPT presentation

Number of Views:139

Avg rating:3.0/5.0

Slides: 56

Provided by: benha8

Category:

more less

Transcript and Presenter's Notes

Title: A Simple Approach to DB2 for LUW Index Redesign Session 1302A

1
A Simple Approach to DB2 for LUW Index
RedesignSession 1302A

Rob Williams, MHC Inc
rob_at_mhubel.com

2
Agenda

Objectives
Benefits of index redesign
Index structure
B index structure
Definition of cardinality
Column that should/ should not be indexed
The Approach
The 7 steps to index redesign
Tools and verification
DB2 Catalog
DB2 Explain
Index advisor

3
Tuning Index Design

Almost all index tuning involves NO changes to
SQL text
gt80 of benefits of tuning lt5 of SQL
Get physical design in order
Index changes
RUNSTATS
Buffer pools

4
Index Changes

Primary physical design technique available for
existing applications
Table and SQL changes are more time consuming
Application testing is required
Index changes often require no functional testing
May wish to change clustering index
At most require rebind of static SQL
J2EE servers sometimes need a reboot

5
Logical I/O
DB2 Table

Hopefully, answer sets are small
Design indexes and SQL to touch less data when
building result table
Concurrency issues

DB2 Processing
Answer Set
6
A Key Issue

One key issue for SQL performance
Result set size
Whenever you write or review an SQL statement,
ask yourself
How many rows should be returned from this
statement?
Compare the answer set size to the size of the
tables
Optimal indexing should reduce the number of rows
touched to produce the result
Event monitor

7
Indexes

Faster access to data
Enforce primary keys in DB2 referential integrity
Strongly consider an index on each foreign key
Ensure uniqueness of values in non-key columns
Indexes can be added or dropped at any time (e.g.
add indexes for special time of year processing)

8
Physical Characteristics of Indexes

Indexes are stored in index space
Can be stored in the same tablespace as your data
or in a separate database
Because a bufferpool is attached at a tablespace
level having the index in a separate tablespace
gives you more control over memory
All indexes are B-Trees
Logarithmic amortized time for search, insert,
update, delete
B-Trees are not optimized for in memory
processing, they designed to minimize the number
of IO requests
Look at solidDB
A table can have multiple indexes

9
DB2 Index Structure

Indexes can be built using multiple columns
Indexes can be multi-column
Uses as many columns as possible to delimit index
scans
Can use inequalities, e.g. gt, gt, lt, lt
Can use index columns to avoid sort
Index clustering
Restored by REORG to specified index
Can specify clustering index
Statistic collected by RUNSTATS

10
Index Structure

Each time root page splits, a new index level is
added - requires one additional I/O

11
The Row Identifier (RID)

The RID is used within indexes to point to the TS
location of a row
Stores RIDs of duplicate keys in order
Normal RID is 4 bytes Large is 6 bytes in LUW
Large now the default as of 9.5 LUW
4 Bytes for page number 2 bytes for ID in page
(4294967296 pages,65536 rows)

12
Columns Which Should be Indexed

Columns frequently used in WHERE clause in
queries (searches)
Columns that are not frequently updated
Column must have sufficient cardinality to be a
viable choice for DB2 Optimizer
Definition of Cardinality - Number of elements of
the set
Tables Number of rows in table
Indexes Number of distinct values in index key
Column Number of distinct values in column
Unique index cardinality matches table
cardinality

13
Disadvantages of Indexes

Increased disk storage
6/4 Bytes per RIDsizeof(columns indexed)1
B-Tree Map overhead
pseudo-deleted, possibly uncommitted, and RID
hole
Increased overhead for update activity
Each insert and delete causes index update
Updates to indexed columns
Locking problems more likely with indexes if
concurrent write access
Increased memory usage

14
Index Tuning Myths

Many redundant indexes
Performance problem? Add this index
Results in more indexes to update
Possible buffer pool contention
Columns added to achieve
Index keys unique or with higher cardinality
Index only access
Do not need to go to the tablespace to fetch a
row
Another reason NOT to do select all the time
Issue with ORM layers such as hibernate
Fewer index entries per page
More index I/O

15
Index Change Guidelines

Do not change the primary index
Unless you have defined a superkey and you are
really sure
Index only access
Reorder indexes
Low card and no skew
Check clustering index placement
By default it is the primary key unless specified
Fact tables typically have the wrong cluster
Remove redundant indexes
Ways to handle this will be discussed later
Remove unnecessary index columns
Dont aim for index only access unless its
necessary
Add index columns if needed
Add indexes carefully when necessary

16
1. Primary Index

Do not alter primary index columns
Column addition or removal will affect uniqueness
of key, affect the integrity of the application
Only two cases when you adjust your primary index
Non optimal order of columns
Dont need to worry about application problems as
it is still the same primary key
You have defined a superkey that is not the
candidate key
Reduce the columns in your primary key

17
Unique or Duplicates Allowed

Primary indexes
Unique
Unique Indexes
Might break application
Duplicate indexes
Ensure sufficient cardinality for index to exist
Avoid measures to artificially increase
cardinality
DB2 indexes do not require artificial measures to
handle duplicate RIDs

18
2. Index Only Access

Index only access is useful for on-line
transaction processing
When single or very few columns are needed
Often overused
Results in long keys
May still need more columns from table
Fewer keys on each index page

19
3. Reorder indexes
20
4. Clustering Index

Key physical design consideration
Sometimes placed on primary key by mistake
Meaning of clustering differs between DBMS
Place clustering index on the most time-sensitive
process
Valuable low cardinality index
Highest use with range predicates or large joins
Higher chance of a merge join
To avoid sorts
To uniformly spread INSERTs
Actively choose a clustering index

21
Clustering Index

Clustering index stores data in a table in the
order of the index
This index is best for retrieving multiple rows
More qualified values found per page
Great for joins that touch a large number of rows
Clustering defined as CLUSTER
Clustered describes physical state
Clusterratio
Respected by inserts
Restored by REORG

22
Clustering Order
23
Candidates For Clustering Index

Column(s) with low cardinality
May be only way for index to be chosen
Large joins
Columns frequently used in range predicates
lt, gt, BETWEEN, LIKE, IN, or non-unique values
Columns frequently processed in sequence
GROUP BY, ORDER BY
Primary key for cursor repositioning / batch
restart
Clustering does not help when unique column(s)
are used with equal predicates
Return of 1 row

24
Changing Clustering Example

Table Name Database Tblspace Cols
Rows Pages
F6.PS_PYMNT_VCHR_XREF FSPF6AP APLARG2 53
398603 65696
Index Name Table Name UCls
Col Keys
F6.PS_PYMNT_VCHR_XREF F6.PS_PYMNT_VCHR_XREF U NN
3 398603
F6.PSPYMNT_VCHR_XREF F6.PS_PYMNT_VCHR_XREF D NN
5 398580
F6.PSAPYMNT_VCHR_XREF F6.PS_PYMNT_VCHR_XREF D YY
3 206
F6.PSBPYMNT_VCHR_XREF F6.PS_PYMNT_VCHR_XREF D NN
4 84832

25
Changing Clustering Example

Index Name Column Seq
O Lth DatTyp
F6.PS_PYMNT_VCHR_XREF BUSINESS_UNIT 1
A 5 CHAR
F6.PS_PYMNT_VCHR_XREF VOUCHER_ID 2
D 8 CHAR
F6.PS_PYMNT_VCHR_XREF PYMNT_CNT 3
A 4 INT
F6.PSPYMNT_VCHR_XREF BUSINESS_UNIT 1
A 5 CHAR
F6.PSPYMNT_VCHR_XREF VOUCHER_ID 2
D 8 CHAR
F6.PSPYMNT_VCHR_XREF PYMNT_ID 3
A 10 CHAR
F6.PSPYMNT_VCHR_XREF BANK_CD 4
A 5 CHAR
F6.PSPYMNT_VCHR_XREF BANK_ACCT_KEY 5
A 4 CHAR
F6.PSAPYMNT_VCHR_XREF PAY_CYCLE 1
A 6 CHAR
F6.PSAPYMNT_VCHR_XREF PAY_CYCLE_SEQ_NUM 2
D 4 INT
F6.PSAPYMNT_VCHR_XREF PYMNT_SELCT_STATUS 3
A 1 CHAR
F6.PSBPYMNT_VCHR_XREF BANK_SETID 1
A 5 CHAR
F6.PSBPYMNT_VCHR_XREF BANK_CD 2
A 5 CHAR
F6.PSBPYMNT_VCHR_XREF BANK_ACCT_KEY 3
A 4 CHAR
F6.PSBPYMNT_VCHR_XREF PYMNT_ID 4
A 10 CHAR

26
Changing Clustering Index

Drop and redefine indexes
Tougher if primary index is involved
No alter index like mainframe
Or Reorg with INDEX command via desired index

27
MDC Indexes

Dimension
Block index column
eg, year, region, itemId
Slice
Key value in one dimension, eg. year 1997
Cell
Unique set of dimension values,
eg, year 1997,
region Canada, AND
itemId 1

28
MDC Design

Choose dimensions that are not too granular
More rows per cell
Consider generated columns
Can even build only one dimension
Maybe alternative to standard clustering index
Choice of extent size is important
Larger extent size reduces I/O
Smaller block indexes, inserts quicker

29
Generated Columns (LUW)

Stored value is computed using an expression
Rather than through an INSERT or UPDATE
Generated columns for frequently used expressions
Optimization improved by query rewrite
Can index generated columns (non-unique)
Maybe improve join strategies
Fragment from table definition
DISC_TIME TIMESTAMP,
DISC_month integer generated always as
(Month(DISC_TIME)),
DISC_YEAR integer generated always as
(YEAR(DISC_TIME))
Adding generated column via ALTER TABLE requires
constraint checking

30
MDC Design

Choose dimensions that are not too granular
More rows per cell
Consider generated columns
Can even build only one dimension
Maybe alternative to standard clustering index
Choice of extent size is important
Larger extent size reduces I/O
Smaller block indexes, inserts quicker
Slower load time
MDC Async Delete in 9.5

31
4. Remove Redundant Indexes

Throwing mud at a brick wall design
Competition for buffer pool space
Two indexes, very similar in design, might both
be chosen when one would do
Let cardinality in first columns of one index be
used for both
Capture all the SQL against your system and run
it against the design advisor

32
Vendor Table Example
33
Improved Vendor Indexes
34
Indexing Recommendations
35
5. Unnecessary Index Columns

Some columns do not help narrow search
Redundant if front columns have narrowed search
already
Remove columns to make index key smaller and have
more entries per page
Additional columns may help with joins
Leave columns in only if no other columns are
needed from the table.

36
Number Order of Columns

Myth place columns with highest cardinality as
first column
Reality Place columns most often known first in
an index
Of course, must have sufficient cardinality to be
useful

37
6. Add Columns To Indexes Where Helpful

Worst case to date
20 indexes on 1 table, each with a single column
Many columns had low cardinality not useful at
all
Add columns to
Further narrow a search
Avoid a sort
Give index only access in special cases
Consider multi-column indexes versus multiple
indexes

38
7. Add Indexes Only If Necessary

Sometimes, insufficient indexes have been defined
and used
Seen as many table scans and prefetch
I/O is the enemy
Have properly designed indexes that stay in the
buffer pool
Avoid extra demand for buffer pool space

39
Verification Tools

The DB2 Catalog
Explain
Visual Explain
SQL Capture
Methods / tools to capture SQL
Design Advisor
A DB2 LUW feature, but usable for all platforms

40
The DB2 Catalog

DB2 catalog should be queried for relevant index
definitions and statistics
Number of columns and order
Statistics
Cardinality
Number of levels
Index size versus table

41
Comparing Table and Index Statistics

SELECT I.NAME, NPAGES, CARD, FIRSTKEYCARD AS
FIRSTK, FULLKEYCARD AS FULLKEY, NLEAF,
NLEVELS AS NLEV, CLUSTERING CLUSTERED AS CL,
UNIQUERULE AS U, T.COLCOUNT AS TBCOL,
I.COLCOUNT AS IXCOL, RECLENGTH AS RECLEN FROM
SYSCAT.TABLES T, SYSCAT.INDEXES I
WHERE T.CREATOR I.TBCREATOR AND T.CREATOR
'F6' AND T.NAME I.TBNAME AND CARD gt20000
ORDER BY CARD DESC, 1
--------------------------------------------
-------------------------------------
NAME NPAGES CARD FIRSTK
FULLKEY NLEAF NLEV CL U TBCOL IXCOL RECLEN
--------------------------------------------
-------------------------------------
PS_JRNL_LN 132381 6284688 6
6284688 77261 4 YY U 28 6 199
PSAJRNL_LN 132381 6284688 1712
2357 11740 3 NN D 28 3 199
PSBJRNL_LN 132381 6284688 36110
6282385 58064 4 NY D 28 4 199
PSCJRNL_LN 132381 6284688 1
50442 9061 3 NN D 28 4 199
PSDJRNL_LN 132381 6284688 721
1183 11534 3 NY D 28 2 199
PS_DEPRECIATION 61952 3647937 4
3647937 93694 4 YY U 24 16 137
PSADEPRECIATION 61952 3647937 33
242 5794 3 NN D 24 3 137
PSBDEPRECIATION 61952 3647937 4
156613 7478 3 NY D 24 5 137
PS_VCHR_ACCTG_LINE 85189 2679644 7
2679644 39694 4 YY U 102 9 649
PSAVCHR_ACCTG_LINE 85189 2679644 3
3117 4310 3 NN D 102 5 649
PSBVCHR_ACCTG_LINE 85189 2679644 317
317 4092 3 NN D 102 1 649

42
Explain

Compare key columns against columns used
UNIX and Windows
db2expln and dynexpln show number used and
defined
Plan table is optional but useful

43
SQL Capture

UNIX Windows
Snapshot monitor
Event monitor
Administrative views
Recent statements cache
Software products are available

44
Design Advisor

Callable via command line or Control Center

45
Design Advisor Calculate
46
Design Advisor Recommendations
47
Index Advisor Sample

db2advis -d pdss -t 3 -s " SELECT
DCD_AMT_CLAIMED, DCD_AMT_PAID, DCD_AMT_COPAY,
DCD_AMT_DEDUCT, DCD_AMT_PFA, DCD_AMT_RESERVES,
DCD_AMT_DISALLOW, DCD_AUDNBR, DCD_AUDSUB,
DCD_DTL_FLG, DCD_PROC_YMD, DCD_PAID_YMD,
DCD_PROCEDURE, DCD_PROC, DCD_PROC_MOD, DCD_RECID,
DCD_SITE_CDE, DCD_SERV_PAR, DCD_RSN_CDE,
DCD_SYS_TIME
FROM QDSPR.VRG_DRDETL
WHERE DCD_SITE_ID '?' AND ( DCD_PROC_YMD
BETWEEN '01/01/1000' AND '01/01/1000') AND
DCD_PROC IN ('?','?','?')"
execution started at timestamp 2001-03-15-09.02.11
.017097
Calculating initial cost (without recommmended
indexes) 1017572.750000 timerons
Initial set of proposed indexes is ready.
Found maximum set of 1 recommended indexes
Cost of workload with all indexes included
6581.913086 timerons
total disk space needed for initial set 2302 MB
total disk space constrained to -1 MB

48
Index Advisor Sample

1 indexes in current solution
1017572.7500 timerons (without indexes)
6581.9136 timerons (with current solution)
99.35 improvement
Trying variations of the solution set.
--
-- execution finished at timestamp
2001-03-15-09.02.21.409868
--
-- LIST OF RECOMMENDED INDEXES
--
-- index1, 2302MB
CREATE INDEX WIZ2 ON "QDSPR
"."TDS00E00_DRCLDETLA" ("DCD_SITE_NBR" ASC,
"DCD_SITE_ID" ASC, "DCD_PROC_YMD" ASC, "DCD_PROC"
ASC, "DCD_RECID" ASC)
--
Index Advisor tool is finished.

49
Design Advisor

Strengths
Good to review indexes on tables - easier in v8.2
Considerations
Where to get workload statistics
Event monitor data / statement level trace /
recent statements history / product
Still need to choose the clustering index
Single statement use
Could lead to over-indexing
Likes to recommend index-only

50
The Real World Isnt So Simple

Total Total Total Total
Sort Rows Rows Count
Application NameCPU Sort Sorts
ElapsedOverfloRead Written
Used Time Time
cfCannedSearch 24.580 6416
31591.11 3 460K 374K 2
Package Name
QUERY011 24.580 6416
31591.11 3 460K 374K 1
Section Number
3 24.580 6416
31591.11 3 460K 374K 1
SELECT DISTINCT A.PROJECTNO
FROM AGENCY_VIEW V, FILINGGENERALINFO A,
FILINGSUBMISSIONS B
WHERE V.USERID H00001 AND V.PROJECTNO
A.PROJECTNO AND
A.PROJECTNO B.PROJECTNO AND (
(B.SUBMITDATE H00002 AND
B.SUBMITTIME gt H00003 ) OR (B.SUBMITDATE
gt H00002 ) )
AND ( (B.SUBMITDATE H00004 AND
B.SUBMITTIME lt H00005 )
OR (B.SUBMITDATE lt H00004 ) ) AND
A.CATFILING IN ( H00006, H00007 , H00008
, H00009 , H00010 , H00011 ,
H00012 , H00013 , H00014 , H00015
, H00016 , H00017 , H00018 ,
H00019 , H00020 , H00021 , H00022
, H00023 , H00024 , H00025 ) AND (
(A.LASTUPDATE H00026

51
Cardinality Exercise

Objective Using simple calculations, determine
whether if indexes will improve performance
Table 500,000 rows on 24,000 pages
What is the minimum usable cardinality for a
non-unique, non-clustered index?
Consider
Number of prefetch I/Os assuming prefetch
quantity of 32
Versus
Number of index and data page I/Os

52
Cardinality Calculations

Prefetch I/Os to read whole table
24,000 / 32 750 prefetch I/Os
Non-clustered
Index keys/RIDs located
Reverse solving for data I/O of 750 pages read
500,000 / 750 667 minimum fullkeycard
Reality in this case, be suspicious of any index
with lt 2,000 fullkeycard
Verify results with Explain
100 Clustered index
Rows per page 500,000 / 24,000 20 RPP
20 times fewer I/O on full key

53
Summary

Index redesign is done to correct initial
physical design errors
Too many indexes and/or columns
Too few indexes and/or columns
Usually same problem exists throughout design
Careful analysis is required
Benefits justify the effort

54
Thanks for coming
Session 1302A A Simple approach to DB2 for LUW
Index Redesign

Rob WilliamsMartin Hubel Consulting Inc.
www.mhubel.com martin_at_mhubel.comWith Thanks
to Embarcadero Technologies

55
IBM and IDUG announce Worldwide XML Contest
The largest Investment in XML community
worldwide 200,000 Participants Worldwide from
more than 30 countries
Starts September 2008
Winners announced December 2008
Many ways to participate Create a video Write
Xquery and SQL queries Submit a database
utility Port an application Write a pureXML
application
Open to students and professionals
Dont forget to pick up contest bookmarks at the
DB2 ped

Write a Comment

User Comments (0)