Title: Indexing Strategies for DB2 UDB for iSeries Shantan Kethireddy shantankus.ibm.com
1Indexing Strategies for DB2 UDB for
iSeriesShantan Kethireddyshantank_at_us.ibm.com
2Agenda
- Types of indexes
- How are indexes used
- General approach to creating indexes
- The perfect index
- Examples
- The material in this presentation was taken from
the white paper Indexing and Statistics
Strategies for DB2 UDB for the iSeries. - ibm.com/servers/enable/site/education/abstracts/i
ndxng_abs.html
3- How are Indexes used
- Why are they important
4Binary Radix Index
- Key values are compressed
- Common patterns are stored once
- Unique portion stored in leaf pages
- Positive impact on size and depth of the index
tree - Algorithm used to find values
- Binary search
- Very efficient process to find a unique value
- Modified to fit the data structure
- Maintenance
- Index data is automatically spread across all
available disk units - Tree is automatically rebalanced to maintain an
efficient structure
5Binary Radix Index
ROOT
Test Node
MISS
AR
ISSIPPI 002
OURI 003
IOWA 004
IZONA 005
KANSAS 001
- DISADVANTAGES
- Table rows retrieved in order of key values (not
physical order) which equates to many random
I/Os when selecting a large number of keys (high
cardinality) - No way to predict which physical index pages are
next when traversing the index for large number
of key values
- ADVANTAGES
- Quick access to a single key value (million-entry
index, on average, only 20 tests) - Also efficient for small, selected range of key
values (low cardinality)
6Encoded Vector Index (EVI)
- New index object for delivering fast data access
in decision support and query reporting
environments - Complementary alternative to existing index
object (binary radix tree structure keyed
logical file or SQL index) - Advanced technology from IBM Research, that is
variation on bitmap indexing - Easy to access data statistics improve query
optimizer decision making - Can only be created through an SQL interface
- CREATE ENCODED VECTOR INDEX Library/EVI_Name on
Library/Table_Name (Column) WITH n DISTINCT VALUES
7Encoded Vector Index (EVI)
- What is it?
- New type of index
- FILE object type, LF attribute
- Composed of two parts
- Symbol table contains information for each
distinct key value. Each key value is assigned a
unique code - Code is 1, 2, or 4 bytes depending on number of
distinct key values - Rather then a bit array for each distinct key
value, the index has one array of codes (The
Vector)
8Index Selection
- Selection criteria is applied to ranges of index
entries to quickly get a subset of rows before
the table is retrieved. - Advantages
- Only those index entries that are within a
selected range are processed - Provides quick access to rows in an OLTP
environment - Potential Disadvantages
- Can perform poorly when a large number of rows
are selected - Requires a separate Random I/O against the table
to extract the values - Rule of Thumb
- Used when only asking for or expecting a few rows
returned from the index - Used when sequencing the rows is required for
ordering or grouping - The selection columns match the first (n) key
fields of the index
9- General Approach to Creating Indexes
10Proactive Query Tuning
- Remember the goal of creating indexes is to give
the optimizer the statistics and implementation
choices it needs while it is choosing an access
plan for the query. - Requires an understanding of the database model
and types of queries that will be run against it - Build indexes for the largest or most commonly
used queries - For ad-hoc (OLAP) or less frequently used queries
build single key EVIs over the local selection
columns used in the queries - Make sure that statistics exist for the most and
least selective columns for the query - This may mean creating an index that will never
be used to implement the query but only to
provide the correct statistics - Customize this approach to your own environment
and query needs
11Reactive Query Tuning
- Reactive query tuning really means, develop the
application and any initial indexes and then run
the application to see what gets used or created
by the optimizer. - Usually highlights the slower running queries,
even on a subset of the entire database records
(test database) - Useful for tuning existing applications that are
not performing as expected - Use the feedback from the optimizer to discover
- Any indexes or statistics the optimizer
recommends for local selection - Any temporary indexes used for the query
- The implementation method(s) that the optimizer
has chosen to run the queries - Use the index advisor to help guide you as to
what local selection columns may provide the best
index coverage - Create permanent indexes over the same columns
that any temporary indexes were created upon. Try
to eliminate the temporary index builds - This also applies to temporary hash tables built
over the entire table with no selection applied
12Other Indexing Tips
- Avoid null capable columns if expecting to use
index only access. Index only access is not
available when a key column in the index is null
capable - Avoid derived expressions in local selection.
Access via an index may not be used for
predicates that have derived values. - T1.ShipDate gt (CURRENT DATE 10 DAYS)
- UPPER(T1.CustomerName) SMITH
- Index access is not used for predicates where
both operands come from the same table - T1.ShipDate gt T1.OrderDate
- Consider index only access if all of the columns
used in the query are represented in the index as
key columns - Use the most selective columns as keys in the
index - Preference should be given to columns used in
equal comparisons - For key columns that are unique, specify the
UNIQUE keyword when creating the index
13 14Perfect Index Guidelines
- Order of the columns in an index is very
important. Optimizer may not use an index if the
columns are in an incorrect order. Use the
following guideline - Equal predicates first. Predicates using the
operator generally eliminate the largest number
of non-participating rows and should therefore be
first in the index - If all of the predicates have an equal operator,
then order the columns as follows - Selection predicates join predicates
- Join predicates selection predicates
- Selection predicates group by columns
- Selection predicates order by columns
- Always place the most selective columns as the
first key in the index - Create perfect indexes ahead of time for
pre-determined queries or queries that produce a
standard report - Indexes will take up system resources, find a
balance between query performance and system
(index) maintenance - A binary radix index is the fastest data access
method available for a query that is highly
selective and returns a small number of rows
15Using a Query Graph
- Queries can also be represented as a graph to
help visualize what columns should be considered
for index creation - Separate all of the tables and major functions in
the query into different nodes of the graph - Create a different node for each table, grouping,
ordering or join requirement - Push all of the selection to the lowest level
possible in the graph - Process the columns starting at the bottom of the
graph to determine which ones should be included
into any indexes - Use the Perfect Index Guidelines to determine
what should be included into the index - Columns at the top of the graph may be better
suited for a column stat rather than a permanent
index.
16 17One-Table Query
Query has four local selection predicates and two
ORDER BY columns. Follow the general guidelines
of Selection predicates order by columns
SELECT Customer, Customer_Number,
Item_Number FROM Items WHERE Year
2000 AND Quarter 4 AND ReturnFlag
R AND ShipMode AIR ORDER BY
Customer_Number, Item_Number CREATE INDEX
Perfect_Index ON Items (Year, Quarter, ShipMode,
ReturnFlag, Customer_Number, Item_Number)
Place the most selective selection predicates
first in this index based upon the database model
and other queries use of the same columns.
18One-Table Query Graph
Always start looking at the bottom of the graph
for the columns to place into the index first.
Then work your way up the graph looking for
additional columns.
Customer_Number, Item_Number
WHERE Year 2000 AND Quarter 4 AND
ReturnFlag R AND ShipMode AIR
19Three-Table Join Query
Query has two join predicates and six selection
predicates. Focus first on the selection
predicates for each table in the query.
SELECT T3.Year, T1.Customer_Name,
SUM(T2.Revenue_WO_Tax) FROM CustDim T1,
SalesFact T2, TimeDim T3 WHERE T2.CustKey
T1.CustKey AND T2.TimeKey T3.TimeKey AND T3.
Year IN (2000, 2001) AND T3.Quarter 1 AND
T1.Continent America AND T1.Country
United States AND T1.Region
Central AND T1.Territory Five GROUP BY
T3.Year, T1.Customer_Name ORDER BY
T1.Customer_Name, T3.Year
20Three-Table Join Query Graph
The columns at the top of the graph may be better
suited for a column stat rather than an index.
T1.Customer_Name, T3.Year
T2.TimeKey T3.TimeKey
T3.Year, T1.Customer_Name
T2.CustKey T1.CustKey
WHERE T1.Continent America AND T1.Country
United States AND T1.Region Central AND
T1.Territory Five
WHERE T3.Year IN (2000, 2001) AND T3.Quarter
1
21Three-Table Join Query
The TimeDim table has two equal selection
predicates and one join predicate. Use the
general guideline for Selection predicates join
predicates
SELECT T3.Year, T1.Customer_Name,
SUM(T2.Revenue_WO_Tax) FROM CustDim T1,
SalesFact T2, TimeDim T3 WHERE T2.CustKey
T1.CustKey AND T2.TimeKey T3.TimeKey AND T3.
Year IN (2000, 2001) AND T3.Quarter 1 AND
T1.Continent America AND T1.Country
United States AND T1.Region
Central AND T1.Territory Five GROUP BY
T3.Year, T1.Customer_Name ORDER BY
T1.Customer_Name, T3.Year CREATE INDEX
Perfect_TimeDim_Index ON TimeDim (Year, Quarter,
TimeKey)
22Three-Table Join Query
The CustDim table has four equal selection
predicates and one join predicate. Again use the
general guideline for Selection predicates join
predicates
SELECT T3.Year, T1.Customer_Name,
SUM(T2.Revenue_WO_Tax) FROM CustDim T1,
SalesFact T2, TimeDim T3 WHERE T2.CustKey
T1.CustKey AND T2.TimeKey T3.TimeKey AND T3.
Year IN (2000, 2001) AND T3.Quarter 1 AND
T1.Continent America AND T1.Country
United States AND T1.Region
Central AND T1.Territory Five GROUP BY
T3.Year, T1.Customer_Name ORDER BY
T1.Customer_Name, T3.Year CREATE INDEX
Perfect_CustDim_Index ON CustDim (Continent,
Country, Region, Territory, CustKey)
23Three-Table Join Query
The SalesFact table only has two join predicates.
Since we dont know the order in which the tables
will be joined we must provide the optimizer with
the flexibility of having indexes with both
combinations of the join predicates.
SELECT T3.Year, T1.Customer_Name,
SUM(T2.Revenue_WO_Tax) FROM CustDim T1,
SalesFact T2, TimeDim T3 WHERE T2.CustKey
T1.CustKey AND T2.TimeKey T3.TimeKey AND T3.
Year IN (2000, 2001) AND T3.Quarter 1 AND
T1.Continent America AND T1.Country
United States AND T1.Region
Central AND T1.Territory Five GROUP BY
T3.Year, T1.Customer_Name ORDER BY
T1.Customer_Name, T3.Year CREATE INDEX
Perfect_SalesFact_Index1 ON SalesFact
(CustKey) CREATE INDEX Perfect_SalesFact_Index2
ON SalesFact (TimeKey)
24Non-Equal Predicates
Inequalities tend to return more rows because
they deal with a range of values rather then a
specific value for an equal operator. Thus they
should be placed at the end of the index.
SELECT T3.Year, T1.Customer_Name,
SUM(T2.Revenue_WO_Tax) FROM CustDim T1,
SalesFact T2, TimeDim T3 WHERE T2.CustKey
T1.CustKey AND T2.TimeKey T3.TimeKey AND T3.
Year gt 2000 AND T3.Quarter 1 AND
T1.Continent America AND T1.Country
United States AND T1.Region
Central AND T1.Territory Five GROUP BY
T3.Year, T1.Customer_Name ORDER BY
T1.Customer_Name, T3.Year CREATE INDEX
Perfect_TimeDim_Index ON TimeDim (Quarter,
TimeKey, Year)
25Non-Equal Predicates Graph
The non-equal predicate can be added as the last
key in the index, or a column stat could be
created.
T1.Customer_Name, T3.Year
T2.TimeKey T3.TimeKey
T3.Year, T1.Customer_Name
T2.CustKey T1.CustKey
WHERE T1.Continent America AND T1.Country
United States AND T1.Region Central AND
T1.Territory Five
WHERE T3.Year gt 2000 AND T3.Quarter 1
26EVI One-Table Query
When a query is not very selective (20 - 70)
then skip sequential is usually the best method.
EVIs can be scanned more efficiently. EVIs can
be used for selection in these queries, however
they cannot not be used for ordering, grouping or
joins.
SELECT Customer, Customer_Number,
Item_Number FROM Items WHERE Year
2000 AND Quarter 4 AND ReturnFlag
R AND ShipMode AIR ORDER BY
Customer_Number, Item_Number CREATE ENCODED
VECTOR INDEX Perfect_EVI1 ON Items (Year) CREATE
ENCODED VECTOR INDEX Perfect_EVI2 ON Items
(Quarter) CREATE ENCODED VECTOR INDEX
Perfect_EVI3 ON Items (ReturnFlag) CREATE ENCODED
VECTOR INDEX Perfect_EVI4 ON Items (ShipMode)
Dynamic bitmaps will be created from the EVI
indexes and the results will be ANDed together to
satisfy the query request.
27EVI One-Table Query
In this case, the EVI indexes we already created
on Year, ReturnFlag and ShipMode can be reused
for this new query.
SELECT Customer, Customer_Number,
Item_Number FROM Items WHERE Year
2000 AND Month IN (1, 2, 3) AND ReturnFlag
R AND ShipMode RAIL CREATE ENCODED
VECTOR INDEX Perfect_EVI5 ON Items (Month)
Either a new index over the Month column could be
created or we can leave that selection to be
processed as post dynamic bitmap selection.
28Indexing Strategy Summary
- Indexes dont have to be used by the optimizer to
be considered helpful - Proactively create indexes that you know will be
useful based upon the database model - Use tools to help you reactively create indexes
that are still required - Remember, the perfect index can be different for
every single query, try to find the right blend
of indexes for your environment - Read the white paper Indexing and Statistics
Strategies for DB2 UDB for the iSeries - ibm.com/servers/enable/site/education/abstracts/in
dxng_abs.html
29Additional Information
- DB2 UDB for iSeries home page
http//www.iseries.ibm.com/db2 - Newsgroups
- USENET comp.sys.ibm.as400.misc,
comp.database.ibm-db2 - iSeries Network (NEWS/400 Magazine) SQL DB2
Forum http//www.iseriesnetwork.com/Forums/main.
cfm?CFApp59 - Education Resources Classroom Online
- http//www.iseries.ibm.com/db2/db2educ_m.htm
- http//www.iseries.ibm.com/developer/education/ibo
/index.html - DB2 UDB for iSeries Publications
- Online Manuals http//www.iseries.ibm.com/db2/boo
ks.htm - Porting Help http//www.iseries.ibm.com/developer
/db2/porting.html - DB2 UDB for iSeries Redbooks (http//ibm.com/redbo
oks ) - Stored Procedures Triggers on DB2 UDB for
iSeries (SG24-6503) - DB2 UDB for AS/400 Object Relational Support
(SG24-5409) - SQL Query Engine
- (http//publib-b.boulder.ibm.com/Redbooks.nsf/Redp
ieceAbstracts/sg2456598.html) - SQL/400 Developers Guide by Paul Conte Mike
Cravitz - http//iseriesnetwork.com/str/books/Uniquebook2.cf
m?NextBook183 - iSeries and AS/400 SQL at Work by Howard Arner
- http//www.sqlthing.com/books.htm