Indexing Strategies for DB2 UDB for iSeries Shantan Kethireddy shantankus.ibm.com

About This Presentation

Title:

Indexing Strategies for DB2 UDB for iSeries Shantan Kethireddy shantankus.ibm.com

Description:

Binary Radix Index. Key values are compressed. Common patterns are stored once ... A binary radix index is the fastest data access method available for a query ... – PowerPoint PPT presentation

Number of Views:208

Avg rating:3.0/5.0

Slides: 30

Provided by: IBMU486

Category:

more less

Transcript and Presenter's Notes

Title: Indexing Strategies for DB2 UDB for iSeries Shantan Kethireddy shantankus.ibm.com

1
Indexing Strategies for DB2 UDB for
iSeriesShantan Kethireddyshantank_at_us.ibm.com
2
Agenda

Types of indexes
How are indexes used
General approach to creating indexes
The perfect index
Examples

The material in this presentation was taken from
the white paper Indexing and Statistics
Strategies for DB2 UDB for the iSeries.
ibm.com/servers/enable/site/education/abstracts/i
ndxng_abs.html

How are Indexes used
Why are they important

4
Binary Radix Index

Key values are compressed
Common patterns are stored once
Unique portion stored in leaf pages
Positive impact on size and depth of the index
tree
Algorithm used to find values
Binary search
Very efficient process to find a unique value
Modified to fit the data structure
Maintenance
Index data is automatically spread across all
available disk units
Tree is automatically rebalanced to maintain an
efficient structure

5
Binary Radix Index
ROOT
Test Node
MISS
AR
ISSIPPI 002
OURI 003
IOWA 004
IZONA 005
KANSAS 001

DISADVANTAGES
Table rows retrieved in order of key values (not
physical order) which equates to many random
I/Os when selecting a large number of keys (high
cardinality)
No way to predict which physical index pages are
next when traversing the index for large number
of key values

ADVANTAGES
Quick access to a single key value (million-entry
index, on average, only 20 tests)
Also efficient for small, selected range of key
values (low cardinality)

6
Encoded Vector Index (EVI)

New index object for delivering fast data access
in decision support and query reporting
environments
Complementary alternative to existing index
object (binary radix tree structure keyed
logical file or SQL index)
Advanced technology from IBM Research, that is
variation on bitmap indexing
Easy to access data statistics improve query
optimizer decision making
Can only be created through an SQL interface
CREATE ENCODED VECTOR INDEX Library/EVI_Name on
Library/Table_Name (Column) WITH n DISTINCT VALUES

7
Encoded Vector Index (EVI)

What is it?
New type of index
FILE object type, LF attribute
Composed of two parts

Symbol table contains information for each
distinct key value. Each key value is assigned a
unique code
Code is 1, 2, or 4 bytes depending on number of
distinct key values
Rather then a bit array for each distinct key
value, the index has one array of codes (The
Vector)

8
Index Selection

Selection criteria is applied to ranges of index
entries to quickly get a subset of rows before
the table is retrieved.
Advantages
Only those index entries that are within a
selected range are processed
Provides quick access to rows in an OLTP
environment
Potential Disadvantages
Can perform poorly when a large number of rows
are selected
Requires a separate Random I/O against the table
to extract the values
Rule of Thumb
Used when only asking for or expecting a few rows
returned from the index
Used when sequencing the rows is required for
ordering or grouping
The selection columns match the first (n) key
fields of the index

General Approach to Creating Indexes

10
Proactive Query Tuning

Remember the goal of creating indexes is to give
the optimizer the statistics and implementation
choices it needs while it is choosing an access
plan for the query.
Requires an understanding of the database model
and types of queries that will be run against it
Build indexes for the largest or most commonly
used queries
For ad-hoc (OLAP) or less frequently used queries
build single key EVIs over the local selection
columns used in the queries
Make sure that statistics exist for the most and
least selective columns for the query
This may mean creating an index that will never
be used to implement the query but only to
provide the correct statistics
Customize this approach to your own environment
and query needs

11
Reactive Query Tuning

Reactive query tuning really means, develop the
application and any initial indexes and then run
the application to see what gets used or created
by the optimizer.
Usually highlights the slower running queries,
even on a subset of the entire database records
(test database)
Useful for tuning existing applications that are
not performing as expected
Use the feedback from the optimizer to discover
Any indexes or statistics the optimizer
recommends for local selection
Any temporary indexes used for the query
The implementation method(s) that the optimizer
has chosen to run the queries
Use the index advisor to help guide you as to
what local selection columns may provide the best
index coverage
Create permanent indexes over the same columns
that any temporary indexes were created upon. Try
to eliminate the temporary index builds
This also applies to temporary hash tables built
over the entire table with no selection applied

12
Other Indexing Tips

Avoid null capable columns if expecting to use
index only access. Index only access is not
available when a key column in the index is null
capable
Avoid derived expressions in local selection.
Access via an index may not be used for
predicates that have derived values.
T1.ShipDate gt (CURRENT DATE 10 DAYS)
UPPER(T1.CustomerName) SMITH
Index access is not used for predicates where
both operands come from the same table
T1.ShipDate gt T1.OrderDate
Consider index only access if all of the columns
used in the query are represented in the index as
key columns
Use the most selective columns as keys in the
index
Preference should be given to columns used in
equal comparisons
For key columns that are unique, specify the
UNIQUE keyword when creating the index

The Perfect Index

14
Perfect Index Guidelines

Order of the columns in an index is very
important. Optimizer may not use an index if the
columns are in an incorrect order. Use the
following guideline
Equal predicates first. Predicates using the
operator generally eliminate the largest number
of non-participating rows and should therefore be
first in the index
If all of the predicates have an equal operator,
then order the columns as follows
Selection predicates join predicates
Join predicates selection predicates
Selection predicates group by columns
Selection predicates order by columns
Always place the most selective columns as the
first key in the index
Create perfect indexes ahead of time for
pre-determined queries or queries that produce a
standard report
Indexes will take up system resources, find a
balance between query performance and system
(index) maintenance
A binary radix index is the fastest data access
method available for a query that is highly
selective and returns a small number of rows

15
Using a Query Graph

Queries can also be represented as a graph to
help visualize what columns should be considered
for index creation
Separate all of the tables and major functions in
the query into different nodes of the graph
Create a different node for each table, grouping,
ordering or join requirement
Push all of the selection to the lowest level
possible in the graph
Process the columns starting at the bottom of the
graph to determine which ones should be included
into any indexes
Use the Perfect Index Guidelines to determine
what should be included into the index
Columns at the top of the graph may be better
suited for a column stat rather than a permanent
index.

Examples

17
One-Table Query
Query has four local selection predicates and two
ORDER BY columns. Follow the general guidelines
of Selection predicates order by columns
SELECT Customer, Customer_Number,
Item_Number FROM Items WHERE Year
2000 AND Quarter 4 AND ReturnFlag
R AND ShipMode AIR ORDER BY
Customer_Number, Item_Number CREATE INDEX
Perfect_Index ON Items (Year, Quarter, ShipMode,
ReturnFlag, Customer_Number, Item_Number)
Place the most selective selection predicates
first in this index based upon the database model
and other queries use of the same columns.
18
One-Table Query Graph
Always start looking at the bottom of the graph
for the columns to place into the index first.
Then work your way up the graph looking for
additional columns.
Customer_Number, Item_Number
WHERE Year 2000 AND Quarter 4 AND
ReturnFlag R AND ShipMode AIR
19
Three-Table Join Query
Query has two join predicates and six selection
predicates. Focus first on the selection
predicates for each table in the query.
SELECT T3.Year, T1.Customer_Name,
SUM(T2.Revenue_WO_Tax) FROM CustDim T1,
SalesFact T2, TimeDim T3 WHERE T2.CustKey
T1.CustKey AND T2.TimeKey T3.TimeKey AND T3.
Year IN (2000, 2001) AND T3.Quarter 1 AND
T1.Continent America AND T1.Country
United States AND T1.Region
Central AND T1.Territory Five GROUP BY
T3.Year, T1.Customer_Name ORDER BY
T1.Customer_Name, T3.Year
20
Three-Table Join Query Graph
The columns at the top of the graph may be better
suited for a column stat rather than an index.
T1.Customer_Name, T3.Year
T2.TimeKey T3.TimeKey
T3.Year, T1.Customer_Name
T2.CustKey T1.CustKey
WHERE T1.Continent America AND T1.Country
United States AND T1.Region Central AND
T1.Territory Five
WHERE T3.Year IN (2000, 2001) AND T3.Quarter
1
21
Three-Table Join Query
The TimeDim table has two equal selection
predicates and one join predicate. Use the
general guideline for Selection predicates join
predicates
SELECT T3.Year, T1.Customer_Name,
SUM(T2.Revenue_WO_Tax) FROM CustDim T1,
SalesFact T2, TimeDim T3 WHERE T2.CustKey
T1.CustKey AND T2.TimeKey T3.TimeKey AND T3.
Year IN (2000, 2001) AND T3.Quarter 1 AND
T1.Continent America AND T1.Country
United States AND T1.Region
Central AND T1.Territory Five GROUP BY
T3.Year, T1.Customer_Name ORDER BY
T1.Customer_Name, T3.Year CREATE INDEX
Perfect_TimeDim_Index ON TimeDim (Year, Quarter,
TimeKey)
22
Three-Table Join Query
The CustDim table has four equal selection
predicates and one join predicate. Again use the
general guideline for Selection predicates join
predicates
SELECT T3.Year, T1.Customer_Name,
SUM(T2.Revenue_WO_Tax) FROM CustDim T1,
SalesFact T2, TimeDim T3 WHERE T2.CustKey
T1.CustKey AND T2.TimeKey T3.TimeKey AND T3.
Year IN (2000, 2001) AND T3.Quarter 1 AND
T1.Continent America AND T1.Country
United States AND T1.Region
Central AND T1.Territory Five GROUP BY
T3.Year, T1.Customer_Name ORDER BY
T1.Customer_Name, T3.Year CREATE INDEX
Perfect_CustDim_Index ON CustDim (Continent,
Country, Region, Territory, CustKey)
23
Three-Table Join Query
The SalesFact table only has two join predicates.
Since we dont know the order in which the tables
will be joined we must provide the optimizer with
the flexibility of having indexes with both
combinations of the join predicates.
SELECT T3.Year, T1.Customer_Name,
SUM(T2.Revenue_WO_Tax) FROM CustDim T1,
SalesFact T2, TimeDim T3 WHERE T2.CustKey
T1.CustKey AND T2.TimeKey T3.TimeKey AND T3.
Year IN (2000, 2001) AND T3.Quarter 1 AND
T1.Continent America AND T1.Country
United States AND T1.Region
Central AND T1.Territory Five GROUP BY
T3.Year, T1.Customer_Name ORDER BY
T1.Customer_Name, T3.Year CREATE INDEX
Perfect_SalesFact_Index1 ON SalesFact
(CustKey) CREATE INDEX Perfect_SalesFact_Index2
ON SalesFact (TimeKey)
24
Non-Equal Predicates
Inequalities tend to return more rows because
they deal with a range of values rather then a
specific value for an equal operator. Thus they
should be placed at the end of the index.
SELECT T3.Year, T1.Customer_Name,
SUM(T2.Revenue_WO_Tax) FROM CustDim T1,
SalesFact T2, TimeDim T3 WHERE T2.CustKey
T1.CustKey AND T2.TimeKey T3.TimeKey AND T3.
Year gt 2000 AND T3.Quarter 1 AND
T1.Continent America AND T1.Country
United States AND T1.Region
Central AND T1.Territory Five GROUP BY
T3.Year, T1.Customer_Name ORDER BY
T1.Customer_Name, T3.Year CREATE INDEX
Perfect_TimeDim_Index ON TimeDim (Quarter,
TimeKey, Year)
25
Non-Equal Predicates Graph
The non-equal predicate can be added as the last
key in the index, or a column stat could be
created.
T1.Customer_Name, T3.Year
T2.TimeKey T3.TimeKey
T3.Year, T1.Customer_Name
T2.CustKey T1.CustKey
WHERE T1.Continent America AND T1.Country
United States AND T1.Region Central AND
T1.Territory Five
WHERE T3.Year gt 2000 AND T3.Quarter 1
26
EVI One-Table Query
When a query is not very selective (20 - 70)
then skip sequential is usually the best method.
EVIs can be scanned more efficiently. EVIs can
be used for selection in these queries, however
they cannot not be used for ordering, grouping or
joins.
SELECT Customer, Customer_Number,
Item_Number FROM Items WHERE Year
2000 AND Quarter 4 AND ReturnFlag
R AND ShipMode AIR ORDER BY
Customer_Number, Item_Number CREATE ENCODED
VECTOR INDEX Perfect_EVI1 ON Items (Year) CREATE
ENCODED VECTOR INDEX Perfect_EVI2 ON Items
(Quarter) CREATE ENCODED VECTOR INDEX
Perfect_EVI3 ON Items (ReturnFlag) CREATE ENCODED
VECTOR INDEX Perfect_EVI4 ON Items (ShipMode)
Dynamic bitmaps will be created from the EVI
indexes and the results will be ANDed together to
satisfy the query request.
27
EVI One-Table Query
In this case, the EVI indexes we already created
on Year, ReturnFlag and ShipMode can be reused
for this new query.
SELECT Customer, Customer_Number,
Item_Number FROM Items WHERE Year
2000 AND Month IN (1, 2, 3) AND ReturnFlag
R AND ShipMode RAIL CREATE ENCODED
VECTOR INDEX Perfect_EVI5 ON Items (Month)
Either a new index over the Month column could be
created or we can leave that selection to be
processed as post dynamic bitmap selection.
28
Indexing Strategy Summary

Indexes dont have to be used by the optimizer to
be considered helpful
Proactively create indexes that you know will be
useful based upon the database model
Use tools to help you reactively create indexes
that are still required
Remember, the perfect index can be different for
every single query, try to find the right blend
of indexes for your environment
Read the white paper Indexing and Statistics
Strategies for DB2 UDB for the iSeries
ibm.com/servers/enable/site/education/abstracts/in
dxng_abs.html

29
Additional Information

DB2 UDB for iSeries home page
http//www.iseries.ibm.com/db2
Newsgroups
USENET comp.sys.ibm.as400.misc,
comp.database.ibm-db2
iSeries Network (NEWS/400 Magazine) SQL DB2
Forum http//www.iseriesnetwork.com/Forums/main.
cfm?CFApp59
Education Resources Classroom Online
http//www.iseries.ibm.com/db2/db2educ_m.htm
http//www.iseries.ibm.com/developer/education/ibo
/index.html
DB2 UDB for iSeries Publications
Online Manuals http//www.iseries.ibm.com/db2/boo
ks.htm
Porting Help http//www.iseries.ibm.com/developer
/db2/porting.html
DB2 UDB for iSeries Redbooks (http//ibm.com/redbo
oks )
Stored Procedures Triggers on DB2 UDB for
iSeries (SG24-6503)
DB2 UDB for AS/400 Object Relational Support
(SG24-5409)
SQL Query Engine
(http//publib-b.boulder.ibm.com/Redbooks.nsf/Redp
ieceAbstracts/sg2456598.html)
SQL/400 Developers Guide by Paul Conte Mike
Cravitz
http//iseriesnetwork.com/str/books/Uniquebook2.cf
m?NextBook183
iSeries and AS/400 SQL at Work by Howard Arner
http//www.sqlthing.com/books.htm

Write a Comment

User Comments (0)

About PowerShow.com

Indexing Strategies for DB2 UDB for iSeries Shantan Kethireddy shantankus.ibm.com - PowerPoint PPT Presentation

Indexing Strategies for DB2 UDB for iSeries Shantan Kethireddy shantankus.ibm.com

Binary Radix Index. Key values are compressed. Common patterns are stored once ... A binary radix index is the fastest data access method available for a query ... – PowerPoint PPT presentation