Indexing Strategies for DB2 UDB for iSeries Shantan Kethireddy shantankus.ibm.com - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Indexing Strategies for DB2 UDB for iSeries Shantan Kethireddy shantankus.ibm.com

Description:

Binary Radix Index. Key values are compressed. Common patterns are stored once ... A binary radix index is the fastest data access method available for a query ... – PowerPoint PPT presentation

Number of Views:208
Avg rating:3.0/5.0
Slides: 30
Provided by: IBMU486
Category:

less

Transcript and Presenter's Notes

Title: Indexing Strategies for DB2 UDB for iSeries Shantan Kethireddy shantankus.ibm.com


1
Indexing Strategies for DB2 UDB for
iSeriesShantan Kethireddyshantank_at_us.ibm.com
2
Agenda
  • Types of indexes
  • How are indexes used
  • General approach to creating indexes
  • The perfect index
  • Examples
  • The material in this presentation was taken from
    the white paper Indexing and Statistics
    Strategies for DB2 UDB for the iSeries.
  • ibm.com/servers/enable/site/education/abstracts/i
    ndxng_abs.html

3
  • How are Indexes used
  • Why are they important

4
Binary Radix Index
  • Key values are compressed
  • Common patterns are stored once
  • Unique portion stored in leaf pages
  • Positive impact on size and depth of the index
    tree
  • Algorithm used to find values
  • Binary search
  • Very efficient process to find a unique value
  • Modified to fit the data structure
  • Maintenance
  • Index data is automatically spread across all
    available disk units
  • Tree is automatically rebalanced to maintain an
    efficient structure

5
Binary Radix Index
ROOT
Test Node
MISS
AR
ISSIPPI 002
OURI 003
IOWA 004
IZONA 005
KANSAS 001
  • DISADVANTAGES
  • Table rows retrieved in order of key values (not
    physical order) which equates to many random
    I/Os when selecting a large number of keys (high
    cardinality)
  • No way to predict which physical index pages are
    next when traversing the index for large number
    of key values
  • ADVANTAGES
  • Quick access to a single key value (million-entry
    index, on average, only 20 tests)
  • Also efficient for small, selected range of key
    values (low cardinality)

6
Encoded Vector Index (EVI)
  • New index object for delivering fast data access
    in decision support and query reporting
    environments
  • Complementary alternative to existing index
    object (binary radix tree structure keyed
    logical file or SQL index)
  • Advanced technology from IBM Research, that is
    variation on bitmap indexing
  • Easy to access data statistics improve query
    optimizer decision making
  • Can only be created through an SQL interface
  • CREATE ENCODED VECTOR INDEX Library/EVI_Name on
    Library/Table_Name (Column) WITH n DISTINCT VALUES

7
Encoded Vector Index (EVI)
  • What is it?
  • New type of index
  • FILE object type, LF attribute
  • Composed of two parts
  • Symbol table contains information for each
    distinct key value. Each key value is assigned a
    unique code
  • Code is 1, 2, or 4 bytes depending on number of
    distinct key values
  • Rather then a bit array for each distinct key
    value, the index has one array of codes (The
    Vector)

8
Index Selection
  • Selection criteria is applied to ranges of index
    entries to quickly get a subset of rows before
    the table is retrieved.
  • Advantages
  • Only those index entries that are within a
    selected range are processed
  • Provides quick access to rows in an OLTP
    environment
  • Potential Disadvantages
  • Can perform poorly when a large number of rows
    are selected
  • Requires a separate Random I/O against the table
    to extract the values
  • Rule of Thumb
  • Used when only asking for or expecting a few rows
    returned from the index
  • Used when sequencing the rows is required for
    ordering or grouping
  • The selection columns match the first (n) key
    fields of the index

9
  • General Approach to Creating Indexes

10
Proactive Query Tuning
  • Remember the goal of creating indexes is to give
    the optimizer the statistics and implementation
    choices it needs while it is choosing an access
    plan for the query.
  • Requires an understanding of the database model
    and types of queries that will be run against it
  • Build indexes for the largest or most commonly
    used queries
  • For ad-hoc (OLAP) or less frequently used queries
    build single key EVIs over the local selection
    columns used in the queries
  • Make sure that statistics exist for the most and
    least selective columns for the query
  • This may mean creating an index that will never
    be used to implement the query but only to
    provide the correct statistics
  • Customize this approach to your own environment
    and query needs

11
Reactive Query Tuning
  • Reactive query tuning really means, develop the
    application and any initial indexes and then run
    the application to see what gets used or created
    by the optimizer.
  • Usually highlights the slower running queries,
    even on a subset of the entire database records
    (test database)
  • Useful for tuning existing applications that are
    not performing as expected
  • Use the feedback from the optimizer to discover
  • Any indexes or statistics the optimizer
    recommends for local selection
  • Any temporary indexes used for the query
  • The implementation method(s) that the optimizer
    has chosen to run the queries
  • Use the index advisor to help guide you as to
    what local selection columns may provide the best
    index coverage
  • Create permanent indexes over the same columns
    that any temporary indexes were created upon. Try
    to eliminate the temporary index builds
  • This also applies to temporary hash tables built
    over the entire table with no selection applied

12
Other Indexing Tips
  • Avoid null capable columns if expecting to use
    index only access. Index only access is not
    available when a key column in the index is null
    capable
  • Avoid derived expressions in local selection.
    Access via an index may not be used for
    predicates that have derived values.
  • T1.ShipDate gt (CURRENT DATE 10 DAYS)
  • UPPER(T1.CustomerName) SMITH
  • Index access is not used for predicates where
    both operands come from the same table
  • T1.ShipDate gt T1.OrderDate
  • Consider index only access if all of the columns
    used in the query are represented in the index as
    key columns
  • Use the most selective columns as keys in the
    index
  • Preference should be given to columns used in
    equal comparisons
  • For key columns that are unique, specify the
    UNIQUE keyword when creating the index

13
  • The Perfect Index

14
Perfect Index Guidelines
  • Order of the columns in an index is very
    important. Optimizer may not use an index if the
    columns are in an incorrect order. Use the
    following guideline
  • Equal predicates first. Predicates using the
    operator generally eliminate the largest number
    of non-participating rows and should therefore be
    first in the index
  • If all of the predicates have an equal operator,
    then order the columns as follows
  • Selection predicates join predicates
  • Join predicates selection predicates
  • Selection predicates group by columns
  • Selection predicates order by columns
  • Always place the most selective columns as the
    first key in the index
  • Create perfect indexes ahead of time for
    pre-determined queries or queries that produce a
    standard report
  • Indexes will take up system resources, find a
    balance between query performance and system
    (index) maintenance
  • A binary radix index is the fastest data access
    method available for a query that is highly
    selective and returns a small number of rows

15
Using a Query Graph
  • Queries can also be represented as a graph to
    help visualize what columns should be considered
    for index creation
  • Separate all of the tables and major functions in
    the query into different nodes of the graph
  • Create a different node for each table, grouping,
    ordering or join requirement
  • Push all of the selection to the lowest level
    possible in the graph
  • Process the columns starting at the bottom of the
    graph to determine which ones should be included
    into any indexes
  • Use the Perfect Index Guidelines to determine
    what should be included into the index
  • Columns at the top of the graph may be better
    suited for a column stat rather than a permanent
    index.

16
  • Examples

17
One-Table Query
Query has four local selection predicates and two
ORDER BY columns. Follow the general guidelines
of Selection predicates order by columns
SELECT Customer, Customer_Number,
Item_Number FROM Items WHERE Year
2000 AND Quarter 4 AND ReturnFlag
R AND ShipMode AIR ORDER BY
Customer_Number, Item_Number CREATE INDEX
Perfect_Index ON Items (Year, Quarter, ShipMode,
ReturnFlag, Customer_Number, Item_Number)
Place the most selective selection predicates
first in this index based upon the database model
and other queries use of the same columns.
18
One-Table Query Graph
Always start looking at the bottom of the graph
for the columns to place into the index first.
Then work your way up the graph looking for
additional columns.
Customer_Number, Item_Number
WHERE Year 2000 AND Quarter 4 AND
ReturnFlag R AND ShipMode AIR
19
Three-Table Join Query
Query has two join predicates and six selection
predicates. Focus first on the selection
predicates for each table in the query.
SELECT T3.Year, T1.Customer_Name,
SUM(T2.Revenue_WO_Tax) FROM CustDim T1,
SalesFact T2, TimeDim T3 WHERE T2.CustKey
T1.CustKey AND T2.TimeKey T3.TimeKey AND T3.
Year IN (2000, 2001) AND T3.Quarter 1 AND
T1.Continent America AND T1.Country
United States AND T1.Region
Central AND T1.Territory Five GROUP BY
T3.Year, T1.Customer_Name ORDER BY
T1.Customer_Name, T3.Year
20
Three-Table Join Query Graph
The columns at the top of the graph may be better
suited for a column stat rather than an index.
T1.Customer_Name, T3.Year
T2.TimeKey T3.TimeKey
T3.Year, T1.Customer_Name
T2.CustKey T1.CustKey
WHERE T1.Continent America AND T1.Country
United States AND T1.Region Central AND
T1.Territory Five
WHERE T3.Year IN (2000, 2001) AND T3.Quarter
1
21
Three-Table Join Query
The TimeDim table has two equal selection
predicates and one join predicate. Use the
general guideline for Selection predicates join
predicates
SELECT T3.Year, T1.Customer_Name,
SUM(T2.Revenue_WO_Tax) FROM CustDim T1,
SalesFact T2, TimeDim T3 WHERE T2.CustKey
T1.CustKey AND T2.TimeKey T3.TimeKey AND T3.
Year IN (2000, 2001) AND T3.Quarter 1 AND
T1.Continent America AND T1.Country
United States AND T1.Region
Central AND T1.Territory Five GROUP BY
T3.Year, T1.Customer_Name ORDER BY
T1.Customer_Name, T3.Year CREATE INDEX
Perfect_TimeDim_Index ON TimeDim (Year, Quarter,
TimeKey)
22
Three-Table Join Query
The CustDim table has four equal selection
predicates and one join predicate. Again use the
general guideline for Selection predicates join
predicates
SELECT T3.Year, T1.Customer_Name,
SUM(T2.Revenue_WO_Tax) FROM CustDim T1,
SalesFact T2, TimeDim T3 WHERE T2.CustKey
T1.CustKey AND T2.TimeKey T3.TimeKey AND T3.
Year IN (2000, 2001) AND T3.Quarter 1 AND
T1.Continent America AND T1.Country
United States AND T1.Region
Central AND T1.Territory Five GROUP BY
T3.Year, T1.Customer_Name ORDER BY
T1.Customer_Name, T3.Year CREATE INDEX
Perfect_CustDim_Index ON CustDim (Continent,
Country, Region, Territory, CustKey)
23
Three-Table Join Query
The SalesFact table only has two join predicates.
Since we dont know the order in which the tables
will be joined we must provide the optimizer with
the flexibility of having indexes with both
combinations of the join predicates.
SELECT T3.Year, T1.Customer_Name,
SUM(T2.Revenue_WO_Tax) FROM CustDim T1,
SalesFact T2, TimeDim T3 WHERE T2.CustKey
T1.CustKey AND T2.TimeKey T3.TimeKey AND T3.
Year IN (2000, 2001) AND T3.Quarter 1 AND
T1.Continent America AND T1.Country
United States AND T1.Region
Central AND T1.Territory Five GROUP BY
T3.Year, T1.Customer_Name ORDER BY
T1.Customer_Name, T3.Year CREATE INDEX
Perfect_SalesFact_Index1 ON SalesFact
(CustKey) CREATE INDEX Perfect_SalesFact_Index2
ON SalesFact (TimeKey)
24
Non-Equal Predicates
Inequalities tend to return more rows because
they deal with a range of values rather then a
specific value for an equal operator. Thus they
should be placed at the end of the index.
SELECT T3.Year, T1.Customer_Name,
SUM(T2.Revenue_WO_Tax) FROM CustDim T1,
SalesFact T2, TimeDim T3 WHERE T2.CustKey
T1.CustKey AND T2.TimeKey T3.TimeKey AND T3.
Year gt 2000 AND T3.Quarter 1 AND
T1.Continent America AND T1.Country
United States AND T1.Region
Central AND T1.Territory Five GROUP BY
T3.Year, T1.Customer_Name ORDER BY
T1.Customer_Name, T3.Year CREATE INDEX
Perfect_TimeDim_Index ON TimeDim (Quarter,
TimeKey, Year)
25
Non-Equal Predicates Graph
The non-equal predicate can be added as the last
key in the index, or a column stat could be
created.
T1.Customer_Name, T3.Year
T2.TimeKey T3.TimeKey
T3.Year, T1.Customer_Name
T2.CustKey T1.CustKey
WHERE T1.Continent America AND T1.Country
United States AND T1.Region Central AND
T1.Territory Five
WHERE T3.Year gt 2000 AND T3.Quarter 1
26
EVI One-Table Query
When a query is not very selective (20 - 70)
then skip sequential is usually the best method.
EVIs can be scanned more efficiently. EVIs can
be used for selection in these queries, however
they cannot not be used for ordering, grouping or
joins.
SELECT Customer, Customer_Number,
Item_Number FROM Items WHERE Year
2000 AND Quarter 4 AND ReturnFlag
R AND ShipMode AIR ORDER BY
Customer_Number, Item_Number CREATE ENCODED
VECTOR INDEX Perfect_EVI1 ON Items (Year) CREATE
ENCODED VECTOR INDEX Perfect_EVI2 ON Items
(Quarter) CREATE ENCODED VECTOR INDEX
Perfect_EVI3 ON Items (ReturnFlag) CREATE ENCODED
VECTOR INDEX Perfect_EVI4 ON Items (ShipMode)
Dynamic bitmaps will be created from the EVI
indexes and the results will be ANDed together to
satisfy the query request.
27
EVI One-Table Query
In this case, the EVI indexes we already created
on Year, ReturnFlag and ShipMode can be reused
for this new query.
SELECT Customer, Customer_Number,
Item_Number FROM Items WHERE Year
2000 AND Month IN (1, 2, 3) AND ReturnFlag
R AND ShipMode RAIL CREATE ENCODED
VECTOR INDEX Perfect_EVI5 ON Items (Month)
Either a new index over the Month column could be
created or we can leave that selection to be
processed as post dynamic bitmap selection.
28
Indexing Strategy Summary
  • Indexes dont have to be used by the optimizer to
    be considered helpful
  • Proactively create indexes that you know will be
    useful based upon the database model
  • Use tools to help you reactively create indexes
    that are still required
  • Remember, the perfect index can be different for
    every single query, try to find the right blend
    of indexes for your environment
  • Read the white paper Indexing and Statistics
    Strategies for DB2 UDB for the iSeries
  • ibm.com/servers/enable/site/education/abstracts/in
    dxng_abs.html

29
Additional Information
  • DB2 UDB for iSeries home page
    http//www.iseries.ibm.com/db2
  • Newsgroups
  • USENET comp.sys.ibm.as400.misc,
    comp.database.ibm-db2
  • iSeries Network (NEWS/400 Magazine) SQL DB2
    Forum http//www.iseriesnetwork.com/Forums/main.
    cfm?CFApp59
  • Education Resources Classroom Online
  • http//www.iseries.ibm.com/db2/db2educ_m.htm
  • http//www.iseries.ibm.com/developer/education/ibo
    /index.html
  • DB2 UDB for iSeries Publications
  • Online Manuals http//www.iseries.ibm.com/db2/boo
    ks.htm
  • Porting Help http//www.iseries.ibm.com/developer
    /db2/porting.html
  • DB2 UDB for iSeries Redbooks (http//ibm.com/redbo
    oks )
  • Stored Procedures Triggers on DB2 UDB for
    iSeries (SG24-6503)
  • DB2 UDB for AS/400 Object Relational Support
    (SG24-5409)
  • SQL Query Engine
  • (http//publib-b.boulder.ibm.com/Redbooks.nsf/Redp
    ieceAbstracts/sg2456598.html)
  • SQL/400 Developers Guide by Paul Conte Mike
    Cravitz
  • http//iseriesnetwork.com/str/books/Uniquebook2.cf
    m?NextBook183
  • iSeries and AS/400 SQL at Work by Howard Arner
  • http//www.sqlthing.com/books.htm
Write a Comment
User Comments (0)
About PowerShow.com