Overview of Indexing - PowerPoint PPT Presentation

About This Presentation
Title:

Overview of Indexing

Description:

Title: Overview of Storage and Indexing Subject: Database Management Systems Author: Raghu Ramakrishnan and Johannes Gehrke Keywords: Chapter 8 Last modified by – PowerPoint PPT presentation

Number of Views:104
Avg rating:3.0/5.0
Slides: 25
Provided by: RaghuRamak266
Learn more at: http://web.cs.wpi.edu
Category:

less

Transcript and Presenter's Notes

Title: Overview of Indexing


1
Overview of Indexing
  • Chapter 8 Part II.
  • Introduction to indexing
  • First glimpse at indices and workloads

2
Understanding the Workload
  • For each query in workload
  • Which relations does it access?
  • Which attributes are retrieved?
  • Which attributes are involved in selection/join
    conditions?
  • How selective are these conditions likely to be?
  • For each update in workload
  • Which attributes are involved in selection/join
    conditions?
  • How selective are these conditions likely to be?
  • The type of update (INSERT/DELETE/UPDATE), and
    the attributes that are affected.

3
Choice of Indexes
  • What indexes should we create?
  • Which relations should have indexes?
  • What field(s) should be the search key?
  • Should we build several indexes?
  • For each index, what kind of an index should it
    be?
  • Clustered vs. unclustered? Hash vs. tree?
  • Clustering must be used sparingly and only when
    justified by frequent queries that benefit from
    clustering.
  • At most one index can be clustered. (Why?)
  • Consider utilizing index-only evaluation. (e.g.,
    avg(age))

4
Choice of Indexes One Approach
  • Consider most important queries in turn.
  • Consider best plan using current indexes, and
    see if a better plan possible with additional
    index. If so, create it.
  • Consider impact on updates in workload!
  • Trade-off Indexes can make queries go faster,
    updates slower. Require disk space, too.
  • Obviously, we must understand how DBMS evaluates
    queries and creates query evaluation plans

5
Choice of Indexes Simple Approach
  • For now, we discuss simple 1-table queries.

6
Index Selection Guidelines
  • Attributes in WHERE clause are candidates for
    index keys.
  • Exact match condition suggests hash index.
  • Range query suggests tree index.
  • Clustering is especially useful for range queries
  • Clustering can also help equality queries if
    there are many duplicates.

7
Index Selection Guidelines
  • Multi-attribute search keys considered when WHERE
    clause contains several conditions.
  • Order of attributes is important for range
    queries.
  • Such indexes can sometimes enable index-only
    strategies for important queries.
  • Question For index-only strategies, is
    clustering important ?

8
Index Selection Guidelines
  • Try to choose indexes that benefit as many
    queries as possible.
  • Since only one index can be clustered per
    relation, choose it based on important queries
    that would benefit the most from clustering.

9
Examples of Clustered Indexes
SELECT E.dno FROM Emp E WHERE E.agegt40
  • B tree index on E.dno?
  • B tree index on E.age ?
  • Trade-offs
  • How selective is the condition? (all gt 40?) or
    (only some gt 40)
  • Is the index clustered?

10
Examples of Clustered Indexes
SELECT E.dno, COUNT () FROM Emp E WHERE
E.agegt10 GROUP BY E.dno
  • Consider the GROUP BY query.
  • Index on E.age ? E.dno ?
  • Issues
  • Use Index on E.age ?
  • If many tuples have E.age gt 10, using E.age index
    and sorting the retrieved tuples may be costly.
  • Use Index on E.dno ?
  • Clustered E.dno index may be better!
  • What about without WHERE condition?

11
Examples of Clustered Indexes
SELECT E.dno FROM Emp E WHERE E.hobbyStamps
  • B tree index on E.hobby?
  • NOTE is equality query.
  • NOTE may contain many duplicates.
  • Clustered or Unclustered index ?
  • CONCLUDE Clustering on E.hobby helps!
  • QUESTION what if index is unclustered ?
  • CONCLUDE may prefer to do a full scan.

12
Indexes with Composite Search Keys
Composite Search Keys Search on combination of
fields (sal and age).
11,80
11
12
12,10
name
age
sal
12,20
12
bob
10
12
13,75
13
cal
80
11
ltage, salgt
ltagegt
joe
12
20
sue
13
75
10,12
10
20
20,12
Data records sorted by name
75,13
75
80,11
80
ltsal, agegt
ltsalgt
Data entries in index sorted by ltsal,agegt
Data entries sorted by ltsalgt
13
Equality and Composite Search Keys
  • Equality query Every field value is equal to a
    constant value.
  • Examples
  • age20
  • sal 75
  • age20 and sal 75
  • sal 75 and age20

11,80
11
12
12,10
name
age
sal
12,20
12
bob
10
12
13,75
13
cal
80
11
ltage, salgt
ltagegt
joe
12
20
sue
13
75
10,12
10
20
20,12
Data records sorted by name
75,13
75
80,11
80
ltsal, agegt
ltsalgt
Data entries in index sorted by ltsal,agegt
Data entries sorted by ltsalgt
14
Composite Search Keys
  • If retrieve Emp records with age30 AND sal4000
  • Index on ltage,salgt would be better than an index
    on age or an index on sal.
  • Choice of index key orthogonal to clustering

15
Ranges and Composite Search Keys
Examples of composite key indexes using
lexicographic order.
  • Range query Some field value is not a constant
    but a range.
  • Examples
  • age12 and sal gt 10
  • age 12

11,80
11
12
12,10
name
age
sal
12,20
12
bob
10
12
13,75
13
cal
80
11
ltage, salgt
ltagegt
joe
12
20
sue
13
75
10,12
10
20
20,12
Data records sorted by name
75,13
75
80,11
80
ltsal, agegt
ltsalgt
Data entries in index sorted by ltsal,agegt
Data entries sorted by ltsalgt
16
Composite Search Keys
  • If condition is 20ltagelt30 AND 3000ltsallt5000
  • Clustered tree index on ltage,salgt or ltsal,agegt is
    best.
  • If condition is age30 AND 3000ltsallt5000
  • Clustered ltage,salgt index much better than
    ltsal,agegt index!
  • Composite indexes are larger, updated more often.

17
Index-Only Plans
  • Answer a query without retrieving actual tuples
  • Is that possible ?
  • If index with suitable information is available.
  • Why is it a good idea ?

18
Index-Only Plans
SELECT D.mgr FROM Dept D, Emp E WHERE
D.dnoE.dno
ltE.dnogt
SELECT D.mgr, E.eid FROM Dept D, Emp E WHERE
D.dnoE.dno
  • A number of queries can be answered without
    retrieving any tuples from one or more of the
    relations involved if a suitable index is
    available.

ltE.dno,E.eidgt
Tree index!
SELECT E.dno, COUNT() FROM Emp E GROUP BY
E.dno
ltE.dnogt
SELECT E.dno, MIN(E.sal) FROM Emp E GROUP BY
E.dno
ltE.dno,E.salgt
Tree index!
ltE. age,E.salgt or ltE.sal, E.agegt
SELECT AVG(E.sal) FROM Emp E WHERE E.age25
AND E.sal BETWEEN 3000 AND 5000
Tree!
19
Index-Only Plans
Does index-only evaluation make sense?
SELECT E.dno, COUNT() FROM Emp E GROUP BY
E.dno
ltE.dnogt ?
ltE.dnogt ? ltE.salgt ? ltE.dno,E.salgt ?
SELECT E.dno, MIN(E.sal) FROM Emp E GROUP BY
E.dno
SELECT AVG(E.sal) FROM Emp E WHERE E.age25
AND E.sal BETWEEN 3000 AND 5000
ltE. age,E.salgt or ltE.sal, E.agegt?
Tree index!
20
Index-Only Plans Multi-Key Index
  • PROS
  • The chance for index-only evaluation is
    increased.
  • CONS
  • - Index size larger.
  • - Update response for any field.

21
Index-Only Plans
  • Tree index on ltdno,agegt, or on
  • ltage,dnogt
  • Which is better?

SELECT E.dno, COUNT () FROM Emp E WHERE
E.age30 GROUP BY E.dno
22
Index-Only Plans
  • Tree index on ltdno,agegt, or on
  • ltage,dnogt
  • Which is better?

SELECT E.dno, COUNT () FROM Emp E WHERE
E.age30 GROUP BY E.dno
SELECT E.dno, COUNT () FROM Emp E WHERE
E.agegt30 GROUP BY E.dno
What if we consider the second query?
23
Index-Only Plans Multiple Relations
ltE.dnogt Or ltD.dnogt Or ltD.mgrgt
SELECT DISTINCT ( D.mgr ) FROM Dept D, Emp
E WHERE D.dnoE.dno
24
Index-Only Plans
ltE.dnogt Or ltD.dnogt Or ltD.mgrgt
SELECT D.mgr FROM Dept D, Emp E WHERE
D.dnoE.dno
SELECT D.mgr, E.eid FROM Dept D, Emp E WHERE
D.dnoE.dno
ltE.dno,E.eidgt
Tree index!
25
Summary
  • Understanding nature of workload for application,
    and the performance goals is essential to
    developing a good design.
  • What are the important queries and updates?
  • What attributes/relations are involved?

26
More Summary
  • Indexes must be chosen to speed up important
    queries
  • Index maintenance overhead on updates to key
    fields.
  • Choose indexes that can help many queries, if
    possible.
  • Build indexes to support index-only strategies.
  • Clustering is an important decision only one
    index on a given relation can be clustered!
  • Order of fields in composite index key can be
    important.
Write a Comment
User Comments (0)
About PowerShow.com