Title:Oracle DB Statistics What do they actually do and why do I care about the details
Description:
insert into patient_list select ' ', ' ','M',NULL from dba_tables where rownum 101; ... insert into gender select 'F' from dba_objects where rownum 10. commit; ... – PowerPoint PPT presentation
The CARD (CARDINALITY) is 50. Cardinality is the number of rows the optimizer expects to come back
TEST TIME! What is the explanation (dont worry multiple choice)
8 What does this Mean
(a) MEN REVOLT!!!! We no longer need women to reproduce!
(b) The optimizer is flawed file an SR!
(c) statistics simply have their limitations
9 How did Oracle calculate 50
200 Rows in the Table
50 have first_preg_date NULL
2 Distinct values for Gender 50
200 Rows 50 50 50 Rows
Not Rocket science but it is obviously way off
If the table happened to 10TB the calculation would have occurred the same way
Sounds like a mathematical statistical limitation
10 Limitation Oracle Should fix that
Statistics collects only very rudimentary metadata about tables indexes and table columns
To blindly widen the net could cause your statistical metadata to grow exponentially causing the optimizer to slow significantly (not to mention the stats job) for the sake of only a relatively small number of queries
11 In a nutshell how does the optimizer work
Upon a hard parse oracle comes up with a large number of possible execution plans
The optimizer simply uses basic mathematical statistics (and limited Oracle proprietary algorithms) to assign cardinalities to each step of these execution plans (stats are used here)
Then the optimizer assigns a cost to each step (how much work to collect that cardinality) (stats are used here)
Then for each plan the optimizer sums the total cost and picks the plan with the lowest number
12 What are Table Level Statistics
ALL_TABLES
The Biggies
NUM_ROWS - Number of rows in the table
BLOCKS - Number of used data blocks in the table
Considerations
DEGREE - Number of threads per instance for scanning the table (parallelism)
PARTITIONED - Indicates whether this table is partitioned. Set to YES if it is partitioned.
AVG_ROW_LEN - Average length of a row in the table (in bytes)
13 What are Column Statistics
This is where method_opt comes in
FOR ALL COLUMNS
FOR ALL INDEXED COLUMS
ALL_TAB_COL_STATISTICS
The Biggies
NUM_DISTINCT - Number of distinct values in the column
LOW_VALUE RAW(32) - Low value in the column
HIGH_VALUE RAW(32) - High value in the column
NUM_NULLS - Number of nulls in the column
Considerations
AVG_COL_LEN NUMBER Average length of the column (in bytes)
HISTOGRAM - Indicates existence/type of histogram (we shall discuss this later)
Calculated based on other stats
DENSITY NUMBER Density of the column
14 What is out of Range
This can occur when the a where clause predicate on a column is less then LOW_VALUE or greater then HIGH_VALUE
Can be caused by
big data shifts dumped into the table after the most recent analyze
Sample rate too low
Problem can appear random when using bind variables
Nice to know but the values in ALL_TAB_COL_STATIST ICS
LOW_VALUE
HIGH _VALUE
Are Raw data type how do I translate
15 Why do I care about Out of Range
The further you get out of range the lower the expected cardinality
The optimizer can greatly underestimate the cardinality of the rows coming back causing a suboptimal plan (plan starts in the wrong place wrong index is used)
16 How do you detect Out of Range Issues
This is tough if you keep your stats from getting stale look at the slow sql plan in vsql_plan. Look at the cardinalities if one appears way off this might be it (but just because you dont see the symptoms doesnt mean this isnt the prob). (Much worse w/ binds).
When using binds try to extract first parse bind variables from the data dictionary or AWR. You may have to trust your gut.
17 LOW_VALUE - HIGH_VALUEconversion
create or replace function display_raw (rawval raw type varchar2)
return varchar2
is
cn number
cv varchar2(32)
cd date
cnv nvarchar2(32)
cr rowid
cc char(32)
begin
if (type NUMBER) then
dbms_stats.convert_raw_value(rawval cn)
return to_char(cn)
elsif (type VARCHAR2) then
dbms_stats.convert_raw_value(rawval cv)
return to_char(cv)
elsif (type DATE) or (type LIKE TIMESTAMP) then
dbms_stats.convert_raw_value(rawval cd)
return to_char(cd)
18 Here is the SQL
select
a.column_name
display_raw(a.low_valueb.data_type) as low_val
display_raw(a.high_valueb.data_type) as high_val
b.data_type a.NUM_DISTINCT a.DENSITY a.NUM_NULLS
from
all_tab_col_statistics a all_tab_cols b
where
a.table_nameACAT_CTRL_FACT and
a.table_nameb.table_name and
a.column_nameb.column_name and
a.owner b.owner
19 What are Index Stats
ALL_IND_STATISTICS
BLEVEL- B-Tree level
LEAF_BLOCKS - Number of leaf blocks in the index
DISTINCT_KEYS - Number of distinct keys in the index key
CLUSTERING_FACTOR - Indicates the amount of order of the rows in the table based on the values of the index.
20 Why do I have to sometime run stats in the middle of the day to fix a query
Likely candidates
Cursor Invalidation (for reused sql)
Simply a bad peaked bind variable
Out of Range issue due to new data or stats sampling issue
Significant data change at the column level (update) since last analyze
Significant change at the number of rows (or table size) typically because stats havent been run for a long time since last analyze
21 Statistics Types on Partitioned tables
Statistics are collected two ways on a partitioned table (or 3 on subpartioned)
Global (Same as a normal table)
ALL_TABLES
ALL_TAB_COL_STATISTICS
ALL_TAB_HISTOGRAMS
ALL_INDEXES
Partitioned
ALL_TAB_PARTITIONS
ALL_PART_COL_STATISTICS
ALL_PART_HISTOGRAMS
ALL_IND_PARTITIONS
Subpartitions
ALL_SUBPART_COL_STATISTICS
ALL_SUBPART_HISTOGRAMS
ALL_TAB_SUBPARTITIONS
22 Why so many statistics on Partitioned Tables
When oracle received a SQL statement that references a partitioned table it will use the values in the where clause to use the lowest degree of stats possible
Select from part_table (global stats)
Select from part_table where partitioned_key X (partition stats)
Select from part_table where sub_partitioned_key X (sub_partition stats)
Select from part_table where partitioned_key in ( X Y) (global stats)
23 Gotchas with stats on partitioned tables
If you use bind variables (i.e. partitioned_key (X) and gender Y) the database will find the best plan for that value for the data in Xs partition. Maybe that is a bad plan in another partition.
Imagine if the first hard parsed query in the morning hits an empty partition (cheap FTS). Then the next query hits a big partition. Same plan (supposedly some work is done on this issue in 11G)
24 Gotchas with stats on partitioned tables
Image a Credit Card transaction table partitioned by load date (1000 Days)
Each load date will contain 4 partial transaction dates
No histograms
Local Stats on any given partition say 4 distinct values for load date
Global Stats say 1004 distinct values
Index on Load Data
All input values are in range
Select from CC_TRANS where load_date X and trans_date Y
Optimizer says FTS (full partion scan) best plan GOOD
Select from CC_TRANS where load_date in (XZ)
and trans_date Y
Optimizer says Index Lookup best plan BAD
Select from CC_TRANS where load_date gt X
and trans_date Y
25 Out of Range issues
Partitioned tables tend to be big tables so adding 1M rows may not be a big change (stale).
If you are adding a new partition every day (partitioned by transaction date) every day the max(transaction date) for the global table grows. Global stats doesnt know. Everyday will grow to be further out of range. Execution plans against global stats may perform terribly when sql tries to hit recent out-of-range values.
In 10.2.0.4 and after new copy table statistics procedure is available to counteract some of these issues
26 Histograms what are they
Dan Hotka Histograms are a cost-based optimizer (CBO) feature that allows for Oracle to see the possible number of values of a particular column. This is known as data skewing and histograms can track the number of occurrences of a particular data value when deciding on what type of index to use or even whether to use an index at all. Histograms give the CBO the ability to see the uniqueness of a particular data value.
27 Histogram metadata
ALL_TAB_HISTOGRAMS
ENDPOINT_NUMBER - Histogram bucket number
ENDPOINT_VALUE - Normalized endpoint value for this bucket
ENDPOINT_ACTUAL_VALUE - Actual (not normalized) string value of the endpoint for this bucket
28 Lets give Histograms a look
We will create a frequency based histogram
create table gender as select M gender from dba_objects
insert into gender select F from dba_objects where rownum lt 10
commit
EXEC DBMS_STATS.GATHER_TABLE_STATS(ownname gt user tabname gt GENDER ESTIMATE_PERCENT gt100 method_opt gt FOR ALL COLUMNS SIZE 10)
select num_rows from all_tables where owner user and table_name GENDER
Oracle implementation will break up the numbers evenly by the range (for instance 10 buckets 10 of the range will be in each bucket)
32 Histograms in 10G
10G defaults to size auto in method_opt
This means there is an algorithm in oracle to view recent behavior in the database to determine if histograms are needed
I believe 10G size auto is too aggressive in creating histograms. Creates histograms on too many columns
And histogram collection time is significant and resource intensive
33 10G Histogram Oddities
I have seen 10G put height based histograms on Last Name columns
If histograms tell use there are 1M rows with last names between Pallack and Phillips how can histograms help the optimizer guess how many Penellopeys (uncommon) there are..
Clearly there are Common last names but range based histograms cant hold them (too many) and height based dont make sense.
34 Histogram Bind Variables
The combination of histograms and bind variable usage can get you in trouble (11G has some advances in this)
The first hard parse determines your plan. What if the bind variable value references infrequent skewed data which is not used again. Subsequent queries go after data on the other side of the skew
Im not saying lack of histograms doesnt have its downfalls but the existence of histograms can cause large volatility which brings the phone calls.
If a small discrete set of values consider using literals in where predicate for histogram columns for the best plan every time.
35 But 10G takes care of Stale Stats
Oracle Documentation - If a monitored table has been modified more than 10 then these statistics are considered stale and gathered again
A good solid rule for 80 of all tables.
But staleness truly cannot be measured by number of modified rows alone. Internal data swings (most likely on bigger tables) can cause HUGE issues. Consider a transaction table with a new distinct ORDR CODE for every day.
If there is a frequency histogram on this column a new ORDR CODE will not be in it. The optimizer using the histogram data in combination of the columns distinct values (all_tab_cols) and the number of rows in the all_tables the optimizer may guess as loww as 1 row.
36 Statistics Guidelines
Many tables fall into batch processes If possible the app teams should run stats after each batch.
Partitioned tables should collect global stats each week
Tables partitioned by month should additionally collect global stats immediately after the first load of monthly data.
If you periodically find bad plans based on wrong statistics even after an analyze up the sample size.
Sure 10G has a default daily stats job dont hesitate to use your own
About PowerShow.com
PowerShow.com is a leading presentation/slideshow sharing website. Whether your application is business, how-to, education, medicine, school, church, sales, marketing, online training or just for fun, PowerShow.com is a great resource. And, best of all, most of its cool features are free and easy to use.
You can use PowerShow.com to find and download example online PowerPoint ppt presentations on just about any topic you can imagine so you can learn how to improve your own slides and presentations for free. Or use it to find and download high-quality how-to PowerPoint ppt presentations with illustrated or animated slides that will teach you how to do something new, also for free. Or use it to upload your own PowerPoint slides so you can share them with your teachers, class, students, bosses, employees, customers, potential investors or the world. Or use it to create really cool photo slideshows - with 2D and 3D transitions, animation, and your choice of music - that you can share with your Facebook friends or Google+ circles. That's all free as well!
For a small fee you can get the industry's best online privacy or publicly promote your presentations and slide shows with top rankings. But aside from that it's free. We'll even convert your presentations and slide shows into the universal Flash format with all their original multimedia glory, including animation, 2D and 3D transition effects, embedded music or other audio, or even video embedded in slides. All for free. Most of the presentations and slideshows on PowerShow.com are free to view, many are even free to download. (You can choose whether to allow people to download your original PowerPoint presentations and photo slideshows for a fee or free or not at all.) Check out PowerShow.com today - for FREE. There is truly something for everyone!