Teradata%20Architectural%20Summit%202007 - PowerPoint PPT Presentation

About This Presentation
Title:

Teradata%20Architectural%20Summit%202007

Description:

Why I cannot insert 50 GB evenly distributed data in a database having 500 GB unused space? ... Almost all allocated space for a database on at least one of ... – PowerPoint PPT presentation

Number of Views:387
Avg rating:3.0/5.0
Slides: 6
Provided by: teradat
Category:

less

Transcript and Presenter's Notes

Title: Teradata%20Architectural%20Summit%202007


1
Teradata Architectural Summit 2007
  • Yasir Hassan Maken
  • Professional Services EMEA (GCC Pakistan)

2
  • What Data Skew can cause?

Why I cannot insert 50 GB evenly distributed data
in a database having 500 GB unused space?
Why we are not able to move only 10 of the
unused space from a database?
Why most of my requests are Spooling out?
Why most of the queries running on this database
usually have high CPU SKEW?
3
Because
  • Most of the database space is being wasted
  • Almost all allocated space for a database on at
    least one of the AMPs has been occupied
  • The chunk of upcoming data that needs to go on
    the highly congested AMP, after distribution, do
    not finds a space for accommodation
  • If space needs to be moved, the chunk of space
    that is supposed to be fetched from the congested
    AMP would not be allowed. As, equal space will be
    fetched from each of the AMPs and the activity
    will fail if one of the AMPs does not have FREE
    SPACETotal Space to be moved/ of AMPs
  • During processing of any request on the database,
    there would be high chances of spooling out on
    the congested AMP as MaxPerm CurrPerm will be
    the available Spool which will be less due to
    high Skew on the AMP
  • In some cases, the CPU Skew of the sql request on
    the skewed tables of the database will be more
    due to uneven distribution of data and different
    CPU required on different AMPs

4
How to make things work normally?
Identify the Skewed tables
Check the row distribution per amp and find the
Skew with the suggested Index
SkewFactor 100 - (AVG(CurrentPerm)/MAX(CurrentPe
rm)100)
Check for Hash collisions
5
Improvements
Now I can easily insert enough data in the
database
Now the Spool out issue has improved a lot
Now I can easily move space from the database if
there is any available space
The CPU Skew has decreased a lot
BEFORE
AFTER
xx GB
xx GB
xx GB
xx GB
xx GB
xx GB
xx GB
xx GB
xx GB
xx GB
Write a Comment
User Comments (0)
About PowerShow.com