Administering SQL Server 2000: VLDB Best Practices - PowerPoint PPT Presentation

1 / 67
About This Presentation
Title:

Administering SQL Server 2000: VLDB Best Practices

Description:

S rskilt formad datastruktur f r att l ttare komma t data ... Fr gor som specificerar ett smalt s kkriteria, svaret blir ett f tal rader ... – PowerPoint PPT presentation

Number of Views:87
Avg rating:3.0/5.0
Slides: 68
Provided by: DavidCa186
Category:

less

Transcript and Presenter's Notes

Title: Administering SQL Server 2000: VLDB Best Practices


1
DATABASE INDEXES , BLOB VLDB Best
Practices Bernt Castman KTH, Kista Ryan
Trout Microsoft Corporation
2
Vad är ett index?
  • Särskilt formad datastruktur för att lättare
    komma åt data
  • Lagras tillsammans med data eller separat
  • Uppdateras normalt av systemet (ej
    fulltextindex)
  • Utan index måste all dataåtkomst gå via sökning i
    tabeller

3
forts
  • Genom att använda index kan man reducera antalet
    I/O-operationer och snabba upp alla andra
    operationer också samt minska antalet
    blockeringar och tabell-låsningar
  • Om en fråga accessar mer än 20 av raderna i en
    tabell är en tabellscanning bättre än index.
    Detta gäller även om samma värde på indexkolumn
    finns i många rader

4
Relationship Between an Index and a Table
5
Index är bäst
  • Frågor som specificerar ett smalt sökkriteria,
    svaret blir ett fåtal rader
  • Frågor som specificerar en räcka värden, svaret
    blir ett fåtal rader
  • Sökning inom JOIN, kolumner som ofta används i
    JOIN är goda indexkandidater
  • Sökning som vill ha data i en specifik ordning,
    utan att behöva sortera
  • efternamn, förnamn- sortering kräver inte ORDER BY

6
forts
  • Indexera med måtta
  • Indexera inte små tabeller
  • Använd "covering queries" där önskvärt svar finns
    inom indexets lagringsutrymme, se mera senare

7
Hur organiseras index?
  • Index byggs oftast upp i trädstruktur
  • Root node är början av index
  • innehåller ett antal indexrader
  • dessa rader innehåller nycklar och en pointer
    till en indexsida
  • Trädet kan ha flera indexnivåer

8
Index-Uppdatering
  • Index-nyckel avser den eller de kolumner i en
    tabell som används för att generera index
  • Index uppdateras hela tiden
  • Vid tex en ny insert i tabellen av en datarad kan
    en omfattande ändring behöva göras i indexträdet
  • På grund av sin karaktär (fast längd etc) ger
    detta snabb uppdatering av index även om man
    måste skapa utrymme för en ny indexsida
  • Detta kallas page split och ger ofta
    kaskad-effekter på resten av trädet
  • Kan undvikas genom balansering med "tuning fill
    factor"

9
Simple Index
  • Definieras av en kolumn i en tabell
  • Refereras till med SQL-attributet WHERE
  • Väljs bara en kolumn där samma data förekommer i
    många rader tex "state blir inte index särskilt
    effektivt

10
Composite Index
  • Sammansatt index definieras från mer än en kolumn
    tex state, street,city
  • Kan adresseras genom att använda olika index
  • Kan bestå av högst 16 kolumner med max 900 bytes
    "data"
  • Vid frågor med sammansatta index behöver man inte
    placera alla indexnycklar i WHERE

11
Exempel
  • Index skapat på kolumn a,b och c av en tabell
  • Kan sökas med SELECT ... WHERE
  • EQUAL a AND b AND c går fortast
  • EQUAL a AND b
  • EQUAL a ger alltid INDEX SCAN

12
INDEX SCAN REAL WORLD A Customer Location Table
  • Suppose we have a table that contains information
    about the location of customers of our business.
    An index is created on the columns state, county,
    and city and is stored in the tree structure in
    the following order state, county, city.
  • If a query specifies the state column value as
    Texas in the WHERE clause, the index will be
    used. Because values for the county and city
    columns are not given in the query, the index
    will return a number of rows based on all of the
    index records that contain Texas as the state
    column value.
  • An index scan is used to retrieve a range of
    index pages and then a number of data pages based
    on values in the state column.
  • The index pages are read sequentially in the same
    way that a table scan accesses data pages.

13
Index Uniqueness
  • Ett unikt definerat index blir alltid unikt. Om
    kolumnen som avses med indexet är unikt blir det
    unika index
  • Fungerar snabbt och bra då endast en I/O till
    behövs för att hitta data. Försök att skapa ett
    lika index ger fel
  • Ett ickeunikt index fungerar på samma sätt men
    här tillåts dubbla index med samma värde i
    datakolumnen
  • Används när man inte kan göra unikt index

14
Cluster
  • Klustrade index lagrar aktuella data av rader i
    sina egna noder
  • så fort rätt data hittats kan det läsas utan I/O
  • högre prestanda
  • alla data i index-sorterad ordning
  • om state/county/city klustrat index söks på state
    blir även sorterat på county och city
  • Om man vill komma till andra data i tabellen blir
    det ytterligare en access
  • Endast ett klustrat index per tabell

15
  • Ickeklustrat index pekar på data i en tabell
  • Max 249 ickeklustrade index per tabell
  • Index skapas via Wizard "Create Index Wizard"
  • SQL CREATE UNIQUE CLUSTERED NONCLUSTERED
  • INDEX index_name ON table_name
  • ( column_name , column_name, column_name, ...
    )
  • WITH options ON filegroup_name

16
Fulltext-Index
  • Mera lik en katalog, ej i trädstruktur
  • Ett fulltextindex måste innehålla en kolumn som
    unikt identifierar varje rad
  • Ett fulltextindex per tabell
  • Uppdateras inte automatiskt av systemet
  • Ofta mer än text-söknings-maskin,.. fraser- ,
    grupper av ord

17
Examples
  • SELECT FROM Customers WHERE ContactName
  • LIKE 'PETE'
  • SELECT FROM Customers WHERE
  • CONTAINS(ContactName, ' "PETE" ')
  • The CONTAINS predicate can find text strings that
    contain the desired string, such as "PETER" or
    "PETEY," by using the full-text index.
  • SELECT CategoryName FROM Categories WHERE
  • FREETEXT(Description, 'Sweets candy bread')
  • This query might find category names containing
    such words as "sweetened," "candied," or "breads."

18
Hints
  • When the Query Optimizer generates a query
    execution plan, it chooses an index based on
    which index will provide the best
    performanceusually the index that will use the
    fewest I/O operations and retrieve the least
    number of rows.
  • Although Query Optimizer usually chooses the most
    efficient query execution plan and access path
    for your query, you might be able to do better if
    you know more about your data than Query
    Optimizer does. For example, suppose you want to
    retrieve data about a person named "Smith" from a
    table with a column listing last names. Index
    statistics generalize based on a column. Suppose
    the statistics show that each last name appears
    three times on average in the column. This
    information provides fairly good selectivity
    however, you know that the name "Smith" appears
    much more often than average. If you have an idea
    about how to better conduct an SQL operation, you
    can use a hint. A hint is simply advice you give
    to Query Optimizer specifying that it should not
    make the automatic choice.

19
Hints cont..
  • Table scan In some cases, you might decide that a
    table scan would be more efficient than an index
    lookup or an index scan. A table scan is more
    efficient when the index scan will retrieve more
    than 20 percent of the rows in the tablesuch as
    when 70 percent of the data is highly selectable
    and the other 30 percent is "Smith."
  • Which index to use? You can specify a particular
    index to be the only index considered. You might
    not know which index SQL Server Query Optimizer
    would choose without your hint, but you feel the
    hinted index will perform best.
  • Which group of indexes to select from You can
    suggest several indexes to Query Optimizer, and
    it will use all of them (ignoring duplicates).
    This option is useful when you know that a set of
    indexes will work well.
  • Locking method You can tell Query Optimizer which
    type of lock to use when it is accessing the data
    from a particular table. If you feel that the
    wrong type of lock might be chosen for this
    table, you can specify that Query Optimizer
    should use a row lock, a page lock, or a table
    lock.

20
Hash Files
  • Rader i tabell lagras inte sekventiellt på disken
  • Hash-funktion beräknar adressen baserat på en
    eller flera kolumner i raden
  • Hur garanterar man en unik adress?
  • Förutsättning Man vet något om vilka värden som
    kan komma i en kolumn
  • Overflow-area behövs

21
Tex personnummer
  • år alla möjliga 1800-2001
  • månad 1-12
  • dag 1-31
  • löpnr 0001-9999
  • Hur ser ett företagsnummer ut 556380-4524, dvs
    månad och dag-mönstret bryter hash-formeln
  • Lösning Overflow-area där hash-beräkning ger
    samma index som en annan rad, lagras sekventiellt

22
Exempel
  • CREATE CLUSTER PropertyCluster
  • (propertyNo VARCHAR2(5))
  • HASH IS propertyNo HASHKEYS 300000
  • när clustret blivit gjort kan vi skapa en tabell
    tex
  • CREATE TABLE PropertyFor Rent
  • (propertyNo VARCHAR2(5) PRIMARY KEY, .....)
  • CLUSTER PropertyCluster (propertyNo)

23
Using Bulk Copy Program (BCP)
  • BCP is an external program provided with
    Microsoft SQL Server 2000 to facilitate the
    loading of data files into a database. BCP can
    also be used to copy data out of a SQL Server
    table into a data file.
  • Using the BULK INSERT command The BULK INSERT
    Transact-SQL (T-SQL) command lets you copy large
    quantities of data from a data file into a SQL
    Server table, from within SQL Server. Because
    this command is an SQL statement (run within
    ISQL, OSQL, or Query Analyzer), the process runs
    as a SQL Server thread. You cannot use this
    command to copy data from SQL Server into a data
    file.
  • Using Data Transformation Services (DTS) DTS is a
    set of tools provided with SQL Server that makes
    copying data into and out of SQL Server an easy
    task. DTS includes a wizard for importing data
    and a wizard for exporting data.
  • Recovering a database from a backup file can also
    be considered a form of database loading
  • Although staging tables do not provide a method
    of loading data, they are commonly used in
    database loading

24
User table data DB2
  • By default, table data is stored on 4 KB pages.
    Each page (regardless of page size) contains 76
    bytes of overhead for the database manager. This
    leaves 4020 bytes to hold user data (or rows),
    although no row on a 4 KB page can exceed 4005
    bytes in length. A row will not span multiple
    pages. You can have a maximum of 500 columns when
    using a 4 KB page size

25
Long field data DB2
  • Long field data is stored in a separate table
    object that is structured differently from other
    data types
  • Data is stored in 32 KB areas that are broken up
    into segments whose sizes are "powers of two"
    times 512 bytes. (Hence these segments can be 512
    bytes, 1024 bytes, 2048 bytes, and so on, up to
    32 700 bytes.)
  • Long field data types (LONG VARCHAR or LONG
    VARGRAPHIC) are stored in a way that enables free
    space to be reclaimed easily. Allocation and free
    space information is stored in 4 KB allocation
    pages, which appear infrequently throughout the
    object.
  • The amount of unused space in the object depends
    on the size of the long field data, and whether
    this size is relatively constant across all
    occurrences of the data. For data entries larger
    than 255 bytes, this unused space can be up to 50
    percent of the size of the long field data.
  • If character data is less than the page size, and
    it fits into the record along with the rest of
    the data, the CHAR, GRAPHIC, VARCHAR, or
    VARGRAPHIC data types should be used instead of
    LONG VARCHAR or LONG VARGRAPHIC.

26
LOB Data Objects
  • Data is stored in 64 MB areas that are broken up
    into segments whose sizes are "powers of two"
    times 1024 bytes. (Hence these segments can be
    1024 bytes, 2048 bytes, 4096 bytes, and so on, up
    to 64 MB.)
  • To reduce the amount of disk space used by LOB
    data, you can specify the COMPACT option on the
    lob-options clause of the CREATE TABLE and the
    ALTER TABLE statements. The COMPACT option
    minimizes the amount of disk space required by
    allowing the LOB data to be split into smaller
    segments. This process does not involve data
    compression, but simply uses the minimum amount
    of space, to the nearest 1 KB boundary. Using the
    COMPACT option may result in reduced performance
    when appending to LOB values.
  • The amount of free space contained in LOB data
    objects is influenced by the amount of update and
    delete activity, as well as the size of the LOB
    values being inserted.

27
LOGGING for BLOBs
  • NOLOG
  • right click at time of definition for objects
    (columns)
  • does not affect other objects (columns) in the
    table where data are used)

28
Windows DNA 2000Next Generation Web Application
Platform
Application LogicTier
Client Tier
DataTier
Application Services IIS/ASP, COM/MTS,
MSMQ. ADO/OLE DB Data Access, Active Directory,
XML, Commerce
Data Access andStorage
ASP/COM Application
Browser
SQL Server
Rich Client
Devices
Internet (TCP/IP, HTTP, HTML, XML)
29
SQL Server 2000VLDB
  • Higher Availability
  • Lower Administration Costs
  • Larger Databases
  • Based on customer feedback
  • Simplified Planning
  • Higher Performance
  • Smooth Operations

Keep the Ideas Coming!
30
Overview
VLDB Techniques
Performance
Architecture
Features
  • Backup Recovery
  • Index Maintenance
  • Loading and Resizing
  • Verification

31
Backup and Recovery
  • Recovery Models
  • Recovery Models and Backup
  • Backup Enhancements
  • Snapshot Backups

32
Log Basics
  • Conceptually an infinite stream of log records
  • Log files on disk provide a circular cache
  • Optimized for sequential writes
  • Not an audit-trail
  • 10 byte Log Sequence Number (LSN) identifies
    location and sequence of a log record.
    Monotonically increasing

33
Page Structure
LSN, ID, record count, ...
  • Header
  • LSN identifies log record corresponding to last
    page modification
  • Data records
  • Offset table

Record 1
Record 3
Record 2
34
Transaction Log Example 1
Roll - back
Done
35
Transaction Log Example 2
Record Level Locking
LSN
Done
Log
36
Checkpoint
  • Anchor in log for recovery
  • Lists active transactions
  • Flushes hot and old dirty pages to disk
  • Limits recovery time
  • Allows log to be reused

37
Recovery Overview
  • 3 passes (phases) to recovery

Log Start
Log/Time
38
Recovery Models
  • Simplify recovery planning
  • Simplify backup and recovery procedures
  • Clarify tradeoffs
  • Performance
  • vs. Data loss exposure
  • vs. Log space consumption
  • vs. Simplicity
  • Simplify transitions between common operational
    scenarios
  • Maintain data protection
  • Maintain availability
  • Minimize disruption of automated maintenance
    procedures

39
Recovery Models
  • Control Media Recovery Only
  • Choosing a Model
  • Key Benefits
  • Work Loss Exposure
  • Functional Limitations
  • Setting the model
  • How it Works

Recovery Models
Full
Simple
Bulk Logged
ALTER DATABASE SET RECOVERY FULL
40
Recovery Models
  • Full
  • No work loss
  • Supports recovery to any arbitrary point-in-time
  • Simple
  • Simplest backup/restore strategy
  • Less log space required
  • Greatest work loss possible
  • Bulk_Logged
  • High performance bulk operations
  • Minimal log space for bulk operations
  • Some work loss exposure

41
How Recovery Models Work
  • Full ?Everything is fully logged
  • Bulk_Logged ?Minimal logging for some operations
  • CREATE INDEX
  • Bulk Load
  • SELECT INTO
  • WRITETEXT, UPDATETEXT
  • Not settable per operation due to administration
    impact
  • Simple ?Log truncation on checkpoint

42
Log Backups in Bulk_Logged Mode
  • Log backup contains extents modified by bulk
    logged operations since last log backup
  • Avoids breaking the log chain
  • Eliminates need for special full backup
  • ?Minimal impact on automated backups
  • ?Log shipping works!

Transaction Log
Log Backup
Data Extents
43
Available Backup TypesFor Each Model
44
Changing Recovery Model
  • FULL? BULK_LOGGED doesnt break automated backup
    procedures
  • Transitions to and from SIMPLE require more care
    but are uncommon

45
Recovery ModelsVLDB Techniques
  • Use
  • Full
  • Bulk_Logged
  • Or both
  • Transitions are easy
  • Dont use Simple for VLDB
  • Limited to database or database differential
    backup
  • Precludes file / filegroup backup

46
Database BackupHow it works
  • Fuzzy backup
  • Writes inconsistent set of pages
  • Includes transaction log to make consistent
  • Minimal impact on running transactions
  • Fast parallel sequential scans, no
    transformations to data

Time
Log
Start Oldest Active Transaction
End Data Backup
Old - not backed up
After data written
During data backup
Data
47
Log and File Backup
  • Flexibility
  • Log and file backups dont block each other
  • Files are independent of each other
  • Must do concurrent file backups as a single
    operation

Time
Log
Log
Log
Log
Log
Log
File
File
File
File
48
Differential Backups
  • Fast bitmap driven
  • NewDifferential file backup for VLDB

Bitmap
Extents Changed Since Database Backup
Backup
49
Snapshot Backup/Restore
  • Equivalent to full database or file backup
  • Roll forward using regular differential and log
    backups
  • History maintained in MSDB
  • Third party VDI application with storage system
    support (split mirror or copy-on-write)

Tape Library
3-way Mirror
Split Mirror
Backup / Passive
Production
Storage
Virtual Device Interface for Backup
50
Snapshot Benefits
  • Very fast restore
  • From mirror in seconds
  • Longer from tape
  • Backup with the minimum possible impact on
    production server
  • Very fast backup
  • Create reporting or test database
  • Initialize warm standby

51
Backup and RecoveryVLDB Techniques
  • Online backup no window needed
  • Use file and log backups
  • Consider file differential to reduce recovery
    time
  • Backup infrequently modified filegroups less
    often
  • Consider snapshots for high availability
  • Very fast restore from disk
  • Vary fast backup to disk

52
Index Maintenance
  • Index Creation
  • Analysis
  • Online Reorganization

53
Parallel Index Creation
  • Fast
  • Degree of Parallelism Controllable
  • How It Works
  • Separate thread buildsa sub-index for each range
  • Each thread fed by parallel scan
  • Sub-indexes stitched together at end
  • Optional use of TEMPDB

Complete Index
Sub-index Per Range
Parallel Scans
54
Fragmentation Analysis
  • Fast mode avoids reading leaf level pages
  • Optional tabular results
  • Optional check of all index levels
  • Optional check of all indexes

?
55
Online Reorg
  • Low impact
  • 20 degradation of medium OLTP workload
  • Uses minimal data space
  • May be stopped and restarted
  • Use analysis to decide when to reorg

56
Online Reorg
  • Reorders the leaf level of an index
  • Reestablishes fill factor
  • Pages remain in the same file
  • Reorders one index at a time

57
Index MaintenanceVLDB Techniques
  • Index Creation - Parallel and Concurrent
  • Parallel for largest indexes
  • Default for SMP
  • Concurrent for smaller indexes
  • Combination for mixed indexes
  • Consider Bulk_Logged recovery model

58
Index MaintenanceVLDB Techniques
  • Maintenance
  • Fast analysis
  • Online reorganization at appropriate threshold
  • Allow for larger log backups
  • Slower than rebuilding index

59
Loading and Resizing
  • Data Loading
  • Shrinking the Log

60
Bulk Data Loading
  • New data types per column collation
  • Optional firing of triggers during load
  • Constraint checks
  • During loading
  • New command to check after loading
  • Biggest change is operational
  • Recovery models
  • Easy to trade off performance vs. concurrency
  • Easy to maintain recoverability

61
Bulk Load Performance
  • Scales linearly with number of CPUs
  • Improvementnearly 100for each CPU

500 MHz Xeon Typical
62
Bulk Data LoadVLDB Techniques
  • Initial load into empty tables
  • Load Data
  • No indexes
  • Use BULK INSERT
  • Parallel load from partitioned data files
  • One load stream per CPU
  • Bulk_Logged or Simple recovery model
  • TABLOCK option
  • Create indexes
  • Switch to appropriate recovery model
  • Perform backups

63
Bulk Data LoadingVLDB Techniques
  • For incremental loads
  • Load data with indexes in place
  • Performance and concurrency requirements
    determine locking granularity
  • Recovery model changes
  • Full ? Bulk_Logged recovery model
  • Unless need to preserve point-in-time recovery
  • Simple (no change)

64
Shrinking the Log
Head of Log
  • Does what it can immediately
  • Manually shrink again after log backup

Head of Log
Head of Log
Log Backup
Target Size
65
Shrink DatabaseVLDB Techniques
  • Long running shrink database
  • May be stopped without losing work
  • Reclaim space by using TRUNCATE_ONLY

66
Verification
  • Optional online check (allow updates)
  • Performance 15 transaction throughput drop
    during medium workload
  • Parallel check on SMP systems
  • Performance scales linearly at gt80
  • Physical check
  • Very fast page level check will catchproblems
    caused by hardware

CHECKDB, CHECKTABLE, CHECKFILEGROUP
67
VerificationVLDB Techniques
  • Use physical check
  • For full check, allow enough TempDB
  • Space use ESTIMATE_ONLY option
  • Read / write throughput
  • Check part of the database at a time
  • Verify individual file groups separately
  • Check infrequently modified data less often

Hot
68
VerificationWhen?
  • Not a part of regular maintenance
  • Run at critical times
  • Hardware upgrade
  • Software upgrade
  • Whenever you suspect a problem, regardless of
    cause
  • Run it occasionally forpeace of mind

69
Distributed Partitioned Views
  • Create identical tables on multiple servers
  • Define constraint on each table
  • Create UNION VIEW over all tables
  • QP optimizes access based on constraints
  • View is updateable
  • Recoverable to consistent state

Updateable Union View
Tables
70
Smooth Operations
  • Simple transitions between recovery models
  • Operation conflicts eliminated
  • E.g., CREATE INDEX vs. Database Backup
  • All operations recoverable
  • E.g., add file
  • Maintain data protection
  • Maintain availability
  • No disruption of automated maintenance

71
Putting It All Together
Time
Full Recovery Model
Full Recovery Model
Bulk_Logged Recovery Model
OLTP
Bulk Load
Create Index
Online Check
Bulk Load
Create Index
Online Reorg
Log Backup
Log Backup
Log Backup
Log Backup
Log Backup
Log Backup
File Backup
File Backup
File Backup
File Backup
Write a Comment
User Comments (0)
About PowerShow.com