Glenn Paulley - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Glenn Paulley

Description:

... CE/Pocket PC, Linux 32- and 64-bit, HP-UX, AIX, Solaris (SPARC and Intel), Mac ... depending on the platform; fibers are used on Windows and Linux platforms ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 41
Provided by: sybas
Category:
Tags: glenn | paulley

less

Transcript and Presenter's Notes

Title: Glenn Paulley


1
Self-management featuresin SQL Anywhere server
  • Glenn Paulley
  • Director, Engineering
  • http//iablog.sybase.com/paulley

2
Sybase iAnywhere products
  • SQL Anywhere
  • Full-function, small-footprint relational DBMS
    with support for triggers, stored procedures,
    materialized views, intra-query parallelism, hot
    failover, ANSI SQL 2003 support including OLAP
    queries, multidatabase capability
  • Mobilink/SQL Remote
  • Two-way data replication/synchronization
    technologies for replicating data through
    different mechanisms to support
    occasionally-connected devices
  • Ultralite
  • fingerprint database supports ad-hoc SQL on
    very small devices
  • UltraliteJ
  • 100 Java fingerprint database for Blackberry and
    iPhone

3
Design goals of SQL Anywhere
  • Ease of administration
  • Comprehensive yet comprehensible tools
  • Good out-of-the-box performance
  • Embeddability features ? self-tuning
  • Many environments have no DBAs
  • Cross-platform support
  • 32- and 64-bit Windows (XP, Server, 2003, 2000,
    9x), Windows CE/Pocket PC, Linux 32- and 64-bit,
    HP-UX, AIX, Solaris (SPARC and Intel), Mac OS/X,
    Compaq Tru-64
  • Interoperability

4
Why is self-management important?
  • In a word complexity
  • Application development is becoming more complex
    new development paradigms such as ORM toolkits,
    distributed computation with synchronization
    amongst database replicas, and so on
  • Databases are now ubiquitous in IT because they
    solve a variety of difficult problems
  • Yet most companies continue to own and manage a
    variety of different DBMS products, which
    increases administrative costs
  • Ubiquity brings scale, in several ways
  • To keep TCO constant, one must improve the
    productivity of each developer

5
Embeddability
6
Physical database design
7
Physical database design
  • Logical and physical database design tradeoffs
    have a significant impact on performance
  • The fundamental problem is that the potential
    benefits of design changes must be amortized over
    the execution characteristics of a workload
  • Representative workloads are difficult to create
  • Issues include data skew, correlation, mitigating
    artificial contention that occurs as a result of
    simulating the actual system
  • Most DBMS vendors now offer tools to assist in
    index or materialized view creation (more on this
    later)
  • However, not all implications are covered by any
    of these tools

8
Physical database design with SQL Anywhere
  • A SQL Anywhere database is composed of up to 15
    dbspaces
  • Each dbspace is an OS file
  • One each used for temporary file, transaction log
  • Others are for user data
  • UNIX raw partitions are not supported
  • Image copy is done via simple OS file copy
  • No image copy utility is necessary
  • Easy to deploy a database to a separate machine
    or CE device
  • Files are interoperable on all supported
    platforms without user intervention
  • on-the-fly data conversion done when necessary

9
Physical database design
  • By default, databases are created with
  • Case-insensitive searching enabled
  • Not blank-padded no difference in the storage of
    CHAR vs. VARCHAR attributes on the disk
  • SQL Anywhere supports different character sets,
    collations for international contexts
  • Includes support for NCHAR data type using
    multi-byte UTF8 charset
  • Supported page sizes are 2K, 4K, 8K, 16K, 32K
  • All dbspaces must use the same page size
  • No hard limits on row length, number of BLOB
    columns
  • BLOB values (up to 2GB each) stored in separate
    page arena

10
Physical database design
  • Database files grow automatically as necessary to
    accommodate any increase in data volume
  • Server can execute a user call-out procedure when
    a disk-full panic occurs
  • Server offers a temporary file governor to ensure
    that connections cannot be starved when they
    require space for intermediate results
  • Indexes on primary, foreign keys created
    automatically
  • Server will automatically detect redundant
    (duplicate or subsumed) indexes
  • two or more logical indexes share the same
    physical structure

11
Physical database design
  • Transaction log
  • SQL Anywhere uses a logical logging and recovery
    scheme at the row level (not physiological, or
    page-based)
  • Transaction log can be translated directly into
    INSERT, UPDATE, DELETE SQL statements
  • DBTRAN utility ships with the software
  • Assists in recovery from catastrophic failures
  • Provides the backbone for two-way synchronization
    with Mobilink, SQL Remote

12
Self-management features
13
Memory management
  • SQL Anywhere uses a single heterogeneous buffer
    pool with few predefined limits
  • Buffer pool can automatically grow and shrink
    depending on database workload and OS
    requirements for other applications
  • Buffer pool comprises
  • Table data pages
  • Index pages
  • Checkpoint log pages
  • Bitmap pages
  • Heap pages (data structures for connections,
    stored procedures, triggers)
  • Free (available) pages
  • All page frames are the same size

14
Page replacement scheme
  • Buffer pool has no preset bounds for sort space,
    heap space, table pages
  • Buffer pools page replacement scheme must be
    able to manage various categories of pages and
    requests on a demand basis
  • Workloads change over time periods
  • Static configurations must tradeoff service
    levels for administrative effort
  • One basic requirement avoid polluting the cache
    with the results of pages from sequential scans
  • Identify pages that are frequently used

15
Page replacement scheme
  • Pages flow through the buffer pool over time
  • Window (the size of the cache) is stratified into
    eight segments based on time of reference
  • A pages score is incremented as it moves from
    segment to segment
  • Need to differentiate reference locality (ie
    multiple references to a single page based on row
    fetch) from other forms of reuse
  • Replacement based on individual page score
  • A modified Clock algorithm goes through each
    page in turn if a pages score is less than a
    threshold value, page is reused
  • If page is retained, score is decayed
  • LRU-k and 2Q algorithms are frequently referenced
    in the literaturehowever we have found their
    overheads to be too high

16
Task scheduling model
  • Basic idea reduce server resource requirements
    by establishing a limited pool of worker
    threads to execute requests
  • Reduces address space, virtual and real memory
    requirements
  • Introduces possibility of thread deadlock
  • Task implementation differs depending on the
    platform fibers are used on Windows and Linux
    platforms
  • Scheduler assigns work across a pool of workers
  • Worker thread pool typically small, default is 20
  • Size of pool establishes the multiprogramming
    level
  • Workers are assigned dynamically to connections
    on demand

17
Task scheduling model
  • Unassigned requests wait for an available thread
  • Server uses cooperative multitasking to switch
    between requests
  • Server supports intra-query parallelism
  • With queries, optimizer determines degree of
    parallelism at optimization time

18
Task scheduling model
  • Waiting requests are candidates for swapping
  • Try to maintain constant memory footprint by
    using a virtual memory technique
  • Heaps for user connections can be swapped to
    disk if server load demands it
  • Originally implemented in the first SQL Anywhere
    release in 1989
  • Pointers within the heap are swizzled to permit
    reloading of heaps on a demand basis
  • Diagnostic connections can use a special,
    reserved thread to execute its requests
  • Enables diagnosis of a hung engine

19
Dynamic memory management
  • A SQL Anywhere server will grow and shrink the
    buffer pool as necessary to accommodate both
  • Database server load
  • Physical memory requirements of other
    applications
  • Enabled by default on all supported platforms
  • User can set lower, upper bounds, default initial
    size
  • OS provides API to determine memory size
    (requires virtual memory support)
  • Implementation varies slightly by platform

20
Dynamic memory management
  • Basic idea match buffer pool size to operating
    systems working set size
  • Feedback control loop

21
Memory allocation to server threads
  • Query optimizer assumes the availability of an
    amount of available memory for each computation
  • Some execution operators (notably hash-based
    operators) have the ability to free memory
    without changing the computation (just making it
    less efficient)
  • Release of memory is done partition-at-a-time so
    effect is not abrupt
  • Other operators can be constructed with
    alternative plans that offer a cheaper execution
    technique if either
  • the optimizers choices are found to be
    suboptimal at run-time, or
  • the operator requires a low-memory strategy at
    the behest of the memory governor

22
Other self-management features
  • On startup, buffer pool is automatically loaded
    with those database pages that were first
    requested on the last startup
  • SQL Anywhere supports events
  • fired on a schedule, or when specific server
    events occur, such as
  • Idle time greater than a threshold
  • Backup
  • Connect/disconnect
  • File size
  • Disk space

23
Self-tuning query optimization
  • and adaptive query execution strategies

24
Automatic statistics management
  • A feature of SQL Anywhere since 1992
  • Early implementations used a hash-based structure
    to manage column density and frequent-value
    statistics
  • See Lynch, 14th VLDB (1988) for a similar
    approach
  • Today
  • Self-tuning column histograms
  • On both base and temporary tables
  • Statistics are updated on-the-fly automatically
  • Join histograms built for intermediate result
    analysis
  • Server maintains index statistics in real-time
  • Index sampling during optimization

25
Column histograms
  • Self-tuning implementation
  • Incorporates both standard range buckets and
    frequent-value statistics
  • Updated in real-time with the results of
    predicate evaluation and update DML statements
  • By default, statistics are computed during the
    execution of every DML request
  • Novel technique used to capture statistics on
    strings for use in optimizing LIKE predicates
  • Histograms computed automatically on LOAD TABLE
    or CREATE INDEX statements
  • Can be created/dropped explicitly if necessary
  • But retained by default across unload/reload

26
Query optimizer
  • SQL Anywhere optimizes requests each time they
    are executed
  • Optimizer takes into account server context
  • Working set
  • Available cache
  • Values of procedure, host variables
  • Assumption optimization is cheap
  • SQL Anywhere uses a proprietary, cost-based join
    enumeration algorithm that primarily constructs
    left-deep trees
  • Optimization process includes both heuristic and
    cost-based complex rewritings
  • Advantages plans are responsive to server
    environment, buffer pool contents/size, data
    skew no need to administer packages

27
Join enumeration
  • Optimization space is over left-deep trees
  • Optimization process includes both heuristic and
    cost-based complex rewritings
  • Exceptions for complex, right-deep nested LEFT
    OUTER JOINS
  • Space pruning based on a governor that
    allocates quota to different portions of the
    search space
  • Join method, degree of parallelism, physical scan
    method (index selection), and expensive predicate
    evaluation are also part of search space
  • Each choice is re-evaluated at each step
  • Superb optimization times even for complex
    queries
  • No hard limits tested with
  • 500 quantifiers in a single block
  • 100-way join on a CE device with 3MB buffer pool

28
Bypassing the query optimizer
  • Simple, single-table queries are optimized
    heuristically
  • Access plans for queries in stored
    procedures/triggers/events are cached
  • Plans undergo a training period where plan
    variance is determined
  • If no variance (even without variable values),
    plan is cached and reused
  • Query is periodically re-optimized on a
    logarithmic scale to ensure plan does not become
    sub-optimal

29
Adaptive query processing
  • In some cases the optimizer will generate
    alternative access plans that can be executed if
    actual intermediate result sizes are poorly
    estimated
  • Server switches to alternative plan automatically
    at run time
  • Memory-intensive operators, such as hash join,
    have low-memory strategies that are used when
    buffer pool utilization is high

30
Adaptive query processing
  • Some operations, such as database backup, contain
    sampling processes to determine the
    characteristics of the I/O device used for
    storage
  • Primary goal is to determine the number of disk
    heads available
  • Processes can utilize the right number of CPUs
    to maximize throughput
  • Algorithms are sensitive to CPU requirements of
    the rest of the system, and automatically scale
    down CPU usage as necessary

31
Intra-query parallelism
  • SQL Anywheres approach is to parallelize a
    left-deep plan when doing so is advantageous
  • There is an upper bound to the number of EXCHANGE
    operators in a data-flow tree can be set by the
    user to override
  • Work is partitioned independently of the
    availability of worker threads at run-time
  • Plans are largely self-tuning with respect to
    degree of parallelism
  • Prevents starvation of any specific subplan when
    the number of available workers is less than
    optimal for some period

32
Management tools
33
Management tools
  • Framework
  • DBISQL interactive SQL query tool
  • Sybase Central administration/development tool
  • Tools
  • Graphical query plans
  • Stored procedure debugger
  • Stored procedure profiler
  • Index consultant
  • Request-level logging
  • Optimization logging

34
Graphical query plans
35
Stored procedure debugger
  • Enables debugging of a PSM or Java stored
    procedure
  • Features
  • set/unset breakpoints and watch settings
  • determine value of variables
  • traverse the call stack
  • execute queries with the values of variables at
    the time of the break

36
Application Profiling scenarios
37
Application Profiling architecture
38
Stored procedure profiler
  • Offers the ability to analyze the performance of
    a stored procedure/trigger/event
  • Computes the cumulative invocations and elapsed
    time for each statement in a procedure

39
Index consultant
  • Recommends indexes to improve query performance
  • Main idea is to improve application performance
  • Particularly useful when DBA has limited
    experience
  • Permits what-if analysis on an existing query
    load and database instance
  • Allows DBA to find statements that are the most
    sensitive to the presence of an index
  • Can be used to find indexes that are unnecessary,
    i.e. those that are not utilized by the optimizer
  • Can be used to estimate disk storage overhead of
    additional indexes

40
Future plans
  • Increasingly difficult to keep optimization costs
    low in the face of increasingly more challenging
    workloads
  • Tradeoff time for space (cached access plans)
  • Advantages of plan caching are workload-specific
    caching expense pays off only if optimization
    costs can be amortized over several executions
  • Improve statistics, cost model management to
    permit self-repair, which will improve access
    plan quality
  • Numerous other opportunities for self-tuning
    using feedback control mechanisms
  • Memory allocation, multiprogramming level,
    intra-query parallelism controls, additional
    adaptive query optimization techniques
  • Integrate server internals management with VM
    hypervisors to improve operation within guest VMs
Write a Comment
User Comments (0)
About PowerShow.com