Dell Confidential - PowerPoint PPT Presentation


PPT – Dell Confidential PowerPoint presentation | free to download - id: aacf-N2ZmN


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Dell Confidential


Dell Confidential. 2. Overview. Overview of OS (Linux) ... Dell Confidential. 3. Benefits of Real Application Clusters. New Shared Cache Architecture ... – PowerPoint PPT presentation

Number of Views:94
Avg rating:3.0/5.0
Slides: 32
Provided by: dellcorp
Learn more at:


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Dell Confidential

Oracle 9i RAC ByRamesh Malayappan Gautam
  • Overview of OS (Linux)
  • Overview of Oracle9i Real Application Clusters
  • Oracle9i RAC on Linux
  • Tuning Tips
  • Issues to deal in RAC
  • Going Forward (Oracle 10G)
  • Q A

Benefits of Real Application Clusters
  • New Shared Cache Architecture
  • Exploits New Hardware and Software Technologies
  • Most Flexible Clustering technology
  • Provides scalability and high availability

Real Application Clusters
  • Real Application Clusters (RAC)
  • Cache Fusion
  • True scalability
  • Transparent Scalability
  • All Applications Scale No tuning required
  • No Physical Data Partitioning required
    (Application user partitioning is needed though)
  • ISV Applications Scale out of the box
  • High Availability Loss of single node on
    cluster will not stop the database
  • Ability to add additional hardware transparently
    to users

Overview of Oracle9i RAC
  • Many instances of Oracle running on many nodes
  • All instances share a single physical database
    and have common data control files
  • Each instance has its own log files and rollback
  • All instances can simultaneously execute
    transactions against the single database
  • Caches are synchronized using Oracles Global
    Cache Management technology (Cache Fusion)
  • No Single Point of Failure (Server side)

Oracle9i RAC on Linux
  • Clustering consists of 2 Oracle-supplied
  • Cluster Manager (oracm)
  • Provides consistent view of Oracle instances
  • Accepts registration of Oracle instances
  • Responsible for process level cluster status
  • Hangcheck-timer
  • New in, replaces watchdogd
  • Monitors the Linux kernel for system hangs
  • Implemented as a kernel module so it much less
    affected by system load
  • Resets node from within kernel if abnormal hangs

Internal Workings Of RAC..
  • Multi-Instance with Single Database
  • Cache Fusion (aggregation of cache from each
  • Inter-Instance Transfers
  • GES and GCS
  • Resources Co-ordination
  • Ownership (conversions)
  • Status and Roles
  • RAC Processes
  • Fail-over Recovery

Cache Fusion
Contents of SGA
  • The shared pool portion of the SGA
  • Library Cache
  • Dictionary cache
  • Buffers for parallel exec mesg and control
  • Library Cache
  • Shared SQL areas, private SQL areas (MTS), PL/SQL
    procedures and packages, and control
  • structures such as locks and library cache
  • Data Dictionary
  • Collection of database tables and views
    containing reference information about the
    database, its structures, and its users.

Shared Pool
Buffer Cache
Redo Buffer
Large Pool
Shared Across Instances
Synchronized Across Instances
Remains Local to each Instance
Cache Fusion
  • Cache Fusion is a fundamental component of Real
    Application Cluster
  • Cache Fusion allows individual nodes to share the
    contents of their buffer caches through the
    inter-connect cluster Interprocess Communication
    (IPC) eliminating the need for extra disk I/Os.
  • This greatly improves the performance and
    scalability characteristics of shared-disk
  • Cache fusion only works with the default resource
    control scheme. If GC_FILES_TO_LOCKS is set, the
    old pre-cache fusion behavior is utilized. In
    other words, forced disk-writes will be used.

Dirty Blocks Past Image
  • In a non-RAC instance
  • User A selects say 10 rows (10 blocks)
  • User B selects same 10 rows (10 blocks)
  • User B updates those 10 rows (10 blocks)
  • Dirtied blocks
  • Not committed
  • User C selects same 10 rows
  • Rollback segment buffer provide read consistent
  • Now User B performs Commit
  • Now User D updates same blocks
  • gets the same dirtied blocks
  • If it is a RAC
  • When user D updates on second Instance, PAST
    Image is created for those blocks sent out

Cache Coherency Lock Management
  • Transfer of blocks among the individual node
  • Global Concurrency of the data blocks / pages
  • Global Control mechanism
  • Cluster Interconnects
  • Connect nodes
  • Can be a simple private network connection
  • Can be a specialized cables with Hub/Switch
  • Functions
  • Monitors Health, Status of nodes, Accessing
    remote file systems
  • Cluster alias routing
  • Application-specific traffic
  • Distributed lock manager (DLM) messages / GCS

Cluster Interconnects
  • Essential Requirements
  • Low latency for short messages
  • High speed and sustained data rates for large
  • Low Host-CPU utilization per message.
  • Flow Control, Error Control and Heart-beat
    Continuity monitoring
  • Host Interfaces to interact directly with host
    processes (OS bypass)
  • Switch Networks that scale well

Resources and Coordination
  • Synchronization
  • Data Blocks and Enqueues
  • Nodes acquire and release ownership of resources
  • Co-ordination of concurrent tasks within shared

Enqueue is a shared memory structure Serializes
access to database resources Associated with a
session or transaction.. E.g. Update to a row
Local Concurrency Controls Latches, Row Locks,
Local Enqueues
Resource Coordination
  • Resources have
  • Roles Locally Managed and Globally Managed
  • Modes Null , Shared, Exclusive

Most important Resource DATA BLOCK
  • Global Resource Directory
  • Data Block Identifiers - DBA
  • Location of most current status
  • Modes of Data Blocks
  • Roles of the Blocks

Past Image When a dirty block is sent to other
node using CF, it keeps a copy (data integrity in
case of failures) Consistent Record
(CR) Consistent snapshot at a previous point in
Resource Modes and Roles
When referring to a lock mode in RAC, there are
three characters to distinguish E.g. ABC A
Represents lock mode with values Null, Shared,
Exclusive B Represents Lock Role, Local,
Global C Shows if Past Image exists or not
(1) PI exists , (0) No PI exists
NL0 Null Local and No past Images SL0 Shared
Local with no past image XL0 Exclusive Local with
no past image NG0 Null Global - Instance owns
current block image SG0 Global Shared Lock -
Instance owns current image XG0 Global Exclusive
Lock - Instance own current image NG1 Global
Null - Instance Owns the Past mage
Block. SG1 Shared Global - Instance owns past
Image XG1 Global Exclusive Lock - Instance owns
Past Image.
Global Enqueue Service
Controls Library Cache Library Cache Locks
during parsing of SQL, DML, DDL, PL/SQL Controls
Data Dictionary Cache (Table Locks etc) Manages
synchronization through latches Handles the
message between instances (for changes)
  • Oracle Processes
  • LMON Monitors the enqueues and resources
  • LMD Lock agent process
  • GSD Diagnosability Daemon
  • LCK manages global eqnueue requests
  • LMSn GCS processes - handles blocking
    interrupts from the remote instance, cross
    instance calls
  • Usual Process like SMON, PMON, LGWR, CKPT, DBWR

Failover Basics
  • Detection of failure, by way of its LMON process
  • One of the Instances (Recovering Instance)
    controls the recovery of the failed instance by
    taking over its redo log files.
  • All in-progress transactions are rolled back
    (transaction recovery)
  • Instance recovery does not include restarting the
    failed instance
  • Only the resources mastered by GSC are re-built
  • SMON process of a surviving Instance performs
    recovery of failed instance

Fusion Recovery
  • Recovery
  • The instance, or instances dies
  • Failure detected by cluster manager or GCS.
  • Reconfiguration occurs and all locks owned by the
    departed instance are remastered and the first
    pass read of threads of failed instances done by
  • SMON claims locks needed to recover blocks found
    by the first pass read.
  • Locks are obtained and second pass of redo theads
    of failed instances is performed and blocks
    become available as they have been recovered.
  • Predecessor blocks can be in past image block in
    a different instance or on disk.

Client Connectivity Server Fail
  • Add failover options manually to TNS
    configuration files
  • They are part of the CONNECT_DATA section of a
    connect descriptor
  • Failover options include
  • TYPE Identify the nature of TAF, if any
  • METHOD Configure how quickly failover can occur
  • BACKUP Identify an alternate net service name
  • RETRIES Limit the number of times a reconnection
    will be attempted
  • DELAY Specify how long to wait between
    reconnection attempts

Oracle9i RAC on Linux (cont.)
  • Install Flowchart

Verification of Hardware and Software
Configure Kernel Parameters
Configure Start Cluster Manager
Install Oracle9i RAC Option
Enable rsh rcp on each node
Create DBA group and Oracle Account
Start GSD Configure Listener
Remove IBM Java Package
Configure Network
Create database
Install Cluster Manager
Configure Storage
Linux kernel parameters
  • Set /proc/sys/kernel/shmmax to 3GB
  • Using multiple DBWRs with async I/O is usually
    better than using I/O slaves
  • Must re-link to use libaio i.e. ASYNC I/O
  • make -f async_on
  • init.ora disk_asynch_iotrue by default
  • init.ora filesystemio_optionsasynch set this as
    well if datafiles are on a filesystem (e.g. ext2)
  • 2 DBWRs is a good default for a large buffer
    cache areas
  • If large read sizes occur, increase
    /proc/sys/fs/aio-max-size to the largest read
    size (default is 128KB)

Larger Buffer Cache
  • Oracle has the capability to use an extended
    buffer cache greater than 4GB
  • Using Indirect Data Buffers has some overhead, so
    use this option only if you have enough RAM to
    create a buffer cache greater than 4GB
  • Steps to enable Indirect Data Buffers (from
    Oracle9i Administrators Reference, Rel 2 for
  • mount -t shm -o size8g shmfs /dev/shm (can put
    this in /etc/fstab)
  • init.ora use_indirect_data_bufferstrue
  • init.ora use only db_block_buffers and
    db_block_size (no db_cache_size)
  • For OLTP Apps, small blocks (e.g. 2KB) typically
    work better

Increasing Address Space
  • Oracle defaults to use about 1.7GB of address
    space for its SGA
  • Its possible to increase the SGA address space
    to about 2.6GB (Note 200266.1)
  • genksms -s 0x15000000 ksms.s
  • make -f ksms.o
  • make -f ioracle
  • echo 268435456 /proc//mapped_base (as
    root), where is the pid of the session
    running SQLPlus

Increasing Address Space (cont.)

After Relink
Reserved for kernel
Reserved for kernel
Variable SGA
Variable SGA
DB Buffers (SGA)
DB Buffers (SGA)
sga_base (relink Oracle)
mapped_base (/proc//mapped_base)
Code, etc.
Code, etc.
  • It is a feature in Red Hat Advance Server that
    provides applications access to large memory
    pages on Intel 32-bit CPUs
  • The default memory page size is 4KB.
  • Requires OS support to enable
  • Large pages used for the SGA reduces the number
    of page table entries that Linux and the CPU need
    to keep track of
  • Reduces the CPUs Translation Look-aside Buffer
    (TLB) miss rate
  • Bigpage settings
  • /proc/sys/kernel/shm-use-bigpages0 bigpage
    pool is not used
  • /proc/sys/kernel/shm-use-bigpages1 bigpage
    memory is useable by Oracle except in the
    Indirect Data Buffers case
  • /proc/sys/kernel/shm-use-bigpages2 same as 1,
    but memory is also useable in the Indirect Data
    Buffers case

Real Cluster Design Issues
Performance Monitoring
  • There are many views that help to monitor the
    inter-instance transfers and RAC performance
  • vclass_cache_transfer, vcache_transfer,
    vcache, vlock_activity, vges_statistics, vbh
    , vsysstat and VSYSTEM_EVENT
  • The above views help diagnose the following
  • The most significant statistics are in vsysstat
  • Cache-related statistics such as consistent gets,
    db block gets, and db block changes
  • Cache Fusion related statistics, such as global
    cache current block receive time or global cache
    current block send time, global cache lock open
  • Convert requests, and global cache wait times,
    such as global cache gets, global cache converts,
    and waits for events such as Null-to-X
  • I/O statistics such as physical reads, physical
    writes, DBWR cross-instance writes, and wait
    times for reads and writes

Oracle Parallel Execution ..
  • RAC can engage multiple processors from different
    nodes for a given task execution
  • Achieve additional parallelism, not possible by a
    single SMP node.
  • For instance, in a two node ORAC, set up a
    parallel query with Parallel Hint to utilize
    the CPUs from the both the instances.
  • SELECT / FULL(nydata) PARALLEL(nydata, 3,2) /
    count() FROM nysales
  • In this example, Degree of Parallelism (DOP) is 3
    and use Two instances. It is executed with total
    6 processes, 3 on each instance

Issues Faced
  • There were multiple issues since we started to
    work on RAC. Many of them have been resolved
    through upgrades and minor patches.
  • Major issues
  • NTP Issues- Problem appears to have been due to
    NTP (Network Time Protocol) settings on the
    server that allowed the time to be automatically
    set backwards by the NTP server. This time change
    caused it to look like a checkin had been missed.
    Changing NTP settings so that setting the time
    backwards is disallowed appears to have resolved
    the problem
  • Split-brain condition - This should never happen
    but we ran into this issues also . The Cluster
    software should take care of this issue.
  • Fork-Process Hanging - we started seeing unable
    to receive acknowledgement from forked process
    in alert log

10G RAC Features
  • Dynamic affinity policy enhancements for
    optimizing the Cache Fusion protocol to enhance
    the performance of several kinds of workloads
  • Better Workload management
  • Improvements to adding a node. Oracle introduces
    portable cluster-ware that makes adding a node
  • Cluster application availability subsystem

  • Q A