Oracle 9i RAC By Ramesh Malayappan - PowerPoint PPT Presentation

About This Presentation
Title:

Oracle 9i RAC By Ramesh Malayappan

Description:

Dell Confidential. 2. Overview. Overview of OS (Linux) ... Dell Confidential. 3. Benefits of Real Application Clusters. New Shared Cache Architecture ... – PowerPoint PPT presentation

Number of Views:111
Avg rating:3.0/5.0
Slides: 32
Provided by: dellcorp
Category:

less

Transcript and Presenter's Notes

Title: Oracle 9i RAC By Ramesh Malayappan


1
Oracle 9i RAC ByRamesh Malayappan Gautam
Mekala
2
Overview
  • Overview of OS (Linux)
  • Overview of Oracle9i Real Application Clusters
  • Oracle9i RAC on Linux
  • Tuning Tips
  • Issues to deal in RAC
  • Going Forward (Oracle 10G)
  • Q A

3
Benefits of Real Application Clusters
  • New Shared Cache Architecture
  • Exploits New Hardware and Software Technologies
  • Most Flexible Clustering technology
  • Provides scalability and high availability

4
Real Application Clusters
  • Real Application Clusters (RAC)
  • Cache Fusion
  • True scalability
  • Transparent Scalability
  • All Applications Scale No tuning required
  • No Physical Data Partitioning required
    (Application user partitioning is needed though)
  • ISV Applications Scale out of the box
  • High Availability Loss of single node on
    cluster will not stop the database
  • Ability to add additional hardware transparently
    to users

5
Overview of Oracle9i RAC
  • Many instances of Oracle running on many nodes
  • All instances share a single physical database
    and have common data control files
  • Each instance has its own log files and rollback
    segments
  • All instances can simultaneously execute
    transactions against the single database
  • Caches are synchronized using Oracles Global
    Cache Management technology (Cache Fusion)
  • No Single Point of Failure (Server side)

6
Oracle9i RAC on Linux
  • Clustering consists of 2 Oracle-supplied
    components
  • Cluster Manager (oracm)
  • Provides consistent view of Oracle instances
  • Accepts registration of Oracle instances
  • Responsible for process level cluster status
  • Hangcheck-timer
  • New in 9.2.0.2, replaces watchdogd
  • Monitors the Linux kernel for system hangs
  • Implemented as a kernel module so it much less
    affected by system load
  • Resets node from within kernel if abnormal hangs
    occur

7
Internal Workings Of RAC..
  • Multi-Instance with Single Database
  • Cache Fusion (aggregation of cache from each
    node)
  • Inter-Instance Transfers
  • GES and GCS
  • Resources Co-ordination
  • Ownership (conversions)
  • Status and Roles
  • RAC Processes
  • Fail-over Recovery

Cache Fusion
database
8
Contents of SGA
  • The shared pool portion of the SGA
  • Library Cache
  • Dictionary cache
  • Buffers for parallel exec mesg and control
    structures.
  • Library Cache
  • Shared SQL areas, private SQL areas (MTS), PL/SQL
    procedures and packages, and control
  • structures such as locks and library cache
    handles.
  • Data Dictionary
  • Collection of database tables and views
    containing reference information about the
    database, its structures, and its users.

Shared Pool
Buffer Cache
Redo Buffer
Large Pool
Shared Across Instances
Synchronized Across Instances
PGA
Remains Local to each Instance
9
Cache Fusion
  • Cache Fusion is a fundamental component of Real
    Application Cluster
  • Cache Fusion allows individual nodes to share the
    contents of their buffer caches through the
    inter-connect cluster Interprocess Communication
    (IPC) eliminating the need for extra disk I/Os.
  • This greatly improves the performance and
    scalability characteristics of shared-disk
    clusters
  • Cache fusion only works with the default resource
    control scheme. If GC_FILES_TO_LOCKS is set, the
    old pre-cache fusion behavior is utilized. In
    other words, forced disk-writes will be used.

10
Dirty Blocks Past Image
  • In a non-RAC instance
  • User A selects say 10 rows (10 blocks)
  • User B selects same 10 rows (10 blocks)
  • User B updates those 10 rows (10 blocks)
  • Dirtied blocks
  • Not committed
  • User C selects same 10 rows
  • Rollback segment buffer provide read consistent
    image
  • Now User B performs Commit
  • Now User D updates same blocks
  • gets the same dirtied blocks
  • If it is a RAC
  • When user D updates on second Instance, PAST
    Image is created for those blocks sent out

11
Cache Coherency Lock Management
  • Transfer of blocks among the individual node
    caches
  • Global Concurrency of the data blocks / pages
  • Global Control mechanism
  • Cluster Interconnects
  • Connect nodes
  • Can be a simple private network connection
  • Can be a specialized cables with Hub/Switch
  • Functions
  • Monitors Health, Status of nodes, Accessing
    remote file systems
  • Cluster alias routing
  • Application-specific traffic
  • Distributed lock manager (DLM) messages / GCS
    messages

12
Cluster Interconnects
  • Essential Requirements
  • Low latency for short messages
  • High speed and sustained data rates for large
    messages
  • Low Host-CPU utilization per message.
  • Flow Control, Error Control and Heart-beat
    Continuity monitoring
  • Host Interfaces to interact directly with host
    processes (OS bypass)
  • Switch Networks that scale well

13
Resources and Coordination
Resources
  • Synchronization
  • Data Blocks and Enqueues
  • Nodes acquire and release ownership of resources
  • Co-ordination of concurrent tasks within shared
    cache

Enqueue is a shared memory structure Serializes
access to database resources Associated with a
session or transaction.. E.g. Update to a row
Local Concurrency Controls Latches, Row Locks,
Local Enqueues
14
Resource Coordination
  • Resources have
  • Roles Locally Managed and Globally Managed
  • Modes Null , Shared, Exclusive

Most important Resource DATA BLOCK
  • Global Resource Directory
  • Data Block Identifiers - DBA
  • Location of most current status
  • Modes of Data Blocks
  • Roles of the Blocks

Past Image When a dirty block is sent to other
node using CF, it keeps a copy (data integrity in
case of failures) Consistent Record
(CR) Consistent snapshot at a previous point in
time
15
Resource Modes and Roles
When referring to a lock mode in RAC, there are
three characters to distinguish E.g. ABC A
Represents lock mode with values Null, Shared,
Exclusive B Represents Lock Role, Local,
Global C Shows if Past Image exists or not
(1) PI exists , (0) No PI exists
NL0 Null Local and No past Images SL0 Shared
Local with no past image XL0 Exclusive Local with
no past image NG0 Null Global - Instance owns
current block image SG0 Global Shared Lock -
Instance owns current image XG0 Global Exclusive
Lock - Instance own current image NG1 Global
Null - Instance Owns the Past mage
Block. SG1 Shared Global - Instance owns past
Image XG1 Global Exclusive Lock - Instance owns
Past Image.
16
Global Enqueue Service
Controls Library Cache Library Cache Locks
during parsing of SQL, DML, DDL, PL/SQL Controls
Data Dictionary Cache (Table Locks etc) Manages
synchronization through latches Handles the
message between instances (for changes)
  • Oracle Processes
  • LMON Monitors the enqueues and resources
  • LMD Lock agent process
  • GSD Diagnosability Daemon
  • LCK manages global eqnueue requests
  • LMSn GCS processes - handles blocking
    interrupts from the remote instance, cross
    instance calls
  • Usual Process like SMON, PMON, LGWR, CKPT, DBWR
    etc

17
Failover Basics
  • Detection of failure, by way of its LMON process
  • One of the Instances (Recovering Instance)
    controls the recovery of the failed instance by
    taking over its redo log files.
  • All in-progress transactions are rolled back
    (transaction recovery)
  • Instance recovery does not include restarting the
    failed instance
  • Only the resources mastered by GSC are re-built
  • SMON process of a surviving Instance performs
    recovery of failed instance

18
Fusion Recovery
  • Recovery
  • The instance, or instances dies
  • Failure detected by cluster manager or GCS.
  • Reconfiguration occurs and all locks owned by the
    departed instance are remastered and the first
    pass read of threads of failed instances done by
    SMON
  • SMON claims locks needed to recover blocks found
    by the first pass read.
  • Locks are obtained and second pass of redo theads
    of failed instances is performed and blocks
    become available as they have been recovered.
  • Predecessor blocks can be in past image block in
    a different instance or on disk.

19
Client Connectivity Server Fail
  • Add failover options manually to TNS
    configuration files
  • They are part of the CONNECT_DATA section of a
    connect descriptor
  • Failover options include
  • TYPE Identify the nature of TAF, if any
  • METHOD Configure how quickly failover can occur
  • BACKUP Identify an alternate net service name
  • RETRIES Limit the number of times a reconnection
    will be attempted
  • DELAY Specify how long to wait between
    reconnection attempts

20
Oracle9i RAC on Linux (cont.)
  • Install Flowchart

Verification of Hardware and Software
Configure Kernel Parameters
Configure Start Cluster Manager
Install Oracle9i RAC Option
Enable rsh rcp on each node
Create DBA group and Oracle Account
Start GSD Configure Listener
Remove IBM Java Package
Configure Network
Create database
Install Cluster Manager
Configure Storage
21
Linux kernel parameters
  • Set /proc/sys/kernel/shmmax to 3GB
  • Using multiple DBWRs with async I/O is usually
    better than using I/O slaves
  • Must re-link to use libaio i.e. ASYNC I/O
  • make -f ins_rdbms.mk async_on
  • init.ora disk_asynch_iotrue by default
  • init.ora filesystemio_optionsasynch set this as
    well if datafiles are on a filesystem (e.g. ext2)
  • 2 DBWRs is a good default for a large buffer
    cache areas
  • If large read sizes occur, increase
    /proc/sys/fs/aio-max-size to the largest read
    size (default is 128KB)

22
Larger Buffer Cache
  • Oracle has the capability to use an extended
    buffer cache greater than 4GB
  • Using Indirect Data Buffers has some overhead, so
    use this option only if you have enough RAM to
    create a buffer cache greater than 4GB
  • Steps to enable Indirect Data Buffers (from
    Oracle9i Administrators Reference, Rel 2 for
    Linux)
  • mount -t shm -o size8g shmfs /dev/shm (can put
    this in /etc/fstab)
  • init.ora use_indirect_data_bufferstrue
  • init.ora use only db_block_buffers and
    db_block_size (no db_cache_size)
  • For OLTP Apps, small blocks (e.g. 2KB) typically
    work better

23
Increasing Address Space
  • Oracle defaults to use about 1.7GB of address
    space for its SGA
  • Its possible to increase the SGA address space
    to about 2.6GB (Note 200266.1)
  • genksms -s 0x15000000 gtksms.s
  • make -f ins_rdbms.mk ksms.o
  • make -f ins_rdbms.mk ioracle
  • echo 268435456 gt/proc/ltpidgt/mapped_base (as
    root), where ltpidgt is the pid of the session
    running SQLPlus

24
Increasing Address Space (cont.)

Default
After Relink
0xFFFFFFFF
0xFFFFFFFF
Reserved for kernel
Reserved for kernel
0xC0000000
0xC0000000
Variable SGA
Variable SGA
DB Buffers (SGA)
DB Buffers (SGA)
sga_base (relink Oracle)
0x50000000
mapped_base (/proc/ltpidgt/mapped_base)
0x40000000
Code, etc.
0x15000000
0x10000000
Code, etc.
0x00000000
0x00000000
25
Bigpages
  • It is a feature in Red Hat Advance Server that
    provides applications access to large memory
    pages on Intel 32-bit CPUs
  • The default memory page size is 4KB.
  • Requires OS support to enable
  • Large pages used for the SGA reduces the number
    of page table entries that Linux and the CPU need
    to keep track of
  • Reduces the CPUs Translation Look-aside Buffer
    (TLB) miss rate
  • Bigpage settings
  • /proc/sys/kernel/shm-use-bigpages0 bigpage
    pool is not used
  • /proc/sys/kernel/shm-use-bigpages1 bigpage
    memory is useable by Oracle except in the
    Indirect Data Buffers case
  • /proc/sys/kernel/shm-use-bigpages2 same as 1,
    but memory is also useable in the Indirect Data
    Buffers case

26
Real Cluster Design Issues
Result Component Effect of Failure
Ok CPU panic / crash Node Failed, other node still active
Ok Memory crash Node Failed, other node still active
Ok Interconnect With dual Interconnects, OK
Down Interconnect Switch Nodes can not communicate
Ok OS failure / freeze Node Failed, other node still active
Down Cluster Manager s/w Custer freezes, all nodes go down
Ok DB Instance Crash Instance running on other node provides database service
Ok Control File (Corrupt / Lost) Multiplexed control file will be used
Ok Redo log file Multiplexed redo file
Down Lost Data File Requires Media recovery
Down Human Error Depends on type of mistake
Down Dropped Object DB is available but applications stall
Down DB software bug DB may stall on all instances.
27
Performance Monitoring
  • There are many views that help to monitor the
    inter-instance transfers and RAC performance
  • vclass_cache_transfer, vcache_transfer,
    vcache, vlock_activity, vges_statistics, vbh
    , vsysstat and VSYSTEM_EVENT
  • The above views help diagnose the following
    issues
  • The most significant statistics are in vsysstat
  • Cache-related statistics such as consistent gets,
    db block gets, and db block changes
  • Cache Fusion related statistics, such as global
    cache current block receive time or global cache
    current block send time, global cache lock open
  • Convert requests, and global cache wait times,
    such as global cache gets, global cache converts,
    and waits for events such as Null-to-X
    conversions
  • I/O statistics such as physical reads, physical
    writes, DBWR cross-instance writes, and wait
    times for reads and writes

28
Oracle Parallel Execution ..
  • RAC can engage multiple processors from different
    nodes for a given task execution
  • Achieve additional parallelism, not possible by a
    single SMP node.
  • For instance, in a two node ORAC, set up a
    parallel query with Parallel Hint to utilize
    the CPUs from the both the instances.
  • SELECT / FULL(nydata) PARALLEL(nydata, 3,2) /
    count() FROM nysales
  • In this example, Degree of Parallelism (DOP) is 3
    and use Two instances. It is executed with total
    6 processes, 3 on each instance

29
Issues Faced
  • There were multiple issues since we started to
    work on RAC. Many of them have been resolved
    through upgrades and minor patches.
  • Major issues
  • NTP Issues- Problem appears to have been due to
    NTP (Network Time Protocol) settings on the
    server that allowed the time to be automatically
    set backwards by the NTP server. This time change
    caused it to look like a checkin had been missed.
    Changing NTP settings so that setting the time
    backwards is disallowed appears to have resolved
    the problem
  • Split-brain condition - This should never happen
    but we ran into this issues also . The Cluster
    software should take care of this issue.
  • Fork-Process Hanging - we started seeing unable
    to receive acknowledgement from forked process
    in alert log

30
10G RAC Features
  • Dynamic affinity policy enhancements for
    optimizing the Cache Fusion protocol to enhance
    the performance of several kinds of workloads
  • Better Workload management
  • Improvements to adding a node. Oracle introduces
    portable cluster-ware that makes adding a node
    easier.
  • Cluster application availability subsystem

31
Q A
  • Q A
Write a Comment
User Comments (0)
About PowerShow.com