Oracle 10g RAC Scalability - PowerPoint PPT Presentation

About This Presentation
Title:

Oracle 10g RAC Scalability

Description:

Oracle 10g RAC Scalability Lessons Learned Bert Scalzo, Ph.D. Bert.Scalzo_at_Quest.com About the Author Oracle Dev & DBA for 20 years, versions 4 through 10g Worked ... – PowerPoint PPT presentation

Number of Views:120
Avg rating:3.0/5.0
Slides: 31
Provided by: bertscalz
Category:
Tags: 10g | rac | oracle | scalability

less

Transcript and Presenter's Notes

Title: Oracle 10g RAC Scalability


1
Oracle 10g RAC Scalability Lessons Learned
Bert Scalzo, Ph.D. Bert.Scalzo_at_Quest.com
2
About the Author
  • Oracle Dev DBA for 20 years, versions 4 through
    10g
  • Worked for Oracle Education Consulting
  • Holds several Oracle Masters (DBA CASE)
  • BS, MS, PhD in Computer Science and also an MBA
  • LOMA insurance industry designations FLMI and
    ACS
  • Books
  • The TOAD Handbook (March 2003)
  • Oracle DBA Guide to Data Warehousing and Star
    Schemas (June 2003)
  • TOAD Pocket Reference 2nd Edition (June 2005)
  • Articles
  • Oracle Magazine
  • Oracle Technology Network (OTN)
  • Oracle Informant
  • PC Week (now E-Magazine)
  • Linux Journal
  • www.Linux.com

3
About Quest Software
Used in this paper
4
Project Formation
  • This paper is based upon collaborative RAC
    research efforts between Quest Software and Dell
    Computers.
  • Quest
  • Bert Scalzo
  • Murali Vallath author of RAC articles and books
  • Dell
  • Anthony Fernandez
  • Zafar Mahmood
  • Also an extra special thanks to Dell for
    allocating a million dollars worth of equipment
    to make such testing possible ?

5
Project Purpose
  • Quest
  • To partner with a leading hardware vendor
  • To field test and showcase our RAC enabled
    software
  • Spotlight on RAC
  • Benchmark Factory
  • TOAD for Oracle with DBA module
  • Dell
  • To write a Dell Power Edge Magazine article about
    the OLTP scalability of Oracle 10g RAC running on
    typical Dell servers and EMC storage arrays
  • To create a standard methodology for all
    benchmarking of database servers to be used for
    future articles and for lab testing
    demonstration purposes

6
OLTP Benchmarking
TPC benchmark (www.tpc.org) TPC Benchmark C
(TPC-C) is an OLTP workload. It is a mixture of
read-only and update intensive transactions that
simulate the activities found in complex OLTP
application environments. It does so by
exercising a breadth of system components
associated with such environments, which are
characterized by The simultaneous execution of
multiple transaction types that span a breadth of
complexity On-line and deferred transaction
execution modes Multiple on-line terminal
sessions Moderate system and application
execution time Significant disk input/output
Transaction integrity (ACID properties)
Non-uniform distribution of data access through
primary and secondary keys Databases consisting
of many tables with a wide variety of sizes,
attributes, and relationships Contention on
data access and update
Excerpt from TPC BENCHMARK C Standard
Specification, Revision 3.5
7
Create the Load - Benchmark Factory
The TPC-C like benchmark measures on-line
transaction processing (OLTP) workloads. It
combines read-only and update intensive
transactions simulating the activities found in
complex OLTP enterprise environments.
8
Monitor the Load - Spotlight on RAC
9
Hardware Software
Servers, Storage and Software Oracle 10g RAC Cluster Servers 10 x 2-CPU Dell PowerEdge 1850 3.8 GHz P4 processors with HT 4 GB RAM (later expanded to 8GB RAM) 1 x 1 Gb NICs (Intel) for LAN 2 x1 Gb LOM teamed for RAC interconnect 1 x two port HBAs (Qlogic 2342) DRAC RHEL AS 4 QU1 (32-bit) EMC PowerPath 4.4 EMC Navisphere agent Oracle 10g R1 10.1.0.4 Oracle ASM 10.1.0.4 Oracle Cluster Ready Services 10.1.0.4 Linux bonding driver for interconnect Dell OpenManage
Servers, Storage and Software Benchmark Factory Servers 2 x 4-CPU Dell PowerEdge 6650 8 GB RAM Windows 2003 server Quest Benchmark Factory Application Quest Benchmark Factory Agents Quest Spotlight on RAC Quest TOAD for Oracle
Servers, Storage and Software Storage 1 x Dell EMC CX700 1 x DAE unit total 30 x 73GB 15K RPM disks Raid Group 1 16 disks having 4 x 50GB RAID 1/0 LUNs for Data and backup Raid Group 2 10 disks having 2 x 20GB RAID 1/0 LUNs for Redo Logs Raid Group 3 4 disks having 1 x 5 GB RAID 1/0 LUN for voting disk, OCR, and spfiles 2 x Brocade SilkWorm 3800 Fibre Channel Switch (16 port) Configured with 8 paths to each logical volume Flare Code Release 16
Servers, Storage and Software Network 1 x Gigabit 5224 Ethernet Switches (24 port) for private interconnect 1 x Gigabit 5224 Ethernet switch for Public LAN Linux binding driver used to team dual onboard NICs for private interconnect
10
(No Transcript)
11
Setup Planned vs. Actual
  • Planned
  • Redhat 4 Update 1 64-bit
  • Oracle 10.2.0.1 64-bit
  • Actual
  • Redhat 4 Update 1 32-bit
  • Oracle 10.0.1.4 32-bit
  • Issues
  • Driver problems with 64-bit (no real surprise)
  • Some software incompatibilities with 10g R2
  • Known ASM issues require 10.0.1.4, not earlier

12
Testing Methodology Steps 1 A-C
13
Testing Methodology Steps 1 D-E
14
Step 1B - Optimize Linux Kernel
15
Step 1C - Optimize Oracle Binaries
16
Step 1C - Optimize Oracle SPFILE
17
Step 1D Find Per Node Sweet Spot
18
Sweet Spot Lessons Learned
  • Cannot solely rely on BMF transactions per second
    graph
  • Can still be increasing throughput while
    beginning to trash
  • Need to monitor database server with vmstat and
    other tools
  • Must stop just shy of bandwidth challenges (RAM,
    CPU, IO)
  • Must factor in multi-node overhead, and reduce
    accordingly
  • Prior to 10g R2, better to rely on app (BMF) load
    balancing
  • If youre not careful on this step, youll run
    into roadblocks which either invalidate your
    results or simply cannot scale!!!

19
Testing Methodology Steps 2 A-C
20
Step 2C Run OLTP Test per Node
21
Some Speed Bumps Along the Way
As illustrated below when we reached our four
node tests we did identify that CPUs on node
racdb1 and racdb3 reached 84 and 76
respectively. Analyzing the root cause of the
problem it was related to temporary overload of
users on these servers, and the ASM response time.
22
Some ASM Fine Tuning Necessary
23
Smooth Sailing After That
As shown below, the cluster level latency charts
from Spotlight on RAC during our eight node run.
This indicated that the interconnect latency was
well within expectations and in par with any
industry network latency numbers.
24
Full Steam Ahead!
As shown below, ASM was performing excellently
well at this user load. 10 instances with over
5000 users indicated an excellent service time
from ASM, actually the I/Os per second was
pretty high and noticeably good - topping over
2500 I/Os per second!
25
Final Results
Other than some basic monitoring to make sure
that all is well and the tests are working,
theres really not very much to do while these
tests run so bring a good book to read. The
final results are shown below.
26
Interpreting the Results
27
Projected RAC Scalability
Using the 6 node graph results to project
forward, the figure below shows a reasonable
expectation in terms of realizable scalability
where 17 nodes should equal nearly 500 TPS and
support about 10,000 concurrent users.
28
Next Steps
  • Since first iteration of test we were limited by
    memory, we upgraded each database server from 4
    to 8 GB RAM
  • Now able to scale up to 50 more users per node
  • Now doing zero percent paging and/or swapping
  • But now CPU bound
  • Next step, replace each CPU with a dual-core
    Pentium
  • Increase from 4 CPUs (2-real/2-virtual) to 8
    CPUs
  • Should be able to double users again ???
  • Will we now reach IO bandwidth limits ???
  • Will be writing about those results in future
    Dell articles

29
Conclusions
30
Questions
Thanks for coming ?
Write a Comment
User Comments (0)
About PowerShow.com