Title: Oracle 10g RAC Scalability
1Oracle 10g RAC Scalability Lessons Learned
Bert Scalzo, Ph.D. Bert.Scalzo_at_Quest.com
2About the Author
- Oracle Dev DBA for 20 years, versions 4 through
10g - Worked for Oracle Education Consulting
- Holds several Oracle Masters (DBA CASE)
- BS, MS, PhD in Computer Science and also an MBA
- LOMA insurance industry designations FLMI and
ACS - Books
- The TOAD Handbook (March 2003)
- Oracle DBA Guide to Data Warehousing and Star
Schemas (June 2003) - TOAD Pocket Reference 2nd Edition (June 2005)
- Articles
- Oracle Magazine
- Oracle Technology Network (OTN)
- Oracle Informant
- PC Week (now E-Magazine)
- Linux Journal
- www.Linux.com
3About Quest Software
Used in this paper
4Project Formation
- This paper is based upon collaborative RAC
research efforts between Quest Software and Dell
Computers. - Quest
- Bert Scalzo
- Murali Vallath author of RAC articles and books
- Dell
- Anthony Fernandez
- Zafar Mahmood
- Also an extra special thanks to Dell for
allocating a million dollars worth of equipment
to make such testing possible ?
5Project Purpose
- Quest
- To partner with a leading hardware vendor
- To field test and showcase our RAC enabled
software - Spotlight on RAC
- Benchmark Factory
- TOAD for Oracle with DBA module
- Dell
- To write a Dell Power Edge Magazine article about
the OLTP scalability of Oracle 10g RAC running on
typical Dell servers and EMC storage arrays - To create a standard methodology for all
benchmarking of database servers to be used for
future articles and for lab testing
demonstration purposes
6OLTP Benchmarking
TPC benchmark (www.tpc.org) TPC Benchmark C
(TPC-C) is an OLTP workload. It is a mixture of
read-only and update intensive transactions that
simulate the activities found in complex OLTP
application environments. It does so by
exercising a breadth of system components
associated with such environments, which are
characterized by The simultaneous execution of
multiple transaction types that span a breadth of
complexity On-line and deferred transaction
execution modes Multiple on-line terminal
sessions Moderate system and application
execution time Significant disk input/output
Transaction integrity (ACID properties)
Non-uniform distribution of data access through
primary and secondary keys Databases consisting
of many tables with a wide variety of sizes,
attributes, and relationships Contention on
data access and update
Excerpt from TPC BENCHMARK C Standard
Specification, Revision 3.5
7Create the Load - Benchmark Factory
The TPC-C like benchmark measures on-line
transaction processing (OLTP) workloads. It
combines read-only and update intensive
transactions simulating the activities found in
complex OLTP enterprise environments.
8Monitor the Load - Spotlight on RAC
9Hardware Software
Servers, Storage and Software Oracle 10g RAC Cluster Servers 10 x 2-CPU Dell PowerEdge 1850 3.8 GHz P4 processors with HT 4 GB RAM (later expanded to 8GB RAM) 1 x 1 Gb NICs (Intel) for LAN 2 x1 Gb LOM teamed for RAC interconnect 1 x two port HBAs (Qlogic 2342) DRAC RHEL AS 4 QU1 (32-bit) EMC PowerPath 4.4 EMC Navisphere agent Oracle 10g R1 10.1.0.4 Oracle ASM 10.1.0.4 Oracle Cluster Ready Services 10.1.0.4 Linux bonding driver for interconnect Dell OpenManage
Servers, Storage and Software Benchmark Factory Servers 2 x 4-CPU Dell PowerEdge 6650 8 GB RAM Windows 2003 server Quest Benchmark Factory Application Quest Benchmark Factory Agents Quest Spotlight on RAC Quest TOAD for Oracle
Servers, Storage and Software Storage 1 x Dell EMC CX700 1 x DAE unit total 30 x 73GB 15K RPM disks Raid Group 1 16 disks having 4 x 50GB RAID 1/0 LUNs for Data and backup Raid Group 2 10 disks having 2 x 20GB RAID 1/0 LUNs for Redo Logs Raid Group 3 4 disks having 1 x 5 GB RAID 1/0 LUN for voting disk, OCR, and spfiles 2 x Brocade SilkWorm 3800 Fibre Channel Switch (16 port) Configured with 8 paths to each logical volume Flare Code Release 16
Servers, Storage and Software Network 1 x Gigabit 5224 Ethernet Switches (24 port) for private interconnect 1 x Gigabit 5224 Ethernet switch for Public LAN Linux binding driver used to team dual onboard NICs for private interconnect
10(No Transcript)
11Setup Planned vs. Actual
- Planned
- Redhat 4 Update 1 64-bit
- Oracle 10.2.0.1 64-bit
- Actual
- Redhat 4 Update 1 32-bit
- Oracle 10.0.1.4 32-bit
- Issues
- Driver problems with 64-bit (no real surprise)
- Some software incompatibilities with 10g R2
- Known ASM issues require 10.0.1.4, not earlier
12Testing Methodology Steps 1 A-C
13Testing Methodology Steps 1 D-E
14Step 1B - Optimize Linux Kernel
15Step 1C - Optimize Oracle Binaries
16Step 1C - Optimize Oracle SPFILE
17Step 1D Find Per Node Sweet Spot
18Sweet Spot Lessons Learned
- Cannot solely rely on BMF transactions per second
graph - Can still be increasing throughput while
beginning to trash - Need to monitor database server with vmstat and
other tools - Must stop just shy of bandwidth challenges (RAM,
CPU, IO) - Must factor in multi-node overhead, and reduce
accordingly - Prior to 10g R2, better to rely on app (BMF) load
balancing - If youre not careful on this step, youll run
into roadblocks which either invalidate your
results or simply cannot scale!!!
19Testing Methodology Steps 2 A-C
20Step 2C Run OLTP Test per Node
21Some Speed Bumps Along the Way
As illustrated below when we reached our four
node tests we did identify that CPUs on node
racdb1 and racdb3 reached 84 and 76
respectively. Analyzing the root cause of the
problem it was related to temporary overload of
users on these servers, and the ASM response time.
22Some ASM Fine Tuning Necessary
23Smooth Sailing After That
As shown below, the cluster level latency charts
from Spotlight on RAC during our eight node run.
This indicated that the interconnect latency was
well within expectations and in par with any
industry network latency numbers.
24Full Steam Ahead!
As shown below, ASM was performing excellently
well at this user load. 10 instances with over
5000 users indicated an excellent service time
from ASM, actually the I/Os per second was
pretty high and noticeably good - topping over
2500 I/Os per second!
25Final Results
Other than some basic monitoring to make sure
that all is well and the tests are working,
theres really not very much to do while these
tests run so bring a good book to read. The
final results are shown below.
26Interpreting the Results
27Projected RAC Scalability
Using the 6 node graph results to project
forward, the figure below shows a reasonable
expectation in terms of realizable scalability
where 17 nodes should equal nearly 500 TPS and
support about 10,000 concurrent users.
28Next Steps
- Since first iteration of test we were limited by
memory, we upgraded each database server from 4
to 8 GB RAM - Now able to scale up to 50 more users per node
- Now doing zero percent paging and/or swapping
- But now CPU bound
- Next step, replace each CPU with a dual-core
Pentium - Increase from 4 CPUs (2-real/2-virtual) to 8
CPUs - Should be able to double users again ???
- Will we now reach IO bandwidth limits ???
- Will be writing about those results in future
Dell articles
29Conclusions
30Questions
Thanks for coming ?