Title: ASE133: Performance Tuning of ASE with special emphasis on Linux
1ASE133 Performance Tuning of ASE with special
emphasis on Linux
Girish VaitheeswaranStaff Software
EngineerSybase, Inc. girish_at_sybase.com
2Contents
- No Introduction to Linux
- System Performance 101
- De-mystifying Processors
- Lets not forget Memory
- Reading and Writing Disk I/O
- Sending and Receiving Network I/O
- Conclusion
3No Introduction to Linux
- Does Linux need an introduction!! ?
- Non Microkernel based architecture (monolithic
kernel) - Multitasking, Secure virtual memory OS
- Different flavors of Linux available
- Redhat, Suse, Redflag etc
- Supported on various hardware platforms.
- Intel (Xeon, Pentium, Itanium), AMD (Athlon),
IBM, SGI, SUN etc - Importantly, appealing to Bean Counters
4Linux Trends
- Consolidation from big boxes to commodity
hardware - Migration to 4CPU P3s with 4/8GB RAM running
some flavor of Linux - Exploiting Moores law
- Moving from 750Mhz CPUs to 1.5GHz CPUs with
hyperthreading - Sizing based on Increased clock speeds.
5ASE on linux
- ASE on Linux supported since 11.0.3.3
- ASE 12.5.0.3 supports RH2.1
- ASE 12.5.1 will support RH 3.0
- SuSE/RedFlag supported as well.
- More info available at www.sybase.com/linux/ase
6System Performance 101
- How do I get my hardware and software to do more
work without buying more hardware ? - System Performance mainly depends on 3 areas
7De-mystifying Processors
- Classes of processors
- Xeon with HT Technology
- Pentium with HT Technology
- What is hyper threading
- Doing more work in each clock cycle by providing
thread level parallelism in each processor - Advantages of hyper threading
- Support for multi-threaded code and multi-tasking
operations through better utilization of
processor resources. - Multiple threads/tasks running simultaneously to
increase the number of transactions that can be
executed. - Improved reaction and response times for end
users. - Increased number of users a server system can
support.
8Consolidation
- Consolidation Guidelines
- One 1.5GHz CPU does not necessarily yield the
same performance as two 750MHz CPUs. Various
parameters to account for are - L1/L2/L3 cache sizes (Internal CPU caches)
- Memory Latencies
- Cycles per instruction
- Number of engines to run
- On HT enabled processors 1 physical CPU can run 2
engines - Having a HT enabled processor is not equivalent
to having 2 physical processors
9Processor Tips
- Identifying Number of processors/Clock
speed/Hyper threading - cat /proc/cpuinfo
Processor 0
Vendor_id GenuineIntel
Cpu family 15
Model name Intel Xeon CPU 2.8GHz
Cache size 512KB
Flags fpu sse ht
Processor 1
Vendor_id GenuineIntel
Cpu family 15
Model name Intel Xeon CPU 2.8GHz
Cache size 512KB
Flags fpu sse ht
10Processor Tips
- Determining cpu usage (mpstat, top, vmstat)
- mpstat 5 5
010333PM CPU user nice system idle Intr/s
010333PM All 83.40 0.00 11.80 4.80 1485.
010338PM All 81.20 0.00 14.20 4.60 1433
010343PM All 83.40 0.00 11.60 5.00 1450
010348PM All 84.60 0.00 10.40 5.00 1457
010353PM All 82.60 0.00 13.40 4.00 1429
010358PM all 83.04 0.00 12.28 4.68 1451
11Processor Tips
- Enabling Hyperthreading
- Enabled by default in most processors
- Note that if ht is shown in the cat
/proc/cpuinfo output does not mean HT is enabled. - Disabling Hyperthreading
- Can be disabled during BIOS setup
- This has been useful in cases where there are
multiple engines and the load on the cpus is
close to 100
12ASE Engines Unleashed
- ASE Engines are Linux processes that schedule
tasks - ASE Engines are multi-threaded
- ASE Performs automatic load balancing of tasks
- ASE has automatic task affinity management
- Tasks tend to run on the same engine that they
last ran on to improve locality - Linux does not have an ability to explicitly bind
engines to processors. (Not Yet) - RH AS 2.1 has built in process-processor affinity
- Internal benchmarks have demonstrated ASEs
ability to scale to 64 engines.
13ASE Engines Unleashed
- 2 configuration parameters control the number of
engines -
- The sp_engine stored procedure can be used to
online or offline engines dynamically - Tune the number of engines based on the Engine
Busy Utilization values presented by sp_sysmon - Extra dataserver threads RH 7.2 only
- For Posix aio support
processors max online engines 4 number of engines at startup 2
14Monitoring and Tuning Engines
- sp_sysmons kernel section reports utilization as
shown - sp_sysmon 000200, kernel
-
Engine busy Utilization CPU Busy I/O Busy Idle
Engine 0 97.3 3.7 0.0
Engine 1 97.5 3.5 0.0
Average 97.4 3.6 0.0
15Monitoring and Tuning Engines
- Influencing kernel utilization
- CPU bound tasks
- I/O bound tasks
- Tuning runnable process search count
- I/O polling process count
16Logical Process Management
- Logical process management can be used to
influence the priority of tasks or to do load
balancing by using engine groups. - E.g.Housekeeper tuning for aggressive garbage
collection
Identify spid for HK GC sp_who --------------------------------------------- 7 sleeping HK GC Set the garbage collection to high priority sp_setpsexe 7, priority, high Validate the priority has been set correctly sp_showpsexe spid current_priority ----------------------------------- 7 HIGH
17Logical Process Management
- I/O bound tasks and cpu bound tasks can be
balanced by using engine groups. - E.g. Mixed work load scenario running a resource
hogging reporting application and an Online
reservation at the same time. - Step1 Create 2 engine groups and associate
engines to engine groups -
exec sp_addengine 0, onl_reservation_engroup exec sp_addengine 1, onl_reservation_engroup exec sp_addengine 2, reporting_engroup exec sp_addengine 3, reporting_engroup
18Logical Process Management
- Step 2 Display information about execution
objects - exec sp_showcontrolinfo
-
- Step 3 Create 2 execution classes
onl_reservation_execlass and reporting_execlass - exec sp_addexeclass onl_reservation_execlass,
MEDIUM, 0, onl_reservation_engroup - exec sp_addexeclass reporting_execlass, MEDIUM,
0, reporting_engroup -
Type Engines Engine_group
EG 0 Onl_reservation_engroup
EG 1 Onl_reservation_engroup
EG 2 Reporting_engroup
EG 3 Reporting_engroup
19Logical Process Management
- Step 4 Bind application logins to the respective
execution class - exec sp_bindexeclass onl_sa, LG, NULL,
onl_reservation_execlass - exec sp_bindexeclass reporting_sa, LG, NULL,
reporting_execlass - Step 5 Validate binding information
- exec sp_showexeclass
-
Classname Priority Engine_group Engines
onl_reservation_execlass MEDIUM onl_reservation_engroup 0,1
reporting_execlass MEDIUM reporting_engroup 2,3
20Some more Engine related Tunes
- Runnable process search count determines the
number of times ASE engines loop looking for
runnable tasks before yielding to the OS. - Default value is good in general
- Tune this parameter only if all of the below are
true - There are multiple applications running on the
same machine and you require ASE to yield to the
OS so that the other applications can be
scheduled - The average cpu busy utilization is lt 5
21Some more Engine related Tunes
- I/O polling process count determines the number
of processes ASE runs before checking for Network
or Disk I/O. - Tune this parameter only if all the following
conditions are met - Increase the value if the total I/O checks is
very high and the Avg Disk I/Os per check or Avg
Net I/Os per check is very low. - If the avg cpu utilization is between 70-90
22Lets take a Checkpoint
- Hyperthreading is not equivalent to having a
physical processor - Just clock speed does not give performance
- Add more engines if Engine busy utilization is
high - Logical process management for priority
scheduling and mixed workloads
23Lets not forget Memory
- Memory is a very critical parameter to obtain
overall system performance - Every disk I/O saved is performance gained.
- Tools to monitor and manage memory
24Using Large Memory
- Users can use 2.7G of memory out of the box by
just changing max memory parameter in ASE
12.5.1
Physical Memory max memory 1380000 allocate max shared memory 1
Key Shmid Owner Perms Bytes nattach
0xac03b5d9 7208963 girish 600 2126512128 3
0xac03b5da 7241732 girish 600 699727872 3
Note that 2 shared memory segments have been
created one for 1.98G and one for 667M
25Using Large memory 12.5.0.3 and below
- In ASE 12.5.0.3 and below, to use Large memory on
Linux do the following - Max configurable shared memory
- 2.7GB addressable memory
Add in rc.local echo ltshm in bytesgt gt /proc/sys/kernel/shmmax Or add the following line in /etc/sysctl.conf kernel.shmmaxltmax_memory_in_bytesgt /sbin/sysctl p
In RUN_SERVER add the following echo 268435456 gt /proc//mapped_base
26Monitoring Memory
- View Memory parameters in kb
- free k
- cat /proc/meminfo
Total Used Free Shared Buffers cached
5855528 2596348 3259180 1979904 179260 120084
MemTotal 5855528 kB MemFree 3256604 kB MemShared 1979904 kB Buffers 179580 kB Cached 122636 kB SwapCached 0 kB Active 1988108 kB Inact_dirty 293532 kB Inact_clean 480 kB
27Configuring ASE memory
- sp_configure max memory to tune memory
configured for ASE. Dynamic option since 12.5 - Tune this parameter based on ASE resource
requirements - Remaining memory does not go to default data
cache starting ASE 12.5 - Do I have extra memory ?
sp_configure memory .. An additional 6020480 K bytes of memory is available for reconfiguration. This is the difference between 'max memory' and 'total logical memory'.
28Monitoring and Tuning ASE Parameters
- To tune various ASE memory parameters
- sp_monitorconfig all
-
- If Reused column has yes watch out.
Name Num_free Num_active Pct_act Max_used Reused
Additional network memory 2611200 0 0.00 0 NA
Number of open indexes 15 480 96.96 510 Yes
Number of open objects 491 9 1.80 9 No
29Monitoring and Tuning ASE Parameters
Config parameter How to tune Applications
Procedure cache size sp_monitorconfig, sp_sysmon Triggers/Stored procedures
Heap memory per user sp_monitorconfig Wide columns
Number of sort buffers Size of tables involved Create index, sort, group by, order by
Number of locks sp_monitorconfig Row locks, high throughput OLTP
Number of large I/o buffers sp_monitorconfig create database/alter database
Number of worker processes sp_monitorconfig Parallel Sort/Parallel Query
Additional network memory sp_monitorconfig Text/Image
30Memory Tuning Tips
- Lock Shared memory
- Guarantees shared memory to be in RAM
- Improves performance
- Tune through sp_configure interface
-
- Static option
- Validate through message in errorlog
- 112308.33 kernel Locking shared memory into
physical memory.
sp_configure lock shared memory, 1
31Named Caches
- Sizing Caches key to improved performance
- Cache Partitions improve performance and
scalability - How ?
- Create the Named cache with required size
- Bind the cache to the hot table
sp_cacheconfig stock_cache, 5M 0200000000122003/07/10 103645.18 server Size of the 2K memory pool 5120 Kb The change is completed. The option is dynamic and the SQL Server need not be rebooted for the change to take effect.
sp_bindcache "stock_cache","production","stock"
32Named Caches
- What to bind
- Transaction Log
- Tempdb
- Hot objects
- Hot indexes
- When to use Named caches
- sp_sysmon Data Cache Management section reports
gt 10 spinlock contention - sp_sysmon provides the recommendation to do so.
- Hot lookup tables, frequently used indexes,
tempdb activity, high transaction throughput
applications are all good scenarios for using
named caches. - How to determine what is hot ??
- Cache Wizard
33Cache Wizard
- A new option to sp_sysmon cache wizard has been
added in 12.5.1 to help in - Identifying hot objects in a cache
- Evaluating effectiveness of Large buffer pools
- Sizing data caches.
- Evaluating effectiveness of APF
34Cache Wizard Usage
- Usage
- sp_sysmon interval , cache wizard , top_N ,
filter - Ranking Criterion
- LogicalReads / sec
- Always in decreasing order of PhysicalReads / sec
- Filter clause
- Caches containing filter pattern
35Cache Wizard Examples
- sp_sysmon 000500, cache wizard, 2,
default data cache - default data cache
- Buffer Pool Information
-
- Object Statistics Cache Occupancy
Information -
Run Size 100 Mb Usage 80 LR/sec 2500 PR/sec 1500 Hit 40
IO Size Wash Run Size APF LR/Sec PR/Sec Hit APF Eff Usage
4 Kb 17202 Kb 16 Mb 10 800 100 87.50 75 80
2 Kb 3276 Kb 84 Mb 10 1700 1400 17.65 20 80
Obj_Size Size in Cache Obj_Cache Cache_Occp
102400 Kb 40960 Kb 40 40
2048 Kb 1024 Kb 50 1
Object LR/sec PR/sec Hit
db.dbo.cost_cutting 1800 1150 36.11
db.dbo.emp_perks 500 200 60.00
36Cache Wizard Recommendations
- Identifying hot objects
- default data cache
- Object Statistics Cache Occupancy
Information - If Cache Hit is low
- For each Object
- If LR/sec is high and Obj hit is low, move
object to a new cache - OR add memory to the cache.
Run Size 100 Mb Usage 88.40 LR/sec 2500 PR/sec 1500 Hit 40
Object LR/sec PR/sec Hit
master.dbo.cost_cutting 1800 1150 36.11
master.dbo.emp_perks 500 200 60.00
master.dbo.emp_info 100 90 10.00
master.dbo.revenue 100 60 40.00
Obj_Size Size in Cache Obj_Cache Cache_Occp
102400 Kb 40960 Kb 40 40
2048 Kb 1024 Kb 50 1
8192 Kb 1024 Kb 12.5 1
16384 Kb 8192 Kb 50 8
37Cache Wizard Recommendations
- Effectiveness of large buffer pools, apf
- default data cache
- Buffer Pool Information
-
- If Pool usage is high and Pool Hit is low, add
memory to the buffer pool - APF effectiveness provides information on how
many pages brought in on account of APF got used. - If Pool hit is low and APF effectiveness is
high, then consider increasing APF percentage.
Run Size 100 Mb Usage 88.40 LR/sec 2500 PR/sec 1500 Hit 40.00
IO Size Wash Run Size APF LR/Sec PR/Sec Hit APF Eff Usage
4 Kb 17202 Kb 16 Mb 10 800 100 87.50 75 80
2 Kb 3276 Kb 84 Mb 10 1700 1400 17.65 20 90
38Cache Partitions
- Cache Partitions help improve scaling
- Decomposes the cache spinlock
- Recommendation is to use as many cache partitions
as there are engines. - How
-
sp_cacheconfig "order_index_cache", "5M", "cache_partition2 104420.85 server Size of the 2K memory pool 2560 Kb 104420.85 server Memory allocated for the order_index_cache cachelet 1 2560 Kb 104420.85 server Size of the 2K memory pool 2560 Kb 104420.85 server Memory allocated for the order_index_cache cachelet 2 2560 Kb The change is completed. The option is dynamic and the SQL Server need not be rebooted for the change to take effect.
39Named Caches Vs Cache Partitions
- Which one should I use ?
- Named caches
- Easily identifiable hot objects, indexes
- Transaction Log
- Tempdb
- Cache Partitions
- Complex applications with many objects
- Named cache with heavy spinlock contention
- Both
- Best fit is to have named caches with cache
partitions
40Named Caches Bottom Line
79
41Lets take a Checkpoint
- Every disk I/O saved is performance gained
- sp_monitorconfig to tune procedure cache, worker
threads etc - Named caches help improve performance and
scalability - Cache partitions Named Caches best combination
- If application has large number of objects have
as many cache partitions as engines - Tempdb, transaction log, hot indexes, hot objects
are ideal candidates
42Reading n Writing Disk I/O
- I/O avoided is Performance Gained
- ASE buffer cache has algorithms to
avoid/delay/Optimize I/Os whenever possible - LRU replacement
- MRU replacement
- Tempdb writes delayed Improved select into
performance - Write ahead logging Only Log is written
immediately - Group commit to batch Log writes
- Coalescing I/O using Large buffer pools
- UFS support
- Raw devices and File systems supported
- Asynchronous I/O is supported
43Raw Devices
- Raw devices provide exceptional write performance
and good read performance - Recommended for Transaction Log
44File Systems
- File System Caching can be effectively used to
improve performance (especially reads). - File Systems as Secondary cache for ASE
- Enables ASE to use gt 2.7GB
- Very useful as pages not fitting in ASE cache are
accommodated in FS Cache - Helps avoid expensive disk I/O
- Many File System Flavors on Linux
- extfs, xfs. IBM's JFS and the Reiserfs
- Recommended file system
- EXT2
- EXT3 with journaling disabled
45What file system to use
- EXT3 with journaling disabled
24
46File system vs Raw devices
- When to use File system
- Frequent reads
- Infrequent writes
- E.g. tempdb WITH DSYNC off using
sp_deviceattr stored procedure - 1gt sp_deviceattr tmpdbdev","dsync","false"
- 2gt go
- 'dsync' attribute of device tmpdbdev' turned
'off'. Restart Adaptive Server for the - change to take effect.
- When to use Raw devices
- Frequent writes
- Infrequent reads
- E.g Transaction log
- How does one compare against the other ?
47And the winner is.
- Bottom line Use mix of File System and Raw
devices.
60
48Asynchronous I/O
- Enables ASE to service user tasks after I/O is
issued - The recommended scheme for doing I/O
- AIO supported on Raw devices and File Systems on
Linux - Enabled by default
- Posix aio and Kernel Supported AIO RH AS 2.1
are supported - What should I use ??
49And the Winner is
95
50Asynchronous I/O tunes
- fs.aio.max-size specifies the maximum block size
performed by one aio read or aio write - For optimal create database, alter database
performance this should be tuned to 1048676(1MB) - To tune this parameter
Add the following in /etc/rc.local echo 1048576 gt /proc/sys/fs/aio-max-size Or Add fs.aio-max-size1048576 in /etc/sysctl.conf and run /bin/sysctl -p
51Disk Tuning Recommendations
- More disks in an array improve I/O parallelism
- Stripe size for OLTP applications
- Data 64K
- Log 8K
- Service Times
- Data 12-15 ms
- Log 2-3 ms
52Disk Tuning Recommendations
- RAID 5 for data
- Independent Data disks with distributed parity
blocks - Space utilization is lesser than 01
- Read Throughput is high
- Reliability is high
- Write Ahead Logging helps delaying data writes
- RAID 01 for Transaction Log
- RAID 0 for the striping
- RAID 1 for mirroring
- Very reliable
- High write throughput
53I/O Commands
- iostat for device statistics.
- iostat -d -x /dev/sdb1 /dev/sdc1 /dev/sdd1 5 5
- Linux 2.4.9-e.3 (vayu) 06/11/2003
Device Rrqm/s Wrqm/s R/s W/s Avrq-sz Avgqu-sz Await Svctm util
Sdb1 0.00 0.20 86.60 68.80 8.04 11.58 74.13 34.48 59.80
Sdc1 0.00 0.20 24.40 48.80 8.02 3.86 52.73 38.25 28.00
Sdd1 0.00 0.20 26.80 45.00 8.02 3.42 47.63 36.77 26.40
54Linux I/O Commands
- Identify file system types
- /sbin/mount
- Create an ext3 file system
- /sbin/mke2fs b lt2K4Kgt -j /dev/sdg2
- View existing raw devices
- raw -qa
- Create a raw device
- raw /dev/raw/raw1 /dev/sdb1
- And add the following in /etc/sysconfig/rawdevices
- /dev/raw/raw1 /dev/sdb1
- Tweaking the file system options
- mount -t ext3 -o rw,noatime,datawriteback
/dev/sde1 /tpcc
55ASE Disk Monitoring
- sp_sysmon 000001, diskio
- Device
- /tpcc_2/data/data1.dat
-
Data1 Per sec Per xacts Count of total
Reads
APF 0.0 0.0 0 0.0
Non-APF 0.4 0.1 2 0.3
Writes 154.4 22.1 772 99.7
Total I/O 154.8 22.1 774 23.9
56Lets take a Checkpoint
- Use Kernel Supported aio
- Use Mixture of File systems and Raw devices
- Raid 01 for transaction log, RAID 5 for data
devices - High Speed device for transaction log and tempdb
- I/O parallelism by having enough disks per array
- 64KB stripe for data and 8KB stripe for
transaction log
57Sending and Receiving Network I/O
- ASE is capable of handling thousands of users
with good throughput and response time - 100BASE-T and Gigabit ethernet cards help improve
network throughput and response time - Full duplex for simultaneous reading and writing
to the network is recommended - One could also use multiple NIC cards for load
balancing
58Load Balancing using Multiple NICs
- Client Machine 1
-
- query ipaddr1 port1
Server Machine master ipaddr1 port1 master
ipaddr2 port2 query ipaddr1 port1
Serviced by listener1
Client Machine 2 query ipaddr2 port2
Serviced by listener2
59Tuning ASE Network I/O
- sp_sysmon Network I/O Management section
provides information on ASE network performance - sp_configure send doneinproc tokens, 0 send
doneinproc tokens only for select statements. - Network packet size tuning both on the client
side and on the server side. - Server Side
- default network packet size
- max network packet size
- Client Side
- -A option for isql
- CS_PACKETSIZE using ct-library
- PACKETSIZE property using JConnect
- The default value for tcp no delay of 1 should
improve network performance.
60Network Commands
- netstat -ts
- Optimize tcp performance by using port numbers
between 1024 and 65000 - echo 1024 65000 gt /proc/sys/net/ipv4/ip_local_p
ort_range - Some more tcp tunes to improve performance
- echo 0 gt /proc/sys/net/ipv4/tcp_sack
- echo 0 gt /proc/sys/net/ipv4/tcp_timestamps
- echo 0 gt /proc/sys/net/ipv4/tcp_window_scaling
- Full duplex information
- cat /proc/net/nicinfo/eth0.info grep Duplex
- Duplex full
61Lets take a Checkpoint
- 100 BASE-T and Giga bit ethernet help improve
network throughput - Large packet sizes for bcp, text/image or for any
other large data transfer needs - Setting send doneinproc tokens to 0 helps improve
network performance