John C' Calvin - PowerPoint PPT Presentation

1 / 63
About This Presentation
Title:

John C' Calvin

Description:

... clearly and up-front... 64-bit JVM can serve 570,000 pages/hr. One 64-bit JVM can run ... demands memory from the host operating system, and manages that ... – PowerPoint PPT presentation

Number of Views:212
Avg rating:3.0/5.0
Slides: 64
Provided by: ErinB63
Category:
Tags: calvin | front | hosting | john | page | web

less

Transcript and Presenter's Notes

Title: John C' Calvin


1
Tuning the Java Virtual Machine for Stability and
Speed
  • John C. Calvin
  • Senior Systems Architect
  • University of Toronto

2
Toronto, Ontario CANADA (YYZ)
  • population 5.6 million
  • about the population of Chicago
  • area 2,751 mi2 (7,125 km2)
  • 4 universities 7 colleges

3
University of Toronto
  • 3 Campuses in the Greater Toronto Area
  • Downtown 52,296 students 175 acres
  • Eastern 10,465 students 300 acres
  • Western 10,924 students 200 acres
  • 1.4 Billion annual operating budget
  • Degree Programs
  • 840 undergraduate
  • 520 graduate
  • 75 doctoral
  • 14,500 annual undergraduate intake
  • 10,000 students in residence
  • 247 buildings
  • 200 acres net assignable space
  • 7,591 parking spaces

4
University of Toronto
  • Students
  • 55,352(46,940 FTEs) undergraduate degree-seeking
    students
  • 13,702 (12,499 FTEs) graduate degree-seeking
    students
  • 2,038 (902 FTEs) certificate, diploma and special
    students
  • 2,593 residents and post-graduate medical
    students
  • International students
  • 5,182 undergraduate degree seeking students
  • 1,579 graduate degree seeking students
  • 326 certificate, diploma and special students
  • 779 residents and post-graduate medical students
  • Faculty
  • 2,260 (2,185 FTEs) Professorial
  • 432 (378 FTEs) Teaching Stream
  • 1,079 (216 FTEs) Term-limited Sessional and
    Stipendiary
  • 3,913 (2,351 FTEs) Clinical
  • 2,707 (1,071 FTEs) Other

5
Agenda
  • Opening Remarks
  • System Operational Goals
  • The Java Runtime Environment
  • Generations and Memory Spaces
  • Garbage Collection Times and Algorithms
  • Fork() and exec() under Linux and Solaris

6
My bias stated clearly and up-front
  • Operating system virtualization multiplies
    operating system overhead.
  • Tomcat clustering multiplies Java Virtual Machine
    (JVM) overhead.
  • Both OS virtualization and Tomcat clustering
    multiply the configuration complexity.

7
The Basis of the Work
  • A fast JVM is only good, if its stable.
  • A big JVM is only good, if its stable.
  • Big fast JVMs can be stable.
  • The Sun 64-bit JVM v1.6.x is very stable.
  • Blackboard is a large Java application.
  • A big, fast JVM must be good for Blackboard.
  • Our JVMs were not stable.

8
System Operational Goals
  • Survive the worst-case single-user demand, placed
    on a single JVM by any administrative, faculty,
    or student use-cases, without JVM failure.
  • Increase the stability of the JVM.
  • Increase the performance of the JVM.
  • Reduce overall configuration complexity.
  • Fully exploit application server resources.
  • Reduce the cost of Blackboard hardware.

9
The Results
  • One 64-bit JVM can serve 60,000 actives.
  • One 64-bit JVM can serve 2000 users.
  • One 64-bit JVM can serve 570,000 pages/hr.
  • One 64-bit JVM can run 900 threads.
  • One application server was enough.
  • Two servers offer hardware redundancy.

10
(No Transcript)
11
Java Runtime Environment
  • The JRE is an application running under a host
    operating system Windows, Linux, Solaris, etc.
  • The JRE is written in C and has been compiled
    specifically for each operating system.
  • The JRE must interpret or compile the Java code
    in Blackboard in order to run it.
  • The JRE demands memory from the host operating
    system, and manages that memory as its heap
    memory spaces, where it stores Java objects
    created as the running JVM threads instantiate
    them.

12
JVM Heap Size
  • Upper Bound (eg. 16GB)
  • bbconfig.max.heapsize.tomcat16g
  • A.K.A. JVM Command-line option Xmx16g
  • The size at which the adaptive JVM memory sizing
    algorithm will not increase the size of the heap.
  • Lower Bound (eg. 4GB)
  • bbconfig.min.heapsize.tomcat4g
  • A.K.A. JVM Command-line option Xms4g
  • The size below which the adaptive JVM memory
    sizing algorithm will not reduce the size of the
    heap.

13
Downward Pressures on JVM max.heapsize
  • Amount of installed physical memory
  • Platform choice
  • AMD, Intel, SPARC, speed, of cores, etc.
  • Operating System choice
  • Windows, Linux, Solaris
  • JVM choice
  • 32-bit vs. 64-bit, 1.5.x or 1.6.x
  • Garbage Collection (GC) algorithm choice
  • Concurrent Mark Sweep (CMS) vs. ParallelOld
  • Max. tolerable Stop-the-World times
  • Headroom for other applications, OS, etc.
  • fork()/exec() swap-space/memory demands

14
Upward Pressures on JVM max.heapsize
  • Worst-case application use cases
  • Largest GradeCenter ( of columns enrollment)
  • Unqualified searches (course catalog, users)
  • Desired number of concurrent users
  • MaxThreads 10MB (approx. measured)
  • Idle JVM Heap low-water mark after GC

15
What Happens When
  • common use-cases exhaust the heap?
  • admin use-cases exhaust the heap?
  • a spike in demand exhausts max.threads?
  • the required number of concurrent users cannot
    be supported by the max.heapsize?

16
Solving Memory Problems Solution 1
  • Reduce the applications demand
  • Reduce the size of organizations
  • Reduce course enrolment sizes
  • Reduce number of columns in GradeCentre
  • Reduce the size of course catalog
  • Reduce the number of users accounts

17
Solving Memory Problems Solution 2
  • Reduce the number of concurrent users
  • Load-balance across multiple servers
  • Add more physical application servers
  • More pizza-boxes, power, A/C
  • Virtualize the application servers
  • Solaris Zones, Logical Domains, VMWare, etc.
  • Network load-balance between them
  • Hardware, Round-robin DNS, etc.
  • Load-balance across multiple JVMs
  • Tomcat Clustering

18
Solving Memory Problems Solution 3
  • Increase the JVMs heap size.
  • Install more physical memory
  • Too much memory has rarely caused a big problem
  • Run a 64-bit JVM and configure a large heap
  • 2GB or larger heap requires 64-bit JVM
  • Tune Garbage Collection algorithms
  • JVM 1.6.0_13 required
  • -XXAlwaysPreTouch
  • Eliminate the Java Runtime.exec method
  • Some Blackboard code changes required

19
Reasons to Avoid a Large Heap
  • Because its a bad idea.
  • Because nobody else is doing it.
  • Because Blackboard doesnt support it.
  • Because you cant run a 64-bit JVM.
  • Because you dont know how to do it.
  • Because you havent enough memory.
  • Because you arent using the CMS collector.
  • Because you just dont want to break it.

20
Reasons to Embrace a Large Heap
  • Because you have infinite GC loops.
  • Because you get out-of-memory errors.
  • Because JVMs are dying for no reason.
  • Because youre already running a 64-bit JVM.
  • Because it just feels like the right solution.
  • Because youve got to try something else.
  • Because youre lazy like me and you want a
    simpler solution than Tomcat Clustering, Solaris
    Zones, and Logical Domains.
  • Because RAM is the cheapest upgrade.

21
Non-Java Memory Demands
  • Java isnt the only thing that is going to demand
    memory!
  • Apache will need 10MB http.max.clients.
  • ModPerl needs memory.
  • NFS needs memory.
  • The operating system needs memory.

22
Generations and Memory Spaces
  • New (Young) Generation (heap)
  • Eden Space
  • Survivor Space
  • Old Generation (heap)
  • Tenured Space
  • Permanent Generation (non-heap)
  • Code Space (non-heap)

23
Default JVM Generational Model
Young Generation
Old Generation
Eden Space
Survivor Space
Tenured Space
24
Garbage Collection Basics
Young Generation
Old Generation
25
Garbage Collection Algorithms
  • -XXUseParalledOldGC (the Throughput
    collector)
  • -XXUseParallelGC
  • Invokes the multi-threaded New Generation
    collector
  • -XX -UseParallelGC
  • Invokes the single-threaded New Generation
    collector
  • -XXUseConcMarkSweepGC (the Low-pause
    collector)
  • -XXUseParNewGC
  • Invokes the multi-threaded New Generation
    collector
  • -XX-UseParNewGC
  • Invokes the single-threaded New Generation
    collector
  • -XXUseG1 (the Garbage First collector)

26
GC Terminology
  • Minor collection (YGC)
  • A Garbage Collection event that clears the Eden
    Space into the survivor space and promotes old
    survivors into the Tenured Space it is a
    stop-the-world collection.
  • Major collection (CMS)
  • A Garbage Collection event that clears dead
    objects from the Tenured Space and coalesces the
    JVMs free-lists into larger memory pages,
    concurrently with the execution of the
    application small portions are stop-the-world
    events.
  • Full collection (Compacting Collector)
  • A Garbage Collection event that clears and
    compacts the heap memory spaces it is a
    stop-the-world collection.

27
Single Server Stop-the-World Times
28
GC Terminology
  • Memory Allocation Rate
  • The rate of consumption of the Eden Space in MB/s
    (GB/Hr.)
  • Survival Rate
  • Percentage of Eden Space (or Survivor Space) that
    must be copied with the next young generation
    garbage collection
  • Infant Mortality Rate
  • Percentage of nascent object that die before
    their first YGC
  • Promotion Rate
  • Percentage of the New Generation that must be
    copy into the Old Generation with the next young
    generation collection
  • Collection Rate (Young or Old)
  • Amount of memory reclaimed by minor and major
    garbage collections measured in MB/s (GB/Hr.)

29
UofT Blackboard JVM Heap
30
Heap Memory Allocation Rate
  • YGC/Hour Eden Size
  • Example Memory Allocation Rate
  • 4GB Eden Space with 15 second YGC interval
  • 960GB per hour

31
/usr/java/bin/jmap -J-d64 -heap ltpidgt
  • /usr/java/bin/jmap -J-d64 -heap 29259
  • Attaching to process ID 29259, please wait...
  • Debugger attached successfully.
  • Server compiler detected.
  • JVM version is 1.5.0_14-b03
  • using parallel threads in the new generation.
  • using thread-local object allocation.
  • Concurrent Mark-Sweep GC
  • Heap Configuration
  • MinHeapFreeRatio 40
  • MaxHeapFreeRatio 70
  • MaxHeapSize 21474836480 (20480.0MB)
  • NewSize 4294967296 (4096.0MB)
  • MaxNewSize 4294967296 (4096.0MB)
  • OldSize 17179869184 (16384.0MB)
  • NewRatio 4
  • SurvivorRatio 2048
  • Heap Usage
  • New Generation (Eden 1 Survivor Space)
  • capacity 4292935680 (4094.0625MB)
  • used 3224075136 (3074.7176513671875MB)
  • free 1068860544 (1019.3448486328125MB)
  • 75.10187378348981 used
  • Eden Space
  • capacity 4290904064 (4092.125MB)
  • used 3224075136 (3074.7176513671875MB)
  • free 1066828928 (1017.4073486328125MB)
  • 75.13743229659865 used
  • From Space
  • capacity 2031616 (1.9375MB)
  • used 0 (0.0MB)
  • free 2031616 (1.9375MB)
  • 0.0 used
  • To Space
  • capacity 2031616 (1.9375MB)
  • used 0 (0.0MB)

32
/usr/java/bin/jstat gccause ltpidgt
  • /usr/java/bin/jstat -gccause 2553 60s
  • S0 S1 E O P YGC YGCT
    FGC FGCT GCT LGCC GCC
  • 0.00 0.00 94.51 13.16 34.54 40
    23.979 32 47.625 71.605 unknown GCCause
    No GC
  • 0.00 0.00 63.01 13.64 34.79 41
    24.211 33 51.549 75.760 unknown GCCause
    No GC
  • 0.00 0.00 20.98 13.97 34.87 42
    24.414 33 51.549 75.963 unknown GCCause
    No GC
  • 0.00 0.00 97.00 13.15 34.98 42
    24.414 34 55.990 80.404 unknown GCCause
    No GC
  • 0.00 0.00 65.48 13.48 35.01 43
    24.596 35 62.001 86.597 unknown GCCause
    No GC
  • 0.00 0.00 58.29 13.91 35.05 44
    24.816 35 62.001 86.817 unknown GCCause
    No GC
  • 0.00 0.00 40.44 14.40 35.06 45
    25.126 35 62.001 87.127 unknown GCCause
    No GC
  • 0.00 0.00 87.47 13.83 35.07 45
    25.126 36 67.605 92.732 unknown GCCause
    No GC
  • 0.00 0.00 35.01 14.42 35.09 46
    25.379 37 67.605 92.984 unknown GCCause
    No GC
  • 0.00 0.00 84.45 14.42 35.11 46
    25.379 37 80.853 106.232 unknown GCCause
    No GC
  • 0.00 0.00 37.64 14.80 35.12 47
    25.655 37 80.853 106.508 unknown GCCause
    No GC
  • 0.00 0.00 88.85 13.45 35.12 47
    25.655 38 86.310 111.965 unknown GCCause
    No GC
  • 0.00 0.00 45.96 13.96 37.68 48
    25.913 39 93.301 119.214 unknown GCCause
    No GC
  • 0.00 0.00 36.45 14.50 37.69 49
    26.227 39 93.301 119.528 unknown GCCause
    No GC

33
Some Logic About GC Times
  • More users on an application server
  • ? More threads creating objects
  • ? Increasing memory allocation rate
  • ? Decreasing YGC Interval
  • ? Less time for nascent objects to die
  • ? Higher promotion/survival rate
  • ? More physical memory to copy
  • ? Longer Young Generation Collections
  • ? Longer Stop-the-world events
  • ? Longer server response times

34
Sun Default 64-bit JVM Heap
35
The New Ratio
  • Ratio of Young Generation to Old Generation
  • A New Ratio of 3 means that the Old Generation
    is 3 times the size of the Young Generation.
  • Example with a 16GB heap
  • 12GB Old Generation
  • 4GB Young Generation
  • Example with a 20GB heap
  • 15GB Old Generation
  • 5GB Young Generation

36
JVM Memory Spaces
37
UofT Heap vs. Default Heap
38
Understanding the Young Generation
Survivor Space
Eden Space
Eden Space
From Space
To Space
39
Tenuring Threshold
Survivor Space
Tenured Space
From Space
To Space
40
GC Times Can Vary
  • Young GC Events
  • The number of objects
  • Size of the surviving objects
  • Old GC Events
  • Occupancy of Young Generation
  • Fragmentation of the Old Generation
  • Configuration
  • Size of the Eden Space
  • Size of the of the Survivor Spaces
  • Tenuring Threshold
  • Parallelism in the collectors

41
If You Cant Measure the Application,Measure the
Machine.
  • Tune the JVM for the largest heap that the
    hardware, operating system, and GC algorithm will
    support.
  • A larger heap means fewer stop-the-world events.
  • There is no harm having a larger heap, if the
    upper-bound on the Stop-the-World times is known,
    fixed, and acceptable for the application.
  • For a fixed number of threads, both CPU and
    memory demands can be reduced by running a single
    JVM, instead of clustering.
  • More users means more threads more threads means
    more objects more objects means more heap.

42
Xms and Xmx Misconceptions
  • When min.heapmax.heap, all of the JVMs heap is
    allocated at startup.
  • FALSE
  • Only the Eden Space is allocated at startup,
    regardless of the min.heap setting.
  • If adaptive sizing is used, the JVM will not
    reduce the total size of the JVM heap below
    min.heap.
  • The heap size will not be reduced until it first
    exceeds min.heap.

43
JVM Non-heap Size Options
  • -XXPermSize256m
  • -XXMaxPermSize256m
  • -XXInitialCodeCacheSize128m
  • -XXReservedCodeCacheSize128m

44
JVM Heap Size Options
  • -Xms20g
  • -Xmx20g
  • -XXOldSize16g
  • -XXNewSize4g
  • -XXMaxNewSize4g
  • -XXNewRatio4
  • -XXSurvivorRatio4096

45
Gross JVM Tuning Tuning
  • -Xss256k
  • -XXUseTLAB
  • -XXMaxFDLimit
  • -XXAlwaysPreTouch
  • -XXDisableExplicitGC
  • -XXMaxTenuringThreshold0

46
CMS Collector Tuning Options
  • -XXUseConcMarkSweepGC
  • -XXParallelGCThreads16
  • -XXParallelCMSThreads4
  • -XXParallelRefProcEnabled
  • -XXCMSMarkStackSize8M
  • -XXCMSMarkStackSizeMax8M
  • -XXCMSInitiatingOccupancyFraction60
  • -XXCMSScavengeBeforeRemark
  • -XXCMSParallelRemarkEnabled
  • -XXCMSPermGenSweepingEnabled
  • -XXCMSClassUnloadingEnabled

47
(No Transcript)
48
CMS Collection Close-up
49
Full CMS Cycle
50
An Evening of CMS Collections
51
The YGC Tipping Point
  • YGC Events occurring too frequently
  • ? Too many objects surviving
  • ? Too much memory being promoted
  • ? Old Generation Collections too frequent
  • ? Objects survive too many epochs
  • ? Old Generation collection rate too low
  • ? Old Generation runs out of memory
  • ? Infinite GC loops are inevitable

52
Some Facts About Memory
  • A process address space is usually larger than
    the total amount of physical memory (RAM)
    installed.
  • Any program might demand more memory than is
    currently uncommitted and available.
  • When the OS is starved for physical memory it
    uses swap space (disk storage) to hold older
    pages.
  • Physical memory is 500k times faster than disk.
  • Swapping to disk is memory starvation.
  • Swapping to disk is evil!

53
The Fork() and Exec() Problemin Solaris and Linux
  • When a process forks, it creates a copy of its
    address space (not its memory footprint).
  • As the forked process runs, it modifies the
    virtual memory in its own address space, and the
    OS must allocate physical memory to hold the
    modified pages (copy-on-write).
  • The exec() causes a lot of copy-on-write faults
    to occur very rapidly, demanding large amounts of
    memory from the OS.

54
Some Solaris Facts About Memory
  • If there is insufficient virtual memory to
    back-stop a forking process, the fork() call will
    fail.
  • This is a fail-safe fork. It prevents the OS
    from over-committing virtual memory.
  • It can be annoying, because it means you might
    need a huge swap-partition (2xRAM) that may get
    committed but is never touched.

55
More Xms and Xmx Misconceptions
  • As the JVM forks, at least max.heapsize swap
    space must be available or the fork() will fail.
  • FALSE
  • Virtual memory Physical Memory Swap Space
  • Virtual memory equal to the resident set size in
    use by the parent process, at the time of the
    fork(), will be allocated.
  • Linux will over-commit virtual memory (kernel 2.6
    default).
  • A JVM call to malloc() succeeds and the JVM is
    promised memory later the OS may not deliver on
    that promise!
  • This can lead to out of memory errors.
  • Solaris will not over-commit virtual memory.
  • As fails the malloc() fails, so fails the fork().

56
More Xms and Xmx Misconceptions
  • When the JVM forks a new process, the forked
    process will use max.heapsize.
  • FALSE
  • Fork1() uses a copy-on-write technique reduce the
    time taken to fork. Physical memory pages from
    the OS are mapped as virtual memory pages in the
    forked process are modified.

57
Some Bad Examples
  • The machine has 8GB of RAM.
  • The machine has a 8GB swap partition
  • The application has a 6GB footprint.
  • There is 9GB of virtual memory available.
  • The application does a fork() and then exec().
  • Exec() zeros 6GB of memory in the forked process,
    demanding 6GB of RAM from the OS.
  • The OS starts swapping memory pages to disk!!

58
Some Bad Examples
  • The machine has 16GB of RAM.
  • The machine has a 4GB swap partition
  • The application has a 8GB footprint.
  • There is 6GB of virtual memory available.
  • The application tries to fork() and exec().
  • Solaris The fork fails (out of memory)
  • Linux The fork succeeds, the exec fails.

59
A Good Example
  • The machine has 64GB of RAM.
  • The machine has a 64GB swap partition
  • The application has a 20GB footprint.
  • There is 86GB of virtual memory available.
  • The application does a fork() and exec().
  • Solaris and Linux both work fine.

60
Tomcat Connector Settings
  • maxThreads
  • Maximum number of threads that can be open in the
    pool.
  • 800 (200 default)
  • minSpareThreads
  • Minimum number of idle threads to leave in the
    pool.
  • 16 (4 default, possibly 10)
  • maxSpareThreads
  • Maximum number of idle threads to leave in the
    pool.
  • 100 (50 default)
  • acceptCount
  • Maximum number of requests to queue if no threads
    are free.
  • 50 (10 default)

61
Reasons Not to Use Tomcat Clustering
  • 32-bit is JVM too small anyway lt 200 threads max
  • 4 JVMs of 2GB each 8 GB
  • Thats 4 JVMs, each still has overhead of 60
    threads and 256MB of executable and non-heap
    memory!
  • ? 180 fewer threads available and 768MB more
    non-heap memory
  • ? 1/3 less work for the same resource
    commitment
  • ? further constrains the worst-case application
    use cases
  • (course enrollment Grade-book columns)
  • Each JVM has only ¼ the memory available to an
    8GB JVM
  • ? can support only ¼ of the peak threads of an
    8GB JVM
  • ? can support less than ¼ of the peak users of a
    8GB JVM
  • ? more likely to enter infinite GC loops.
  • CPU time lost to process-switching over
    thread-switching
  • Blackboard configuration is more complicated than
    with a single JVM.
  • SNMP monitoring and metering of clustered JVMs is
    more complicated.

62
Computing is All About Memory
  • A memory upgrade is often the most cost-effective
    hardware upgrade that one can perform.
  • Not CPU-bound? Not network-bound? Not database
    bound? Then what is the problem?
  • 4 machines with too little memory cost far more
    than does one machine with too much.
  • A slow machine with lots of memory is faster than
    a fast machine thats out of memory.
  • Increase the heap and use the CMS collector!

63
Thank you for your time.
  • John C. Calvin
  • Senior Systems Architect
  • Information Technology Services
  • University of Toronto
  • john.calvin_at_utoronto.ca
Write a Comment
User Comments (0)
About PowerShow.com