John C' Calvin - PowerPoint PPT Presentation

1 / 83
About This Presentation
Title:

John C' Calvin

Description:

John C' Calvin – PowerPoint PPT presentation

Number of Views:243
Avg rating:3.0/5.0
Slides: 84
Provided by: ErinB63
Category:
Tags: adobe | calvin | cs3 | john | photoshop

less

Transcript and Presenter's Notes

Title: John C' Calvin


1
Design to Debug Build to Last
  • John C. Calvin
  • Senior Systems Architect
  • University of Toronto

2
Toronto, Ontario CANADA (YYZ)
  • population 5.6 million
  • about the population of Chicago
  • area 2,751 mi2 (7,125 km2)
  • 4 universities 7 colleges

3
University of Toronto
  • 3 Campuses in the Greater Toronto Area
  • Downtown 52,296 students 175 acres
  • Eastern 10,465 students 300 acres
  • Western 10,924 students 200 acres
  • 1.4 Billion annual operating budget
  • Degree Programs
  • 840 undergraduate
  • 520 graduate
  • 75 doctoral
  • 14,500 annual undergraduate intake
  • 10,000 students in residence
  • 247 buildings
  • 200 acres net assignable space
  • 7,591 parking spaces

4
University of Toronto
  • Students
  • 55,352(46,940 FTEs) undergraduate degree-seeking
    students
  • 13,702 (12,499 FTEs) graduate degree-seeking
    students
  • 2,038 (902 FTEs) certificate, diploma and special
    students
  • 2,593 residents and post-graduate medical
    students
  • International students
  • 5,182 undergraduate degree seeking students
  • 1,579 graduate degree seeking students
  • 326 certificate, diploma and special students
  • 779 residents and post-graduate medical students
  • Faculty
  • 2,260 (2,185 FTEs) Professorial
  • 432 (378 FTEs) Teaching Stream
  • 1,079 (216 FTEs) Term-limited Sessional and
    Stipendiary
  • 3,913 (2,351 FTEs) Clinical
  • 2,707 (1,071 FTEs) Other

5
Agenda
  • System Design
  • Implementation
  • Observations

6
Blackboard System Overview
7
Raw Systems Statistics
  • System Activity
  • 225,703 average pages per day
  • 414,415 peak pages per day
  • User Accounts
  • 224,192 defined
  • 59,813 active

8
Raw System Statistics
  • 600,000 hits/hour (https GET/POST)
  • 1200-1900 concurrent users (10min int.)
  • 1 Terabyte/hour (JVM Memory Usage)

9
Raw System Statistics
10
Production Hardware Components
  • F5 BIG-IP LTM load-balancers (redundant)
  • Filbert firewalls (redundant)
  • 3 x Sun T2000 application servers
  • 1 x Sun T2000 NFS/collaboration services
  • 1 x Sun v890 database server
  • 9.4TB Sun 9985v storage array
  • 2 x Brocade 200E 16-port FC switches
  • 3 x 3Com 5500G 48-port gigabit switches

11
Support Systems
  • Warm Standby/Backup System
  • Sun T5220 database server
  • Sun T2000 application/content server
  • QA Pre-production Test System (8.0)
  • Sun T2000 application server
  • Sun T2000 content server
  • Sun T5220 database server
  • Sun StorageTek 6140
  • STAGING Upgrade and Migration Test (9.x)
  • Sun T1000 content server
  • Sun T5220 database server
  • Sun StorageTek 6140

12
Blackboard Software Components
  • Solaris 10 Update 5 (SPARC)
  • Sun JDK 1.6.0_14 (Java)
  • Apache/Tomcat/ModPerl/Xythos/Pubcookie
  • Blackboard v8.0.422.0
  • Blackboard Learning System
  • Blackboard Community System
  • Blackboard Content System
  • Oracle 10GR2

13
Application Server Basics
  • User requests come to Apache on port 443.
  • Java requests go to Tomcat.
  • Perl request go to Apache with ModPerl.
  • Apache delivers results to users.
  • Oracle provides everything but file content.
  • Content and course permissions
  • Grade-book and session history
  • User-interface layout customizations
  • NFS file system, mounted on the application
    servers, provides Apache with access to shared
    file content.

14
Blackboard System Internals
15
T2000 Application Servers
  • 1.2GHz UltraSPARC T1 (Niagara) processor
  • 8 cores, 32 virtual processors
  • 64GB DDR2 EEC 400Mhz memory
  • 2 x HW RAID-1pairs of 73GB SAS disks
  • One pair dedicated to /usr/local/blackboard
  • One pair for all other file systems
  • 4 x 10/100/1000 network interfaces
  • 2 x Redundant, hot-swap power supplies
  • Advance Lights-Out Management (ALOM)

16
Blackboard System Old Design
17
Blackboard System New Design
18
2 Front-side Networks
  • After the load-balancer
  • Inside and outside the firewall
  • 1514 byte frames
  • SSH and HTTPS traffic share single NIC
  • Production Collaboration Services
  • QA and management systems

19
6 Back-end Networks
  • 3 NFS backend networks
  • 3 SQL backend networks
  • Jumbo Frames 8192 byte frames
  • /etc/hosts overrides DNS
  • Each application server talks to a dedicated
    interface

20
Switch VLAN Configuration
  • VLAN ID Description IP Address Range
  • 1 LMS Front-side 128.100.87.0/24
  • 666 Test Network 192.168.1.0/24
  • 1103 CNS2 Network 128.100.103.0/24
  • 4011 NFS Channel 1 172.16.11.0/24
  • 4012 NFS Channel 2 172.16.12.0/24
  • 4013 NFS Channel 3 172.16.13.0/24
  • 4021 SQL Channel 1 172.16.21.0/24
  • 4022 SQL Channel 2 172.16.22.0/24
  • 4023 SQL Channel 3 172.16.23.0/24

21
Redundant Load-Balancersand Firewalls
22
(No Transcript)
23
Physical Storage Components
  • Sun StorageTek 9985v (15k RPM disks)
  • 34 x 300GB disk drives
  • 4 x 1.6TB RAID6 (6D2P) arrays
  • 2 3.2TB Pools, each composed of 2 1.6TB arrays
  • 2 hot-spare drives
  • Sun StorageTek 6140 (15k RPM disks)
  • 16 x 300GB disk drives
  • 3.5 TB RAID6 (13D2P)
  • 1 x hot-spare

24
Physical Storage Components
  • Sun StorageTek 6140 (10k RPM disks)
  • 16 x 300GB disk drives
  • 3.5 TB RAID6 (13D2P)
  • 1 x hot-spare
  • Sun StorageTek 6130 (15K RPM disks)
  • 14 x 68GB disk drives
  • 1 x 814GB RAID5 (12D1P)
  • 1 x hot-spare

25
RAID-5 (7D1P) 300GB Disks 1.9TB
26
RAID-6 (6D2P) 300GB Disks 1.6TB
27
14.2 TB of Usable Storage
  • 15,000 RPM Drives
  • 4 x 1.6TB - 300GB RAID6 (6D2P)
  • 1 x 3.5TB - 300GB RAID6 (13D2P)
  • 1 x 814GB - 68GB RAID5 (12D1P)
  • 10,000 RPM Drives
  • 1 x 3.5TB - 300GB RAID6 (13D2P)

28
2 Parity Groups per Pool 3.2TB
29
2 Pools 6.4TB Internal Total
30
Parity Groups and Pools
31
Now, forget about the disk arrays.Think about
pools of storage.
32
Each pool serves a different purpose.
33
Each pool is carved into 50 x 61GB LDEVs.
34
The Storage Plan
35
ShadowImage LDEV Duplication
36
ShadowImage LDEV Split
37
ShadowImage to Multiple LDEVs
38
ShadowImage to Cascading LDEVs
39
ShadowImage Consistent Split
40
Normal Paired Operation
41
Split for Online Backups
42
Resync Standby from Production
43
Resume Paired Operation
44
Split for Standby Server Takeover
45
Recover Production from Standby
46
Resume Paired Operation
47
Scope of the Upgrade
  • Software updates
  • Raidctrl support
  • Upgrade Solaris
  • Upgrade Blackboard
  • Upgrade Oracle
  • Sun 9985v upgrade
  • Install configure
  • Benchmark SAN
  • Migrate content and DB
  • BIOS upgrades
  • 8 x T2000
  • 2 x T1000
  • 3 x T5220
  • 1 x v890
  • Firmware upgrades
  • 3 x 3com switches
  • 2 x Brocade 200E

48
Physical System Size
  • 4 19-inch 42U racks
  • 22 pairs of mounting rails
  • 18 custom cable harnesses
  • 35 serial console ports
  • 41 DB9 to RJ45 serial adapters
  • 175 Ethernet cables
  • 54 220v AC power cords

49
System Availability Restrictions
  • June 27 start date
  • July long-weekend (1) x 56-hour window
  • 1700 Friday to 0001 Monday
  • August 8,15, 22 (3) x 2-hour windows
  • 1800 2000 Fridays
  • August long-weekend (1) x 56-hour window
  • 1700 Friday to 0001Monday
  • August 8 Sun StorageTek 9985v Installation
  • August 12 Blackboard Consulting Engagement
  • August 15 PRODUCTION LOCKDOWN!!

50
Cable Colour Scheme
  • NET0 (1st network interface)
  • NET1 (2nd network interface)
  • NET2 (3rd network interface)
  • NET3 (4th network interface)
  • MGT (management interface)
  • SER (serial console port)

51
That looks simple enough.
52
(No Transcript)
53
(No Transcript)
54
Default JVM Generational Model
Young Generation
Old Generation
Eden Space
Survivor Space
Tenured Space
55
Generational Memory Model
Young Generation
Old Generation
56
64-Bit JVM
  • Java(TM) 2 Runtime Environment, Standard Edition
  • (build 1.5.0_14-b03)
  • Java HotSpot(TM) 64-Bit Server VM
  • (build 1.5.0_14-b03, mixed mode)
  • Java(TM) Platform, Standard Edition for Business
  • (build 1.6.0_14-b08)
  • Java HotSpot(TM) 64-Bit Server VM
  • (build 14.0-b16, mixed mode)

57
Java on Sun Supported Platforms
  • Java(TM) 2 Runtime Environment, Standard Edition
    (build 1.5.0-2008-11-17-065212.va203678.j2se-jprta
    dm_16_Nov_2008_23_48)
  • Java HotSpot(TM) 64-Bit Server VM (build
    1.5.0_17rev_TEST_150_17revcr6786503cr6787254cr5
    070073_AlwaysPreTouch_03_chrisphi_2009.02.04_1357
    , mixed mode)

58
JVM Memory Management
  • Default JVM Settings
  • MaxHeapSize 1GB
  • NewRatio 2
  • SurvivorRatio 6
  • TenuringThreshold16
  • MaxPermSize 84MB
  • Memory Spaces
  • New Generation
  • Eden Space
  • Survivor Spaces
  • Old Generation

59
Sun Default 64-bit JVM Heap
60
UofT Blackboard JVM Heap
61
UofT Blackboard JVM Heap
62
JVM Options Memory Sizing
  • Reserved, Perm., Stack
  • -Xss256k
  • -XXInitialCodeCacheSize128m
  • -XXReservedCodeCacheSize128m
  • -XXPermSize256m
  • -XXMaxPermSize256m
  • Misc
  • -XXUseTLAB
  • -XXAlwaysPreTouch
  • -XXUseNiagaraIntrs
  • Young Old Gen.
  • -Xms16g
  • -Xmx16g
  • -XXNewSize4g
  • -XXMaxNewSize4g
  • -XXOldSize12g
  • -XXNewRatio4
  • -XXSurvivorRatio4096

63
JVM Options Garbage Collection
  • -XXDisableExplicitGC
  • -XXUseParNewGC
  • -XXParallelRefProcEnabled
  • -XXUseConcMarkSweepGC
  • -XXCMSClassUnloadingEnabled
  • -XXCMSParallelRemarkEnabled
  • -XXCMSPermGenSweepingEnabled
  • -XXCMSScavengeBeforeRemark
  • -XXCMSMarkStackSize8M
  • -XXCMSMarkStackSizeMax8M
  • -XXCMSInitiatingOccupancyFraction60
  • -XXMaxTenuringThreshold0
  • -XXParallelGCThreads16
  • -XXParallelCMSThreads16

64
JVM Options - Logging
  • -XXPrintVMOptions
  • -XXPrintCommandLineFlags
  • -XXPrintGCDetails
  • -XXPrintGCTimeStamps
  • -XXPrintGCTaskTimeStamps
  • -XXPrintGCApplicationStoppedTime
  • -XXPrintGCApplicationConcurrentTime
  • -XXPrintHeapAtGC
  • -XXPrintTenuringDistribution
  • -XXPrintCMSStatistics2
  • -Xloggc/var/log/gc.ltpidgt.log

65
JVM Options Monitoring
  • Dcom.sun.management.snmp.interface0.0.0.0
  • Dcom.sun.management.snmp.port8161
  • Dcom.sun.management.snmp.aclfalse
  • Dcom.sun.management.jmxremote.authenticatefalse
  • Dcom.sun.management.jmxremote.port9161
  • Dcom.sun.management.jmxremote.sslfalse

66
Instrumentation
  • 1500 monitored parameters
  • 7 production JVMs
  • 52 server network interfaces
  • 13 server CPUs
  • 70 disk sub-systems
  • 144 gigabit switch ports
  • 32 FC switch ports
  • Load-balancers firewalls

67
Best Indicators of System Stability
  • Operating System
  • CPU Usage
  • Swap Space Usage
  • TCP Connections
  • JVM
  • Thread Counts
  • Memory Usage Pattern
  • Garbage Collection Pattern

68
Tenured Generation Usage
69
Eden Space Usage
70
The fork()/exec() Problem
71
The fork()/exec() Problem (contd)
72
Problems Become Obvious
73
Peak Week Portal Connections
74
JVM Memory Allocation Rate
75
Peak Database Disk
76
Single Application Server450k hits/hour
77
Single Server Thread Counts
78
Single Server Stop-the-World Times
79
Single Server Memory Usage
80
Midterm Marks Peak
81
Useful URLs
  • http//www.cacti.net/
  • http//research.sun.com/techrep/2000/smli_tr-2000-
    88.pdf
  • http//research.sun.com/jtech/pubs/04-g1-paper-ism
    m.pdf
  • http//java.sun.com/docs/hotspot/gc5.0/gc_tuning_5
    .html

82
Summary
  • Design to debug and build to last.
  • Know what to cut and where.
  • Its all about the cabling.
  • SANs fall harder than LANs.
  • Big, fast, or stable. Pick any two?
  • Graph everything. Always!

83
Question Answer
  • John C. Calvin
  • Senior Systems Architect
  • University of Toronto
  • john.calvin_at_utoronto.ca
Write a Comment
User Comments (0)
About PowerShow.com