Automated Testing of Massively Multi-Player Games Lessons Learned from The Sims Online - PowerPoint PPT Presentation

About This Presentation

Title:

Automated Testing of Massively Multi-Player Games Lessons Learned from The Sims Online

Description:

Automated Testing of Massively Multi-Player Games Lessons Learned from The Sims Online Larry Mellon Spring 2003 Context: What Is Automated Testing? – PowerPoint PPT presentation

Number of Views:188

Avg rating:3.0/5.0

Slides: 65

Provided by: LMel7

Category:

more less

Transcript and Presenter's Notes

Title: Automated Testing of Massively Multi-Player Games Lessons Learned from The Sims Online

1
Automated Testing of Massively Multi-Player Games
Lessons Learned fromThe Sims Online

Larry Mellon
Spring 2003

2
Context What Is Automated Testing?
3
Classes Of Testing
System Stress
Feature Regression
Load
QA
Developer
4
Automation Components
5
What Was Not Automated?
Startup Control
Repeatable, Synchronized Inputs
Results Analysis
Visual Effects
6
Lessons Learned Automated Testing

Design Initial Implementation
Architecture, Scripting Tests, Test Client
Initial Results

1/3
1/3
Fielding Analysis Adaptations
Wrap-up Questions What worked best, what
didnt Tabula Rasa MMP / SPG
1/3
Time (60 Minutes)
7
Requirements
Load
Load Testing
Regression
Regression Testing
High Code Churn Rate
8
Design Constraints
Load
Regression
Churn Rate
9
Single, Data Driven Test Client
Regression
Load
Reusable Scripts Data
Single API
Test Client
10
Data Driven Test Client
Testing feature correctness
Testing system performance
Regression
Load
Reusable Scripts Data
Single API
Test Client
Single API
Key Game States
Pass/Fail Responsiveness
Configurable Logs Metrics
11
Problem Testing Accuracy

Load Regression inputs must be
Accurate
Repeatable
Churn rate logic/data in constant motion
How to keep testing client accurate?
Solution game client becomes test client
Exact mimicry
Lower maintenance costs

12
Test Client Game Client
13
Game Client How Much To Keep?
Game Client
View
Presentation Layer
Logic
14
What Level To Test At?
Game Client
View
Mouse Clicks
Presentation Layer
Logic
Regression Too Brittle (pixel shift) Load Too
Bulky
15
What Level To Test At?
Game Client
View
Internal Events
Presentation Layer
Logic
Regression Too Brittle (Churn Rate vs Logic
Data)
16
Gameplay Semantic Abstractions
Basic gameplay changes less frequently than UI or
protocol implementations.
NullView Client
View
¾
Presentation Layer
Logic
¼
17
Scriptable User Play Sessions

SimScript
Collection Presentation Layer primitives
Synchronization wait_until, remote_command
State probes arbitrary game state
Avatars body skill, lamp on/off,
Test Scripts Specific / ordered inputs
Single user play session
Multiple user play session

18
Scriptable User Play Sessions

Scriptable play sessions big win
Load tunable based on actual play
Regression constantly repeat hundreds of play
sessions, validating correctness
Gameplay semantics very stable
UI / protocols shifted constantly
Game play remained (about) the same

19
SimScript Abstract User Actions

include_script setup_for_test.txt
enter_lot alpha_chimp
wait_until game_state inlot
chat Im an Alpha Chimp, in a Lot.
log_message Testing object purchase.
log_objects
buy_object chair 10 10
log_objects

20
SimScript Control Sync

Have a remote client use the chair
remote_cmd monkey_bot
use_object chair sit
set_data avatar reading_skill 80
set_data book unlock
use_object book read
wait_until avatar reading_skill 100
set_recording on

21
Client Implementation
22
Composable Client
- Scripts - Cheat Console - GUI
Presentation Layer
Game Logic
23
Composable Client
- Console - Lurker - GUI
- Scripts - Console - GUI
Presentation Layer
Game Logic
Any / all components may be loaded per instance
24
Lesson View Logic Entangled
Game Client
View
Logic
25
Few Clean Separation Points
Game Client
View
Presentation Layer
Logic
26
Solution Refactored for Isolation
Game Client
View
Presentation Layer
Logic
27
Lesson NullView Debugging
?
Without (legacy) view system attached, tracing
was difficult.
Presentation Layer
Logic
28
Solution Embedded Diagnostics
Timeout Handlers
Diagnostics
Diagnostics
Diagnostics
Presentation Layer
Logic
29
Talk Outline Automated Testing

Design Initial Implementation
Architecture Design
Test Client
Initial Results

1/3
1/3
Lessons Learned Fielding
Wrap-up Questions
1/3
Time (60 Minutes)
30
Mean Time Between Failure

Random Event, Log Execute
Record client lifetime / RAM
Worked just not relevant in early stages of
development
Most failures / leaks found were not
high-priority at that time, when weighed against
server crashes

31
Monkey Tests

Constant repetition of simple, isolated actions
against servers
Very useful
Direct observation of servers while under
constant, simple input
Server processes aged all day
Examples
Login / Logout
Enter House / Leave House

32
QA Test Suite Regression

High false positive rate high maintenance
New bugs / old bugs
Shifting game design
Unknown failures

Not helping in day to day work.
33
Talk Outline Automated Testing

Design Initial Implementation

¼
Fielding AnalysisAdaptations Non-Determinism M
aintenance Overhead Solutions Results Monkey
/ Sniff / Load / Harness
½
¼
Wrap-up Questions
Time (60 Minutes)
34
Analysis Testing Isolated Features
35
Analysis Critical Path
Test Case Can an Avatar Sit in a Chair?
Failures on the Critical Path block access to
much of the game.
use_object ()
buy_object ()
enter_house ()
buy_house ()
create_avatar ()
login ()
36
Solution Monkey Tests

Primitives placed in Monkey Tests
Isolate as much possible, repeat 400x
Report only aggregate results
Create Avatar 93 pass (375 of 400)
Poor Mans Unit Test
Feature based, not class based
Limited isolation
Easy failure analysis / reporting

37
Talk Outline Automated Testing

Design Initial Implementation

1/3
Lessons Learned Fielding Non-Determinism
Maintenance Costs Solution Approaches Monkey /
Sniff / Load / Harness
1/3
1/3
Wrap-up Questions
Time (60 Minutes)
38
Analysis Maintenance Cost

High defect rate in game code
Code Coupling side effects
Churn Rate frequent changes
Critical Path fatal dependencies
High debugging cost
Non-deterministic, distributed logic

39
Turnaround Time
Tests were too far removed from introduction of
defects.
40
Critical Path Defects Were Very Costly
41
Solution Sniff Test
42
Solution Hourly Diagnostics

SniffTest Stability Checker
Emulates a developer
Every hour, sync / build / test
Critical Path monkeys ran non-stop
Constant baseline
Traffic Generation
Keep the pipes full servers aging
Keep the DB growing

43
Analysis CONSTANT SHOUTING IS REALLY IRRITATING

Bugs spawned many, many, emails
Solution Report Managers
Aggregates / correlates across tests
Filters known defects
Translates common failure reports to their root
causes
Solution Data Managers
Information Overload Automated workflow tools
mandatory

44
ToolKit Usability

Workflow automation
Information management
Developer / Tester push button ease of use
XP flavour increasingly easy to run tests
Must be easier to run than avoid to running
Must solve problems on the ground now

45
Sample Testing Harness Views
46
Load Testing Goals

Expose issues that only occur at scale
Establish hardware requirements
Establish response is playable _at_ scale
Emulate user behaviour
Use server-side metrics to tune test scripts
against observed Beta behaviour
Run full scale load tests daily

47
Load Testing Data Flow
Resource
Debugging Data
Load Testing Team
Metrics
Client
Metrics
Load Control Rig
Test
Test
Test
Test
Test
Test
Test
Test
Test
Client
Client
Client
Client
Client
Client
Client
Client
Client
Test Driver CPU
Test Driver CPU
Test Driver CPU
Game
Traffic
Internal
System
Server Cluster
Probes
Monitors
48
Load Testing Lessons Learned

Very successful
ScaleBreak up to 4,000 clients
Some conflicting requirements w/Regression
Continue on fail
Transaction tracking
Nullview client a little chunky

49
Current Work

QA test suite automation
Workflow tools
Integrating testing into the new features
design/development process
Planned work
Extend Esper Toolkit for general use
Port to other Maxis projects

50
Talk Outline Automated Testing
1/3

Design Initial Implementation

1/3
Lessons Learned Fielding
Wrap-up Questions
1/3
Biggest Wins / Losses Reuse Tabula Rasa MMP SSP
Time (60 Minutes)
51
Biggest Wins

Presentation Layer Abstraction
NullView client
Scripted playsessions powerful for regression
load
Pre-Checkin Snifftest
Load Testing
Continual Usability Enhancements
Team
Upper Management Commitment
Focused Group, Senior Developers

52
Biggest Issues

Order Of Testing
MTBF / QA Test Suites should have come last
Not relevant when early game too unstable
Find / Fix Lag too distant from Development
Changing TSOs Development Process
Tool adoption was slow, unless mandated
Noise
Constant Flood Of Test Results
Number of Game Defects, Testing Defects
Non-Determinism / False Positives

53
Tabula Rasa
How Would I Start The Next Project?
54
Tabula Rasa
PreCheckin Sniff Test
Theres just no reason to let code break.
55
Tabula Rasa
PreCheckin SniffTest
Keep Mainline working
Hourly Monkey Tests
Useful baseline keeps servers aging.
56
Tabula Rasa
PreCheckin SniffTest
Keep Mainline working
Hourly Stability Checkers
Baseline for Developers
Dedicated Tools Group
Continual usability enhancements adapted tools To
meet on the ground conditions.
57
Tabula Rasa
PreCheckin SniffTest
Keep Mainline working
Hourly Stability Checkers
Baseline for Developers
Dedicated Tools Group
Easy to Use Used
Executive Level Support
Mandates required to shift how entire teams
operated.
58
Tabula Rasa
PreCheckin SniffTest
Keep Mainline working
Hourly Stability Checkers
Baseline for Developers
Easy to Use Used
Dedicated Tools Group
Executive Support
Radical Shifts in Process
Load Test Early Often
59
Tabula Rasa
PreCheckin SniffTest
Keep Mainline working
Hourly Stability Checkers
Baseline for Developers
Easy to Use Used
Dedicated Tools Group
Executive Support
Radical shifts in Process
Load Test Early Often
Break it before Live
Distribute Test Development Ownership Across
Full Team
60
Next Project Basic Infrastructure
Control Harness For Clients Components
Reference Client
Self Test
Living Doc
Reference Feature
Regression Engine
61
Building Features NullView First
Control Harness
Reference Client
Self Test
Reference Feature
Living Doc
NullView Client
Regression Engine
62
Build The Tests With The Code
Control Harness
Self Test
Reference Client
Reference Feature
Regression Engine
NullView Client
Login
Monkey Test
Nothing Gets Checked In Without A Working Monkey
Test.
63
Conclusion

Estimated Impact on MMP High
Sniff Test kept developers working
Load Test IDd critical failures pre-launch
Presentation Layer scriptable play sessions
Cost To Implement Medium
Much Lower for SSP Games

Repeatable, coordinated inputs _at_ scale and
pre-checkin regression were very significant
schedule accelerators.
64
Conclusion
Go For It
65
Talk Outline Automated Testing
1/3