Automated Testing of Massively Multi-Player Games Lessons Learned from The Sims Online - PowerPoint PPT Presentation

About This Presentation
Title:

Automated Testing of Massively Multi-Player Games Lessons Learned from The Sims Online

Description:

Automated Testing of Massively Multi-Player Games Lessons Learned from The Sims Online Larry Mellon Spring 2003 Context: What Is Automated Testing? – PowerPoint PPT presentation

Number of Views:188
Avg rating:3.0/5.0
Slides: 65
Provided by: LMel7
Category:

less

Transcript and Presenter's Notes

Title: Automated Testing of Massively Multi-Player Games Lessons Learned from The Sims Online


1
Automated Testing of Massively Multi-Player Games
Lessons Learned fromThe Sims Online
  • Larry Mellon
  • Spring 2003

2
Context What Is Automated Testing?
3
Classes Of Testing
System Stress
Feature Regression
Load
QA
Developer
4
Automation Components
5
What Was Not Automated?
Startup Control
Repeatable, Synchronized Inputs
Results Analysis
Visual Effects
6
Lessons Learned Automated Testing
  • Design Initial Implementation
  • Architecture, Scripting Tests, Test Client
  • Initial Results

1/3
1/3
Fielding Analysis Adaptations
Wrap-up Questions What worked best, what
didnt Tabula Rasa MMP / SPG
1/3
Time (60 Minutes)
7
Requirements
Load
Load Testing
Regression
Regression Testing
High Code Churn Rate
8
Design Constraints
Load
Regression
Churn Rate
9
Single, Data Driven Test Client
Regression
Load
Reusable Scripts Data
Single API
Test Client
10
Data Driven Test Client
Testing feature correctness
Testing system performance
Regression
Load
Reusable Scripts Data
Single API
Test Client
Single API
Key Game States
Pass/Fail Responsiveness
Configurable Logs Metrics
11
Problem Testing Accuracy
  • Load Regression inputs must be
  • Accurate
  • Repeatable
  • Churn rate logic/data in constant motion
  • How to keep testing client accurate?
  • Solution game client becomes test client
  • Exact mimicry
  • Lower maintenance costs

12
Test Client Game Client
13
Game Client How Much To Keep?
Game Client
View
Presentation Layer
Logic
14
What Level To Test At?
Game Client
View
Mouse Clicks
Presentation Layer
Logic
Regression Too Brittle (pixel shift) Load Too
Bulky
15
What Level To Test At?
Game Client
View
Internal Events
Presentation Layer
Logic
Regression Too Brittle (Churn Rate vs Logic
Data)
16
Gameplay Semantic Abstractions
Basic gameplay changes less frequently than UI or
protocol implementations.
NullView Client
View
¾
Presentation Layer
Logic
¼
17
Scriptable User Play Sessions
  • SimScript
  • Collection Presentation Layer primitives
  • Synchronization wait_until, remote_command
  • State probes arbitrary game state
  • Avatars body skill, lamp on/off,
  • Test Scripts Specific / ordered inputs
  • Single user play session
  • Multiple user play session

18
Scriptable User Play Sessions
  • Scriptable play sessions big win
  • Load tunable based on actual play
  • Regression constantly repeat hundreds of play
    sessions, validating correctness
  • Gameplay semantics very stable
  • UI / protocols shifted constantly
  • Game play remained (about) the same

19
SimScript Abstract User Actions
  • include_script setup_for_test.txt
  • enter_lot alpha_chimp
  • wait_until game_state inlot
  • chat Im an Alpha Chimp, in a Lot.
  • log_message Testing object purchase.
  • log_objects
  • buy_object chair 10 10
  • log_objects
  •  

20
SimScript Control Sync
  • Have a remote client use the chair
  • remote_cmd monkey_bot
  • use_object chair sit
  •  
  • set_data avatar reading_skill 80
  • set_data book unlock
  • use_object book read
  • wait_until avatar reading_skill 100
  • set_recording on

21
Client Implementation
22
Composable Client
- Scripts - Cheat Console - GUI
Presentation Layer
Game Logic
23
Composable Client
- Console - Lurker - GUI
- Scripts - Console - GUI
Presentation Layer
Game Logic
Any / all components may be loaded per instance
24
Lesson View Logic Entangled
Game Client
View
Logic
25
Few Clean Separation Points
Game Client
View
Presentation Layer
Logic
26
Solution Refactored for Isolation
Game Client
View
Presentation Layer
Logic
27
Lesson NullView Debugging
?
Without (legacy) view system attached, tracing
was difficult.
Presentation Layer
Logic
28
Solution Embedded Diagnostics
Timeout Handlers
Diagnostics
Diagnostics
Diagnostics
Presentation Layer
Logic
29
Talk Outline Automated Testing
  • Design Initial Implementation
  • Architecture Design
  • Test Client
  • Initial Results

1/3
1/3
Lessons Learned Fielding
Wrap-up Questions
1/3
Time (60 Minutes)
30
Mean Time Between Failure
  • Random Event, Log Execute
  • Record client lifetime / RAM
  • Worked just not relevant in early stages of
    development
  • Most failures / leaks found were not
    high-priority at that time, when weighed against
    server crashes

31
Monkey Tests
  • Constant repetition of simple, isolated actions
    against servers
  • Very useful
  • Direct observation of servers while under
    constant, simple input
  • Server processes aged all day
  • Examples
  • Login / Logout
  • Enter House / Leave House

32
QA Test Suite Regression
  • High false positive rate high maintenance
  • New bugs / old bugs
  • Shifting game design
  • Unknown failures

Not helping in day to day work.
33
Talk Outline Automated Testing
  • Design Initial Implementation

¼
Fielding AnalysisAdaptations Non-Determinism M
aintenance Overhead Solutions Results Monkey
/ Sniff / Load / Harness
½
¼
Wrap-up Questions
Time (60 Minutes)
34
Analysis Testing Isolated Features
35
Analysis Critical Path
Test Case Can an Avatar Sit in a Chair?
Failures on the Critical Path block access to
much of the game.
use_object ()
buy_object ()
enter_house ()
buy_house ()
create_avatar ()
login ()
36
Solution Monkey Tests
  • Primitives placed in Monkey Tests
  • Isolate as much possible, repeat 400x
  • Report only aggregate results
  • Create Avatar 93 pass (375 of 400)
  • Poor Mans Unit Test
  • Feature based, not class based
  • Limited isolation
  • Easy failure analysis / reporting

37
Talk Outline Automated Testing
  • Design Initial Implementation

1/3
Lessons Learned Fielding Non-Determinism
Maintenance Costs Solution Approaches Monkey /
Sniff / Load / Harness
1/3
1/3
Wrap-up Questions
Time (60 Minutes)
38
Analysis Maintenance Cost
  • High defect rate in game code
  • Code Coupling side effects
  • Churn Rate frequent changes
  • Critical Path fatal dependencies
  • High debugging cost
  • Non-deterministic, distributed logic

39
Turnaround Time
Tests were too far removed from introduction of
defects.
40
Critical Path Defects Were Very Costly
41
Solution Sniff Test
42
Solution Hourly Diagnostics
  • SniffTest Stability Checker
  • Emulates a developer
  • Every hour, sync / build / test
  • Critical Path monkeys ran non-stop
  • Constant baseline
  • Traffic Generation
  • Keep the pipes full servers aging
  • Keep the DB growing

43
Analysis CONSTANT SHOUTING IS REALLY IRRITATING
  • Bugs spawned many, many, emails
  • Solution Report Managers
  • Aggregates / correlates across tests
  • Filters known defects
  • Translates common failure reports to their root
    causes
  • Solution Data Managers
  • Information Overload Automated workflow tools
    mandatory

44
ToolKit Usability
  • Workflow automation
  • Information management
  • Developer / Tester push button ease of use
  • XP flavour increasingly easy to run tests
  • Must be easier to run than avoid to running
  • Must solve problems on the ground now

45
Sample Testing Harness Views
46
Load Testing Goals
  • Expose issues that only occur at scale
  • Establish hardware requirements
  • Establish response is playable _at_ scale
  • Emulate user behaviour
  • Use server-side metrics to tune test scripts
    against observed Beta behaviour
  • Run full scale load tests daily

47
Load Testing Data Flow
Resource
Debugging Data
Load Testing Team
Metrics
Client
Metrics
Load Control Rig
Test
Test
Test
Test
Test
Test
Test
Test
Test
Client
Client
Client
Client
Client
Client
Client
Client
Client
Test Driver CPU
Test Driver CPU
Test Driver CPU
Game
Traffic
Internal
System
Server Cluster
Probes
Monitors
48
Load Testing Lessons Learned
  • Very successful
  • ScaleBreak up to 4,000 clients
  • Some conflicting requirements w/Regression
  • Continue on fail
  • Transaction tracking
  • Nullview client a little chunky

49
Current Work
  • QA test suite automation
  • Workflow tools
  • Integrating testing into the new features
    design/development process
  • Planned work
  • Extend Esper Toolkit for general use
  • Port to other Maxis projects

50
Talk Outline Automated Testing
1/3
  • Design Initial Implementation

1/3
Lessons Learned Fielding
Wrap-up Questions
1/3
Biggest Wins / Losses Reuse Tabula Rasa MMP SSP
Time (60 Minutes)
51
Biggest Wins
  • Presentation Layer Abstraction
  • NullView client
  • Scripted playsessions powerful for regression
    load
  • Pre-Checkin Snifftest
  • Load Testing
  • Continual Usability Enhancements
  • Team
  • Upper Management Commitment
  • Focused Group, Senior Developers

52
Biggest Issues
  • Order Of Testing
  • MTBF / QA Test Suites should have come last
  • Not relevant when early game too unstable
  • Find / Fix Lag too distant from Development
  • Changing TSOs Development Process
  • Tool adoption was slow, unless mandated
  • Noise
  • Constant Flood Of Test Results
  • Number of Game Defects, Testing Defects
  • Non-Determinism / False Positives

53
Tabula Rasa
How Would I Start The Next Project?
54
Tabula Rasa
PreCheckin Sniff Test
Theres just no reason to let code break.
55
Tabula Rasa
PreCheckin SniffTest
Keep Mainline working
Hourly Monkey Tests
Useful baseline keeps servers aging.
56
Tabula Rasa
PreCheckin SniffTest
Keep Mainline working
Hourly Stability Checkers
Baseline for Developers
Dedicated Tools Group
Continual usability enhancements adapted tools To
meet on the ground conditions.
57
Tabula Rasa
PreCheckin SniffTest
Keep Mainline working
Hourly Stability Checkers
Baseline for Developers
Dedicated Tools Group
Easy to Use Used
Executive Level Support
Mandates required to shift how entire teams
operated.
58
Tabula Rasa
PreCheckin SniffTest
Keep Mainline working
Hourly Stability Checkers
Baseline for Developers
Easy to Use Used
Dedicated Tools Group
Executive Support
Radical Shifts in Process
Load Test Early Often
59
Tabula Rasa
PreCheckin SniffTest
Keep Mainline working
Hourly Stability Checkers
Baseline for Developers
Easy to Use Used
Dedicated Tools Group
Executive Support
Radical shifts in Process
Load Test Early Often
Break it before Live
Distribute Test Development Ownership Across
Full Team
60
Next Project Basic Infrastructure
Control Harness For Clients Components
Reference Client
Self Test
Living Doc
Reference Feature
Regression Engine
61
Building Features NullView First
Control Harness
Reference Client
Self Test
Reference Feature
Living Doc
NullView Client
Regression Engine
62
Build The Tests With The Code
Control Harness
Self Test
Reference Client
Reference Feature
Regression Engine
NullView Client
Login
Monkey Test
Nothing Gets Checked In Without A Working Monkey
Test.
63
Conclusion
  • Estimated Impact on MMP High
  • Sniff Test kept developers working
  • Load Test IDd critical failures pre-launch
  • Presentation Layer scriptable play sessions
  • Cost To Implement Medium
  • Much Lower for SSP Games

Repeatable, coordinated inputs _at_ scale and
pre-checkin regression were very significant
schedule accelerators.
64
Conclusion
Go For It
65
Talk Outline Automated Testing
1/3
  • Design Initial Implementation

1/3
Lessons Learned Fielding
Wrap-up Questions
1/3
Time (60 Minutes)
Write a Comment
User Comments (0)
About PowerShow.com