Automated Testing: Better, Cheaper, Faster, For Everything - PowerPoint PPT Presentation

About This Presentation
Title:

Automated Testing: Better, Cheaper, Faster, For Everything

Description:

Increase comfort and confidence of entire team ... Do all test suites pass? - Are the servers stable. under peak load conditions? Promotable to ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 40
Provided by: LMel7
Category:

less

Transcript and Presenter's Notes

Title: Automated Testing: Better, Cheaper, Faster, For Everything


1
Automated TestingBetter, Cheaper, Faster,For
Everything
  • Larry Mellon, Steve Keller
  • Austin Game Conference
  • Sept, 2004

2
About This Talk
  • Highly visual slides are often followed by a Key
    Points text slide that provides additional
    details. For smoother flow, such slides are
    hidden in presentation mode.
  • Some animations are not compatible with older
    versions of PowerPoint.

3
What Is A MMPAutomated Testing System?
  • Push-button ability to run large-scale,
    repeatable tests
  • Cost
  • Hardware / Software
  • Human resources
  • Process changes
  • Benefit
  • Accurate, repeatable measurable tests during
    development and operations
  • Stable software, faster, measurable progress
  • Base key decisions on fact, not opinion

4
Key Points
  • Comfort and confidence level
  • Managers/Producers can easily judge how
    development is progressing
  • Just like bug count reports, test reports
    indicate overall quality of current state of the
    game
  • Frequent, repeatable tests show progress
    backsliding
  • Investing developers in the test process helps
    prevent QA vs. Development shouting matches
  • Smart developers like numbers and metrics just as
    much as producers do
  • Making your goals you will ship cheaper,
    better, sooner
  • Cheaper even though initial costs may be
    higher, issues get exposed when its cheaper to
    fix them (and developer efficiency increases)
  • Better robust code
  • Sooner its ok to ship now is based on real
    data, not supposition

5
MMP Requires A Strong Commitment To Testing
  • System complexity, non-determinism, scale
  • Tests provide hard data in a confusing sea of
    possibilities
  • Increase comfort and confidence of entire team
  • Tools augment your teams ability to do their
    jobs
  • Find problems faster
  • Measure / change / measure repeat as necessary
  • Production / exec teams come to depend on this
    data to a high degree

6
How To Get There
  • Plan for testing early
  • Non-trivial system
  • Architectural implications
  • Make sure the entire team is on board
  • Be willing to devote time and money

7
Automation Architecture
Startup Control
Collection Analysis
Repeatable, Synced Test Inputs
System Under Test
System Under Test
System Under Test
Scripted Test Clients Emulated User Play
Sessions Multi-client synchronization
Report Managers Raw Data Collection Aggregation /
Summarization Alarm Triggers
Test Manager Test Selection/Setup Control N
Clients RT probes
8
Key Points
  • Scriptable test clients
  • Lightweight subset of the shipping client
  • Instrumented spits out lots of useful
    information
  • Repeatable
  • Bots help you understand the test results
  • Log both server and client output (common
    format), w/timestamps!
  • Automated metrics collection aggregation
  • High level at a glance reports with detail
    drill down
  • Pushbutton application for both running and
    analyzing a test

9
Outline
  • Overview Automated Testing
  • Definition, Value, High-Level Approach
  • Applying Automated Testing
  • Mechanics, Applications
  • Process Shifts Stability, Scale Metrics
  • Implementation Key Risks
  • Summary Questions

10
Scripted Test Clients
  • Scripts are emulated play sessions just like
    somebody plays the game
  • Command steps what the player does to the game
  • Validation steps what the game should do in
    response

11
Scripts TailoredTo Each Test Application
  • Unit testing 1 feature 1 script
  • Load testing Representative play session
  • The average Joe, times thousands
  • Shipping quality corner cases, feature
    completeness
  • Integration test code changes for catastrophic
    failures

12
Bread Crumbs Aggregated Instrumentation Flags
Trouble Spots
Server Crash
13
Quickly Find Trouble Spots
DB byte count oscillates out of control
14
Drill Down For Details
A single DB Request is clearly at fault
15
Process Shift Applying Automation to Development
Earlier Tools Investment Equals More Gain
Not Good Enough
16
Process Shifts Automated Testing Can Change The
Shape Of The Development Progress Curve
Stability
Keep Developers moving forward, not bailing water
Scale
Focus Developers on key, measurable roadblocks
17
Process Shift Measurable Targets, Projected
Trend Lines
Target Complete
Core Functionality Tests, Any Feature (e.g.
clients)
Time
Any Time (e.g. Alpha)
Actionable progress metrics, early enough to react
18
Stability Analysis What Brings Down The Team?
Test Case Can an Avatar Sit in a Chair?
Failures on the Critical Path block access to
much of the game. Worse, unreliable failures
use_object ()
buy_object ()
enter_house ()
buy_house ()
create_avatar ()
login ()
19
Impact On Others
20
(No Transcript)
21
Key Points
  • Build stability slowed forward progress
    (especially the critical path)
  • People were blocked from getting work done
  • Uncertainty did I break that, or did it just
    happen?
  • A lot of developers just didnt get
    non-determinism
  • Backsliding things kept breaking
  • Monkey Tests always current baseline for
    developers
  • Common measuring stick across builds
    deployments extremely valuable

22
Monkey Test EnterLot
23
Non-Deterministic Failures
24
Key Points
  • 30 test runs, 4 behaviours
  • Successful entry
  • Hang or Crash
  • Owner evicted, all possessions stolen
  • Random results observed in all major features
  • Critical Path random failures outside of Unit
    Tests very difficult to track

25
Stability Via Monkey Tests
Continual Repetition of Critical Path Unit Tests
26
Key Points
  • Hourly stability checkers
  • Aging (dirty processes, growing datasets, leaking
    memory)
  • Moving parts (race conditions)
  • Stability measure what works, right now?
  • Flares go off, etc
  • Unit tests (against Features)
  • Minimal noise / side effects
  • Reference point what should work?
  • Clarity in reporting / triaging

27
Process Shift Comb Filter Testing
Sniff Test, Monkey Tests - Fast to run -
Catch major errors - Keeps coders working
Smoke Test, Server Sniff - Is the game
playable? - Are the servers stable under a
light load? - Do all key features work?
Full Feature Regression, Full Load Test - Do
all test suites pass? - Are the servers stable
under peak load conditions?



New code ready For checkin
Promotable to full testing
Promotable to paying customers
Full system build
  • Cheap tests to catch gross errors early in the
    pipeline
  • More expensive tests only run on known
    functional builds

28
Key Points
  • Much faster progress after stability checkers
    added
  • Sniff
  • Hourly reference tests (sniff monkey, unit
    monkey)
  • Comb filters kept the manpower overhead low (on
    both sides, and gave quick feedback. Fewer redos
    for engs, fewer bugs for QA to findprocess)
  • Extra post-checkin testing story (optional)
  • Size of team gives high broken build cost
  • Fewer Redos
  • Fewer side-effect bugs

29
Process Shift Who Tests What?
  • Automation simple tasks (repetitive or
    large-scale)
  • Load _at_ scale
  • Workflow (information management)
  • Full weapon damage assessment, broad, shallow
    feature coverage
  • Manual judgment / innovative tasks
  • Visuals, playability, creative bug hunting
  • Combined
  • Tier 1 / Tier 2 automation flags potential
    errors, manual investigates
  • Within a single test automation snapshots key
    game states, manual evaluates results
  • Augmented / accelerated complex build steps,

30
Process Shift Load Testing (Before Paying
Customers Show Up)
  • Expose issues that only occur at scale

Establish hardware requirements
Establish play is acceptable _at_ scale
31
(No Transcript)
32
Client-Server Comparison
33
Highly Accurate Load TestingMonkey See /
Monkey Do
Sim Actions (Player Controlled)
Sim Actions (Script Controlled)
34
Outline
  • Overview Automated Testing
  • Definition, Value, High-Level Approach
  • Applying Automated Testing
  • Mechanics, Applications
  • Process Shifts Stability, Scale Metrics
  • Implementation Key Risks
  • Summary Questions

35
Data Driven Test Client
Regression
Load
Reusable Scripts Data
Single API
Test Client
Single API
Key Game States
Pass/Fail Responsiveness
Script-Specific Logs Metrics
36
Scripted Players Implementation
Commands
Presentation Layer
37
What Level To Test At?
Game Client
View
Mouse Clicks
Presentation Layer
Logic
Regression Too Brittle (UIpixel shift) Load
Too Bulky
38
What Level To Test At?
Game Client
View
Internal Events
Presentation Layer
Logic
Regression Load Too Brittle (Churn Rate vs
Logic Data)
39
Automation Scripts QA Tester Scripts
Basic gameplay changes less frequently than UI or
protocol implementations.
NullView Client
View
Presentation Layer
Logic
40
Key Points
  • Support costs one (data driven) client better
    than N clients
  • Tailorable validation output turned out to be a
    very powerful construct
  • Each test script contains required validation
    steps (flexible, tunable, )
  • Minimize state to regress against fewer false
    positives

41
Common Gotchas
  • Setting the Test bar too high, too early
  • Feature drift expensive test maintenance
  • Code is built incrementally reporting failures
    nobody is prepared to deal with wastes
    everybodys time
  • Non-determinism
  • Race conditions, dirty buffers/processState,
  • Developers test with a single client against a
    single server no chance to expose race
    conditions
  • Not designing for testability
  • Testability is an end requirement
  • Retrofitting is expensive
  • No senior engineering committed to the testing
    problem

42
Outline
  • Overview Automated Testing
  • Definition, Value, High-Level Approach
  • Applying Automated Testing
  • Mechanics, Applications
  • Process Shifts Stability Scale
  • Implementation Key Risks
  • Summary Questions

43
Summary Mechanics Implications
  • Scripted test clients and instrumented code rock!
  • Collection, aggregation and display of test data
    is vital in making decisions on a day to day
    basis
  • Lessen the panic
  • ScaleBreak is a very clarifying experience
  • Stable codeservers in development greatly ease
    the pain of building a MMP game
  • Hard data (not opinion) is both illuminating and
    calming
  • Long-term operations testing is a recurring cost

44
Summary Process
  • Integrate automated testing at all levels
  • Dont just throw testing over the wall to QA
    monsters
  • Use automation to speed focus development
  • Stability Sniff Test, Monkey Tests
  • Scale Load Test

45
Summary Key Points
  • Ship a better game
  • Lessen the panic
  • Constant testing for stability prevents
    backsliding during development and operations,
    keeps the team moving forward, roadblock free,
    keeps the player experience smooth
  • Early load testing exposes critical server costs
    and failures in time to be addressed
  • Everybody knows what works, every day
  • Testing its not just for QA anymore
  • Continual content extensions while keeping
    previous features stable, over years of
    operations
  • Stable systems keep customers happy developers
    working on new features, not fire-fighting
  • Recurring cost excellent fit for tool investment

46
Tabula Rasa
PreCheckin SniffTest
Keep Mainline Working
Hourly Monkey Tests
Baseline for Developers
Dedicated Tools Group
Easy to Use Used
Executive Support
Radical Shifts in Process
Load Test Early Often
Break It Before Live
Distribute Test Development Ownership Across
Full Team
47
Cautionary Tales
Flexible Game Development Requires Flexible Tests
Signal To Noise Ratio
Defects Variance In The Testing System
48
Key Points
  • Initial development phase game design in
    constant flux
  • Tests usually start by not working
  • Noise makes it hard to find results
  • boy who cried wolf syndrome
  • Business decisions get made off testing results
    make sure theyre accurate (load testing inputs,
    report generators, probing system, script errors,
    )
  • Team trust another factor
  • Complex system with high degree of flex requires
  • Senior engineers full time
  • Team management commitment

49
Questions (15 Minutes)
  • Overview Automated Testing
  • Definition, Value, High-Level Approach
  • Applying Automated Testing
  • Mechanics, Applications
  • Process Shifts Stability, Scale Metrics
  • Implementation Key Risks

Slides online _at_ www.maggotranch.com/MMP
Write a Comment
User Comments (0)
About PowerShow.com