Title: Metrics for MMP Development and Operations Lessons Learned: The Sims Online
1Metrics forMMP Development and
OperationsLessons Learned The Sims Online
- Larry Mellon
- GDC, Spring 2004
2Metrics Catch-22
Useful
Careful
GI / GO
Hard Data
Optimization Tool
Expensive
3Key Points
- What you measure becomes what people base
critical decisions on - Pick carefully (example Refactoring)
- Red herrings tracking the wrong metric produces
the wrong result - Cross check the numbers
- Metrics bugs cause great confusion GI/GO
- Mis-information bad data that people then take
take action against leads to bad results
4Importance of Metrics is Relative
Mark Twain
Lord Kelvin
Measure Everything
Measure Just Enough
5Key Points
- Need some form of scale to estimate how big of a
club to use on the metrics problem for your
project - Everybody needs some level, but how much to
spend? - Two fans of metrics make good icons for the end
points - Lord kelvin smart enough to get a scale named
after him, measure everything, spare no expense
kind of guy - Mark twain they have their uses, but viewed them
with a jaundiced eye - Be careful (pick well, cross check)
- Dont go overboard (metrics have uses, but are no
silver bullet, and and hurt you)
6Pro-Metrics Lord Kelvin
- I often say that when you can measure what you
are speaking about and express it in numbers you
know something about it - But when you cannot express it in numbers, your
knowledge is of a meagre and unsatisfactory
kind. - Institution of Civil Engineers, 1883
7Mark Twain Caveat Emptor / Siren Song
- Figures often beguile me, particularly when I
have the arranging of them myself - There are three kinds of lies
Lies
Damned Lies
Statistics
8Invest No More Than You Need
Kelvin
Twain
9Key Points
- Notional Scale Lord Kelvin vs Mark Twain
- LK complexity of a system under study requires
fine-grain visibility into many variables - MT Practical Man measurements cut to fit,
good enough, roughly correct - What the scale tells you about your problem is
the size complexity of your metrics system - Big metrics systems are expensive
- Dont go postal (unless you need to)
- Build no more than you need (why measure beyond
what you care about for either precision,
frequency, depth or breadth)
10MMP Measurement Focal Points
Operational Costs
Infrastructure
Player Actions
Economy
11Key Points MMP Infrastructure
- Complexity of implementation
- Butterfly effect non-determinism what went
wrong?? - Number of moving parts tens of interacting
server processes, hundreds to thousands of
(highly variable) user inputs - Many interacting developer teams
- Scale repeat the above, to support 50,000
butterflys - Quality of Service requirements are high
- Reliability
- Performance
12Key Points (2) Operations
- Service business, not packaged goods
- Driving Requirements reliability / performance /
fun - ROI (value to customer vs cost to buildrun)
- Player base
- Who costs money
- Who generates money
- Minimize overhead
- Anything you can measure, you can optimize
- Where do the operational costs go?
- E.g. bwidth, service calls, crashes
- What costs money
- What generates money
- Customer Service
- Whos being naughty?
- Who is a loyal customer?
13Key Points (3) Social / Economic
- What do people do in-game?
- Where does their in-game money come from?
- What do they spend it on?
- Why?
- The need to please
- What aspects of the game are used the most
- Are people having fun, right now
- Tuning the gameplay
14Key Points
- Profiler captures tonso data
- Essential to have report generators that
automatically create high level views/summaries
of data - Each view is tailored to the particular class of
user - Designers/community_managers daily summaries
- Eng/ops time-driven charts
15Similar Use Case Casinos(Harrahs Total
Reward)
Unified Player Action DB
Casino
Casino
Casino
Table / Machine
Table / Machine
Table / Machine
Track every Player Action
Player
Player
Player
16Highly Profitable, Highly Popular
Analyze
Unified Player Action DB
Profit (per Casino, per Player)
Patterns of Play
Modify
Casino Operations
Player Awards Program
This is one of the best investments that we have
ever made as a corporation and will prove to
forge key new business strategies and
opportunities in the future. John Boushy
(Harrah's CIO, 2000)
17Key Points Harrahs Total Reward
- One of the biggest success stories for CRM is in
fact a sibling game industry casinos It is, in
fact, the only visible sign of one of the most
successful computer-based loyalty schemes ever
seen. - well on the way to becoming a classic business
school story to illustrate the transformational
use of information technology - 26 of customers generate 82 of revenues
- "Millionaire Maker," which ties regional
properties to select "destination" properties
through a slot machine contest held at all of
Harrah's sites. Satre makes a personal invitation
to the company's most loyal customers to
participate, and winners of the regional
tournaments then fly out to a destination
property, such as Lake Tahoe, to participate in
the finals. Each one of these contests is
independently a valuable promotion and profitable
event for each property - 286.3 million in such comps. Harrah's might
award hotel vouchers to out-of-state guests,
while free show tickets would be more appropriate
for customers who make day trips to the casino - At a Gartner Group conference on CRM in Chicago
in September 1999, Tracy Austin highlighted the
key areas of benefits and the ROI achieved in the
first several years of utilizing the 'patron
database' and the 'marketing workbench' (data
warehouse). "We have achieved over 74 million in
returns during our first few years of utilizing
these exciting new tool and CRM processes within
our entire organization - John Boushy, CIO of Harrah's, in a speech at the
DCI CRM Conference in Chicago in February 2000,
stated "We are achieving over 50 annual
return-on-investment in our data warehousing and
patron database activities. This is one of the
best investments that we have ever made as a
corporation and will prove to forge key new
business strategies and opportunities in the
future."
18TSO Live Monitors, Summary Views
Embedded Profiler (Server Side) Automated Report
Generators
19Outline
- Background Metrics MMP
- Implementation Overview
- Metrics in TSO
- Applications Sample Charts
- Wrapup
- Lessons Learned
- Conclusions
- Questions
20ImplementationDriving Requirements
Low overhead
Common Infrastructure
Ease of use
21Key Points Driving Requirements
- Ease of use Information Management
- Adding probes
- Pointclick to find things, speed
- Automated collection aggregation of data
- Volume of data quickly becomes unmanagable
people stop looking. - Metrics are high entropy If you rely on a
person, some part will eventually become
unreliable. - If you cant rely on metrics, they become
useless, and then nobody bothers to look any
more. - if you cant get the information you _need_ out
of the information _available_, it isnt
_useful_ - Low RT overhead
- Dont disrupt the servers under study
- Positive feedback loops
- Shrodingers cat dilemma
- But, still need massive volumes of information
- Common Infrastructure
- Less code (than N separately targeted systems
- Bonus allows direct comparison of user actions
to load spikes
22Esper Architecture
Live Server CPUs
23Key Points
- Visualization tool parallel simulation
- Entity actions heavily drove CPU performance, but
scale (of both) made finding patterns problems
very difficult - Esper golden-age SF term, one with ESP (Andre
Norton, etc) - Peers into the internal workings of the
(distributed) mind, giving a high-level view of
the data - TSO Esper (v. 4) eliminate raw data
- Most of the time, detailed data never used
- Probes collect _at_ aggregate level
- Repeatable tests could be done with detailed
metrics, when required - Data capture server-side only
- Allows single infrastructure for engine player
data - Untrusted client privacy spoofing
- Summary-only data views means we can collect
aggregate-only data its most of what you need,
and is far cheaper - Probe internal to every server process
- Count/average/min/max values inside a fixed time
window - Log out values _at_ end of time_window, reset probes
24Event-level Sampling, Aggregated Reporting
esperProbes
esperStore
Min, Max, Av, Count
DBImporter
esperFetch
25Esper Probes
- Self-organizing class hierarchy
- Data driven new probes and/or new game content
immediately visible on web - Example ESPER_PROBE
- (Object.interaction.s, chair-gtpicked)
- (Object.interaction.puppet.s, self-gtpicked)
- Human-readable intermediate files
26EsperView Web-Driven Presentation
Daily Reports
Report Generator
Graph Caching Archiving
Filtering Meta Data
27Key Points
- Standard set of views posted to Daily Reports
page - Flexible report_generator to gen new charts
- Caching of large graphs (used in turn for
archiving historical views) - Noise filters (something big you just dont care
about right now) - Open source graphing system
28EsperView Hierarchical Presentation
Process-Level Collection
Server Cluster
Process Class
Process Instance
29Key Points Hierarchical Presentation
- Metrics were collected and tracked at the process
level, with two aggregated views - Server Farm (anyProbe, averaged across all
processes) - Process class (anyProbe, averaged across just the
simulators within a system) - Process instance (anyProbe, averaged within a
single simulator) - Data collections viewable at three levels
slide? - Server Farm (all processes)
- Process class (all simulators within a system)
- Process instance (a single login server)
- Viewable in timeOrder or dailySummary,
w/drill-down
30Outline
- Background Metrics MMP
- Implementation Overview
- Metrics in TSO
- Applications Sample Charts
- Wrapup
- Lessons Learned
- Conclusions
- Questions
31Applications of Metrics
Load Testing (Realistic Inputs)
Beta Testing Live Operations (Game Tuning,
Community Management)
Load Testing Live Operations (Server
Performance)
32Load TestingMonkey See / Monkey Do
Sim Actions (Player Controlled)
Sim Actions (Script Controlled)
33Key Points
- Measure userLoad _at_ peak in Live city
- Change user_behaviour in load testing script
(automated testing), using Esper to measure
emulatedLoad against liveLoad - Re-calibration as required (constant protocol /
code shifts) - Example WAH.txt
- Used in turn to
- measure the infrastructure for completeness is
the infrastructure ready for launch? - Find fix bugs
- Very realistic load testing!
- oh, thats what happens when 1,000 simulators
all start up at the same time - Client-side response metrics tracked separately
34Applications of Metrics
Load Testing Realistic Inputs
Beta Testing Live Operations Tuning/Mngmnt
Load Testing Live Operations Server Performance
35Key Points Game Play Analysis
- Game designers were heavy Esper users
- Validated metrics against community boards,
tests, - Most popular Interactions / Objects / places
- Trends
- Length of time in a house
- Chat rate
- Types of characters chosen
-
- Direct cycle, repeat N times
- Observe behaviour, tune play, observe changes
36History
Make Friends, Shake Hands beats out Give Money /
Get Money Least Used Disco Dancing
Meta Data
37Top TD Dance, Woohoo Bottom Dance
38Players per Lot
0 to 70 players lt 2 / lot 70 to 400 players gt
3 / lot
39Top Metrics Bug (sorta) Next Garden Gnome,
Toilet Bottom Buffet Table
40Beta numPlayers by numRMates
41Key Points Economy Analysis
- Where did the money come from?
- Where did it go?
- How much did users play the money sub-game?
- Av amount of made per player over 1st 10 days
42(No Transcript)
43Economy Detailed View
44Visitor Bonus Who Makes Money?
454 of top 5 windows??
46House Categories (Beta Test)
47Community Management
Community Actions Trends
Influencing Player Activity
Free Content
Tracking Problem Players
48Key Points Community Management
- Observing community behaviour
- Metrics that matter
- Influencing player behaviour via publishing
selected metrics - Example shifting users to Calvins Creek
- Cheap content
- Customer Service
- Whos being a pain?
- Cheaters / griefers /
49Marketing
In-Game Brand Exposure
Special Events
Press Release Teasers
50Key Points Marketing
- Press releases
- Teasers to catch media / free pub
- Paid sponsorship
- How many eyes on their brand, and for how long?
- Tracking special objects / events
51NYEve Kiss Count
Esper Cities All
Cities (extrapolated)
New Year's Kiss 32,560
271,333Be Kissed Hotly
7,674 63,950Be Kissed
5,658 47,150Be Kissed
Sweetly 2,967 24,725Blow a
Kiss 1,639
13,658Be Kissed Hello 1,161
9,675Have Hand Kissed 415
3,458
Total 52,074 433,949
52Applications of Metrics
Load Testing Realistic Inputs
Beta Testing Live Operations Tuning/Mngmnt
Load Testing Live Operations Server Performance
53(No Transcript)
54DB byte count oscillates out of control
55A single DB Request is clearly at fault
56(No Transcript)
57Most-Used DB Queries (unfiltered)
11,000,000 level Queries need attention, and
drown out others
58DB Queries (Filtered)
Filters on 11,000,000 level Queries show patterns
of 7,000 level Queries
59Incoming Outgoing Packets
60Outline
- Background Metrics MMP
- Implementation Overview
- Metrics in TSO
- Applications Sample Charts
- Wrapup
- Lessons Learned
- Conclusions
- Questions
61Lessons Learned
- Implement early
- Ownership, senior engineers
- Aggregated probes vs event-level tracking
- Automation collect / summarize / alarms
- There can be only one
62Key Points
- When to implement (a) when the system is complex
(b) before you need it - Implementation notes
- Easier report generator vs commercial data
reporting tool (flirter, no followthru) - Fully automated metrics engineer yourself out of
a job - Complex sub-system senior engineer needs to own
drive - Ease of use UI, UI, UI. Speed, speed, speed.
- Automate error checking on inputs
- Fast/easy turnaround on new metrics
- Integration with server logs
- Allows drilldowns by finding logs in same time
window via quikeasy web UI - Excellent compliment to automated testing
- Repeatable inputs accurate measurements allow
cutfit _at_ scale - Scale break cycle fast repeatable
- Closer integration with cityDB (lotso useful
data) - Too many metrics collection system
- Lack of a useful central system meant N people
went and did one for their (narrowly targeted)
needs - Categories of players vs playerEvent tracking
- Debatable which to pick, but event tracking did
not work out
63Conclusion Very Useful!
Game Design
Data Mining On Players Untapped Gold
Realistic Load Testing
Engine Fixes, Optimization
Server Cost, Launch Timing
Critical Feature Accessibility
64Key Points
- Critical Feature accessibility
- Collecting data is easy doing something useful
with it is much harder - High-level views of data
- Ease of use
- Fully automated collection / display /
errorChecking - Very useful!
- Game design
- Engine optimization
- Load testing accuracy
- Server internals
- Release planning (server capacity, launch timing)
- DB was least tested. Guess what was the sole
real problem _at_ launch? Guess why - Real user data different from load testing data.
- Assumed stress on DB would be from numberQueries,
not relationships. Thus no DB-specific stress
tests. - Oops. DB has most variable inputs _and_ has
what users care most about persistent results.
Should have pounded the snot out of it
pre-launch. - Build it early
- Data Mining on players is very, _very_ cool
- They are your source of your costs revenue
analyze / optimize loop to maximize profit - They shape your game tune your game based on
direct observation
65Questions
Slides available _at_ www.maggotranch.com/MMP