Summary
Probability is an essential quantitative tool for game designers — from balancing loot drop rates to predicting how often a complex multi-step mechanic will trigger. Hiwiller (Players Making Decisions, Chs. 29–31) argues it should be treated as a core design competency, not a specialist skill: “Those who shy away from math should be comforted in knowing that probability is just advanced counting.”
The second half of the argument — covered in Ch. 30–31 — is that modern spreadsheet tools (Excel, Google Sheets) remove the mathematical barrier almost entirely: Monte Carlo simulation in a spreadsheet can answer in minutes questions that would take hours or days to solve analytically.
(Hiwiller, Players Making Decisions, see source-players-making-decisions)
Fundamentals
Probability is fancy counting
Probability = Count of Specific Event / Count of All Events
Probability always falls between 0 (never happens) and 1 (always happens). It can be expressed as a fraction, decimal, or percentage — these are equivalent representations.
Example: drawing an ace from a standard 52-card deck:
P(Ace) = 4 aces / 52 cards = 0.077 = 7.7%
Design usage: use this formula whenever you need to verify that a loot drop, random encounter, critical hit, or special event is occurring at the rate you intended.
Joint probability (AND events)
When two or more independent events must all occur, multiply their probabilities:
P(A AND B) = P(A) × P(B)
Example: three fair coins must all land heads:
P(H AND H AND H) = 0.5 × 0.5 × 0.5 = 0.125 = 12.5%
Design usage: whenever a mechanic requires multiple conditions to be true simultaneously (hit AND crit AND target is debuffed), multiply the probabilities. Complex compound conditions rapidly become very rare.
Independence requirement: this formula only works if the events are independent — if one event’s outcome does not affect the other’s probability. See dependent events below.
Dependent events
When one event changes the probability of another (because the outcome space is altered), the events are dependent and must not be multiplied naively.
Example: drawing two aces from the same deck. After drawing the first ace, only 3 aces remain in 51 cards:
P(first ace) = 4/52 = 7.7%
P(second ace | first ace drawn) = 3/51 = 5.9%
P(two aces) = (4/52) × (3/51) = 0.0045 = 0.45%
This is notably different from the naive (incorrect) calculation of (4/52)² = 0.6%.
Design usage: deck-building mechanics, draw-without-replacement systems (Slay the Spire), and any mechanic where outcomes are drawn from a finite pool require dependent event calculations.
Conditional probability
Conditional probability is the probability of event A occurring given that event B is known to have occurred:
P(A | B) = P(A and B) / P(B)
The vertical pipe ”|” reads as “given.”
Example: from a survey of 10 children (7 boys, 3 girls; 6 chose spaceships, 4 chose ponies; 5 boys chose spaceships):
P(Spaceships | Boy) = P(Spaceships AND Boy) / P(Boy)
= (5/10) / (7/10)
= 5/7 = 71.4%
This is different from P(Spaceships) = 60% — knowing the player is a boy changes the probability.
Design usage: conditional probability matters when player demographics, choices, or prior game states affect the probability of subsequent events. It is also the foundation for Bayesian player modelling in analytics.
Adding dice: why bell curves matter
Flat vs bell-curve distributions
A single die (d20, d6) produces a flat (uniform) distribution — every outcome is equally likely.
Multiple dice summed produce a bell-curve (normal-like) distribution — middle values are much more likely than extremes.
With 3d6 (three six-sided dice summed), there is only one way to roll an 18 (6+6+6), but 27 ways to roll a 10 or 11. Rolling a 10 is 27× more likely than rolling an 18.
Naive estimate: P(16 or higher with 3d6) = 3/16 = 18.75% ← WRONG
Correct count: P(16 or higher with 3d6) = 10/216 = 4.6%
The naive estimate treats each outcome as equally likely and is dramatically wrong. The correct calculation requires counting all the ways each sum can occur.
The key rule: summing N independent dice of the same type does not produce the same distribution as rolling a single die with N sides. More dice = steeper bell curve = greater clustering around the average.
| Distribution | Shape | Use case |
|---|---|---|
| 1d20 | Flat — all values equally likely | Maximum variance; anything can happen |
| 2d10 | Gentle bell curve | Moderate average clustering |
| 3d6 | Steeper bell — extremes rare | Characteristic ability score distribution |
| 4d4+4 | Even steeper; range 8–20 | Constrained range; near-mean outcomes dominant |
Design applications:
- Stat blocks using 3d6 produce characters clustered around 10–11 rather than equally distributed from 3–18 — a design choice that shapes character diversity
- Combat resolution using multiple dice is more predictable than a single die — streaks are rarer, outcomes are more consistent
- Loot or drop systems using single dice have high variance; using multiple dice with a sum threshold have low variance
The H/T game: probability surprises
Hiwiller’s H/T game illustrates how probability fools our intuitions. In a perfectly fair 1000-flip coin game (each flip 50/50), the lead changes hands far less often than people expect: one player may hold the lead for hundreds of consecutive flips before it switches, despite the game being completely fair. This is not a broken random number generator — it is the expected behaviour of fair independent random processes over long sequences.
Design implication: players will perceive patterns and streaks in truly random outcomes and conclude the system is broken. This is the gambler’s fallacy (see cognitive-biases-in-games). Consider whether to add variance reduction (pity timers, streak tracking) or clearer probability communication.
Expected value in game decisions
Expected value (EV) is the average outcome of a random event, weighted by probability:
EV = Σ (P(outcome) × Value(outcome))
Example: a game show choice between two defense plays (run vs pass defense against an opponent who runs 50% and passes 50%):
- Run defense: EV = 0.5 × 2 + 0.5 × 20 = 11 yards surrendered
- Pass defense: EV = 0.5 × 10 + 0.5 × 5 = 7.5 yards surrendered → better choice
EV is a fast balancing check: if two upgrades have equal expected value, they are roughly equivalent and neither is dominant. Large EV gaps between nominally equal options indicate balance problems.
Limitation: players are not EV maximisers (see cognitive-biases-in-games). They are risk-averse in some contexts (insurance) and risk-seeking in others (lottery). Balance around EV but test player perception through playtesting.
Monte Carlo simulation
Monte Carlo simulation is the method of answering complex probability questions by generating a large number of random trials and observing the distribution of outcomes. For game designers, this means running a simulated game in a spreadsheet thousands of times and reading off aggregate statistics.
Named after the Casino de Monte Carlo (though actually developed by John von Neumann for WWII atomic bomb research — the casino name was a code name because it involved random numbers).
Why use simulation instead of calculation?
Many game probability questions are intractable by hand but trivial in a spreadsheet:
- “How often will the super-attack trigger per round under these conditions?”
- “Does advantage in D&D 5e give too much benefit?”
- “What is the optimal strategy for the Monty Hall problem?”
A few minutes of spreadsheet setup answers these questions definitively, eliminating guesswork from balancing.
The basic workflow
- Identify what to track: list the random events in the mechanic
- Create columns for each random element and derived state
- Use
=RAND()or=RANDBETWEEN(a,b)to generate random values - Use
=IF()statements to compute derived states (hit/miss, streak count, trigger condition) - Drag formulas down for N rows (N = number of rounds/trials per simulation)
- Use a one-way data table to run the simulation M times (e.g., 1000 trials)
- Use
=COUNTIF(),=AVERAGE(),=MIN(),=MAX()to summarise results
Example 1 — Super attack balance
Problem: a player makes 100 attacks per round at 50% hit chance. Four consecutive hits trigger a “super attack.” The designer wants approximately 10 super attacks per round. Are there enough?
Method: spreadsheet with columns for (1) attack number, (2) random roll, (3) hit/miss, (4) consecutive hits count, (5) super attack flag. Run 1000 trials via a one-way data table. Count super attacks per trial.
Result: at 50% hit rate, most trials produce far fewer than 10 super attacks. By testing different hit rates, the designer finds that 57.5% hit rate produces approximately 10 super attacks per round on average — but with high variance (individual trials range from 0 to 29). The designer may need to change the mechanic design, not just tune the numbers.
Example 2 — D&D Advantage/Disadvantage
Problem: D&D 5e’s Advantage mechanic (roll two d20, take the higher) was introduced to replace flat numeric bonuses (+3, etc.). How much benefit does Advantage actually provide?
Method: generate 1000 pairs of d20 rolls, compute straight/advantage/disadvantage averages.
Result: Advantage gives an average of +3.34 over a straight roll; Disadvantage gives -3.30. This is roughly equivalent to the old “+3 / -3 modifier” system in terms of average outcome — a useful calibration point for designers converting between the systems.
Example 3 — Monty Hall problem
Problem: a game show host reveals a goat behind one of three doors after the player picks. Should the player switch?
Method: simulate 5000 trials. Track where the prize is, which door the player chose, which door the host reveals (always a non-prize, non-chosen door), and whether switching results in a win.
Result: switching wins 66.7% of the time, confirming Marilyn vos Savant’s answer. The counterintuitive result becomes obvious from the data: by not switching, you only win if you initially picked correctly (33% of the time). The simulation resolves disagreements that pure analysis cannot.
Key spreadsheet functions
| Function | Use |
|---|---|
=RAND() | Random decimal between 0 and 1 |
=RANDBETWEEN(a, b) | Random integer between a and b |
=IF(condition, "Hit", "Miss") | Binary event resolution |
=COUNTIF(range, "Hit") | Count occurrences of an outcome |
=AVERAGE(range) | Average outcome across trials |
=MIN(range) / =MAX(range) | Range of outcomes across trials |
| One-way Data Table | Re-run the simulation N times automatically |
| Goal Seek / Solver | Find the input value that produces a target output |
Goal Seek is particularly valuable: rather than manually testing different hit rates to achieve 10 super attacks, use Goal Seek to find the exact hit rate that produces exactly 10.
In practice
When to use probability calculation (fast, exact):
- Single-event probabilities (drop rates, encounter chances)
- Joint probability of simple compound conditions (hit AND crit AND enemy debuffed)
- Quick EV comparisons between two balanced upgrade paths
When to use Monte Carlo simulation (fast for complex, good for validation):
- Multi-step mechanics with dependent state (streak systems, hot/cold momentum)
- Distribution questions (“how often will this trigger?” not just “what is the average?“)
- Verifying that complex systems behave as intended before shipping
- Resolving design debates with data rather than argument
Open questions
- Simulation gives average outcomes and distributions, but not subjective experience of variance. A mechanic that triggers 0–20 times per round with an average of 10 may feel broken to players who hit 0 or 20, even if the average is correct. How do designers decide what level of variance is acceptable?
- Monte Carlo simulation assumes the designer has correctly modelled the game mechanic in the spreadsheet. Errors in the simulation model produce wrong answers with false confidence. What validation practices reduce this risk?
- Conditional probability is the foundation for Bayesian game analytics — predicting what type of player is currently playing based on their choices. Where does this become ethically concerning?
Related
- randomness-in-games — Skill/luck spectrum; how randomness affects player experience and fairness perception
- cognitive-biases-in-games — Why players systematically misunderstand probability; gambler’s fallacy; compound event errors
- game-balance — EV calculations as a mathematical balancing method; Sellers’ four-method taxonomy
- progression-and-power-curves — Power curves and attribute weight coefficients as the analytical layer of intransitive balance
- game-analytics — Post-launch probability and statistics in player data analysis
- source-players-making-decisions