Summary
Playtesting is the structured process of observing real people playing a game in order to gather data about the player experience. It is the primary tool for closing the gap between intended design (the inscribed layer) and actual player experience (the dynamic layer) — the gap that the mda-framework identifies as the central challenge of game design.
Jesse Schell treats playtesting as so fundamental that he devotes multiple chapters to it (Lens #91). Jeremy Gibson Bond structures it around expanding circles of testers, distinguishes the investigator role from the playtester role, and provides formal methods for structured sessions.
(Schell, The Art of Game Design; Bond, Introduction to Game Design, Prototyping, and Development, see source-art-of-game-design, source-introduction-game-design-prototyping)
Investigator vs. playtester
Bond (Ch. 10) distinguishes two distinct roles in a playtest session:
- Investigator — administers and manages the test. Prepares the environment, runs the script, observes, takes notes, and interprets results. The investigator is focused on data collection.
- Playtester — participates in the test. Plays the game and provides feedback. The playtester should not know the investigator’s hypotheses.
In small teams, one person may do both — but the risk is confirmation bias: a designer who is both investigator and designer tends to see what they want to see. If possible, separate the roles.
Circles of playtesters (Bond)
Bond (Ch. 10) structures playtesters into four expanding circles, each offering different feedback and carrying different risks:
[ You ]
→ [ Trusted friends / tissue playtesters ]
→ [ Acquaintances / other designers ]
→ [ Internet / public ]
1. You — Test your own game first. Catch the most obvious issues. Risk: you cannot see your own blind spots; you know the design intent too well to experience it as a naïve player.
2. Trusted friends (tissue playtesters) — Close contacts willing to give honest feedback. Critical for early, fragile builds. Risk: may be too supportive, or too harsh to be useful. First-impression value is very high.
3. Acquaintances / other designers — People who represent target players or have relevant expertise. More objective than friends. Willing to critique without social cost. This circle provides the most reliable data on typical player experience.
4. Internet / public — Open beta, crowdfunding campaigns, app store releases. Reaches the widest audience. Risk: feedback is unfiltered, volume is hard to process, and public impressions persist.
Move outward through the circles as the game matures. Testing with the internet before inner circles is almost always a mistake.
Tissue playtesters
A tissue playtester is someone whose first-impression feedback has not yet been used up — they have never seen the game (or the feature being tested). The term comes from the idea that, like a tissue, they are single-use.
Tissue playtesters are irreplaceable for:
- Tutorial and first-level testing — only a naïve player reveals onboarding failures
- Difficulty progression — a playtester who has already learned the mechanic cannot feel the difficulty of first encounter
- Emotional impact — surprise, mystery, and awe are one-time experiences; a repeat playtester cannot recapture them
Once a tissue playtester has seen the game, they can still provide useful feedback — but on different questions (balance, late-game pacing, replayability). They cannot recover their naïve perspective.
Implication for student projects: Treat each person as a tissue playtester resource. Plan who will test what, in what order, to avoid wasting first impressions.
Key ideas
Types of playtest
- Informal / individual: Play with one person, little structure. Fast and flexible. Bond’s “informal individual testing” is a conversation with minimal formality. Good for early, unstable builds. Risk: low signal-to-noise; tester may feel social pressure.
- Formal individual: Structured session with a single playtester, a prepared script, defined tasks, and post-session survey. Investigator observes and takes notes. High fidelity but time-intensive.
- Formal group: Multiple playtesters, often with a facilitator. Good for multiplayer testing; also useful for gathering diverse perspectives in a single session. Group dynamics can suppress individual reactions.
- Expert test: Experienced game designers or genre experts. Useful for balance and exploit-finding; not representative of typical player experience.
- Target audience test: Playtesters who match the game’s intended audience. Their experience is the ground truth for design decisions.
The observation rule
The single most valuable playtesting skill is observation — watching what players do, not listening to what they say they would do. Players are bad at predicting their own behaviour but excellent at demonstrating it.
- Watch eyes and body language: Where does the player look? Where do they hesitate? What frustrates them physically?
- Watch what they try, not what they say: Players often request features they don’t actually need; they rarely articulate the underlying problem.
- Don’t explain. Don’t defend. If a player is confused, that confusion is data. Explaining your intent removes the signal. Defending your design transforms the playtest into a debate.
The “don’t defend” principle
The most common and damaging mistake in playtesting is the designer explaining or defending their design in response to player confusion or negative feedback. When you explain the mechanic, the player understands it — but the data point (they didn’t understand it on their own) is now lost. The game must work without you present.
What to measure
- Confusion: Where do players get stuck, look uncertain, or ask questions?
- Frustration: Where do players sigh, stop playing, or express negative emotion?
- Engagement: Where are players leaning in, smiling, or playing past the intended session length?
- Session length vs. intended length: Did players play longer or shorter than expected? Why?
- First-time behaviour: What does a player do in the first 30 seconds? The first 5 minutes?
In practice
Before the playtest:
- Define what question you are trying to answer. A playtest without a specific question produces vague data.
- Prepare a silent observation protocol — have someone else run the session if possible so you are free to watch.
During the playtest:
- Take timestamped notes. Mark moments of confusion, frustration, and delight.
- Resist all urge to intervene unless the player is completely stuck and the session is collapsing.
After the playtest:
- Distinguish symptoms from causes. “Players didn’t pick up the key” is a symptom. “The key was visually indistinct from the environment” is a cause. Fix causes, not symptoms.
- Prioritise by frequency — issues that every tester hit matter more than issues one tester mentioned.
Formal testing procedures (Bond)
Formal playtests require preparation to produce reliable, repeatable data.
The test script
Before running a formal session, prepare a script covering:
- What to say at the start — introduction, goals, what you’re testing (without revealing your hypotheses)
- How to react to questions — typically: “What would you expect to happen?” or “What are you trying to do?” Do not answer gameplay questions.
- Environment setup — hardware, lighting, recording permissions
- Survey questions — what to ask after the session; standardised so results can be compared across sessions
- Note-taking guidelines — what categories of observation to record
Without a script, informal biases creep in. The investigator unconsciously prompts, guides, or reacts differently across sessions, making results incomparable.
Playtest notes format (Bond)
Bond recommends a standardised note format per observation:
| Field | Content |
|---|---|
| Number | Sequential note ID for this session |
| Player | Name / contact for follow-up |
| Location in game | Where in the game (level, scene, time in session) |
| Feedback verbatim | Exact words or action observed — do not paraphrase yet |
| Underlying issue | Your interpretation of the root cause |
| Severity | High / Medium / Low / As Designed / FoL (Fact of Life — cannot be changed) |
| Proposed solution | Initial design hypothesis for addressing the issue |
Recording verbatim before interpreting is critical. The interpretation step introduces designer bias; keeping it separate preserves the raw data.
Sellers: playtest methods in practice
Sellers (Advanced Game Design, Ch. 12) provides a practitioner’s account of playtesting that complements Bond’s formal methods with specific techniques for structured data collection.
The silent observation rule
The single most important behaviour during a playtest is silence. The designer-observer must not:
- Explain mechanics the player is struggling with
- Answer questions about rules
- React visibly to player choices (neither approval nor dismay)
Every intervention contaminates the session. The player’s confusion, struggle, and eventual resolution — or failure to resolve — is precisely the data the test is designed to capture. Once you explain the mechanic, the data point is gone.
The only valid intervention: if the player is completely blocked and the session has collapsed entirely (no actions are being taken, the player has expressed frustration and stopped trying), a neutral redirect is acceptable: “What are you trying to do right now?” This surfaces the player’s mental model without resolving the problem for them.
Posttest survey: Likert-scale questions
After the session, a structured posttest survey captures player attitudes that observation cannot directly measure. Sellers recommends Likert-scale questions (typically 1–5 or 1–7 agreement scales) for consistency and comparability across sessions:
Examples:
- “I found the game easy to understand.” (1 = strongly disagree, 5 = strongly agree)
- “I felt in control of my character.”
- “I wanted to keep playing when the session ended.”
- “The difficulty felt fair.”
Why Likert scales: results can be averaged across testers, tracked across iterations, and compared with statistical significance (with sufficient sample sizes). Open-ended questions produce richer qualitative data but are harder to aggregate. Use both: Likert for trend data, open-ended for explanation.
Directional analysis
Sellers distinguishes between the direction and magnitude of a problem. Raw playtest data tells you that something is wrong; directional analysis tells you which way to fix it.
Method: after each test, classify each observed problem:
- Direction: should this be made easier or harder? More telegraphed or more subtle? Faster or slower?
- Confidence: do multiple testers agree on the direction, or do they diverge?
Divergent directions (half of testers find it too hard, half find it too easy) indicate not a tuning problem but a design problem — the underlying mechanic may not be communicating clearly enough to produce consistent play. Tuning alone will not fix it.
Players diagnose, designers solve
A critical principle for interpreting feedback: players accurately diagnose symptoms; they rarely correctly identify causes or solutions.
When a player says “the jumping feels broken,” they are reporting a symptom. The cause might be input lag, jump arc shape, collider size, ground detection logic, or visual feedback timing. When a player says “you should add a double jump,” they are proposing a solution — but it may not address the actual underlying problem.
Collect player diagnoses (symptoms) faithfully. Treat player-proposed solutions as data about what the player wants to feel, not as prescriptions for what to build.
(Sellers, Advanced Game Design, Ch. 12, see source-advanced-game-design)
Open questions
- How many playtests are enough? Nielsen’s heuristic (5 users find 85% of usability issues) comes from UX research — does it transfer to games?
- Remote playtesting (screen sharing, recorded sessions) has grown in practice. What is lost vs. gained compared to in-person observation?
- When should you playtest? Both Schell and Bond argue: as early as possible with paper prototypes, before building digital systems.
- Tissue playtesters are one-use per feature. How should small student teams plan their use across an 8–12 week semester project?
Related
- flow — Playtesting reveals whether players are in the flow channel
- interest-curves — Playtest data can be mapped to the intended interest curve to find divergences
- holographic-design — Playtesting is the primary tool for observing the skin layer
- layered-tetrad — Playtesting observes the Dynamic layer; identifies gaps between Inscribed intent and Dynamic outcome
- mda-framework — Playtesting closes the mechanics → dynamics → aesthetics gap
- prototyping — Playtesting and prototyping are tightly coupled
- sprints — The Sprint Review is a structured playtesting session with stakeholders every 1–3 weeks; see source-agile-game-development
- scrum-in-game-development — Agile’s inspect-and-adapt cycle treats each Sprint Review as an empirical test of the game’s current state
- player-centric-design — Playtesting is the operational tool for player-centric design
- design-lenses — Lens #91 (Playtesting)
- source-art-of-game-design
- source-introduction-game-design-prototyping
- source-advanced-game-design