Machine Learning for Games

Summary

Machine learning (ML) gives game agents the ability to learn from experience rather than executing hand-authored rules. In games, ML is used for adaptive difficulty, procedural content generation, playtesting automation, and increasingly for NPC behaviour. The most relevant approaches for game AI are reinforcement learning (agents learn from reward signals) and genetic algorithms (populations evolve toward better solutions — see genetic-algorithms for detail). (Taulli, Artificial Intelligence Basics, see source-ai-basics; Yannakakis and Togelius, Artificial Intelligence and Games, see source-ai-and-games)

(Prof Charles, course lecture series, see source-game-ai-lectures)

AI, ML, and deep learning

These three terms are often conflated. They are nested concepts:

Artificial Intelligence (broadest)
  └── Machine Learning
        └── Deep Learning (narrowest)

AI — concerned with intelligent behaviour in artefacts: perception, reasoning, learning, communicating, and acting in complex environments. (Nils Nilsson, Artificial Intelligence: A New Synthesis)
Machine learning — AI systems that learn from data rather than executing hand-authored rules
Deep learning — ML using multi-layer neural networks; particularly effective for pattern recognition, image classification, and sequential prediction

Types of AI

Type	Description	Example
Weak AI	Focused on a single narrow task	Controlling flow in a pipe; a chess engine
Strong AI	Can adapt, problem-solve, or learn across domains	Not yet achieved reliably
AGI (Artificial General Intelligence)	More human-like; can generalise and perform a range of different tasks	Research goal; not yet achieved
Superintelligence	Faster, more accurate, and beyond human understanding	Speculative

Most AI in games today is Weak AI — purpose-built for a specific task (pathfinding, difficulty adjustment, procedural generation). Strong AI and AGI are research horizons, not production tools.

Brief history

Period	Milestone
1950	Turing Test proposed; Dartmouth Conference (John McCarthy coins artificial intelligence); information theory and symbolic reasoning
1964	ELIZA — first chatbot (Joseph Weizenbaum, MIT)
1975–1982	Expert systems (Edward Feigenbaum); neural networks conceptualised; optical character recognition; speech recognition
~1990	AI Winter I — limited compute, storage, and network capacity; combinatorial explosion; disappointing results
1990s–2000s	Machine learning resurgence; deep learning (pattern analysis, classification); big data; fast processors
1997	IBM Deep Blue defeats Garry Kasparov (chess)
2011	IBM Watson wins Jeopardy!; Apple integrates Siri into iPhone
2014	AI learns to recognise cats from YouTube videos (Google); DeepMind defeats Lee Sedol at Go
~2000s	AI Winter II — collapse of dedicated hardware vendors; disappointing scaling results
2015+	Transformers, large language models, Stable Diffusion; modern deep learning era

(Prof Charles, course lecture series, citing Lavenda/Marsden AI timeline)

Uncertainty and probability

Games and AI share a foundational relationship with uncertainty. ML uses probability to model likelihood and predict future events.

Basic probability rules

P(A) = outcomes where A happens / total outcomes
P(B | A) = P(A and B) / P(A)          // conditional probability
P(A or B) = P(A) + P(B) - P(A and B)  // addition rule
P(A and B) = P(A) * P(B|A)            // multiplication rule
P(not A) = 1 - P(A)

Bayes’ Theorem

Updates a belief (prior probability) in light of new evidence:

P(H | E) = ( P(E | H) * P(H) ) / P(E)

Term	Meaning
`P(H)`	Prior — initial belief before seeing evidence
`P(E	H)`
`P(E)`	Marginal — total probability of the evidence across all hypotheses
`P(H	E)`

Game application: Estimating a player’s current ability level from observed performance, then updating the estimate as more data arrives (dynamic difficulty adjustment).

Shannon entropy

Entropy measures the average information content of a probability distribution:

H(X) = -Σ p(x) * log₂(p(x))

Individual information content of event x:

I(x) = -log₂(p(x))

Example: A fair coin flip. P(heads) = 0.5, so I(heads) = -log₂(0.5) = 1 bit.

“Learning that an unlikely event has occurred is more informative than learning that a likely event has occurred.” — Goodfellow et al., Deep Learning

Low probability event → high information (surprising)
High probability event → low information (unsurprising)

Entropy underlies decision tree construction (information gain) and many ML scoring metrics.

ML algorithm taxonomy

Type	Description	Example algorithms	Game use
Supervised learning	Learn from labelled (input, output) pairs; tackles regression and classification	Decision Tree, Linear/Logistic Regression, Neural Networks	Player behaviour classification, cheat detection
Unsupervised learning	Discover structure in unlabelled data; groups similar examples	Clustering (k-means), PCA	Player segmentation, behaviour clustering
Reinforcement learning	Learn through action → reward signals; no labels needed	Q-Learning, SARSA, PPO	Adaptive NPCs, game-playing agents (AlphaGo, OpenAI Five)
Recommender systems	Learn to suggest items matching user preferences	Collaborative filtering	In-game recommendations, matchmaking

Search-based ML

Hill climbing: Iteratively moves toward higher-valued solutions in the search space. Fast but gets stuck in local optima.

Simulated annealing: Like hill climbing, but with a probability of accepting a worse solution (decreasing over time). Allows escape from local optima at the cost of slower convergence.

Reinforcement learning

RL is the most directly applicable ML paradigm for game AI. An agent learns which actions produce rewards by interacting with an environment.

Formal model

States S — all possible configurations the environment can be in
Actions A — all possible actions the agent can take
Rewards R — scalar feedback signal received after taking an action
Policy π — the agent’s mapping from states to actions (what we want to learn)

Temporal difference (TD) learning

The agent learns a Q-table: a mapping from (state, action) pairs to expected cumulative reward. Q-values are updated iteratively using the Bellman equation:

Q(s, a) ← Q(s, a) + α * (r + γ * max_a' Q(s', a') - Q(s, a))

Symbol	Meaning
`α`	Learning rate (0–1): how fast new information overrides old
`γ` (gamma)	Discount factor (0–1): how much future rewards matter vs. immediate rewards
`r`	Reward received on transitioning to new state `s'`
`max Q(s', a')`	Best expected future reward from the next state

Q-Learning is off-policy: the update always uses the best possible next action, regardless of what the agent actually does.

SARSA is on-policy: the update uses the action the agent actually took next (making it more conservative in risky situations).

Action selection

During learning, the agent must balance:

Exploitation (greedy): always take the action with the highest current Q-value
Exploration (probabilistic): sometimes choose randomly to discover better options

ε-greedy: Choose the best action with probability (1-ε); choose randomly with probability ε.

Probabilistic greedy (proportional): The probability of choosing action a equals Q(s,a) / Σ Q(s, a') — every action gets some chance, proportional to its Q-value.

Worked example (three-state world)

States S0, S1, S2. S1 has reward 3, Qmax 5. S2 has reward 7, Qmax 6. α=0.6, γ=0.5.

Time 0:  Q(S0→S1)=0,   Q(S0→S2)=0
Time 1:  Q(S0→S1) = 0 + 0.6*(3 + 0.5*5 - 0) = 3.3
Time 2:  Q(S0→S2) = 0 + 0.6*(7 + 0.5*6 - 0) = 6.0
Time 3:  Q(S0→S2) = 6.0 + 0.6*(7 + 0.5*6 - 6.0) = 8.4
Time 4:  Q(S0→S2) = 8.4 + 0.6*(7 + 0.5*6 - 8.4) = 9.12
Final:   Q(S0→S1) = 3.3 + 0.6*(3 + 0.5*5 - 3.3) = 4.62

The agent converges on preferring S2 (higher reward + higher future value). This generalises to any Markov decision process.

Game applications of RL

Use case	State space	Reward signal
Enemy combat AI	Player position, health, distance	Damage dealt - damage received
Game-playing agent	Board/game state	Win/loss/score delta
Difficulty adjustment	Player performance history	Error from target win rate
Resource management AI	Resource counts, time, territory	Score/survival time

Neural networks

Artificial Neural Networks (ANNs) model computation loosely after biological neurons. Each neuron sums weighted inputs and applies an activation function; layers of neurons compose complex non-linear functions.

Architecture

Input layer → Hidden layer(s) → Output layer

Input: game state features (positions, health, distances, etc.)
Hidden: learned feature representations
Output: action probabilities or Q-values

Backpropagation algorithm

Training a network with labelled data:

Feed input forward through all weights to compute output
Compute loss (error between output and target)
Back-propagate error through the network (chain rule)
Update weights to reduce loss (gradient descent)
Repeat until training error is acceptable

Neural networks + RL = Deep RL

When the state space is too large for a Q-table (e.g. pixels), a neural network approximates the Q-function. The input is the game state; the output is Q-values for each action. This is the approach used in DeepMind’s Atari-playing agents and Unity ML-Agents.

Unity ML-Agents provides a ready-made RL training pipeline that connects to a Python backend (PyTorch), enabling training of neural network policies directly inside Unity. See Unity ML-Agents documentation for setup.

Neuroevolution

Instead of backpropagation, evolve network weights using a genetic algorithm (see genetic-algorithms). Suited to environments where:

No labelled training data exists
Performance can only be measured through interaction
The fitness landscape has many local minima that gradient descent would get stuck in

Convolutional neural networks (CNNs)

When the input is pixel data (an image or game screen), a standard fully-connected network is impractical — the parameter count explodes. CNNs apply learned filters spatially, reducing dimensionality before passing to fully-connected layers.

Canonical game AI architecture (DeepMind Atari agent):

Input:      84×84×4 frames (grayscale; 4 stacked for velocity)
Conv 1:     8×8 filters, 16 feature maps → 20×20×16
Conv 2:     4×4 filters, 32 feature maps → 9×9×32
FC layer:   256 units
Output:     Q-values for each action (4 outputs for Atari)

The stack of frames encodes motion implicitly — two identical frames would look static; four consecutive frames allow the network to detect velocity.

Generative Adversarial Networks (GANs)

A GAN trains two networks adversarially:

Generator (G) — produces fake samples from random noise
Discriminator (D) — tries to distinguish real samples from generated ones

Training loop:

G produces a fake image
D scores it real or fake; backpropagation updates D to be a better detector
Backpropagation also updates G to fool D more effectively
At equilibrium, G produces samples indistinguishable from real data

Game applications: texture generation, concept art, character sprite synthesis, upscaling low-resolution assets. GANs underlie many commercial generative art tools (see generative-ai-game-dev).

genetic-algorithms — evolutionary optimisation; detailed GA mechanics; Knapsack, TSP, Lunar Lander examples
neuroevolution — evolving neural networks rather than training them only with gradients
game-ai-agent-design — applying ML to autonomous agents; Behaviour Trees; GOAP
ai-state-machine-pattern — simpler rule-based alternative to ML for NPC behaviour
player-modelling — one of the major applied uses of ML in games
procedural-generation — PCGML: applying ML to content generation
generative-ai-game-dev — practical generative AI tools (Muse, Firefly, Scenario AI, SoundRaw, Copilot); AI in the development pipeline
overview-artificial-intelligence-in-games — synthesis view across classical AI, learning, PCG, and player models
reward-systems — game design reward schedules echo RL reward signal design
source-game-ai-lectures

GDnD Wiki

Explorer

GDnD Wiki

Machine Learning for Games

Summary

AI, ML, and deep learning

Types of AI

Brief history

Uncertainty and probability

Basic probability rules

Bayes’ Theorem

Shannon entropy

ML algorithm taxonomy

Search-based ML

Reinforcement learning

Formal model

Temporal difference (TD) learning

Action selection

Worked example (three-state world)

Game applications of RL

Neural networks

Architecture

Backpropagation algorithm

Neural networks + RL = Deep RL

Neuroevolution

Convolutional neural networks (CNNs)

Generative Adversarial Networks (GANs)

Graph View

Table of Contents

Backlinks

GDnD Wiki

Explorer

Machine Learning for Games

Summary

AI, ML, and deep learning

Types of AI

Brief history

Uncertainty and probability

Basic probability rules

Bayes’ Theorem

Shannon entropy

ML algorithm taxonomy

Search-based ML

Reinforcement learning

Formal model

Temporal difference (TD) learning

Action selection

Worked example (three-state world)

Game applications of RL

Neural networks

Architecture

Backpropagation algorithm

Neural networks + RL = Deep RL

Neuroevolution

Convolutional neural networks (CNNs)

Generative Adversarial Networks (GANs)

Related

Graph View

Table of Contents

Backlinks