Summary
Simulating large numbers of NPCs — hundreds or thousands — requires fundamentally different architectural choices from designing a small number of high-fidelity agents. The challenge is computational: each agent must think, navigate, and animate, and the cost multiplies directly with agent count. The approach described here is drawn from Project Highrise by SomaSim, which simulated 1000 NPCs at 60 FPS on commodity hardware (Zubek, Game AI Pro 360, Ch. 10, see source-game-ai-pro-360-character-behavior).
The core insight is: simplify the AI to match the required fidelity, not to match academic completeness. When NPCs live stereotyped, routine lives (office workers, apartment dwellers), a planner is solving the wrong problem. A daily script scheduler is cheaper, more controllable, and sufficient.
Key ideas
- Match fidelity to need: Detailed internal state (hunger, mood, etc.) is expensive and produces confusing population-level behaviour when multiplied by hundreds. Abstract it unless the game explicitly requires it.
- Open-loop action performance: Only re-evaluate world state when the action queue is empty. This trades responsiveness for dramatic performance savings.
- Domain-specific representation: Generic pathfinding on a raw grid does not scale. Reformulate the problem using domain knowledge to reduce the search space.
- Cheap restart over complex failure handling: When inexpensive action selection enables it, abandon failed actions and restart rather than building elaborate recovery logic.
Architecture — two-part separation
Split NPC AI into exactly two responsibilities:
- Action selection: What should I do? — produces a sequence of actions.
- Action performance: How do I carry out this action? — executes the sequence.
This separation keeps both parts simple and independently testable.
Action selection
Option A: Propositional planning (explored and rejected for this use case)
Project Highrise initially used a propositional planner. A propositional planner uses bit-vector pre/postconditions, enabling fast bitwise operations. Plans are cacheable.
// Propositional rule example
Rule: Preconditions: at-home & is-hungry
Action: go-to-restaurant
Postconditions: ~at-home & at-restaurant
Performance was good, but the planner was solving the wrong abstraction. The team wanted to author entire daily routines, not individual reactive steps. Planning was abandoned in favour of daily scripts.
Key takeaway: planning is excellent for prototyping AI design space (flexible, declarative), but for stereotyped NPC lives, it is over-engineered.
Option B: Daily script schedules (chosen approach)
Each NPC type has a daily schedule: a data structure defining blocks of continuous activity and one-shot events with probabilities.
// Daily script example (adapted from Zubek, Ch. 10)
name "schedule-office.7"
blocks [
{ from 8 to 20 tasks [ go-work-at-workstation ] }
{ from 20 to 8 tasks [ go-stay-offsite ] }
]
oneshots [
{ at 8 prob 1.0 tasks [ go-get-coffee ] }
{ at 12 prob 1.0 tasks [ go-get-lunch ] }
{ at 15 prob 0.5 tasks [ go-get-coffee ] }
{ at 20 prob 0.5 tasks [ go-get-dinner ] }
{ at 20 prob 0.25 tasks [ go-get-drink ] }
]
Blocks define what the NPC does continuously within a time window. Oneshots are optional interruptions at specific times, with an independent probability of firing.
Scripts bottom out in sequences of primitive actions (go to, sit, wait, etc.) that the action performance layer executes.
Unity sketch
[System.Serializable]
public struct ScheduleBlock
{
public float startHour;
public float endHour;
public string taskName;
}
[System.Serializable]
public struct ScheduleOneshot
{
public float atHour;
[Range(0, 1)] public float probability;
public string taskName;
}
[CreateAssetMenu]
public class DailySchedule : ScriptableObject
{
public ScheduleBlock[] blocks;
public ScheduleOneshot[] oneshots;
}
public class NPCScheduler : MonoBehaviour
{
[SerializeField] private DailySchedule schedule;
private Queue<string> actionQueue = new();
public void PlanCurrentDay(float currentHour)
{
actionQueue.Clear();
foreach (var block in schedule.blocks)
{
if (currentHour >= block.startHour && currentHour < block.endHour)
actionQueue.Enqueue(block.taskName);
}
foreach (var shot in schedule.oneshots)
{
if (shot.atHour >= currentHour && Random.value < shot.probability)
actionQueue.Enqueue(shot.taskName);
}
}
}Action performance — open-loop queues
Most NPC AI systems are closed-loop: they monitor the world continuously and adapt mid-action. This is expensive. For NPCs in benign game worlds (a building management sim), closed-loop monitoring is largely wasted computation.
Open-loop model
1. Action selection produces a sequence of primitive actions.
2. Actions execute in order from the queue.
3. Each action may define a failure condition (e.g. path not found).
4. On failure: flush queue, optionally enqueue a fallback script.
5. When queue is empty: run action selection again.
The system only evaluates world state at the start of each script, not continuously. If something unexpected happens, restart — action selection is cheap enough to make this viable.
public class OpenLoopActionQueue : MonoBehaviour
{
private Queue<NPCAction> queue = new();
private NPCAction current;
private NPCScheduler scheduler;
void Update()
{
if (current == null || current.IsComplete)
{
if (queue.Count == 0)
RefillQueueFromSchedule();
current = queue.Count > 0 ? queue.Dequeue() : null;
current?.Begin(this);
}
else if (current.HasFailed)
{
queue.Clear();
EnqueueFallback();
current = null;
}
else
{
current.Tick(this);
}
}
void RefillQueueFromSchedule()
{
scheduler.PlanCurrentDay(GameClock.CurrentHour);
foreach (var task in scheduler.GetPlannedActions())
queue.Enqueue(NPCActionFactory.Create(task));
}
void EnqueueFallback()
{
// Play "unhappy" animation and complain about missing resources
queue.Enqueue(new ComplainAction());
}
}Pathfinding optimisation — domain-specific graph reduction
A 100-storey, 150-tile-wide building has ~15,000 grid cells. A* on the raw grid with 1000 NPCs is infeasible. But the domain has structure: movement inside a floor is always a straight line; movement between floors always uses a connector (stairs, escalator, lift).
Graph reduction
Raw: 15,000 grid cells (100 × 150 grid)
High-level: ~100 nodes (floor plates + connectors)
Edges: ~400 (each plate connects to adjacent plates via connectors)
- Floor plate: A contiguous horizontal sequence of tiles — movement inside is straight-line.
- Connector: A staircase, escalator, or lift shaft connecting adjacent floors.
public class BuildingGraph
{
public List<FloorPlate> plates = new();
public List<Connector> connectors = new();
// Build graph from level layout
public void Build(TileGrid grid)
{
// Extract floor plates from row-scan
// Extract connectors from designated tile types
// Link plates to connectors and neighbours
}
// A* over plates + connectors — search space is ~100 nodes
public List<GraphNode> FindPath(FloorPlate from, FloorPlate to)
{
return AStarSearch(from, to, HeuristicCost);
}
}After reaching the destination plate, movement is a simple straight-line walk. No tile-level pathfinding needed.
This reduction made A* so fast that algorithm choice (JPS, JPS+, etc.) became irrelevant. Model reduction outperformed algorithmic optimisation.
Broader principle: abstraction scales better than fidelity
Project Highrise originally modelled physiological state per NPC (hunger, tiredness). This was removed because:
- Hidden internal state is hard for designers and players to understand.
- At scale, unexpected population-level behaviour emerges from complex individual state.
“The utility of detailed NPC representation is inversely proportional to the number of NPCs the player has to manage.” (Zubek)
Practical guidance:
- Few NPCs (< 10 visible): detailed internal state, planning, rich behaviour.
- Many NPCs (tens to hundreds): schedule scripts, visible routines, limited internal state.
- Crowd-scale (hundreds+): emergent behaviour from simple rules; individual fidelity unimportant.
Other techniques from the anthology
Data-driven character types (The Last of Us, Botta Ch. 1)
All Infected types share a single C++ class, differentiated entirely by data files specifying which skills they have and their tuning values. No character-type conditionals in code. This allows new character types to be added without touching AI code.
Action tokens for group behaviour (Shroff Ch. 4)
Pool-limited tokens prevent too many NPCs performing the same action simultaneously. Token count can scale dynamically with group size. Each token type has a designer-specified cooldown before the same NPC can re-acquire it.
On-screen realization
Allocate interesting behaviour preferentially to NPCs that are currently on screen. Off-screen NPCs can run simpler logic. This is a lightweight LOD (Level of Detail) system for AI.
Trade-offs
| Technique | Saves | Costs |
|---|---|---|
| Open-loop queues | CPU (no per-frame world checks) | Responsiveness to world changes |
| Daily scripts | Design time; predictable population behaviour | Flexibility for emergent NPC decisions |
| Domain graph reduction | Pathfinding CPU | Generality — requires level-specific abstraction |
| Propositional planning | Flexibility during prototyping | More author effort; not suitable for stereotyped lives |
Evidence
- Zubek (Game AI Pro 360, Ch. 10, see source-game-ai-pro-360-character-behavior) documents the Project Highrise 1000 NPCs at 60 FPS implementation in full, including the switch from planning to scripts and the hierarchical pathfinding approach.
- Botta (Ch. 1) describes the data-driven character system that allowed new Infected types (Stalkers) to be added months before ship without touching shared AI code.
Implications
- Start simple and justify complexity. Project Highrise proved that planning is sometimes an overcomplicated solution to what is really a scheduling problem.
- Open-loop design is not laziness — it is a deliberate architectural choice that removes an entire class of failure-handling complexity.
- Domain knowledge is the most powerful optimisation tool. Generic algorithms on domain-specific representations are almost always outperformed by specific algorithms on simplified models.
Open questions
- Unity’s job system and DOTS make open-loop NPC queues much more tractable at crowd scale. How does the daily script scheduler pattern map to DOTS entities and job structs?
- Is there a clean way to implement the floor-plate graph reduction dynamically for procedurally generated buildings?
- At what NPC count does the switch from planning to scripted schedules become worthwhile? Is there a hybrid (planning for emergencies; scripts for routine) that preserves both benefits?
Related
game-ai-agent-design · utility-ai · combat-coordinator-pattern · source-game-ai-pro-360-character-behavior