Summary

Sound design in games encompasses both the creation of sound effects and the sonic shaping of musical instruments. This page covers the technical foundations: how synthesisers and samplers are structured, how filters and envelopes shape sounds, and how to create seamless music loops for game audio. These concepts apply equally to music composition tools (DAW plug-ins) and to runtime audio engines.

(Sweet, Writing Interactive Music for Video Games, Ch. 11 and Ch. 7, see source-writing-interactive-music)

The synthesis chain

Synthesisers and samplers share a common signal flow:

Sound Generator → Filter → Volume Envelope (ADSR) → [LFO Modulation]

Each stage shapes the sound before it reaches the listener. Understanding this chain makes it possible to customise any instrument patch or create entirely new sounds from raw materials.

Stage 1: Sound generator

The sound generator is the source of the audio signal. Two main types:

Synthesiser oscillator: generates a mathematically-defined waveform. The five basic waveforms differ in harmonic content and timbre:

WaveformCharacterTypical use
SinePure, smooth — fundamental frequency onlySub-bass, pure tones, pads
SquareHollow, buzzy — strong odd harmonicsRetro synth leads, video game bleeps
TriangleSofter than square — fewer high harmonicsFlute-like tones, mellow leads
SawtoothBright, rich — all harmonics presentOrchestral string simulation, brass
NoiseRandom — all frequencies simultaneouslyPercussion, wind, ambient textures

Sampler: plays back a digitally recorded audio file (a sample) — a recording of a real instrument, sound effect, or any other audio source. Samplers include pitch mapping: pressing different MIDI keys plays the sample at different pitches.

“You can create myriad unusual sounds just by playing sound effects at different pitches. If this is the only tool that you have at your disposal, you will still be able to create a wide range of sounds.” — Sweet, Ch. 11

Stage 2: Filter

A filter is an equaliser that modifies the frequency content of the sound. The three most common filter types in game audio:

Low Pass Filter (LPF) Allows low frequencies to pass through while reducing high frequencies above the cutoff frequency. This is the most common filter in games.

  • Game use cases: underwater sequences (muffled, dull sound), post-explosion hearing loss (ringing, dull), player health critically low (the world becoming muffled), sounds heard through walls
  • The LPF makes sounds feel distant, muffled, or submerged

High Pass Filter (HPF) Allows high frequencies to pass while reducing low frequencies below the cutoff. Removes the “weight” from a sound.

  • Game use cases: simulating sounds heard from altitude or during airborne movement (SSX Tricky removed low frequencies when the player launched into the air), radio or telephone voice simulation (combined with BPF), thin/distant ambient elements

Band Pass Filter (BPF) Allows a narrow band of frequencies near the cutoff to pass, reducing everything above and below. Makes sounds “small.”

  • Game use cases: making audio sound like it is coming from a radio speaker, walkie-talkie, or cheap monitor; diegetic music from a device in the game world

Filter controls:

  • Cutoff frequency: the point at which frequency reduction begins
  • Resonance: a boost at the cutoff frequency that adds a resonant peak, useful for creating characteristic “filter sweep” sounds

Stage 3: Volume envelope (ADSR)

An envelope shapes the volume of a sound over time — from when the key is pressed (note-on) to after it is released (note-off). Every synthesiser and sampler has ADSR controls regardless of complexity.

Volume
  │    /\
  │   /  \
  │  /    \─────────────
  │ /                   \
  └──────────────────────────→ Time
     A    D    S (hold)  R
ParameterDescriptionPractical effect
Attack timeTime from silence to peak volume when key is pressedShort = percussive, sharp attack; Long = gradual crescendo (strings legato)
Decay timeTime from peak volume down to sustain levelShort = punchy (drum hits); Long = slow settling
Sustain levelVolume held while key is held down (not a time — a level)Sets the ongoing volume; 0 = sound dies after attack/decay
Release timeTime from sustain level to silence after key is releasedShort = abrupt cut; Long = gradual fade (piano resonance, reverb-like tail)

Warning: If the sustain level equals the peak volume, adjustments to decay time have no audible effect — there is no volume drop to occur. Set sustain below the peak to hear decay shaping.

Practical examples:

  • Orchestral string legato: Increase attack time so strings swell in smoothly; increase release time so they overlap between notes.
  • Drum hit: Zero attack time, short decay (~200ms), sustain at ~50%, zero release. Creates a sharp transient followed by a quick drop.
  • Sustained pad: Medium attack, minimal decay, sustain at near-maximum, long release.

LFO (Low-Frequency Oscillator)

An LFO is a second oscillator running below 20 Hz (below audible range) that modifies another parameter — typically filter cutoff, pitch, or volume — rhythmically over time.

LFO targetResulting effect
PitchVibrato (pitch oscillation)
VolumeTremolo (volume oscillation)
Filter cutoffFilter sweep / wah effect

LFO controls:

  • Waveform: any of the five basic waveforms (sine = smooth oscillation; square = on/off switching; random/noise = random modulation)
  • Frequency: how fast the oscillation cycles (in Hz)
  • Depth/Range: how much the LFO modifies the target parameter

The LFO automates repetitive modulation that would otherwise require manual parameter automation — freeing the composer to focus on musical decisions.

Audio signal processing (DSP effects)

DSP (digital signal processing) effects are applied after the synthesis chain to further shape the sound. Audio middleware such as Wwise and FMOD can apply these effects to music at runtime, which is why composers are sometimes asked to deliver dry mixes (without reverb or other room acoustics), leaving spatial treatment to the game engine (see audio-middleware-overview).

Three categories:

Time-based effects

Effects that add copies of the signal at different time offsets:

  • Reverb — simulates acoustic spaces (room, hall, cave). The most transformative single effect in audio. Creates a sense of physical space.
  • Delay/Echo — distinct repetitions of the signal at set intervals
  • Chorus — very short delay with slight pitch variation; makes thin sounds appear fuller
  • Flanging/Phasing — swept comb filtering; produces characteristic “jet engine” or “whooshing” sounds

Warning on compression: Many new composers over-apply dynamic compression. Compressing an orchestral mix reduces the dynamic range — forte passages and pianissimo passages converge toward a mezzo-forte midpoint. The music loses all sense of dynamic surprise. Use compression surgically on individual instruments, not the full mix. (Sweet, Ch. 11)

Frequency-based effects

  • Equalisers (EQ) — precise frequency boosts and cuts
  • Pitch shifting — transposing audio up or down without changing speed
  • Vibrato — pitch oscillation (also achievable via LFO)
  • Resonators — emphasise specific harmonics

Volume-based effects

  • Compression — reduces the dynamic range by attenuating signals above a threshold; increases perceived loudness at the cost of dynamics
  • Limiting — hard ceiling on maximum volume; prevents clipping
  • Gating — mutes the signal below a volume threshold; removes background noise between phrases

Music loop editing

Most game music consists of audio loops — audio files that play and then seamlessly repeat from the beginning. Creating a seamless loop requires attention to several technical details.

Zero crossing points

A zero crossing point is a moment in the audio waveform where the signal passes through zero amplitude. Starting and ending a loop at zero crossing points minimises the chance of an audible click or pop at the loop boundary. Most DAWs can snap the playhead to zero crossings automatically.

Short fade-ins/fade-outs at the loop boundary (under 1 millisecond) can smooth remaining clicks, but are a last resort — it is better to find clean zero crossings.

Waveform direction matching

Even at a zero crossing, a click can occur if the waveform is moving in opposite directions at the loop end and loop start. At the end of the loop the waveform may be moving downward (negative direction); at the start of the loop it may be moving upward (positive direction). The discontinuity creates an audible artefact.

Match both amplitude and direction at the loop point. The larger the mismatch in waveform shape, the louder the click.

Transient hiding: If placing a drum hit or other sharp transient at the loop boundary is musically viable, its initial burst of energy will mask small edit imperfections. Transients hide bad loop points; legato elements (sustained strings, pad sounds) expose them.

Reverb tail baking

Reverb tails do not survive the loop boundary — when the audio file ends and returns to the start, the reverb that was meant to ring out beyond the loop end simply disappears. This produces an unnatural silence exactly at the moment the loop repeats.

Solution: copy the reverb tail from the end of the loop and paste it to the beginning of the loop, then bounce the final mix. When the loop restarts, it begins with the reverb tail already present — the listener’s ear does not notice the transition.

Process (in a DAW):

  1. Add silence before and after the intended loop section
  2. Bounce the section with 5 extra seconds to capture the reverb tail
  3. Copy the reverb tail onto a new track at the very beginning of the loop
  4. Bounce the final loop at the exact intended length

The same logic applies to long delays: it is often cleaner to let instruments end before the loop boundary so their delays fade out naturally, rather than trying to bake delay tails into the loop head.

Tempo and metre

Most loops are edited to begin and end on downbeats — the first beat of a bar. Consistency matters above all: if you choose to cut on beat 3 for musical reasons, cut every loop in the set on beat 3.

When using live musicians, slight variations in timing (“playing on top of the beat” vs “in the pocket”) can complicate loop editing. Decide before editing whether to quantise the first attack or accept the performer’s timing, then apply that decision consistently.

Dealing with legato elements at loop boundaries

Sounds without transients (arco strings, synth pads, long delays) are hardest to loop because they have no sharp onset to hide an edit point. Options:

  • Move legato elements to the middle of the loop, away from the boundary
  • Fade them out well before the loop end (1–3 second fades; shorter fades sound like a dropout)
  • Use overlapping dual loops with different fade points so one is always audible (only works when harmonic sync between loops is not required)
  • Write more silence into the music near the loop boundary

Sound creation techniques

Sampling unusual objects

“Some of the most unique sounds might be created by items that are not normally thought of as musical at all, including bowls, basement oil tanks, bowed cymbals, a bouncy ball being rubbed on a gong (a Michael Giacchino trick), or air-conditioning duct work. Be creative.” — Sweet, Ch. 11

Recording unexpected objects and loading them into a sampler produces distinctive sounds that avoid the homogenous quality of commercial sample libraries. The sampler then allows pitch-mapping the recordings across a keyboard, creating playable instruments from unconventional sources.

Pitch manipulation

Playing any sample at different pitches changes its character significantly. Very low pitches add weight and menace; very high pitches add fragility or tension. This technique alone — playing existing sound effects at extreme pitches — can yield usable instruments and textures.

Implications

  • ADSR understanding is a prerequisite for working with any audio software. A composer who cannot shape envelopes is limited to unmodified preset patches.
  • LPF is the single most useful real-time effect for game state communication — muffling the world when a player is hurt, underwater, or near an explosion is immediately readable.
  • Reverb tail baking must be planned during the mix stage, not as an afterthought. Attempting to fix it from a stereo bounce is not possible.
  • Dry deliverables (no reverb, no room tone) give the game engine maximum flexibility to apply spatial treatment at runtime. Communicate with the audio director about deliverable specs before mixing.

Open questions

  • As real-time convolution reverb and impulse response-based spatialisation become standard in game engines (Unity’s Audio Spatializer, Wwise Spatial Audio), how does the workflow for “wet vs dry deliverables” change?
  • Procedurally generated sound effects (synthesised at runtime from parameters rather than played from pre-recorded files) are used in some games. What are the design and performance trade-offs compared to sample-based approaches?