Making Two Birds: Audio

The audio in Two Birds One Stone is mostly just playing back a variety of samples, but some audio is generated in real time – this seemed to be the simplest way of having non-annoying audio for things that had unpredictable interactions with time – for example, the robot will “think” for as long as it takes to beat the level, and this can’t be predicted.

I didn’t want to spend too much effort on the audio, as I fully expect that typically the game will be played without audio – the game is primarily supposed to serve as a fast diversion while waiting for something else. At the same time, the game might feel broken if there is no audio. I decided that the game should always start muted, and that there should be no background music (background music seemed like something which would just serve to annoy).

Robot Sounds

There are a few components to the robot sounds – the moving sound, the laser sound, and the “talking” sound. The moving sound and the laser sound can be played separately or together, but the “talking” sound is only played when the robot has stopped moving and the laser is off.

Sample of robot sounds from the game

Moving

I wanted this sound to be very mechanical and “dirty” sounding. I don’t think I can describe the thought process of how this was actually written, but I think that is just the reality of FM synthesis:

static float synth_chug_chug(float t)
{
	const float f0 = 0.02;
	float v = 0.4 + sin(t * 0.03) * 0.5 + sin(t * 0.003) * 0.1;
	float w = sin(t * 0.21);
	return ( sin(t * f0 * 1 + (w * 0.400)) * 0.3
	       + sin(t * f0 * 3 + (w * 0.060)) * 0.1
	       + sin(t * f0 * 5 + (w * 0.040)) * 0.06
	       + sin(t * f0 * 7 + (w * 0.043)) * 0.04
	       ) * v * 0.3;
}  

Run at 48k (with “t” increasing by 1 per sample) this fit the bill.

Laser

The laser was supposed to be a bit dirty too. Again, difficult to describe the thought process. Here we wanted the sound to change based on how fast the robot was moving the laser, so that was taken as an input:

static float synth_laser(float t, float laser_speed, float *t0, float *t1)
{
	float x = sin(t * 0.05 * (1.0 + (laser_speed / 500.0))) * (0.3 + sin(t * 0.001) * 0.1);
	float xx = x * x;
	*t0 += 0.05 * (1.0 + (laser_speed / 500.0));
	*t1 += 0.001;
	x = sin(*t0) * sin(*t1) * 0.1;
	return x + (xx * 0.01);
}  

The *t0 and *t1 state variables need to be initialized to something like 0 when this starts.

Talking

The robots sounds post firing were supposed to make the robot seem happy with its accomplishments. To help show this, the pitch would get higher whenever a bird was hit. It is just a single sine wave with its frequency modulated. Each “outburst” lasts for half a second, and an outburst is triggered by launching or hitting a bird.

Within an outburst, target frequency is chosen every N samples, for a randomly selected N, the target frequency is chosen randomly from a range determined by the number of birds that have been hit (the more birds hit, the higher it will go). The actual frequency moves towards the target frequency with a one pole IIR.

Mixing

The three different sounds all have a gain associated with them, and this gain is smoothly adjusted depending on which sound should be playing. I do everything in floating point and all the levels of things are low enough that I don’t need to do anything clever for a limiter (I just clip, after all the sounds have been mixed, not just the robot sounds).

Slingshot Sounds

Ths slingshot sound is a physical model of a string using the Karplus-Strong algorithm. The length of the modeled string changes based on how far the slingshot is pulled back, and the shape of the “pluck” is slightly different for when it is getting tighter vs when it is relaxing.

Diagram of how Karplus-Strong works. A buffer of some length, where each step of the algorithm, all items are shifted forward one position in the buffer.

The item that was first in the buffer goes to the speaker, and also into a filter, where the output of the filter goes to the last item in the buffer.
The Karplus-Strong algorithm in one picture. Each output sample is generated by taking the head item out of a delay line. All items in the delay line are then aged, with a filtered version of the output sample fed back in to the end of the delay line.

The filter is just a simple one-pole IIR – and since this is floating point, and occasionally compiled with x87 code, there is a special case for setting things to zero if they are too small to avoid denorms.

The length of the string changes dynamically, but I don’t do anything fancy with this (e.g. it would be righteous to resample the delay line, but I don’t do this). As a result, energy is thrown away “unfairly” when slots are removed from the delay line without their energy being distributed anywhere else. It sounds fine though.

Other Sounds

All the other sounds are wave files loaded at startup. Most sounds have several variants that can be chosen from (with very slight differences) to avoid them being fatiguing. Using slight differences instead of large differences helps avoid drawing attention to the fact that there are only a finite number of variants (this effect is probably more obvious with graphics where if you are trying to make a Sim City like game and you only have 3 variations on “house”, then having them all different colours makes it really obvious you have 3, but if you make them all similar coloured with different shapes, then it feels less obvious.