in development

making a more efficient android audio engine

I’ve written a few real-time audio engines for my Android projects. The prospect of writing real-time code in Java would have amused me not that long ago, but the JIT compiler makes it possible. There are still pitfalls to watch out for, of course. Most programmers understand that you shouldn’t allocate objects inside a tight loop, but some potential problems are less obvious.

Here’s a stripped-down version of an engine I wrote very early on.

package com.example;

import android.media.AudioFormat;
import android.media.AudioManager;
import android.media.AudioTrack;

/**
 * audio synthesizer engine
 */
public class Audio {

	// audio track object
	private AudioTrack track;
	
	// length of buffer in shorts
	private int bufferLength;
	
	// sampling rate in Hz
	private int sampleRate;
	
	// audio hardware buffer
	private short[] buffer;
	
	// tone generator
	private Whistler whistler;
	
	/**
	 * ctor
	 */
	public Audio() {
		new Thread() {
			public void run() {
				init();
				while (true) {
					loop();
				}
			}
		}.start();
		
	}

	/**
	 * startup and initialization  
	 */
	private void init() {
		sampleRate = AudioTrack.getNativeOutputSampleRate(
				AudioManager.STREAM_MUSIC);

		bufferLength = AudioTrack.getMinBufferSize(
				sampleRate, 
				AudioFormat.CHANNEL_OUT_MONO, 
				AudioFormat.ENCODING_PCM_16BIT);

		buffer = new short[bufferLength];
		
		whistler = new Whistler(0.002f, 0.75f);
		whistler.setPitch(440);
		
		track = new AudioTrack(
				AudioManager.STREAM_MUSIC,
				sampleRate,
				AudioFormat.CHANNEL_OUT_MONO,
				AudioFormat.ENCODING_PCM_16BIT,
				bufferLength * 2, // length in bytes
				AudioTrack.MODE_STREAM);
		
		if (track.getState() != AudioTrack.STATE_INITIALIZED) {
			throw new RuntimeException("Couldn't initialize AudioTrack object");
		}
		
		track.setStereoVolume(1, 1);
		track.play();
	}
	
	/**
	 * main audio pump loop
	 */
	private void loop() {
		for (int i = 0; i < buffer.length; i++) {
			buffer[i] = (short)(whistler.get() * 32767);
		}
		track.write(buffer, 0, buffer.length);
	}

	/**
	 * generates a breathy tone at specified freqency
	 */
	class Whistler {

		float noiseFactor, limiter, qFactor;
		float t0, t1, t2;
		
		public Whistler(float nf, float lm) {
			noiseFactor = nf;
			limiter = lm;
			t2 = 0.001f;
		}
		
		public void setPitch(float f) {
			qFactor = (float)(1 / Math.pow(sampleRate / (2 * Math.PI * f), 2));
		}
		
		public float get() {
			t2 = (float)(-qFactor * t0 + noiseFactor * Math.random() * Math.signum(-t2) - t2 * limiter);
			t1 += t2;
			t0 += t1;
			return t0;
		}
	}
}

The heart of this engine is the Whistler inner class. It’s a noise-driven oscillator that produces a breathy sound, like a person whistling. Now, the code works, but the audio tends to drop a lot, especially if the CPU is busy with other things. Running a DDMS trace on the app produces this output.

audio1

Notice that the “audio pump” thread is almost completely black–it’s constantly running. That’s bad. What’s going on? Let’s zoom in.

audio2

Each of those colored blocks represents a function call. Though the JIT compiler can transform Java to native code, function calls still have overhead. To optimize our real-time code, we’ll have to reduce the number of functions we call inside our loops. Let’s start by replacing Whistler.get() with something that doesn’t require a function call for every sample we place in the buffer.

public class Audio {

	... snipped ...
	
	private void loop() {
		whistler.fill();
		track.write(buffer, 0, buffer.length);
	}

	class Whistler {

		... snipped ...

		public void fill() {
			for (int i = 0; i < buffer.length; i++) {
				t2 = (float)(-qFactor * t0 + noiseFactor * Math.random() * Math.signum(-t2) - t2 * limiter);
				t1 += t2;
				t0 += t1;
				buffer[i] = (short)(t0 * 32767);
			}
		}
	}
}

Our new function fill accesses the buffer variable of the outer class directly. Now, one function call fills the entire buffer! How does that affect our DDMS trace?

audio3

The audio pump thread is still going like mad. What’s the problem?

audio4

One of the top function calls listed in the trace is Audio.access$3, which you might recognize as a Java internal symbol. In order to facilitate access between inner and outer classes, the JIT compiler generates static accessor functions of the form class$access<number>. In this case, it’s referring to the buffer variable that we allocate in the outer Audio class, and access within the inner Whistler class. So, by accessing the outer class variable, we’re still making a function call within the loop. Fortunately, we can eliminate this call simply by passing the buffer as a parameter.

public class Audio {

	... snipped ...

	private void loop() {
		whistler.fill(buffer);
		track.write(buffer, 0, buffer.length);
	}

	class Whistler {

		... snipped ...

		public void fill(short[] b) {
			for (int i = 0; i < b.length; i++) {
				t2 = (float)(-qFactor * t0 + noiseFactor * Math.random() * Math.signum(-t2) - t2 * limiter);
				t1 += t2;
				t0 += t1;
				b[i] = (short)(t0 * 32767);
			}
		}
	}
}

Now, our efforts are starting to show results.

audio5

The audio pump thread is no longer a solid black line. We’re cutting down on the amount of work it has to do. However, we can do more. We know that function calls create overhead. Can we eliminate them?

There are two function calls in the fill function: Math.random() and Math.signum(). The first returns a random double value, and the second returns -1 for a negative number, 1 for a positive number, and 0 for zero. Let’s substitute our own code for these functions.

class Whistler {

	float noiseFactor, limiter, qFactor;
	float t0, t1, t2;
	long noise;
	
	public Whistler(float nf, float lm) {
		noiseFactor = nf;
		limiter = lm;
		t2 = 0.001f;
		noise = System.currentTimeMillis();
	}
	
	public void setPitch(float f) {
		qFactor = (float)(1 / Math.pow(sampleRate / (2 * Math.PI * f), 2));
	}
	
	public void fill(short[] b) {
		for (int i = 0; i < b.length; i++) { 
			
			float random = (float)(noise >> 16) / (float)(0xffffffffL);
			noise = (noise * 25214903917L + 11L) & 0xffffffffffffL;

			int signum = (t2 > 0 ? -1 : 1);
			
			t2 = (float)(-qFactor * t0 + noiseFactor * random * signum - t2 * limiter);
			t1 += t2;
			t0 += t1;
			b[i] = (short)(t0 * 32767);
			noise = (noise * 25214903917L + 11L) & 0xffffffffffffL;
		}
	}
}

Here, we’ve implemented a simple random number generator by taking a seed value (the current time in milliseconds) and multiplying/adding prime numbers to that value on each trip through the loop. (This is an example of a linear congruential RNG.) The signum function has been replaced with a ternary expression (we don’t need the zero->0 case). There are no longer any function calls in the loop. How does it look in DDMS?

audio6

Wow! The audio thread looks nearly idle. Removing function calls from inner loops eliminates a rather significant amount of overhead, and allows you to do a lot more in your audio engine.