Making Waves Part 1 - Build a Synthesizer

1. Introduction

Let's make some noise! In this codelab we're going to use the AAudio API to build a low latency, touch controlled synthesizer app for Android.

Our app produces sound as quickly as possible after the user touches the screen. The delay between input and output is known as latency. Understanding and minimizing latency is key to creating a great audio experience. In fact, the main reason we're using AAudio is because of its ability to create low latency audio streams.

What you'll learn

  • Basic concepts for creating low latency audio apps
  • How to create audio streams
  • How to handle audio devices being connected and disconnected
  • How to generate audio data and pass it to an audio stream
  • Best practices for communicating between Java and C++
  • How to listen for touch events in your UI

What you'll need

2. Architecture overview

The app produces a synthesized sound when the user taps on the screen. Here's the architecture:

213d64e35fa7035c.png

Our synthesizer app has four components:

  • UI - Written in Java, the MainActivity class is responsible for receiving touch events and forwarding them to the JNI bridge
  • JNI bridge - This C++ file uses JNI to provide a communication mechanism between our UI and C++ objects. It forwards events from the UI to the Audio Engine.
  • Audio engine - This C++ class creates the playback audio stream and sets up the data callback used to supply data to the stream
  • Oscillator - This C++ class generates digital audio data using a simple mathematical formula for calculating a sinusoidal waveform

3. Create the project

Start by creating a new project in Android Studio:

  • File -> New -> New Project...
  • Name your project "WaveMaker"

As you go through the project setup wizard, change the default values to:

  • Include C++ support
  • Phone and Tablet Minimum SDK: API 26: Android O
  • C++ Standard: C++11

Note: If you need to refer to the finished source code for the WaveMaker app it's here.

4. Build the oscillator

Since our oscillator is the object that produces the audio data it makes sense to start with this. We'll keep it simple and have it create a 440Hz sine wave.

Digital synthesis basics

Oscillators are a fundamental building block of digital synthesis. Our oscillator needs to produce a series of numbers, known as samples. Each sample represents an amplitude value which is converted by audio hardware into a voltage to drive headphones or a speaker.

Here's a plot of samples representing a sine wave:

5e5f107a4b6a2a48.png

Before we get started with implementation here's some important terms for digital audio data:

  • Sample format - The type of data used to represent each sample. Common sample formats include PCM16 and floating point. We'll use floating point because of its 24-bit resolution and improved precision at low volumes, amongst other reasons.
  • Frame - When generating multi-channel audio, the samples are grouped together in frames. Each sample in the frame corresponds to a different channel of audio. For example, stereo audio has 2 channels (left and right) so a frame of stereo audio has 2 samples, one for the left channel and one for the right channel.
  • Frame rate - The number of frames per second. This is often referred to as sample rate. Frame rate and sample rate usually mean the same thing and are used interchangeably. Common frame rate values are 44,100 and 48,000 frames per second. AAudio uses the term sample rate, so we use that convention in our app.

Create the source and header files

Right click on the /app/cpp folder and go to New->C++ class.

31d616d7c001c02e.png

Name your class "Oscillator".

59ce6364705b3c3c.png

Add the C++ source file to the build by adding the following lines to CMakeLists.txt. It can be found under the External Build Files section of the Project window.

add_library(...existing source filenames... 
src/main/cpp/Oscillator.cpp)

Make sure your project builds successfully.

Add the code

Add the following code to the Oscillator.h file:

#include <atomic>
#include <stdint.h>

class Oscillator {
public:
    void setWaveOn(bool isWaveOn);
    void setSampleRate(int32_t sampleRate);
    void render(float *audioData, int32_t numFrames);

private:
    std::atomic<bool> isWaveOn_{false};
    double phase_ = 0.0;
    double phaseIncrement_ = 0.0;
};

Next, add the following code to the Oscillator.cpp file:

#include "Oscillator.h"
#include <math.h>

#define TWO_PI (3.14159 * 2)
#define AMPLITUDE 0.3
#define FREQUENCY 440.0

void Oscillator::setSampleRate(int32_t sampleRate) {
    phaseIncrement_ = (TWO_PI * FREQUENCY) / (double) sampleRate;
}

void Oscillator::setWaveOn(bool isWaveOn) {
    isWaveOn_.store(isWaveOn);
}

void Oscillator::render(float *audioData, int32_t numFrames) {

    if (!isWaveOn_.load()) phase_ = 0;

    for (int i = 0; i < numFrames; i++) {

        if (isWaveOn_.load()) {

            // Calculates the next sample value for the sine wave.
            audioData[i] = (float) (sin(phase_) * AMPLITUDE);

            // Increments the phase, handling wrap around.
            phase_ += phaseIncrement_;
            if (phase_ > TWO_PI) phase_ -= TWO_PI;

        } else {
            // Outputs silence by setting sample value to zero.
            audioData[i] = 0;
        }
    }
}

void setSampleRate(int32_t sampleRate) allows us to set the desired sample rate for our audio data (more on why we need this later). Based on sampleRate and FREQUENCY it calculates the value of phaseIncrement_, which is used in render. If you want to change the pitch of the sine wave just update FREQUENCY with a new value.

void setWaveOn(bool isWaveOn) is a setter method for the isWaveOn_ field. This is used in render to determine whether to output the sine wave or silence.

void render(float *audioData, int32_t numFrames) puts floating point sine wave values into the audioData array each time it's called.

numFrames is the number of audio frames which we must render. To keep things simple our oscillator outputs a single sample per frame, i.e. mono.

phase_ stores the current wave phase, and it is incremented by phaseIncrement_ after each sample is generated.

If isWaveOn_ is false we just output zeros (silence).

That's our oscillator done! But how do we get to hear our sine wave? For that we need an audio engine...

5. Create the audio engine

Our audio engine is responsible for:

  • Setting up an audio stream to the default audio device
  • Connecting our oscillator to the audio stream using a data callback
  • Switching the oscillator's wave output on and off
  • Closing the stream when it's no longer required

If you haven't already it's worth a familiarising yourself with the AAudio API as this covers the key concepts behind building streams and managing stream state.

Create source and headers

As with the previous step, create a C++ class named "AudioEngine".

Add the C++ source file and AAudio library to the build by adding the following lines to CMakeLists.txt

add_library(...existing source files...
src/main/cpp/AudioEngine.cpp )

target_link_libraries(...existing libraries...
aaudio)

Add the code

Add the following code to the AudioEngine.h file:

#include <aaudio/AAudio.h>
#include "Oscillator.h"

class AudioEngine {
public:
    bool start();
    void stop();
    void restart();
    void setToneOn(bool isToneOn);

private:
    Oscillator oscillator_;
    AAudioStream *stream_;
};

Next, add the following code to the AudioEngine.cpp file:

#include <android/log.h>
#include "AudioEngine.h"
#include <thread>
#include <mutex>

// Double-buffering offers a good tradeoff between latency and protection against glitches.
constexpr int32_t kBufferSizeInBursts = 2;

aaudio_data_callback_result_t dataCallback(
        AAudioStream *stream,
        void *userData,
        void *audioData,
        int32_t numFrames) {

    ((Oscillator *) (userData))->render(static_cast<float *>(audioData), numFrames);
    return AAUDIO_CALLBACK_RESULT_CONTINUE;
}

void errorCallback(AAudioStream *stream,
                  void *userData,
                  aaudio_result_t error){
   if (error == AAUDIO_ERROR_DISCONNECTED){
       std::function<void(void)> restartFunction = std::bind(&AudioEngine::restart,
                                                           static_cast<AudioEngine *>(userData));
       new std::thread(restartFunction);
   }
}

bool AudioEngine::start() {
    AAudioStreamBuilder *streamBuilder;
    AAudio_createStreamBuilder(&streamBuilder);
    AAudioStreamBuilder_setFormat(streamBuilder, AAUDIO_FORMAT_PCM_FLOAT);
    AAudioStreamBuilder_setChannelCount(streamBuilder, 1);
    AAudioStreamBuilder_setPerformanceMode(streamBuilder, AAUDIO_PERFORMANCE_MODE_LOW_LATENCY);
    AAudioStreamBuilder_setDataCallback(streamBuilder, ::dataCallback, &oscillator_);
    AAudioStreamBuilder_setErrorCallback(streamBuilder, ::errorCallback, this);

    // Opens the stream.
    aaudio_result_t result = AAudioStreamBuilder_openStream(streamBuilder, &stream_);
    if (result != AAUDIO_OK) {
        __android_log_print(ANDROID_LOG_ERROR, "AudioEngine", "Error opening stream %s",
                            AAudio_convertResultToText(result));
        return false;
    }
    
    // Retrieves the sample rate of the stream for our oscillator.
    int32_t sampleRate = AAudioStream_getSampleRate(stream_);
    oscillator_.setSampleRate(sampleRate);

    // Sets the buffer size. 
    AAudioStream_setBufferSizeInFrames(
           stream_, AAudioStream_getFramesPerBurst(stream_) * kBufferSizeInBursts);

    // Starts the stream.
    result = AAudioStream_requestStart(stream_);
    if (result != AAUDIO_OK) {
        __android_log_print(ANDROID_LOG_ERROR, "AudioEngine", "Error starting stream %s",
                            AAudio_convertResultToText(result));
        return false;
    }

    AAudioStreamBuilder_delete(streamBuilder);
    return true;
}

void AudioEngine::restart(){

   static std::mutex restartingLock;
   if (restartingLock.try_lock()){
       stop();
       start();
       restartingLock.unlock();
   }
}

void AudioEngine::stop() {
    if (stream_ != nullptr) {
        AAudioStream_requestStop(stream_);
        AAudioStream_close(stream_);
    }
}

void AudioEngine::setToneOn(bool isToneOn) {
    oscillator_.setWaveOn(isToneOn);
}

Here's what the code does...

Starting the engine

Our start() method sets up an audio stream. Audio streams in AAudio are represented by the AAudioStream object, and to create one we need an AAudioStreamBuilder:

AAudioStreamBuilder *streamBuilder;
AAudio_createStreamBuilder(&streamBuilder);

We can now use streamBuilder to set various parameters on the stream.

Our audio format is floating point numbers:

AAudioStreamBuilder_setFormat(streamBuilder, AAUDIO_FORMAT_PCM_FLOAT);

We'll be outputting in mono (one channel):

AAudioStreamBuilder_setChannelCount(streamBuilder, 1);

Note: We didn't set some parameters because we want AAudio to take care of them automatically, these include:

  • The audio device ID - we want to use the default audio device, rather than explicitly specifying one, such as the built-in speaker. A list of possible audio devices can be obtained using AudioManager.getDevices().
  • The stream direction - by default an output stream is created. If we wanted to make a recording, we'd specify an input stream instead.
  • The sample rate (more on this later).

Performance mode

We want the lowest possible latency so we set the low latency performance mode:

AAudioStreamBuilder_setPerformanceMode(streamBuilder, AAUDIO_PERFORMANCE_MODE_LOW_LATENCY);

AAudio does not guarantee that the resulting stream has this low latency performance mode. Reasons it might not obtain this mode include:

  • You specified a non-native sample rate, sample format or samples per frame (more on this below), which may cause resampling or format conversion. Resampling is the process of re-calculating sample values into a different rate. Both resampling and format conversion can add computational load and/or latency.
  • There are no low latency streams available, probably because they're all in use by your app or other apps

You can check the performance mode of your stream using AAudioStream_getPerformanceMode.

Open the stream

Once all the parameters are set (I'll cover the data callback later) we open the stream and check the result:

aaudio_result_t result = AAudioStreamBuilder_openStream(streamBuilder, &stream_);

If the result is anything except AAUDIO_OK we log the output to the Android Monitor window in Android Studio and return false.

if (result != AAUDIO_OK){
__android_log_print(ANDROID_LOG_ERROR, "AudioEngine", "Error opening stream", AAudio_convertResultToText(result));
        return false;
}

Set the oscillator sample rate

We deliberately didn't set the stream's sample rate, because we want to use its native sample rate - i.e. the rate which avoids resampling and added latency. Now that the stream is open we can query it to find out what the native sample rate is:

int32_t sampleRate = AAudioStream_getSampleRate(stream_);

And then we tell our oscillator to produce audio data using this sample rate:

oscillator_.setSampleRate(sampleRate);

Set the buffer size

The stream's internal buffer size directly affects the latency of the stream. The larger the buffer size the greater the latency.

We'll set our buffer size to be twice the size of a burst. A burst is a discrete amount of data written during each callback. This offers a good trade-off between latency and underrun protection. You can read more about buffer size tuning in the AAudio documentation.

AAudioStream_setBufferSizeInFrames(
           stream_, AAudioStream_getFramesPerBurst(stream_) * kBufferSizeInBursts);

Start the stream

Now that everything is set up we can start the stream which causes it to start consuming audio data and triggering data callbacks.

result = AAudioStream_requestStart(stream_);

The data callback

So how do we get audio data into our stream? We have two options:

We'll use the second approach because it's better for low latency apps; the data callback function is called from a high priority thread every time the stream requires audio data.

The dataCallback function

We start by defining the callback function in the global namespace:

aaudio_data_callback_result_t dataCallback(
    AAudioStream *stream,
    void *userData,
    void *audioData,
    int32_t numFrames){
        ...
}

The clever part here is that our userData parameter is a pointer to our Oscillator object. So we can use it to render audio data into the audioData array. Here's how:

((Oscillator *)(userData))->render(static_cast<float*>(audioData), numFrames);

Note that we also cast the audioData array to floating point numbers because that's the format expected by our render() method.

Finally, the method returns a value that tells the stream to continue consuming audio data.

return AAUDIO_CALLBACK_RESULT_CONTINUE;

Setting up the callback

Now that we have the dataCallback function, telling the stream to use it from our start() method is simple (the :: indicates that the function is in the global namespace):

AAudioStreamBuilder_setDataCallback(streamBuilder, ::dataCallback, &oscillator_);

Starting and stopping the oscillator

Switching our oscillator's wave output on and off is simple, we just have a single method which passes the tone state to the oscillator:

void AudioEngine::setToneOn(bool isToneOn) {
  oscillator_.setWaveOn(isToneOn);
}

It's worth noting that even when the oscillator's wave is off, its render() method still produces audio data filled with zeros (see avoiding warm-up latency above).

Tidying up

We have provided a start() method which creates our stream so we should also provide a corresponding stop() method which deletes it. This method can be called whenever the stream is no longer needed (for example when our app exits). It stops the stream which stops the callbacks, and closes the stream causing it to be deleted.

AAudioStream_requestStop(stream_);
AAudioStream_close(stream_);

Handling stream disconnects using the error callback

When our playback stream starts it uses the default audio device. This might be the built-in speaker, headphones, or some other audio device like a USB audio interface.

What happens if the default audio device changes? For example, if the user starts playback through the speaker and then connects headphones. In this case, the audio stream becomes disconnected from the speaker and your app will no longer be able to write audio samples to the output. It simply stops playing.

This is probably not what the user expects. The audio should continue to play through the headphones. (However, there are other scenarios where stopping playback might be more appropriate.)

We need a callback to detect the stream disconnect, and a function to restart the stream to the new audio device, when appropriate.

Setting up the error callback

To listen for the stream disconnect event, define a function of type AAudioStream_errorCallback.

void errorCallback(AAudioStream *stream,
                  void *userData,
                  aaudio_result_t error){
   if (error == AAUDIO_ERROR_DISCONNECTED){
       std::function<void(void)> restartFunction = std::bind(&AudioEngine::restart,
                                                           static_cast<AudioEngine *>(userData));
       new std::thread(restartFunction);
   }
}

This function will be called whenever the stream encounters an error. If the error is AAUDIO_ERROR_DISCONNECTED we can restart the stream.

Note that the callback cannot restart the audio stream directly. Instead, to restart the stream we create a std::function which points to AudioEngine::restart(), then invoke the function from a separate std::thread.

Finally, we set the errorCallback the same way we did for dataCallback in start().

AAudioStreamBuilder_setErrorCallback(streamBuilder, ::errorCallback, this);

Restarting the stream

Since the restart function may be called from multiple threads (for instance, if we receive multiple disconnect events in quick succession), we protect the critical sections of code with a std::mutex.

void AudioEngine::restart(){

    static std::mutex restartingLock;
    if (restartingLock.try_lock()){
        stop();
        start();
        restartingLock.unlock();
    }
}

That's it for our audio engine, and there's not much more to do...

6. Create the JNI bridge

We need a way for our UI in Java to talk to our C++ classes, this is where JNI steps in. Its method signatures may not be the nicest thing to look at, but thankfully there's only three of them!

Rename the file native-lib.cpp to jni-bridge.cpp. You could leave the filename as is, but I like to make it absolutely clear that this C++ file is for JNI methods. Be sure to update CMakeLists.txt with the renamed file (but leave the library name as native-lib).

Add the following code to jni-bridge.cpp:

#include <jni.h>
#include <android/input.h>
#include "AudioEngine.h"

static AudioEngine *audioEngine = new AudioEngine();

extern "C" {

JNIEXPORT void JNICALL
Java_com_example_wavemaker_MainActivity_touchEvent(JNIEnv *env, jobject obj, jint action) {
    switch (action) {
        case AMOTION_EVENT_ACTION_DOWN:
            audioEngine->setToneOn(true);
            break;
        case AMOTION_EVENT_ACTION_UP:
            audioEngine->setToneOn(false);
            break;
        default:
            break;
    }
}

JNIEXPORT void JNICALL
Java_com_example_wavemaker_MainActivity_startEngine(JNIEnv *env, jobject /* this */) {
    audioEngine->start();
}

JNIEXPORT void JNICALL
Java_com_example_wavemaker_MainActivity_stopEngine(JNIEnv *env, jobject /* this */) {
    audioEngine->stop();
}

}

Our JNI bridge is fairly simple:

  • We create a static instance of our AudioEngine
  • startEngine() and stopEngine() start and stop the audio engine
  • touchEvent() translates touch events into method calls to switch the tone on and off

7. Create the UI

Finally, let's create our UI and wire it to our back end...

Layout

Our layout is very simple (we'll improve it in subsequent codelabs). It's just a FrameLayout with a TextView in the center:

4a039cdf72e4846f.png

Update res/layout/activity_main.xml to the following:

<?xml version="1.0" encoding="utf-8"?>
<FrameLayout xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:tools="http://schemas.android.com/tools"
    android:id="@+id/touchArea"
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    tools:context="com.example.wavemaker.MainActivity">

    <TextView
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:layout_gravity="center"
        android:text="@string/tap_anywhere"
        android:textAppearance="@android:style/TextAppearance.Material.Display1" />
</FrameLayout>

Add our string resource for @string/tap_anywhere to res/values/strings.xml:

<resources>
    <string name="app_name">WaveMaker</string>
    <string name="tap_anywhere">Tap anywhere</string>
</resources>

Main Activity

Now update MainActivity.java with the following code:

package com.example.wavemaker;

import android.os.Bundle;
import android.support.v7.app.AppCompatActivity;
import android.view.MotionEvent;

public class MainActivity extends AppCompatActivity {

    static {
        System.loadLibrary("native-lib");
    }

    private native void touchEvent(int action);

    private native void startEngine();

    private native void stopEngine();

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);
        startEngine();
    }

    @Override
    public boolean onTouchEvent(MotionEvent event) {
        touchEvent(event.getAction());
        return super.onTouchEvent(event);
    }

    @Override
    public void onDestroy() {
        stopEngine();
        super.onDestroy();
    }
}

Here's what this code does:

  • The private native void methods are all defined in jni-bridge.cpp, we need to declare them here to be able to use them
  • The activity lifecycle events onCreate() and onDestroy() call the JNI bridge to start and stop the audio engine
  • We override onTouchEvent() to receive all touch events for our Activity and pass them directly to the JNI bridge to switch the tone on and off

8. Build and run

Fire up your test device or emulator and run the WaveMaker app on it. When you tap the screen you should hear a clear sine wave being produced!

OK so our app isn't going to win any awards for musical creativity, but it should demonstrate the fundamental techniques required to produce low latency, synthesized audio on Android.

Don't worry, in later codelabs we'll make our app a lot more interesting! Thanks for completing this codelab. If you have questions please ask them in the android-ndk group.

Further reading

High-performance audio samples

High-performance audio guide on the Android NDK documentation

Best Practices for Android Audio video - Google I/O 2017