Speech Recognition Using TensorFlow

This tutorial will show you how to runs a simple speech recognition model built by the audio training tutorial. Listens for a small set of words, and display them in the UI when they are recognized.

It’s important to know that real speech and audio recognition systems are much more complex, but like MNIST for images, it should give you a basic understanding of the techniques involved. Once you’ve completed this tutorial, you’ll have a application that tries to classify a one second audio clip as either silence, an unknown word, “yes”, “no”, “up”, “down”, “left”, “right”, “on”, “off”, “stop”, or “go”.

TensorFow speech recognition model


You can train your model on the desktop or on the laptop or on the server and then you can use that pre-trained model on our mobile device.So there’s no training that would happen on the device the training would happen on our bigger machine either a server or our laptop.You can download a pretrained model from tensorflow.org

2. Adding Dependencies

The TensorFlow Inference Interface is available as a JCenter package and can be included quite simply in your android project with a couple of lines in the project’s build.gradle file:

Add the following dependency in app’s build.gradle

This will tell Gradle to use the latest version of the TensorFlow AAR that has been released to https://bintray.com/google/tensorflow/tensorflow-android. You may replace the + with an explicit version label if you wish to use a specific release of TensorFlow in your app.

3.Add Pre-trained Model to Project

You need the pre-trained model and label file.You can download the model from here.Unzip this zip file, You will get conv_actions_labels.txt(label for objects) and conv_actions_frozen.pb(pre-trained model).

Put conv_actions_labels.txt and conv_actions_frozen.pb into android/assets directory.

4.Microphone Permission

To request microphone, you should be requesting RECORD_AUDIO permission in your manifest file as below:

Since Android 6.0 Marshmallow, the application will not be granted any permission at installation time. Instead, the application has to ask the user for a permission one-by-one at runtime.

5.Recording Audio

The AudioRecord class manages the audio resources for Java applications to record audio from the audio input hardware of the platform. This is achieved by “pulling” (reading) the data from the AudioRecord object. The application is responsible for polling the AudioRecord object in time using read(short[], int, int).

6.Run TensorFlow Model

TensorFlowInferenceInterface class that provides a smaller API surface suitable for inference and summarizing the performance of model execution.

7.Recognize Commands

RecognizeCommands class is fed the output of running the TensorFlow model over time, it averages the signals and returns information about a label when it has enough evidence to think that a recognized word has been found. The implementation is fairly small, just keeping track of the last few predictions and averaging them.

The demo app updates its UI of results automatically based on the labels text file you copy into assets alongside your frozen graph, which means you can easily try out different models without needing to make any code changes. You will need to updaye LABEL_FILENAME and MODEL_FILENAME to point to the files you’ve added if you change the paths though.


You can easily replace it with a model you’ve trained yourself. If you do this, you’ll need to make sure that the constants in the main MainActivity Java source file like SAMPLE_RATE and SAMPLE_DURATION match any changes you’ve made to the defaults while training. You’ll also see that there’s a Java version of the RecognizeCommands module that’s very similar to the C++ version in this tutorial. If you’ve tweaked parameters for that, you can also update them in MainActivity to get the same results as in your server testing.


Download this project from GitHub


Related Post

Android TensorFlow Machine Learning

Google Cloud Speech API in Android APP




Leave a Reply

Your email address will not be published. Required fields are marked *