Many of you have used the “Ok Google” functionality on your phone. What the Speech API does is it lets developers integrate that functionality into their own applications, lets you do speech to text transcription in over 80 languages. It also works in streaming mode.If you want to send it a continuous stream of audio, you can send it to a stream of audio and it returns transcriptions as that audio is coming in.In this tutorial, we are going to learn Streaming Recognition to perform speech recognition. So the best way to see how the speech API works is through a demo.


  • Google Cloud Platform account(You can use 12 months free trial)

1.Acquiring an API Key

To Use Google Cloud Speech API services in the app, you need a Service account keys. You can get one by creating a new project in the Google Cloud Platform console. Once the project has been created, go to API Manager > Dashboard and press the Enable API button. Enable Google Cloud Speech API

2.Creating a New Android Project

Google provides client libraries in a to simplify the process of building and sending requests and receiving and parsing responses. Add the following compile dependencies to the app build.gradle:

apply plugin: ''
ext {
    grpcVersion = '1.4.0'
android {
    configurations.all {
        resolutionStrategy.force ''
protobuf {
    protoc {
        artifact = ''
    plugins {
        javalite {
            artifact = ""
        grpc {
            artifact = "io.grpc:protoc-gen-grpc-java:${grpcVersion}"
    generateProtoTasks {
        all().each { task ->
            task.plugins {
                javalite {}
                grpc {
                    // Options added to --grpc_out
                    option 'lite'
dependencies {
   // gRPC
    compile "io.grpc:grpc-okhttp:$grpcVersion"
    compile "io.grpc:grpc-protobuf-lite:$grpcVersion"
    compile "io.grpc:grpc-stub:$grpcVersion"
    compile 'javax.annotation:javax.annotation-api:1.2'
    protobuf ''
    compile group: '', name: 'grpc-google-cloud-speech-v1', version: '0.1.13'
    // OAuth2 for Google API
    compile('') {
        exclude module: 'httpclient'
    compile ''

Set Up to Authenticate With Your Project’s Credentials

In order to get Service Account Key, visit the Cloud Console, and navigate to: API Manager > Credentials > Create credentials > Service account key > New service account. Create a new service account, and download the JSON credentials file. Put the file in the app resources as app/src/main/res/raw/credential.json. Generate Google Cloud Service Account

Streaming Speech API Recognition Requests

Streaming Speech API recognition call is designed for real-time capture and recognition of audio, within a bi-directional stream. Your application can send audio on the request stream and receive interim and final recognition results on the response stream in real time. Interim results represent the current recognition result for a section of audio, while the final recognition result represents the last, best guess for that section of audio.

Streaming requests

You send both the configuration and audio within a single request, calling the streaming Speech API requires sending multiple requests. The first StreamingRecognizeRequest must contain a configuration of type StreamingRecognitionConfig without any accompanying audio. Subsequent StreamingRecognizeRequests sent over the same stream will then consist of consecutive frames of raw audio bytes. A StreamingRecognitionConfig consists of the following fields:

  • config – (required) contains configuration information for the audio, of type RecognitionConfig.
  • single_utterance – (optional, defaults to false) indicates whether this request should automatically end after speech is no longer detected.
  • interim_results – (optional, defaults to false) indicates that this stream request should return temporary results that may be refined at a later time.
public void startRecognizing(int sampleRate) {
    if (mApi == null) {
          Log.w(TAG, "API not ready. Ignoring the request.");
    // Configure the API
    mRequestObserver = mApi.streamingRecognize(mResponseObserver);
    StreamingRecognitionConfig streamingConfig = StreamingRecognitionConfig.newBuilder()
     StreamingRecognizeRequest streamingRecognizeRequest = StreamingRecognizeRequest.newBuilder().setStreamingConfig(streamingConfig).build();
public void recognize(byte[] data, int size) {
    if (mRequestObserver == null) {
    // Call the streaming recognition API
            .setAudioContent(ByteString.copyFrom(data, 0, size))

Streaming responses

Streaming speech recognition results are returned within a series of responses of type StreamingRecognitionResponse. Response consists of the following fields:

  • speechEventType contains events of type SpeechEventType.
  • results contains the list of results, which may be either interim or final result, of type StreamingRecognitionResult. The results list contains following the sub-fields:
    • alternatives contains a list of alternative transcriptions.
    • isFinalinterim or are final.
    • stability indicates the volatility of results obtained so far, with 0.0 indicating complete instability while 1.0 indicates complete stability.
private final StreamObserver<StreamingRecognizeResponse> mResponseObserver = new StreamObserver<StreamingRecognizeResponse>() {
    public void onNext(StreamingRecognizeResponse response) {
        String text = null;
        boolean isFinal = false;
        if (response.getResultsCount() > 0) {
             final StreamingRecognitionResult result = response.getResults(0);
             isFinal = result.getIsFinal();
             if (result.getAlternativesCount() > 0) {
                  final SpeechRecognitionAlternative alternative = result.getAlternatives(0);
                  text = alternative.getTranscript();
         if (text != null) {
              for (Listener listener : mListeners) {
                   listener.onSpeechRecognized(text, isFinal);
     public void onError(Throwable t) {
         Log.e(TAG, "Error calling the API.", t);
     public void onCompleted() {
         Log.i(TAG, "API completed.");

Download this project from GitHub

Related Post

Google Cloud Natural Language API in Android APP Google Cloud Vision API in Android APP

Speech Recognition Using TensorFlow