Google Cloud Speech API in Android APP

So many of you have used the “Ok Google” functionality on your phone.So what the Speech API does is it lets developers integrate that functionality into their own applications, lets you do speech to text transcription in over 80 languages.

There’s a lot of potential use case where you may want to combine different cloud machine learning APIs.In this case, they’re using the Speech API.So whenever the two people don’t speak the same language, They’ll transcribe the audio with the Speech API and then they translate it into another person’s host language with the Translation API.

But it also works in streaming mode.So if you want to send it a continuous stream of audio, you can send it to a stream of audio and it returns transcriptions as that audio is coming in.once you’re done transcribing your text, you may want to do more analysis on your text.In this tutorial, we are going to learn Streaming Recognition to perform speech recognition.

So the best way to see how the speech API works is through a demo.

Prerequisites

  • Google Cloud Platform account(You can use 12 months free trial)

1.Acquiring an API Key

To Use Google Cloud Speech API services in the app, you need a Service account keys. You can get one by creating a new project in the Google Cloud Platform console.

Once the project has been created, go to API Manager > Dashboard and press the Enable API button.

Enable Google Cloud Speech API

2.Creating a New Android Project

Google provides client libraries in a number of programming languages to simplify the process of building and sending requests and receiving and parsing responses.

Add the following compile dependencies to the app build.gradle:

Set Up to Authenticate With Your Project’s Credentials

This Android app uses JSON credential file locally stored in the resources.we just put the Service Account in the client for ease of use. The app still gets an access token using the service account credential, and use the token to call the API.

In order to get Service Account Key, visit the Cloud Console, and navigate to: API Manager > Credentials > Create credentials > Service account key > New service account. Create a new service account, and download the JSON credentials file. Put the file in the app resources as app/src/main/res/raw/credential.json.

Generate Google Cloud Service Account

Streaming Speech API Recognition Requests

A streaming Speech API recognition call is designed for real-time capture and recognition of audio, within a bi-directional stream. Your application can send audio on the request stream and receive interim and final recognition results on the response stream in real time. Interim results represent the current recognition result for a section of audio, while the final recognition result represents the last, best guess for that section of audio.

Streaming requests

You send both the configuration and audio within a single request, calling the streaming Speech API requires sending multiple requests. The first StreamingRecognizeRequest must contain a configuration of type StreamingRecognitionConfig without any accompanying audio. Subsequent StreamingRecognizeRequests sent over the same stream will then consist of consecutive frames of raw audio bytes.

StreamingRecognitionConfig consists of the following fields:

  • config – (required) contains configuration information for the audio, of type RecognitionConfig and is the same as that shown within synchronous and asynchronous requests.
  • single_utterance – (optional, defaults to false) indicates whether this request should automatically end after speech is no longer detected. If set, the Speech API will detect pauses, silence, or non-speech audio to determine when to end recognition. If not set, the stream will continue to listen and process audio until either the stream is closed directly, or the stream’s limit length has been exceeded. Setting single_utterance to true is useful for processing voice commands.
  • interim_results – (optional, defaults to false) indicates that this stream request should return temporary results that may be refined at a later time (after processing more audio). Interim results will be noted within responses through the setting of is_final to false.

Streaming responses

Streaming speech recognition results are returned within a series of responses of type StreamingRecognitionResponse. Response consists of the following fields:

  • speechEventType contains events of type SpeechEventType. The value of these events will indicate when a single utterance has been determined to have been completed. The speech events serve as markers with in your stream’s response.
  • results contains the list of results, which may be either interim or final result, of type
    StreamingRecognitionResult. The results list contains following the sub-fields:

    • alternatives contains a list of alternative transcriptions.
    • isFinal indicates whether the results obtained within this list entry are interim or are final.
    • stability indicates the volatility of results obtained so far, with 0.0 indicating complete instability while 1.0 indicates complete stability. Note that unlike confidence, which estimates whether a transcription is correct, stability estimates whether the given partial result may change. If isFinal is set to truestability will not be set.

 

Download this project from GitHub

Related Post

Google Cloud Natural Language API in Android APP

Google Cloud Vision API in Android APP

Speech Recognition Using TensorFlow

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *