Many of you have used the “Ok Google” functionality on your phone. What the Speech API does is it lets developers integrate that functionality into their own applications, lets you do speech to text transcription in over 80 languages. It also works in streaming mode.If you want to send it a continuous stream of audio, you can send it to a stream of audio and it returns transcriptions as that audio is coming in.In this tutorial, we are going to learn Streaming Recognition to perform speech recognition. So the best way to see how the speech API works is through a demo.
Prerequisites
- Google Cloud Platform account(You can use 12 months free trial)
1.Acquiring an API Key
To Use Google Cloud Speech API services in the app, you need a Service account keys. You can get one by creating a new project in the Google Cloud Platform console. Once the project has been created, go to API Manager > Dashboard and press the Enable API button.
2.Creating a New Android Project
Google provides client libraries in a to simplify the process of building and sending requests and receiving and parsing responses. Add the following compile dependencies to the app build.gradle
:
apply plugin: 'com.google.protobuf' ext { grpcVersion = '1.4.0' } android { ... configurations.all { resolutionStrategy.force 'com.google.code.findbugs:jsr305:3.0.2' } } protobuf { protoc { artifact = 'com.google.protobuf:protoc:3.3.0' } plugins { javalite { artifact = "com.google.protobuf:protoc-gen-javalite:3.0.0" } grpc { artifact = "io.grpc:protoc-gen-grpc-java:${grpcVersion}" } } generateProtoTasks { all().each { task -> task.plugins { javalite {} grpc { // Options added to --grpc_out option 'lite' } } } } } dependencies { .... // gRPC compile "io.grpc:grpc-okhttp:$grpcVersion" compile "io.grpc:grpc-protobuf-lite:$grpcVersion" compile "io.grpc:grpc-stub:$grpcVersion" compile 'javax.annotation:javax.annotation-api:1.2' protobuf 'com.google.protobuf:protobuf-java:3.3.1' compile group: 'com.google.api.grpc', name: 'grpc-google-cloud-speech-v1', version: '0.1.13' // OAuth2 for Google API compile('com.google.auth:google-auth-library-oauth2-http:0.7.0') { exclude module: 'httpclient' } compile 'com.android.support:multidex:1.0.0' }
Set Up to Authenticate With Your Project’s Credentials
In order to get Service Account Key, visit the Cloud Console, and navigate to: API Manager > Credentials > Create credentials > Service account key > New service account
. Create a new service account, and download the JSON credentials file. Put the file in the app resources as app/src/main/res/raw/credential.json
.
Streaming Speech API Recognition Requests
Streaming Speech API recognition call is designed for real-time capture and recognition of audio, within a bi-directional stream. Your application can send audio on the request stream and receive interim and final recognition results on the response stream in real time. Interim results represent the current recognition result for a section of audio, while the final recognition result represents the last, best guess for that section of audio.
Streaming requests
You send both the configuration and audio within a single request, calling the streaming Speech API requires sending multiple requests. The first StreamingRecognizeRequest
must contain a configuration of type StreamingRecognitionConfig without any accompanying audio. Subsequent StreamingRecognizeRequest
s sent over the same stream will then consist of consecutive frames of raw audio bytes. A StreamingRecognitionConfig
consists of the following fields:
config
– (required) contains configuration information for the audio, of type RecognitionConfig.single_utterance
– (optional, defaults tofalse
) indicates whether this request should automatically end after speech is no longer detected.interim_results
– (optional, defaults tofalse
) indicates that this stream request should return temporary results that may be refined at a later time.
public void startRecognizing(int sampleRate) { if (mApi == null) { Log.w(TAG, "API not ready. Ignoring the request."); return; } // Configure the API mRequestObserver = mApi.streamingRecognize(mResponseObserver); StreamingRecognitionConfig streamingConfig = StreamingRecognitionConfig.newBuilder() .setConfig(RecognitionConfig.newBuilder() .setLanguageCode("en-US") .setEncoding(RecognitionConfig.AudioEncoding.LINEAR16) .setSampleRateHertz(sampleRate) .build() ) .setInterimResults(true) .setSingleUtterance(true) .build(); StreamingRecognizeRequest streamingRecognizeRequest = StreamingRecognizeRequest.newBuilder().setStreamingConfig(streamingConfig).build(); mRequestObserver.onNext(streamingRecognizeRequest); } public void recognize(byte[] data, int size) { if (mRequestObserver == null) { return; } // Call the streaming recognition API mRequestObserver.onNext(StreamingRecognizeRequest.newBuilder() .setAudioContent(ByteString.copyFrom(data, 0, size)) .build()); }
Streaming responses
Streaming speech recognition results are returned within a series of responses of type StreamingRecognitionResponse. Response consists of the following fields:
speechEventType
contains events of type SpeechEventType.results
contains the list of results, which may be either interim or final result, of type StreamingRecognitionResult. Theresults
list contains following the sub-fields:alternatives
contains a list of alternative transcriptions.isFinal
interim or are final.stability
indicates the volatility of results obtained so far, with0.0
indicating complete instability while1.0
indicates complete stability.
private final StreamObserver<StreamingRecognizeResponse> mResponseObserver = new StreamObserver<StreamingRecognizeResponse>() { @Override public void onNext(StreamingRecognizeResponse response) { String text = null; boolean isFinal = false; if (response.getResultsCount() > 0) { final StreamingRecognitionResult result = response.getResults(0); isFinal = result.getIsFinal(); if (result.getAlternativesCount() > 0) { final SpeechRecognitionAlternative alternative = result.getAlternatives(0); text = alternative.getTranscript(); } } if (text != null) { for (Listener listener : mListeners) { listener.onSpeechRecognized(text, isFinal); } } } @Override public void onError(Throwable t) { Log.e(TAG, "Error calling the API.", t); } @Override public void onCompleted() { Log.i(TAG, "API completed."); }
Download this project from GitHub
Related Post
Google Cloud Natural Language API in Android APP Google Cloud Vision API in Android APP
Speech Recognition Using TensorFlow
]]>