Google Cloud Speech API in Android APP

So many of you have used the “Ok Google” functionality on your phone.So what the Speech API does is it lets developers integrate that functionality into their own applications, lets you do speech to text transcription in over 80 languages.

There’s a lot of potential use case where you may want to combine different cloud machine learning APIs.In this case, they’re using the Speech API.So whenever the two people don’t speak the same language, They’ll transcribe the audio with the Speech API and then they translate it into another person’s host language with the Translation API.

But it also works in streaming mode.So if you want to send it a continuous stream of audio, you can send it to a stream of audio and it returns transcriptions as that audio is coming in.once you’re done transcribing your text, you may want to do more analysis on your text.In this tutorial, we are going to learn Streaming Recognition to perform speech recognition.

So the best way to see how the speech API works is through a demo.

Prerequisites

  • Google Cloud Platform account(You can use 12 months free trial)

1.Acquiring an API Key

To Use Google Cloud Speech API services in the app, you need a Service account keys. You can get one by creating a new project in the Google Cloud Platform console.

Once the project has been created, go to API Manager > Dashboard and press the Enable API button.

Enable Google Cloud Speech API

2.Creating a New Android Project

Google provides client libraries in a number of programming languages to simplify the process of building and sending requests and receiving and parsing responses.

Add the following compile dependencies to the app build.gradle:

Set Up to Authenticate With Your Project’s Credentials

This Android app uses JSON credential file locally stored in the resources.we just put the Service Account in the client for ease of use. The app still gets an access token using the service account credential, and use the token to call the API.

In order to get Service Account Key, visit the Cloud Console, and navigate to: API Manager > Credentials > Create credentials > Service account key > New service account. Create a new service account, and download the JSON credentials file. Put the file in the app resources as app/src/main/res/raw/credential.json.

Generate Google Cloud Service Account

Streaming Speech API Recognition Requests

A streaming Speech API recognition call is designed for real-time capture and recognition of audio, within a bi-directional stream. Your application can send audio on the request stream and receive interim and final recognition results on the response stream in real time. Interim results represent the current recognition result for a section of audio, while the final recognition result represents the last, best guess for that section of audio.

Streaming requests

You send both the configuration and audio within a single request, calling the streaming Speech API requires sending multiple requests. The first StreamingRecognizeRequest must contain a configuration of type StreamingRecognitionConfig without any accompanying audio. Subsequent StreamingRecognizeRequests sent over the same stream will then consist of consecutive frames of raw audio bytes.

StreamingRecognitionConfig consists of the following fields:

  • config – (required) contains configuration information for the audio, of type RecognitionConfig and is the same as that shown within synchronous and asynchronous requests.
  • single_utterance – (optional, defaults to false) indicates whether this request should automatically end after speech is no longer detected. If set, the Speech API will detect pauses, silence, or non-speech audio to determine when to end recognition. If not set, the stream will continue to listen and process audio until either the stream is closed directly, or the stream’s limit length has been exceeded. Setting single_utterance to true is useful for processing voice commands.
  • interim_results – (optional, defaults to false) indicates that this stream request should return temporary results that may be refined at a later time (after processing more audio). Interim results will be noted within responses through the setting of is_final to false.

Streaming responses

Streaming speech recognition results are returned within a series of responses of type StreamingRecognitionResponse. Response consists of the following fields:

  • speechEventType contains events of type SpeechEventType. The value of these events will indicate when a single utterance has been determined to have been completed. The speech events serve as markers with in your stream’s response.
  • results contains the list of results, which may be either interim or final result, of type
    StreamingRecognitionResult. The results list contains following the sub-fields:

    • alternatives contains a list of alternative transcriptions.
    • isFinal indicates whether the results obtained within this list entry are interim or are final.
    • stability indicates the volatility of results obtained so far, with 0.0 indicating complete instability while 1.0 indicates complete stability. Note that unlike confidence, which estimates whether a transcription is correct, stability estimates whether the given partial result may change. If isFinal is set to truestability will not be set.

 

Download this project from GitHub

Related Post

Google Cloud Natural Language API in Android APP

Google Cloud Vision API in Android APP

Speech Recognition Using TensorFlow

 

 

Google Cloud Natural Language API in Android APP

When you want to do something more with your text once you’ve transcribed it.You might want to analyze it.That’s where the Natural Language API comes into play.The Natural Language API lets you extract entities, sentiment, and syntax from your text.Real world example is customer feedback platform.They enable all of their users to do is gather feedback from their app’s user as they’re going through their application.So Customer their make sense of all this open-ended feedback.It’s that open-ended text feedback that’s much more difficult for them to make sense of.That’s where the Natural Language API comes into play.Entity and Syntax annotation to pull out the key subjects and terms from the feedback, and then if necessary, route those to the right person in real time to respond to the feedback.

In this tutorial, I’ll introduce you to the Cloud Natural Language platform and show you how to use it to analyze text.

Prerequisites

  • Google Cloud Platform account(You can use 12 months free trial)

 

1.Acquiring an API Key

To Use Google Cloud Natural Language API services in the app, you need an API key. You can get one by creating a new project in the Google Cloud Platform console.

Once the project has been created, go to API Manager > Dashboard and press the Enable API button.

Enable Natural Language API

To get API key, go to the Credentials tab, press the Create Credentials button, and select API key.

Google Cloud Vision API Key

2.Creating a New Android Project

Google provides client libraries in a number of programming languages to simplify the process of building and sending requests and receiving and parsing responses.

Add the following compile dependencies to the app build.gradle:

Add INTERNET permission in the AndroidManifest.xml file.

 

To interact with the API using the Google API Client library, you must create a CloudNaturalLanguage object using the CloudNaturalLanguage.Builder class. Its constructor also expects an HTTP transport and a JSON factory.

Furthermore, by assigning a CloudNaturalLanguageRequestInitializer instance to it, you can force it to include your API key in all its
requests.

All the text you want to analyze using the API must be placed inside a Document object. The Document object must also contain configuration information, such as the language of the text and whether it is formatted as plain text or HTML. Add the following code:

Next, you must create a Features object specifying the features you are interested in analyzing. The following code shows you how to create a Features object that says you want to extract entities and run sentiment analysis only.

Use the Document and Features objects to compose an AnnotateTextRequest object, which can be passed to the annotateText() method to generate an AnnotateTextResponse object.

 

Entity Analysis

So I’ve got the sentence and I sent it to the entity extraction endpoint of the Natural Language API And it return all of these as entities in my text. So we can see that each entity, we get the name of the entity, in this case, Google.The type of the entity is organization.Then we get back some metadata.MID ID that maps to Google’s Knowledge Graph.If you want to get more information about the entity, you can call Google’s Knowledge Graph API, passing it this ID.We also get the Wikipedia URL for this particular entity.

You can extract a list of entities from the AnnotateTextResponse object by calling its getEntities() method.

Analyze Entity

 

Sentiment Analysis

Analyze the sentiment of your text.If we have this restaurant review,

The food at that restaurant has stale,I will not be going back.

If I worked at this restaurant, I ’d and potentially follow up with this customer to see why they didn’t like it.But it’s likely that I would have lots and lots of reviews, and I probably wouldn’t want to read each one manually.I might want to flag the most positive and most negative once and then respond just to those. So we get two number back from the Natural Language API to help us do this.The first thing we get back is score, which will tell us on a scale from -1 to 1 how positive or negative is this text? In this example, we get negative 0.8, which is almost fully negative.Then we get magnitude, which tells us regardless of being positive or negative, how strong is the sentiment in this text?And this is a range from 0 to infinity, and it’s normalize based on the length of the text.So we get a pretty small number here,0.8 because this is just a small piece of text.

You can extract the overall sentiment of the transcript by calling the getDocumentSentiment() method. To get the actual score of the sentiment, however, you must also call the getScore() method, which returns a float.

analyze sentiment

 

Download this project from GitHub

Related Post

Google Cloud Vision API in Android APP

Google Cloud Speech API in Android APP

Google Cloud Vision API in Android APP

Smartphone users are starting to look for smarter apps. As an Android Developer, you probably lack the resources needed to create automation, artificial intelligence, and machine learning apps from scratch.

Google recently launched a Cloud Machine Learning platform, which offers Neural Networks that have been pre-trained model to perform a variety of tasks.You can use them by simply making a few REST API calls or Client library.

In this tutorial, I’ll introduce you to the Cloud Machine Learning platform and show you how to use it to create a smart Android app that can recognize the real-world object.

Prerequisites

  • Device running Android 4.4+
  • Google Cloud Platform account(You can use 12 months free trial)

1.Acquiring an API Key

To Use Google Vision API services in the app, you need an API key. You can get one by creating a new project in the Google Cloud Platform console.

Once the project has been created, go to API Manager > Dashboard and press the Enable API button.

Enable Google Vision API

To get API key, go to the Credentials tab, press the Create Credentials button, and select API key.

Google Cloud Vision API

2.Creating a New Android Project

Google Cloud Vision API use to understand the content of an image by machine learning models using  REST API. It quickly classifies images into thousands of categories, detects objects and faces within images, and finds and reads printed words contained within images.

Google provides client libraries in a number of programming languages to simplify the process of building and sending requests and receiving and parsing responses.

Add the following compile dependencies to app build.gradle:

Add INTERNET permission in the AndroidManifest.xml file.

Step 1: Create an Intent

Creating a new intent with the ACTION_IMAGE_CAPTURE action and passing it to the startActivityForResult() method, you can ask the default camera app of the user’s device to take pictures and pass them on to your app. Add the following code to your Activity class:

Receive the images captured by the default camera app in onActivityResult() method of activity class. you’ll have access to a Bundle object containing all the image data. You can render the image data by simply converting it into a Bitmap and passing it to the ImageView widget.

Step 3: Encode the Image

The Vision API cannot use Bitmap objects directly. It expects a Base64-encoded string of compressed image data.To compress the image data, you can use the compress() method of the Bitmap class. As its arguments, the method expects the compression format to use, the output quality desired, and a ByteArrayOutputStream object. The following code compresses the bitmap using the JPEG format.

Step 4: Create Feature

The Feature indicates what type of image detection task to perform. Describe the type of Vision tasks to perform over images by using Features. Features encode the Vision vertical to operate on and the number of top-scoring results to return.

This is the Java data model class that specifies how to parse/serialize into the JSON that is transmitted over HTTP when working with the Cloud Vision API.

Step 5: Create Request

Create the request for performing Vision tasks over a user-provided image, with user-requested features.

Step 6: Process the Image

Now, you need to interact with the Vision API. Start by creating a HttpTransport and VisionRequestInitializer that contains your API key:

CLOUD VISION API FEATURES

Derive insight from images with Google powerful Cloud Vision API

1.Label Detection

The Vision API can detect and extract information about entities within an image.Labels can identify objects, locations, activities, animal species, products, and more.

Vision API Label Detection

2.Landmark Detection

Landmark requests detect well-known natural and human-made landmarks and return identifying information such as an entity ID (that may be available in the Google Knowledge Graph), the landmark’s name and location, and the bounding box that surrounds the landmark in the image.

Vision API Landmark Detection

3.Logo Detection

Logo detection requests detect popular product and corporate logos within an image.

4.Safe Search Detection SAFE_SEARCH_DETECTION

Safe Search requests examine an image for potentially unsafe or undesirable content. Likelihood of such imagery is returned in 4 categories:

  • adult indicates content generally suited for 18 years plus, such as nudity, sexual activity, and pornography (including cartoons or anime).
  • spoof indicates content that has been modified from the original to make it funny or offensive.
  • medical indicates content such as surgeries or MRIs.
  • violent indicates violent content, including but not limited to the presence of blood, war images, weapons, injuries, or car crashes.Vision API Safe Search

5.Image Properties

An image properties request returns the dominant colors in the image as RGB values and percent of the total pixel count.

Vision API Image Property

 

Conclusion

In this tutorial, you learned how to use the Cloud Vision, which is part of the Google Cloud Machine Learning platform, in an Android app. There are many more such APIs offered by the platform. You can learn more about them by referring to the official documentation.

 


Download this project from GitHub

 

Related Post

Android TensorFlow Machine Learning

Google Cloud Natural Language API in Android APP

Google Cloud Speech API in Android APP