Google recently launched a Cloud Machine Learning platform, which offers Neural Networks that have been pre-trained model to perform a variety of tasks.You can use them by simply making a few REST API calls or Client library.

Google Cloud Vision API use to understand the content of an image by machine learning models using  REST API. It quickly classifies images into thousands of categories, detects objects and faces within images, and finds and reads printed words contained within images.

In this tutorial, I’ll introduce you to the Cloud Machine Learning platform and show you how to use it to create a smart Android app that can recognize the real-world object.


  • A device running Android 4.4+
  • Google Cloud Platform account(You can use 12 months free trial)

1.Acquiring an API Key

To Use Google Vision API services in the app, you need an API key. You can get one by creating a new project in the Google Cloud Platform console.

Once the project has been created, go to API Manager > Dashboard and press the Enable API button.

Enable Google Vision API

To get API key, go to the Credentials tab, press the Create Credentials button, and select API key.

Google Cloud Vision API

2.Creating a New Android Project

Google provides client libraries to simplify the process of building and sending requests and receiving and parsing responses.

Add the following compile dependencies to app build.gradle:

Add INTERNET permission in the AndroidManifest.xml file.

Step 1: Create an Intent

Creating a new intent with the ACTION_IMAGE_CAPTURE action and passing it to the startActivityForResult() method, you can ask the default camera app of the user’s device to take pictures and pass them on to your app. Add the following code to your Activity class:

Receive the images captured by the default camera app in onActivityResult() method of activity class. you’ll have access to a Bundle object containing all the image data. You can render the image data by simply converting it into a Bitmap and passing it to the ImageView widget.

Step 3: Encode the Image

The Vision API cannot use Bitmap objects directly. It expects a Base64-encoded string of compressed image data.To compress the image data, you can use the compress() method of the Bitmap class. As its arguments, the method expects the compression format to use, the output quality desired, and a ByteArrayOutputStream object. The following code compresses the bitmap using the JPEG format.

Step 4: Create Feature

The Feature indicates what type of image detection task to perform. Describe the type of Vision tasks to perform over images by using Features. Features encode the Vision vertical to operate on and the number of top-scoring results to return.

This is the Java data model class that specifies how to parse/serialize into the JSON that is transmitted over HTTP when working with the Cloud Vision API.

Step 5: Create Request

Create the request for performing Vision tasks over a user-provided image, with user-requested features.

Step 6: Process the Image

Now, you need to interact with the Vision API. Start by creating a HttpTransport and VisionRequestInitializer that contains your API key:

1.Label Detection

The Vision API can detect and extract information about entities within an image.Labels can identify objects, locations, activities, animal species, products, and more.

Vision API Label Detection

2.Landmark Detection

Landmark requests detect well-known natural and human-made landmarks and return identifying information such as an entity ID, the landmark’s name and location, and the bounding box that surrounds the landmark in the image.

Vision API Landmark Detection

3.Logo Detection

Logo detection requests detect popular product and corporate logos within an image.

4.Safe Search Detection 

Safe Search requests examine an image for potentially unsafe or undesirable content. Likelihood of such imagery is returned in 4 categories:

  • adultindicates content generally suited for 18 years plus, such as nudity, sexual activity, and pornography (including cartoons or anime).
  • spoofindicates content that has been modified from the original to make it funny or offensive.
  • medicalindicates content such as surgeries or MRIs.
  • violentindicates violent content, including but not limited to the presence of blood, war images, weapons, injuries, or car crashes.Vision API Safe Search

5.Image Properties

An image properties request returns the dominant colors in the image as RGB values and percent of the total pixel count.

Vision API Image Property



In this tutorial, you learned how to use the Cloud Vision, which is part of the Google Cloud Machine Learning platform, in an Android app. There are many more such APIs offered by the platform. You can learn more about them by referring to the official documentation.


Download this project from GitHub


Related Post

Android TensorFlow Machine Learning

Google Cloud Natural Language API in Android APP

Google Cloud Speech API in Android APP

2 thoughts on “Google Cloud Vision API in Android APP”

  1. Thank you for your tutorial, working pretty well.
    For anyone who wants to use a text-to-image recognition function, it follows the code I made and is working:

    if(flag == “DOCUMENT_TEXT_DETECTION”){
    labels = response.getResponses().get(0).getTextAnnotations();
    if (labels != null){
    for(EntityAnnotation label : labels){
    message += label.getDescription();
    message += “\n”;

    } else {
    message += “Nothing found!”;

Leave a Reply