Google recently launched a Cloud Machine Learning platform, which offers Neural Networks that have been pre-trained model to perform a variety of tasks. You can use them by simply making a few REST API calls or Client library. Google Cloud Vision API use to understand the content of an image by machine learning models using  REST API. It quickly classifies images into thousands of categories, detects objects and faces within images, and finds and reads printed words contained within images. In this tutorial, I’ll introduce you to the Cloud Machine Learning platform and show you how to use it to create a smart Android app that can recognize the real-world object.


  • A device running Android 4.4+
  • Google Cloud Platform account(You can use 12 months free trial)

1.Acquiring an API Key

To Use Google Vision API services in the app, you need an API key. You can get one by creating a new project in the Google Cloud Platform console.

Once the project has been created, go to API Manager > Dashboard and press the Enable API button.

Enable Google Vision API To get API key, go to the Credentials tab, press the Create Credentials button, and select API key. Google Cloud Vision API

2.Creating a New Android Project

Google provides client libraries to simplify the process of building and sending requests and receiving and parsing responses.

Add the following compile dependencies to app build.gradle:

android {
    configurations.all {
        resolutionStrategy.force ''
dependencies {
    compile '' exclude module: 'httpclient'
    compile '' exclude module: 'httpclient'
    compile '' exclude module: 'httpclient'

Add INTERNET permission in the AndroidManifest.xml file.

<uses-permission android:name="android.permission.INTERNET"/>

Step 1: Create an Intent

Creating a new intent with the ACTION_IMAGE_CAPTURE action and passing it to the startActivityForResult() method, you can ask the default camera app of the user’s device to take pictures and pass them on to your app. Add the following code to your Activity class:

public void takePictureFromCamera() {
     Intent intent = new Intent(MediaStore.ACTION_IMAGE_CAPTURE);
     startActivityForResult(intent, CAMERA_REQUEST_CODE);

Receive the images captured by the default camera app in onActivityResult() method of activity class. you’ll have access to a Bundle object containing all the image data. You can render the image data by simply converting it into a Bitmap and passing it to the ImageView widget.

protected void onActivityResult(int requestCode, int resultCode,
                                    Intent data) {
    if (requestCode == CAMERA_REQUEST_CODE && resultCode == RESULT_OK) {
          bitmap = (Bitmap) data.getExtras().get("data");
          callCloudVision(bitmap, feature);

Step 3: Encode the Image

The Vision API cannot use Bitmap objects directly. It expects a Base64-encoded string of compressed image data.To compress the image data, you can use the compress() method of the Bitmap class. As its arguments, the method expects the compression format to use, the output quality desired, and a ByteArrayOutputStream object. The following code compresses the bitmap using the JPEG format.

private Image getImageEncodeImage(Bitmap bitmap) {
     Image base64EncodedImage = new Image();
     // Convert the bitmap to a JPEG
     // Just in case it's a format that Android understands but Cloud Vision
     ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
     bitmap.compress(Bitmap.CompressFormat.JPEG, 90, byteArrayOutputStream);
     byte[] imageBytes = byteArrayOutputStream.toByteArray();
     // Base64 encode the JPEG
     return base64EncodedImage;

Step 4: Create Feature

The Feature indicates what type of image detection task to perform. Describe the type of Vision tasks to perform over images by using Features. Features encode the Vision vertical to operate on and the number of top-scoring results to return.

Feature feature = new Feature();

This is the Java data model class that specifies how to parse/serialize into the JSON that is transmitted over HTTP when working with the Cloud Vision API.

Step 5: Create Request

Create the request for performing Vision tasks over a user-provided image, with user-requested features.

AnnotateImageRequest annotateImageReq = new AnnotateImageRequest();

Step 6: Process the Image

Now, you need to interact with the Vision API. Start by creating a HttpTransport and VisionRequestInitializer that contains your API key:

new AsyncTask<Object, Void, String>() {
    protected String doInBackground(Object... params) {
          try {
              HttpTransport httpTransport = AndroidHttp.newCompatibleTransport();
              JsonFactory jsonFactory = GsonFactory.getDefaultInstance();
              VisionRequestInitializer requestInitializer = new VisionRequestInitializer(CLOUD_VISION_API_KEY);
              Vision.Builder builder = new Vision.Builder(httpTransport, jsonFactory, null);
              Vision vision =;
              BatchAnnotateImagesRequest batchAnnotateImagesRequest = new BatchAnnotateImagesRequest();
              Vision.Images.Annotate annotateRequest = vision.images().annotate(batchAnnotateImagesRequest);
              BatchAnnotateImagesResponse response = annotateRequest.execute();
             return convertResponseToString(response);
         } catch (GoogleJsonResponseException e) {
              Log.d(TAG, "failed to make API request because " + e.getContent());
         } catch (IOException e) {
              Log.d(TAG, "failed to make API request because of other IOException " + e.getMessage());
         return "Cloud Vision API request failed. Check logs for details.";
    protected void onPostExecute(String result) {

1.Label Detection

The Vision API can detect and extract information about entities within an image.Labels can identify objects, locations, activities, animal species, products, and more.

Vision API Label Detection

AnnotateImageResponse imageResponses = response.getResponses().get(0);
List<EntityAnnotation> entityAnnotations;
 entityAnnotations = imageResponses.getLabelAnnotations();
 if (entityAnnotation != null) {
        for (EntityAnnotation entity : entityAnnotation) {
               message = message + "    " + entity.getDescription() + " " + entity.getScore();
               message += "\n";
 } else {
         message = "Nothing Found";

2.Landmark Detection

Landmark requests detect well-known natural and human-made landmarks and return identifying information such as an entity ID, the landmark’s name and location, and the bounding box that surrounds the landmark in the image.

Vision API Landmark Detection

entityAnnotations = imageResponses.getLogoAnnotations();

3.Logo Detection

Logo detection requests detect popular product and corporate logos within an image.

entityAnnotations = imageResponses.getLogoAnnotations();

4.Safe Search Detection

Safe Search requests examine an image for potentially unsafe or undesirable content. Likelihood of such imagery is returned in 4 categories:

  • adultindicates content generally suited for 18 years plus, such as nudity, sexual activity, and pornography (including cartoons or anime).
  • spoofindicates content that has been modified from the original to make it funny or offensive.
  • medicalindicates content such as surgeries or MRIs.
  • violentindicates violent content, including but not limited to the presence of blood, war images, weapons, injuries, or car crashes.Vision API Safe Search
SafeSearchAnnotation annotation = imageResponses.getSafeSearchAnnotation();
message = String.format("adult: %s\nmedical: %s\nspoofed: %s\nviolence: %s\n",

5.Image Properties

An image properties request returns the dominant colors in the image as RGB values and percent of the total pixel count.

Vision API Image Property

ImageProperties imageProperties = imageResponses.getImagePropertiesAnnotation();
message = getImageProperty(imageProperties);
DominantColorsAnnotation colors = imageProperties.getDominantColors();
for (ColorInfo color : colors.getColors()) {
        message = message + "" + color.getPixelFraction() + " - " + color.getColor().getRed() + " - " + color.getColor().getGreen() + " - " + color.getColor().getBlue();
        message = message + "\n";


In this tutorial, you learned how to use the Cloud Vision, which is part of the Google Cloud Machine Learning platform, in an Android app. There are many more such APIs offered by the platform. You can learn more about them by referring to the official documentation

Download this project from GitHub

Related Post

Android TensorFlow Machine Learning

Google Cloud Natural Language API in Android APP

Google Cloud Speech API in Android APP