TensorFlow Lite

What is TensorFlow?


Implement the Machine Learning or AI-powered applications running on mobile phones it may be easiest and the fastest way to use TensorFlow. which is the open source library for Machine Learning.TensorFlow is some google standard framework for building new ML or AI basis product. So this is a standard play mapper Machine Learning in google and created by Google brain team and Google has opensource in 2015.TensorFlow is scalable and portable. So you can get started with downloading TensorFlow code on your laptop and try out with some sample code and then you can move to you models the production level use cases by using GPU. After training the model you can bring the model which consists of tens of megabytes of data that could be ported to the mobile embedded systems.

Neural Network for Mobile


If you want to bring the TensorFlow into your mobile applications there are some challenges you have to face. The neural network is big compared with the other classic machine learning models because deep learning you have to multiple layers.So the total amount of the parameters and amount of the calculation you have to do it can be big for example, the inceptionV3 which is one of the popular image classification models that requires to 91 MB.If you use TensorFlow without any changes by default which consume like 12MB of the binary code. So if you want to bring your mobile applications in productions you don’t want to have users downloading 100 MB. When you’re starting to use your applications you may want to compress everything into the rack at 10-20 MB.So Google has to think about optimization for mobile applications things like pleasing graph quantization memory mapping and selective registration.

Freeze Graph

Freezing graph means that you can remove the all the variables from the TensorFlow graph and convert it into the constants.TensorFlow has the weights and biases so the parameters inside neural networks as a variable because you want to train the model you want to train the neural network in its training data but once you have finish training you don’t have to those parameters in the variable you can put everything into constant.So that by converting from variables to constants you can get much faster learning time.

Quantization in TensorFlow

Quantization is another optimization you can take for the mobile app.Quantizations means that you can compress the precision of each variable in parameters, weights, and biases into fewer operations.For example, by default, TensorFlow use the 32-bit floating point numbers for representing any weights and biases.But by using quantization, you can compress that into 8-bit integer.By using 8-bit integer, you can shrink the size of the parameters much, much smaller and especially for the embedded systems or mobile systems.It’s important to use the integer numbers rather than the floating point numbers to do all the calculations such as multiplications and additions between matrices and vectors because hardware for floating point requires much larger footprint in implementation.So TensorFlow already provides your primitive datatypes for supporting quantization of parameters and operations quantizing, and de-quantizing, or operations that support the quantized variables.

What is TensorFlow Lite?


We know that machine learning adds great power to your mobile application.So with great power comes great responsibility.TensorFlow Lite is a lightweight ML library for mobile and embedded devices.TensorFlow works well on large devices and TensorFlow Lite works really well on small devices. So that it’s easier and faster and smaller to work on mobile devices.

What is different between TensorFlow mobile and TensorFlow Lite?

You should view TensorFlow Lite as an evolution of TensorFlow mobile. TensorFlow Lite is like the next generation.This is created to be really small in size and opt for smaller devices.

TensorFlow Lite came up with three goals.It wanted to have a very small memory and binary size.So even without selective registration.It wants to keep the binary size small and it wants to make sure that the overhead latency is also really small you really can’t 30 seconds for an inference to happen by the time that model is downloaded and processes and quantization is the first-class citizen.It support quantization and many of the model’s support are quantized models.

TensorFlow Lite Architecture
TensorFlow Lite architecture

This is the high-level architecture as you can see it’s a simplified architecture and works both for Android and ios.This is lightweight performs better and leverages hardware acceleration if available.

So to better understanding how to write a model let’s consider how to build a model using TensorFlow Lite.There are two aspects one is the workstation side and other one is the mobile side and let’s walk through the complete lifecycle.
TensorFlow Lite lifecycle

The first step is to decide what model you want to use. So if you want to use their already pre-trained model then you can skip this step because you’ve already done the model generation. One option is to use a pre-trained model the other option would be to retrain just the last layers like you did in the post. You can write your own custom model and train and generate a graph this is nothing specific to TensorFlow Lite this is as good as standard TensorFlow where you build a model and generate graph depths and checkpoints.

The next step is specific to TensorFlow Lite is to convert the generated model into a format the TensorFlow Lite understands.A prerequisite to converting it is to freeze graph.So checkpoints have the weight the graphdef has the variables and tensors freezing the graph is a step where you combine these two results and feed it to your converter the converter is provided as part of the TensorFlow Lite software.You can use this to convert your model into the format that we need. Once this step is completed the conversion step is completed you will have what is called as a .lite binary file.

So now you have a means to move the model to the mobile side.You feed this TensorFlow Lite model into the interpreter.The interpreter executes the model using a set of operators.It supports selective operator loading and only and without this operator it’s only about 70KB and with all the operators it’s about 300KB so you can see how small the minor resize this is a significant reduction from what the TensorFlow is which is over 1 MB at this point so you can also implement custom kernels using the API.If the interpreter is running a CPU then this can be executed directly on the CPU otherwise if there is hardware acceleration then it can be executed on the hardware accelerated hardware as well.

Components of TensorFlow Lite

TensorFlow Lite ComponentThe main components of TensorFlow are the model file format, the interpreter for processing the graph, a set of kernels to work to or where the interpreter can invoke a set of kernels, and lastly an interface to the hardware acceleration layer.TensorFlow Lite has a special model file formate and this is lightweight and has very few dependencies and most graph calculations are done using 32-bit float, but neural networks are trained to be robust for noise and this allows us to explore lower precision numeric the advantages of lower precision numeric is lower memory and faster computation and this is vital for mobile and embedded devices this using lower precision can result in come amount of accuracy loss. So depending on the application you want to develop you can overcome this and use quantization lost in your training. So you can get better accuracy.So quantization is supported as the first class citizen in TensorFlow Lite.TensorFlow has also FlatBuffer base system so we can have the speed of execution.

1.FlatBuffers

FlatBuffer is an opensource Google project and it’s comparable to protocol buffers but much faster to use it’s much more memory efficient and in the past when we developed of application we always thought about optimizing for CPU instructions but now CPU are moved far ahead and writing something efficient for memory is more important today. So this is a FlatBuffers is a cross-platform serialization library and it is similar to protobufs but it is designed to be more efficient that you don’t need to you can access them without unpacking and there is no need for secondary representation before you access the data. So this is aimed for speed and efficiency and it is strongly typed so you can find errors at compile time.

2.Interpreter

The interpreter is engineered to be lower work with low overhead and on very small devices. TensorFlow Lite has very few dependencies and it is easy to build on simple devices.TensorFlow Lite kept the binary size of 70KB and 300KB with operators.

It uses FlatBuffers. So it can load really and the speed comes at the cost of flexibility.TensFolw Lite support only a subset of operators that TensorFlow has. So if you are building a mobile application and if the operators are supported by TensorFlow Lite then the recommendation is use TensorFlow Lite but if you are building a mobile application that is not supported by TensorFlow Lite yet then you should use TensorFlow mobile but be going forward all developer we are going to be using TensorFlow Lite as the main standard.

3.Ops/Kernels

It has support for operators and used in some common inference models.The set of operators are smaller.Every model will be not supported them, in particular, TensorFlow Lite provides a set of core built-in ops and these have been optimized for arm CPU using neon and they work in both float and quantized. These have been used by Google apps and so they have been battle tested and Gooogle has done the handoff. Google has hand optimized for many common patterns and it has fused many operations to reduce the memory bandwidth.If there are ops that are unsupported it also provides a C API so you could use custom operators and you can write your own operators for this.

4.Interface to Hardware Acceleration

It targets custom hardware.It is the neural network API TensorFlow lite comes pre-loaded with hooks for neural network API if you have an Android release that supports NN API then tensor flow lite will delegate these operators into NN API and if you have an Android release that does not support NN API it’s executed directly on the CPU.

Android Neural Network API


Android Neural Network API is supported for Android with 8.1 release in Oreo.It will support various hardware acceleration you can get from vendors for GPU for DPS and CUP.It uses TensorFlow as a core technology. So, for now, you can keep using TensorFlow to write your mobile app and your app will get the benefits of hardware acceleration through your NN API. It basically abstracts the hardware layer for ML inference for example if a device has ML DSP it can transparently map to it and it uses NN primitives that are very similar to TensorFlow Lite.

android neural network architecture

So It’s architecture for neural network API’s looks like this essentially there’s an android app. On top typically there is no need for the Android app to access the neural network API directly it will be accessing it through the machine learning interface which is the TensorFlow Lite interpreter and the NN runtime. The neural network runtime can talk to the hardware abstraction layer and then which talks to their device and run various accelerators.

 

Related Post

Image Classify Using TensorFlow Lite

Introduction TensorFlow Machine Learning Library

Install TensorFlow

Train Image classifier with TensorFlow

Train your Object Detection model locally with TensorFlow

Android TensorFlow Machine Learning

 

One Reply to “TensorFlow Lite”

Leave a Reply

Your email address will not be published. Required fields are marked *