If you want to bring the TensorFlow into your mobile applications there are some challenges you have to face. The neural network is big compared with the other classic machine learning models because deep learning you have to multiple layers. The total amount of the parameters and amount of the calculation is very large.
You can remove the all the variables from the TensorFlow graph and convert it into the constants. Once you have finish training you don’t have to those parameters in the variable. You can put everything into constant. Converting from variables to constants you can get much faster learning time.
Quantization in TensorFlow
Quantization is another optimization you can take for the mobile app. Quantizations means that you can compress the precision of each variable in parameters, weights, and biases into fewer operations. For example, TensorFlow uses the 32-bit floating point numbers for representing any weights and biases. But by using quantization, you can compress that into an 8-bit integer.
TensorFlow Lite is a lightweight ML library for mobile and embedded devices.TensorFlow works well on large devices and TensorFlow Lite works really well on small devices. It’s easier and faster and smaller to work on mobile devices.
How to build a model using TensorFlow Lite.
There are two aspects one is the workstation side and other one is the mobile side and let’s walk through the complete lifecycle.
The first step is to decide what model you want to use. One option is to use a pre-trained model the other option would be to retrain just the last layers like you did in the post. You can write custom model and train and generate a graph this is nothing specific to TensorFlow Lite this is as good as standard TensorFlow where you build a model and generate graph depths and checkpoints.
The next step is specific to TensorFlow Lite is to convert the generated model into a format the TensorFlow Lite understands. Prerequisite to converting it is to freeze graph.
Freezing the graph is a step where you combine these two results and feed it to your converter. The converter is provided as part of the TensorFlow Lite software. You can use this to convert your model into the format that you need. Once conversion step is completed you will have what is called as a .lite binary file.
Move the model to the mobile side
You feed this TensorFlow Lite model into the interpreter.The interpreter executes the model using a set of operators.If the interpreter is running a CPU then this can be executed directly on the CPU otherwise if there is hardware acceleration then it can be executed on the hardware accelerated hardware as well.
Components of TensorFlow Lite
The main components of TensorFlow Lite are the model file format, the interpreter for processing the graph, a set of kernels to work to or where the interpreter can invoke a set of kernels, and lastly an interface to the hardware acceleration layer.
TensorFlow Lite has a special model file formate and this is lightweight and has very few dependencies and most graph calculations are done using 32-bit float
The interpreter is engineered to be work with low overhead and on small devices. TensorFlow Lite has very few dependencies and it is easy to build on simple devices.TensorFlow Lite kept the binary size of 70KB and 300KB with operators.
It uses FlatBuffers. So it can load really and the speed comes at the cost of flexibility.TensFolw Lite support only a subset of operators that TensorFlow has.
The set of operators are smaller. Every model will be not supported them, in particular, TensorFlow Lite provides a set of core built-in ops and these have been optimized for arm CPU using neon and they work in both float and quantized.
4.Interface to Hardware Acceleration
It targets custom hardware. It is the neural network API TensorFlow lite comes pre-loaded with hooks for neural network API. If your device supports NN API then tensor flow lite will delegate these operators into NN API and if you have a device that does not support NN API it’s executed directly on the CPU.
Android Neural Network API
Android Neural Network API is supported for Android with 8.1+ release in Oreo. It will support various hardware acceleration. It uses TensorFlow as a core technology.
You can use TensorFlow to write your mobile app and your app will get the benefits of hardware acceleration through your NN API. It basically abstracts the hardware layer for ML inference, for example, if a device has ML DSP it can transparently map to it and it uses NN primitives that are very similar to TensorFlow Lite.
Architecture for neural network API’s looks like this essentially there’s an android app. On top typically there is no need for the Android app to access the neural network API directly it will be accessing it through the machine learning interface which is the TensorFlow Lite interpreter and the NN runtime. The neural network runtime can talk to the hardware abstraction layer and then which talks to their device and run various accelerators.