In the modern world, machine learning is used in various fields: image classification, consumer demand forecasts, film and music recommendations for particular people, and clustering. At the same time, for fairly large models, the result computation (and to a much greater degree the model’s training) can be resource-intensive.
To use the trained models on devices other than the most powerful ones, Google introduced its TensorFlow Lite framework. To work with it, you must train a model built using the TensorFlow framework (not Lite!) and then convert it to the TensorFlow Lite format. After that, the model can be easily used on embedded or mobile devices.
This article will describe all the steps for running a model on Android.
Training and Transfer of Model
For example, take one of the standard MobileNet models. Training will be conducted on a set of pictures ILSVRC-2012-CLS.
For training, we will download a set of models on a rather powerful Linux OS machine:
git clone
Install the build system, called Bazel, according to the instructions on the site.
Launch training of the model:
bazel build -c opt mobilenet_v1_{eval,train} ./bazel-bin/mobilenet_v1_train --dataset_dir the path to the set of pictures —checkpoint_dir the path to the checkpoints
These commands will train the model and create files with the * .tflite extension required by the TensorFlow Lite interpreter.
In the more general case, if it is necessary to use the model described using the TensorFlow framework, then after training, it should be saved and transferred to the TensorFlow Lite format using the converter. In our case, this step is not required.
Android Application
The application will contain the following functions:
- Image capture from camera in real time
- Image classification using TensorFlow Lite model
- Display of classification result on screen
The source code for the “image_classification” example from the website can be used as a template for this application.
Real-time image capture
The “android.hardware.camera2” framework will be used, and it will be available starting from Android 5. Using the CameraManager system service, we will get access to the camera and receive frames in the onImageAvailable method by implementing the OnImageAvailableListener interface.
Frames will come several times per second, with a frequency depending on the hardware implementation of the built-in camera. For clarity, we will also display the image received from the camera on the screen by placing the TextureView component in a layout with a size that matches the screen size (except for a small section at the bottom of the TextureView, where we will place information on the results of the object classification). To do this, we will associate this component with the output from the camera.
In the onImageAvailable method, we will first get the last available frame by calling acquireLatestImage (). A situation may arise in which the frame classification takes a long time; during this period, the camera may issue more than one frame. That is why we take the last frame, skipping unprocessed frames. The frame comes in the YUV420 format. Let’s convert it to ARGB8888 format by calling convertYUV420 to the ImageUtils library. Since the format we need is an array of single-precision floating-point numbers ranging from -1 to 1, we will also perform this conversion.
Image classification
Before performing the classification, at the beginning of the application execution, it is necessary to load the model from the wired-in file of the application itself stored in the Assets directory. You need to copy the model file mobilenet_v1_1.0_224.tflite, obtained at the “Training and transfer of the model” stage, and the description of classification objects label.txt.
After that, a TensorFlow Lite interpreter object is created. The model is passed as constructor parameters.
Classification is performed by simply calling the interpreter’s run method. The method’s first parameter is a recognizable frame in the format of an array of floating-point numbers we obtained from the camera. The second parameter is an array of floating-point arrays. These arrays will be filled with the model’s performance results. The result, in our case, is an array of probabilities. Each array element with an i index is the probability that the i-th object is in the frame.
Display of classification result on screen
Let’s display the name of the most probable object on the screen using the preloaded names of the objects from the file with names stored in Assets. In the model we use, the names are in English, but you can translate them into any language you’d like beforehand by changing the file labels.txt.
Alternatively, you can display not just the name of the most probable object but a list of the 3–5 most probable objects alongside with the corresponding probabilities.
Conclusion
We considered how a highly trained image classification model could be used in the Android OS. In addition, using models is also possible on embedded devices since the TensorFlow Lite interpreter has a C++ interface and takes up about 300 kilobytes of memory.
The article was initially published in Embedded Computing