Applying Machine Learning on Mobile Devices

In the modern world, machine learning is used in various fields: image classification, consumer demand forecasts, film and music recommendations for particular people, clustering. At the same time, for fairly large models, the result computation (and to a much greater degree the training of the model) can be a resource-intensive operation.

In order to use the trained models on devices other than the most powerful ones, Google introduced its TensorFlow Lite framework. To work with it, you need to train a model built using the TensorFlow framework (not Lite!) and then convert it to the TensorFlow Lite format. After that, the model can be easily used on embedded or mobile devices.

In this article, we will describe all the steps for running a model on Android.

Training and Transfer of Model

For example, take one of the standard MobileNet models. Training will be conducted on a set of pictures ILSVRC-2012-CLS.

For training, we will download a set of models on a rather powerful Linux OS machine:

git clone

Install the build system, called Bazel, according to the instructions on the site.

Launch training of the model:

bazel build -c opt mobilenet_v1_{eval,train}
./bazel-bin/mobilenet_v1_train --dataset_dir the path to the set of pictures
—checkpoint_dir the path to the checkpoints

These commands will perform the training of the model and create files with the * .tflite extension required for use by the TensorFlow Lite interpreter.

In the more general case, if it is necessary to use the model described using the TensorFlow framework, then after training, it should be saved and transferred to the TensorFlow Lite format using the converter. In our case, this step is not required.

Android Application

The application will contain the following functions:

  • Image capture from camera in real time
  • Image classification using TensorFlow Lite model
  • Display of classification result on screen

The source code for the “image_classification” example from the website can be used as a template for this application.

Real-time image capture

The “android.hardware.camera2” framework will be used, available starting from Android 5. Using the CameraManager system service, we will get access to the camera and receive frames in the onImageAvailable method by implementing the OnImageAvailableListener interface.

Frames will come several times per second, with a frequency depending on the hardware implementation of the built-in camera. For clarity, we will also display the image received from the camera on the screen by placing the TextureView component in a layout with a size that matches the screen size (except for a small section at the bottom of the TextureView, where we will place information on the results of the object classification). To do this, we will associate this component with the output from the camera.

In the onImageAvailable method, we will first get the last available frame by calling acquireLatestImage (). A situation may arise in which the frame classification takes a long time, and during this period, the camera may issue more than one frame. That is why we take the last frame, skipping unprocessed frames. The frame comes in the YUV420 format. Let’s convert it to ARGB8888 format by calling convertYUV420 to the ImageUtils library. Since the format we need is an array of single-precision floating-point numbers in the range from -1 to 1, we will also perform this conversion.

Image classification

Before performing the classification, at the beginning of the application execution, it is necessary to load the model from the wired-in file of the application itself stored in the Assets directory. You need to copy to this directory the model file mobilenet_v1_1.0_224.tflite, obtained at the “Training and transfer of the model” stage, and the description of classification objects label.txt.

After that, a TensorFlow Lite interpreter object is created. The model is passed as constructor parameters.

Classification is performed by simply calling the interpreter’s run method. The first parameter of the method is a recognizable frame in the format of an array of floating-point numbers, which we obtained from the camera. The second parameter is an array of floating-point arrays. These arrays will be filled with the performance results of the model. The result in our case is an array of probabilities. Each element of the array with i index is the probability that the i-th object is in the frame.

Display of classification result on screen

Let’s display on the screen the name of the most probable object using the preloaded names of the objects from the file with names stored in Assets. In the model we use, the names are in English, but you can translate them into Russian beforehand by changing the file labels.txt.

Alternatively, you can display not just the name of the most probable object but a list of the 3–5 most probable objects alongside with the corresponding probabilities.

Conclusion

We considered how a pretrained image classification model can be used in the Android OS. In addition, the use of models is also possible on embedded devices, since the TensorFlow Lite interpreter also has a C++ interface and takes up about 300 kilobytes of memory.

The article was initially published at Embedded Computing