Home Development for Android Object recognition on androidwith TensorFlow: from data preparation to running on the device

# Object recognition on androidwith TensorFlow: from data preparation to running on the device

Training a neural network for pattern recognition is a long and resource-intensive process.Especially when you only have an inexpensive laptop and not a computer with a powerful video card.In this case you will need Google Colaboratory which offers a completely free Tesla K80-level GPU ( more info ).

This article describes the process of preparing the data, training the tensorflow model in Google Colaboratory and running it on an android device.

## Data preparation

As an example, let’s try to train a neural network to recognize white dice on a black background.So, first of all, we need to create a dataset sufficient for training (for now we will stop at ~100 pics).

For training we will use Tensorflow Object Detection API We will prepare all the data we need for the training on the laptop.We will need the environment and dependencies manager conda.Installation instructions here

Create an environment to work in :

conda create -n object_detection_prepare pip python=3.6

And activate it:

conda activate object_detection_prepare

Set the dependencies we will need :

pip install --ignore-installed --upgrade tensorflow==1.14pip install --ignore-installed pandaspip install --ignore-installed Pillowpip install lxmlconda install pyqt=5

Create a folder object_detection, and put all our photos in a folder object_detection/images

Google Colab has a limit on memory usage, so you need to reduce the resolution of photos before partitioning the data so you don’t encounter an error in the learning process "tcmalloc: large alloc…."

Let’s create a folder object_detection/preprocessing and add to it prepared by scripts.

To resize a photo, use the script :

python ./object_detection/preprocessing/image_resize.py -i ./object_detection/images--imageWidth=800 --imageHeight=600

This script will run through the folder with the specified photos, resize them to 800×600 and put them in object_detection/images/resized.You can now replace the original photos in object_detection/images

To partition the data, let’s use the following tool labelImg

Clone repository labelImg in object_detection

Go to the folder labelImg

cd [FULL_PATH]/object_detection/labelImg

and run the command :

pyrcc5 -o libs/resources.py resources.qrc

After that you can start partitioning the data (this is the longest and most boring step):

python labelImg.py

In "Open dir" specify the folder object_detection/images and go through all the photos, selecting objects for recognition and specifying their class. In our case these are the face values of dice (1, 2, 3, 4, 5, 6). Let’s save the metadata (*.xmlfiles)in the same folder.

Let’s create a folder object_detection/training_demo which we’ll fill in a little later in Google Colab for training.

Let’s divide our photos (with metadata) into training and test photos in the ratio 80/20 and move them into the appropriate folders object_detection/training_demo/images/train and object_detection/training_demo/images/test

Let’s create a folder object_detection/training_demo/annotations , in which we will put files with metadata necessary for training. The first of these will be label_map.pbtxt in which we specify the relationship of the object class and the integer value. In our case it is :

label_map.pbtxt

item {id: 1name: '1'}item {id: 2name: '2'}item {id: 3name: '3'}item {id: 4name: '4'}item {id: 5name: '5'}item {id: 6name: '6'}

Remember the metadata we got from the data partitioning process? To use them for training, we need to convert them into the format TFRecord We will use the scripts from the source [1] to convert them.

We will perform the conversion in two steps : xml -> csv and csv -> record

Let’s go to the preprocessing folder :

cd [FULL_PATH]\object_detection\preprocessing

1. From xml to csv

Training Data :

python xml_to_csv.py -i [FULL_PATH]/object_detection/training_demo/images/train-o [FULL_PATH]/object_detection/training_demo/annotations/train_labels.csv

Test Data :

python xml_to_csv.py -i [FULL_PATH]/object_detection/training_demo/images/test -o [FULL_PATH]/object_detection/training_demo/annotations/test_labels.csv

2. From csv to record

Training Data :

python generate_tfrecord.py --label_map_path=[FULL_PATH]\object_detection\training_demo\annotations\label_map.pbtxt --csv_input=[FULL_PATH]\object_detection\training_demo\annotations\train_labels.csv --output_path=[FULL_PATH]\object_detection\training_demo\annotations\train.record --img_path=[FULL_PATH]\object_detection\training_demo\images\train

Test Data :

python generate_tfrecord.py --label_map_path=[FULL_PATH]\object_detection\training_demo\annotations\label_map.pbtxt --csv_input=[FULL_PATH]\object_detection\training_demo\annotations\test_labels.csv --output_path=[FULL_PATH]\object_detection\training_demo\annotations\test.record --img_path=[FULL_PATH]\object_detection\training_demo\images\test

At this point we have finished preparing the data, now we have to choose the model we want to train.

Available models for retraining can be found here

We will now select the model ssdlite_mobilenet_v2_coco in order to later run the trained model on android device.

You should get something like
object_detection/training_demo/pre-trained-model/ssdlite_mobilenet_v2_coco_2018_05_09

From the unpacked archive copy the file pipeline.config in object_detection/training_demo/training and rename it to ssdlite_mobilenet_v2_coco.config

Next we need to configure it for our task, to do this :

1. Let’s specify the number of classes

model.ssd.num_classes: 6

2. Let’s specify the size of the package (the amount of data to be learned per iteration), the number of iterations, and the path to the saved model from the archive we downloaded

train_config.batch_size: 18train_config.num_steps: 20000train_config.fine_tune_checkpoint:"./training_demo/pre-trained-model/ssdlite_mobilenet_v2_coco_2018_05_09/model.ckpt"

3. Let’s specify the number of photos in the training set ( object_detection/training_demo/images/train )

eval_config.num_examples: 64

4. Specify the path to the training dataset

train_input_reader.label_map_path: "./training_demo/annotations/label_map.pbtxt"train_input_reader.tf_record_input_reader.input_path:"./training_demo/annotations/train.record"

5. Let’s specify the path to the test data set

eval_input_reader.label_map_path: "./training_demo/annotations/label_map.pbtxt"eval_input_reader.tf_record_input_reader.input_path:"./training_demo/annotations/test.record"

You should end up with something like this

Next, archive the folder training_demo and the resulting training_demo.zip upload to Google Drive.

Alternative to working with the archive

In this tells how to mount google drive in Google Colab, but remember to change all the paths in configs and scripts

This concludes the data preparation, let’s move on to training.

## Model Training

In Google Drive, select training_demo.zip , click on Get shareable link and from this link we will save id of this file :

The easiest way to use Google Colab is to create a new notebook in Google Drive.

By default, training will be done on the CPU.To use the GPU, you must change the runtime type.

The training consists of the following steps :

1. Cloning the TensorFlow Models repository:

!git clone https://github.com/tensorflow/models.git

2. Install protobuf and compile the necessary files into object_detection:

!apt-get -qq install libprotobuf-java protobuf-compiler%cd ./models/research/!protoc object_detection/protos/*.proto --python_out=.%cd ../..

3. Add necessary paths to the PYTHONPATH environment variable:

import osos.environ['PYTHONPATH'] += ":/content/models/research/"os.environ['PYTHONPATH'] += ":/content/models/research/slim"os.environ['PYTHONPATH'] += ":/content/models/research/object_detection"os.environ['PYTHONPATH'] += ":/content/models/research/object_detection/utils"

4. To get the file from Google Drive, install PyDrive and authorize :

!pip install -U -q PyDrivefrom pydrive.auth import GoogleAuthfrom pydrive.drive import GoogleDrivefrom google.colab import authfrom oauth2client.client import GoogleCredentialsauth.authenticate_user()gauth = GoogleAuth()gauth.credentials = GoogleCredentials.get_application_default()drive = GoogleDrive(gauth)

5. Download the archive (you need to specify the id of your file) and unzip it :

drive_file_id="[YOUR_FILE_ID_HERE]"training_demo_zip = drive.CreateFile({'id':drive_file_id})training_demo_zip.GetContentFile('training_demo.zip')!unzip training_demo.zip!rm training_demo.zip

6. Starting the learning process :

!python ./models/research/object_detection/legacy/train.py --logtostderr --train_dir=./training_demo/training --pipeline_config_path=./training_demo/training/ssdlite_mobilenet_v2_coco.config

Parameter description

–train_dir =./training_demo/training – the path to the directory where the
training results

–pipeline_config_path =./training_demo/training/ssdlite_mobilenet_v2_coco.config – path to the config

7. Convert the training result into a frozen graphthat can be used :

!python /content/models/research/object_detection/export_inference_graph.py--input_type image_tensor --pipeline_config_path/content/training_demo/training/ssdlite_mobilenet_v2_coco.config --trained_checkpoint_prefix/content/training_demo/training/model.ckpt-[CHECKPOINT_NUMBER]--output_directory/content/training_demo/training/output_inference_graph_v1.pb

Parameter description

–pipeline_config_path /content/training_demo/training/ssdlite_mobilenet_v2_coco.config – path to the config

–trained_checkpoint_prefix /content/training_demo/training/model.ckpt-[CHECKPOINT_NUMBER]- the path to the checkpoint we want to convert.

–output_directory /content/training_demo/training/output_inference_graph_v1.pb – name of the converted model

Checkpoint number [CHECKPOINT_NUMBER] , you can see in the folder content/training_demo/training/ AFTER of training there should appear files like model.ckpt-1440.index, model.ckpt-1440.meta.1440 is and [CHECKPOINT_NUMBER] and the training iteration number.

There is a special script in notepad to visualize the result of training. The figure below shows the result of image recognition from a test dataset after ~20000 iterations of training.

8. Conversion of the trained model to tflite.
To use tensorflow lite you need to convert the model into the format tflite To do this, convert the result of the training to frozen graph which supports conversion to tflite (the parameters are the same as when using the export_inference_graph.py ):

!python /content/models/research/object_detection/export_tflite_ssd_graph.py --pipeline_config_path /content/training_demo/training/ssdlite_mobilenet_v2_coco.config --trained_checkpoint_prefix /content/training_demo/training/model.ckpt-[CHECKPOINT_NUMBER] --output_directory /content/training_demo/training/output_inference_graph_tf_lite.pb

And open it in a toolbox Netron We are interested in the names and dimensions of the input and output nodes of the model.

Knowing them you can convert pb model to tflite format :

!tflite_convert --output_file=/content/training_demo/training/model_q.tflite --graph_def_file=/content/training_demo/training/output_inference_graph_tf_lite_v1.pb/tflite_graph.pb --input_arrays=normalized_input_image_tensor --output_arrays='TFLite_Detection_PostProcess', 'TFLite_Detection_PostProcess:1', 'TFLite_Detection_PostProcess:2', 'TFLite_Detection_PostProcess:3' --input_shapes=1, 300, 300, 3 --enable_select_tf_ops--allow_custom_ops --inference_input_type=QUANTIZED_UINT8 --inference_type=FLOAT --mean_values=128--std_dev_values=128

Parameter description

–output_file =/content/training_demo/training/model_q.tflite – path to the conversion result

–graph_def_file =/content/training_demo/training/output_inference_graph_tf_lite_v1.pb/tflite_graph.pb – path to frozen graph to convert

–input_arrays =normalized_input_image_tensor is the name of the input node we learned above

–output_arrays =’TFLite_Detection_PostProcess’, ‘TFLite_Detection_PostProcess:1’, ‘TFLite_Detection_PostProcess:2’, ‘TFLite_Detection_PostProcess:3’ – names of output nodes we learned above

–input_shapes =1, 300, 300, 3 – the dimensionality of the input data we learned above

–enable_select_tf_ops – To use the extended runtime TensorFlow Lite

–allow_custom_ops – To use the TensorFlow Lite Optimizing Converter

–inference_type =FLOAT – data type for all arrays in the model except the input arrays

–inference_input_type =QUANTIZED_UINT8 – data type for all input arrays in the model

-mean_values =128 –std_dev_values =128 – mean value and standard deviation of input data, to use QUANTIZED_UINT8

Archive the folder with the training results and upload it to Google Drive:

!zip -r ./training_demo/training.zip ./training_demo/training/training_result = drive.CreateFile({'title': 'training_result.zip'})training_result.SetContentFile('training_demo/training.zip')training_result.Upload()

If there is an error Invalid client secrets file, you need to reauthorize in google drive.

## Running the model on android device

The android application was based on official guide on object detection, but it was completely rewritten using kotlin and CameraX You can see the full code here

CameraX already has a mechanism for analyzing incoming frames from the camera with ImageAnalysis The logic on recognition is in ObjectDetectorAnalyzer

The whole process of image recognition can be broken down into several steps :

1. As an input we get an image which has YUV format. For further work it has to be converted into RGB format:

val rgbArray = convertYuvToRgb(image)

2. Next, we need to transform the image (rotate if necessary, and resize it to the input values of the model, in our case it is 300×300), to do this we draw an array of pixels on the Bitmap and apply the transformation on it :

val rgbBitmap = getRgbBitmap(rgbArray, image.width, image.height)val transformation = getTransformation(rotationDegrees, image.width, image.height)Canvas(resizedBitmap).drawBitmap(rgbBitmap, transformation, null)

3. Convert bitmap to an array of pixels, and give it to the detector input :

ImageUtil.storePixels(resizedBitmap, inputArray)val objects = detect(inputArray)

4. To visualize, pass the result of recognition to RecognitionResultOverlayView and transform the coordinates respecting the aspect ratio :

val scaleFactorX = measuredWidth / result.imageWidth.toFloat()val scaleFactorY = measuredHeight / result.imageHeight.toFloat()result.objects.forEach { obj ->val left = obj.location.left * scaleFactorXval top = obj.location.top * scaleFactorYval right = obj.location.right * scaleFactorXval bottom = obj.location.bottom * scaleFactorYcanvas.drawRect(left, top, right, bottom, boxPaint)canvas.drawText(obj.text, left, top - 25f, textPaint)}

To run our model in the application, replace the model file in the assets with training_demo/training/model_q.tflite (renaming it to detect.tflite ) and the tag file labelmap.txt , in our case it is :

detect.tflite

123456

Since the official guide uses SSD Mobilenet V1 in which the label indexing starts with 1 instead of 0, you need to change the labelOffset from 1 to 0 in the collectDetectionResult method of the ObjectDetector class.

That’s the end of it.
The video shows the result of how the trained model works on an older Xiaomi Redmi 4X :

The following resources were used in the process :