JeVois  1.20
JeVois Smart Embedded Machine Vision Toolkit
Share this page:
Converting and running neural networks for Coral TPU

JeVois-Pro supports Google Coral 4-TOPS tensor processing units (TPU) as optional hadrware neural accelerator. One can use either the standard Coral M.2 2230 A+E PCIe board, a custom JeVois board that includes 2 Coral TPUs + one eMMC flash disk onto a single M.2 2230 board, or any number of Coral USB dongles. Note that PCIe is much faster at 5 Gbits/s data transfer compared to USB 2.0 at 480 Mbits/s (the JeVois-Pro processor only have one 5 GBits/s interface, which we use for PCIe).

Note
JeVois-Pro only. This accelerator is not supported on JeVois-A33.

Supported neural network frameworks

  • TensorFlow / TensorFlow-Lite

The TPU can run models quantized to int8 weights. It does not support float weights and hence quantization and conversion are necessary. A limited number of operations and layer types is supported by the hardware, which further constrains what can run on it. Further, only a small amount of RAM is on the accelerator, which further constrains the size of networks that can efficiently run on it. But it is manyfold faster than a standard CPU.

For execution on TPU, your model will be quantized and then converted on a Linux desktop to a blob format that can then be transferred to JeVois-Pro microSD for execution.

Procedure

  • Read and understand the JeVois docs about Running neural networks on JeVois-A33 and JeVois-Pro
  • Make sure you understand the quantization concepts in Converting and Quantizing Deep Neural Networks for JeVois-Pro
  • Check out the official Google Coral docs
  • The TPU only supports a specific set of layer types. If you try to convert a network that contains unsupported layers, the conversion may sometimes seem to succeed but your converted network may fail to run, or run very slowly using CPU-based emulation. Check the compatibility overview before you attempt to convert a network. In particular, note this statement in the Coral docs: Note: Currently, the Edge TPU compiler cannot partition the model more than once, so as soon as an unsupported operation occurs, that operation and everything after it executes on the CPU, even if supported operations occur later.
  • You need to download and install the EdgeTPU compiler to convert/quantize your model on a desktop computer running Linux Ubuntu 20.04.
  • Everything you need for runtime inference (EdgeTPU runtime libraries, kernel drivers, PyCoral) is pre-installed on your JeVois microSD.
  • Obtain a model: train your own, or download a pretrained model.
    • Beware the the TPU has only around 6.5 MBytes of available on-board RAM for model parameters, which acts as a pseudo cache (see https://coral.ai/docs/edgetpu/compiler/). So, best performance will be obtained with smaller models which can fit into that small RAM all at once. Larger models will require constant loading/unloading of weights over the PCIe or USB links. Larger models run much better on the JeVois-Pro integrated NPU, which has direct access to the main RAM (4 GBytes on JeVois-Pro).
    • Google recommends starting from one of their models at https://coral.ai/models/ and retraining it on your own data as explained in https://github.com/google-coral/tutorials
  • Obtain some parameters about the model (e.g., pre-processing mean, stdev, scale, expected input image size, RGB or BGR, packed (NWHC) or planar (NCHW) pixels, etc).
  • Copy model to JeVois microSD card under JEVOIS[PRO]:/share/dnn/custom/
  • Create a JeVois model zoo entry for your model, where you specify the model parameters and the location where you copied your model files. Typically this is a YAML file under JEVOIS[PRO]:/share/dnn/custom/
  • Launch the JeVois DNN module. It will scan the custom directory for any valid YAML file, and make your model available as one available value for the pipe parameter of the DNN module's Pipeline component. Select that pipe to run your model.

Setting up the EdgeTPU compiler

Note
Everything below is to be run on a fast x86_64 desktop computer running Ubuntu 20.04 Linux, not on your JeVois-Pro camera. At the end, we will copy the converted model to microSD and then run inference on JeVois-Pro with that model.

Follow the instructions at https://coral.ai/docs/edgetpu/compiler/

curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
echo "deb https://packages.cloud.google.com/apt coral-edgetpu-stable main" | sudo tee /etc/apt/sources.list.d/coral-edgetpu.list
sudo apt-get update
sudo apt-get install edgetpu-compiler
edgetpu_compiler --help

Example: Object classification using NASNetMobile

1. Install TensorFlow

  • The preferred method is through conda, as detailed at https://www.tensorflow.org/install/pip
  • Here we will instead just get the tensorflow wheel in a python3 virtual env, which has fewer steps:
python3 -m venv tf_for_tpu
source tf_for_tpu/bin/activate
pip install --upgrade pip
pip install tensorflow
python3 -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))" # test install

You may see some warnings about missing GPU libs which we just ignore here (CPU is enough to just convert a model), and finally something like tf.Tensor(-337.86047, shape=(), dtype=float32) which is the result of our test command (the value -337.86047 will vary since it is randomized).

2. Get the trained model

  • We find a Keras/Tensorflow NASNetMobile pre-trained on ImageNet at https://keras.io/api/applications/
  • We will get it loaded into TensorFlow as explained in https://keras.io/api/applications/nasnet/#nasnetmobile-function
  • So we start a little convert.py script as follows:
    import tensorflow as tf
    import numpy as np
    model = tf.keras.applications.NASNetMobile()
  • This will use all the defaults: 224x224x3 inputs, ImageNet weights, include last fully-connected layer, include final softmax activation.
  • You could retrain the model at this stage. Here we will just use it as is.
  • If we run our convert.py now, it just downloads the model and exits.

2. Get a sample dataset for quantization

  • Since we are using ImageNet, we could get that dataset from some built-in TensorFlow function, but let's do it manually to see how it would be done on a custom dataset.
  • We still want the data to be representative of our training data, so let's download the ImageNet validation set:
    • We go to https://image-net.org but download is by request only even after creating an account
    • So instead we get a torrent file from http://academictorrents.com/details/5d6d0df7ed81efd49ca99ea4737e0ae5e3a5f2e5 and use transmission-gtk (pre-installed on Ubuntu) to download the dataset.
    • We obtain ILSVRC2012_img_val.tar which we untar:
      mkdir dataset
      cd dataset
      tar xvf ~/Downloads/ILSVRC2012_img_val.tar
      cd ..
    • We need to understand how pre-processing works and what mean, stdev, and scale should be applied to the raw pixel values so that we can later set the correct pre-processing parameters. We find some info in the TensorFlow docs for NASNetMobile which suggest that nasnet.preprocess_input() will scale to [-1 .. 1]. But no mention of means... Further looking at the source code, nasnet.preprocess_input() calls imagenet_utils.preprocess_input() defined here, which calls _preprocess_numpy_input() defined here where we finally learn that in 'tf' mode we will use mean=[127.5 127.5 127.5] and scale=1/127.5
    • We add the following to our convert.py, modeled after the colab we are following, section "Convert to TFLite" (we just need to change the location of our image files, and pre-processing):
      IMAGE_SIZE = 224
      # A generator that provides a representative dataset
      def representative_data_gen():
      dataset_list = tf.data.Dataset.list_files('dataset/*') # JEVOIS modified
      for i in range(100):
      image = next(iter(dataset_list))
      image = tf.io.read_file(image)
      image = tf.io.decode_jpeg(image, channels=3)
      image = tf.image.resize(image, [IMAGE_SIZE, IMAGE_SIZE])
      image = tf.cast((image - 127.5) / 127.5, tf.float32) # JEVOIS modified
      image = tf.expand_dims(image, 0)
      yield [image]

3. Quantize the model and convert to TFLite

We again add the following to our convert.py, modeled after the colab we are following, section "Convert to TFLite":

converter = tf.lite.TFLiteConverter.from_keras_model(model)
# This enables quantization
converter.optimizations = [tf.lite.Optimize.DEFAULT]
# This sets the representative dataset for quantization
converter.representative_dataset = representative_data_gen
# This ensures that if any ops can't be quantized, the converter throws an error
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
# For full integer quantization, though supported types defaults to int8 only, we explicitly declare it for clarity.
converter.target_spec.supported_types = [tf.int8]
# These set the input and output tensors to uint8 (added in r2.3)
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
tflite_model = converter.convert()
with open('NASNetMobile_quant.tflite', 'wb') as f: # JEVOIS modified
f.write(tflite_model)

We run the full convert.py (collating the 3 above snippets):

python3 convert.py

Which takes a while (maybe we should have installed GPU support after all), but eventually we get NASNetMobile_quant.tflite which is a quantized version of our original model.

Let's do a quick check and upload our quantized model to Lutz Roeder's great Netron online model inspection tool. Upload NASNetMobile_quant.tflite and inspect the various layers. In particular, if you expand the input, weight, bias, and output details of any Conv layer, you will see how the data is int8 with some associated quantization parameters.

4. Convert quantized TFLite model to EdgeTPU

  • To convert from quantized TFLite to EdgeTPU, we wimply run:
    edgetpu_compiler NASNetMobile_quant.tflite
  • This will port as many layers and operations as possible for execution on the TPU. We see this:
    Edge TPU Compiler version 16.0.384591198
    Started a compilation timeout timer of 180 seconds.
    Model compiled successfully in 5841 ms.
    Input model: NASNetMobile_quant.tflite
    Input size: 6.21MiB
    Output model: NASNetMobile_quant_edgetpu.tflite
    Output size: 8.15MiB
    On-chip memory used for caching model parameters: 6.31MiB
    On-chip memory remaining for caching model parameters: 0.00B
    Off-chip memory used for streaming uncached model parameters: 635.12KiB
    Number of Edge TPU subgraphs: 1
    Total number of operations: 669
    Operation log: NASNetMobile_quant_edgetpu.log
    See the operation log file for individual operation details.
    Compilation child process completed within timeout period.
    Compilation succeeded!
  • And we get NASNetMobile_quant_edgetpu.tflite that we will copy to JeVois-Pro microSD.
    Note
    Just a little bit too big! From the above messages, we are maxing out the on-board RAM, and 635 Kbytes of model parameters will need to be swapped in/out between that RAM and the main processor's RAM on every inference, in addition to streaming images over to the TPU.
  • We can check the generated NASNetMobile_quant_edgetpu.log to confirm that in this case all layers are ported to TPU:
    Edge TPU Compiler version 16.0.384591198
    Input: NASNetMobile_quant.tflite
    Output: NASNetMobile_quant_edgetpu.tflite
    Operator Count Status
    PAD 20 Mapped to Edge TPU
    ADD 84 Mapped to Edge TPU
    MAX_POOL_2D 4 Mapped to Edge TPU
    MEAN 1 Mapped to Edge TPU
    QUANTIZE 86 Mapped to Edge TPU
    CONV_2D 196 Mapped to Edge TPU
    CONCATENATION 20 Mapped to Edge TPU
    FULLY_CONNECTED 1 Mapped to Edge TPU
    RELU 48 Mapped to Edge TPU
    MUL 4 Mapped to Edge TPU
    SOFTMAX 1 Mapped to Edge TPU
    STRIDED_SLICE 4 Mapped to Edge TPU
    AVERAGE_POOL_2D 40 Mapped to Edge TPU
    DEPTHWISE_CONV_2D 160 Mapped to Edge TPU

5. Create a zoo YAML file for our new model

Now we need to let JeVois know about our model, by creating a small YAML file that describes the model and locations of the files. We just take an entry from the pre-loaded JeVois tpu.yml (in Config tab of the GUI) for inspiration, and create our new NASNetMobile.yml:

%YAML 1.0
---
NASNetMobile:
preproc: Blob
nettype: TPU
postproc: Classify
model: "dnn/custom/NASNetMobile_quant_edgetpu.tflite"
intensors: "NHWC:8U:1x224x224x3:AA:0.0078125:128"
mean: "127.5 127.5 127.5"
scale: 0.0078125
classes: "coral/classification/imagenet_labels.txt"
classoffset: 1
Note
For classes, we use an existing ImageNet label file that is already pre-loaded on our microSD, since we did not get one from Keras. Because that label file has a first entry for "background", which is not used in our model here, we use a classoffset of 1 to shift the class labels. You can adjust this at runtime in case labels seem off. If you use a custom-trained model, you should also copy a file NASNetMobile.labels to microSD, that describes your class names (one class label per line), and then set the classes parameter to that file.

6. Copy to microSD and run

  • Copy NASNetMobile_quant_edgetpu.tflite and NASNetMobile.yml to /jevoispro/share/dnn/custom/ on the JeVois-Pro microSD.
  • Launch the DNN module and select pipe TPU:Classify:NASNetMobile

  • It works! Note that a TPU connected to USB 2.0 was used for this screenshot, speed is higher when using a PCIe TPU board.

Tips

  • Models larger than 6.5 MB of weights can run quite well on the TPU, but will be slower. The caching if completely transparent to users and works very well.
  • On JeVois-Pro, several Coral TPU pipelines can be simultaneously instantiated for several different models. The models will be automatically and transparently time-multiplexed over the hardware accelerator. For example, in JeVois modules MultiDNN or MultiDNN2, you can set several of the pipelines (in the params.cfg file of the module) to TPU models without any conflicts or issues even if you only have one TPU.
  • If you have several TPUs, YAML parameter tpunum can be used to run a given model on a given TPU.
  • Also see Tips for running custom neural networks