TensorFlow is a popular neural network framework. This module first finds the most conspicuous (salient) object in the scene, then identifies it using a deep neural network. It returns the top scoring candidates.
This module runs a TensorFlow network on an image window around the most salient point and shows the top-scoring results. We alternate, on every other frame, between updating the salient window crop location, and predicting what is in it. Actual network inference speed (time taken to compute the predictions on one image crop) is shown at the bottom right. See below for how to trade-off speed and accuracy.
Note that by default this module runs fast variant of MobileNets trained on the ImageNet dataset. There are 1000 different kinds of objects (object classes) that this network can recognize (too long to list here). It is possible to use bigger and more complex networks, but it will likely slow down the framerate.
For more information about MobileNets, see https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md
For more information about the ImageNet dataset used for training, see http://www.image-net.org/challenges/LSVRC/2012/
Sometimes this module will make mistakes! The performance of mobilenets is about 40% to 70% correct (mean average precision) on the test set, depending on network size (bigger networks are more accurate but slower).
Neural network size and speed
This module provides a parameter,
The network actual input size varies depending on which network is used; for example, mobilenet_v1_0.25_128_quant expects 128x128 input images, while mobilenet_v1_1.0_224 expects 224x224. We automatically rescale the cropped window to the network's desired input size. Note that there is a cost to rescaling, so, for best performance, you should match
When using video mappings with USB output, irrespective of
On every frame where detection results were obtained that are above
See Standardized serial messages formatting for more on standardized serial messages, and Helper functions to convert coordinates from camera resolution to standardized for more info on standardized coordinates.
Using your own network
For a step-by-step tutorial, see Training custom TensorFlow networks for JeVois.
This module supports RGB or grayscale inputs, byte or float32. You should create and train your network using fast GPUs, and then follow the instruction here to convert your trained network to TFLite format:
Then you just need to create a directory under JEVOIS:/share/tensorflow/ with the name of your network, and, in there, two files, labels.txt with the category labels, and model.tflite with your model converted to TensorFlow Lite (flatbuffer format). Finally, edit JEVOIS:/modules/JeVois/TensorFlowEasy/params.cfg to select your new network when the module is launched.