JeVois-Pro includes a 5-TOPS neural processing unit (NPU) that is integrated into the Amlogic A311D processor. This neural accelerator has the fastest connection to the main memory (direct DMA access) and hence data transfers and network execution can be very fast. Also, it does not suffer from having limited on-chip memory like some of the other accelerators do, thus quite large networks can run on the NPU.

Note: JeVois-Pro only. This accelerator is not supported on JeVois-A33.

Supported neural network frameworks

Caffe
TensorFlow
TensorFlow-Lite
ONNX (and pyTorch via conversion to ONNX)
Darknet
Keras

The NPU can run models quantized to uint8, int8, or int16 weights. It does not support float weights and hence quantization and conversion are necessary. A limited number of operations and layer types is supported by the hardware, which further constrains what can run on it. But it is manyfold faster than a standard CPU.

For execution on NPU, your model will be quantized and then converted on a Linux desktop to a proprietary blob format that can then be transferred to JeVois-Pro microSD for execution.

Procedure

Read and understand the JeVois docs about Running neural networks on JeVois-A33 and JeVois-Pro
Make sure you understand the quantization concepts in Converting and Quantizing Deep Neural Networks for JeVois-Pro
Check out the NPU SDK docs. Of particular interest is the Model Transcoding and Running User Guide.
The NPU only supports a specific set of layer types. If you try to convert a network that contains unsupported layers, the conversion may sometimes seem to succeed but your converted network may fail to run. Check the Layer and Operation Support Guide before you attempt to convert a network.
You need to download and install the Amlogic NPU SDK to convert/quantize your model on a desktop computer running Linux Ubuntu 20.04 or later.
Everything you need at runtime (NPU runtime libraries) is pre-installed on your JeVois-Pro microSD.
Obtain a model: train your own, or download a pretrained model.
Obtain some parameters about the model (e.g., pre-processing mean, stdev, scale, expected input image size, RGB or BGR, packed (NWHC) or planar (NCHW) pixels, etc).
Obtain a sample dataset that will be used for quantization, to determine the range of values experienced on each layer of the network during inference, and to set the quantization parameters accordingly. The dataset should be representative, i.e., it should contain the objects you want to detect, along with other things you want to ignore.
Convert and quantize the model using the NPU SDK.
Copy the converted model to JeVois-Pro microSD card under JEVOIS[PRO]:/share/dnn/custom/
Create a JeVois model zoo entry for your model, where you specify the model parameters and the location where you copied your model files. Typically this is a YAML file under JEVOIS[PRO]:/share/dnn/custom/
Launch the JeVois DNN module. It will scan the custom directory for any valid YAML file, and make your model available as one available value for the pipe parameter of the DNN module's Pipeline component. Select that pipe to run your model.

Setting up the NPU SDK

Note: Everything below is to be run on a fast x86_64 desktop computer running Ubuntu 20.04 Linux or later, not on your JeVois-Pro camera. At the end, we will copy the converted model to microSD and then run inference on JeVois-Pro with that model.

The Amlogic/VeriSilicon NPU SDK is distributed by Khadas, which manufactures a development board that uses the same Amlogic A311D processor as JeVois-Pro.

git lfs install
git lfs clone --recursive https://github.com/khadas/aml_npu_sdk.git
cd aml_npu_sdk/acuity-toolkit/

No need to install Ubuntu packages or Python wheels, everything is included in that git repo.

Getting a sample dataset for quantization

To quantize a model, we need to know the expected range of values experienced at each layer of the network. For that, we need a representative sample dataset, which typically would be about 1000 images from your training or validation set. This is very important as it will set the quantization parameters. The images should contain some of the targets you want to detect, and also more things that your camera may see (whether or not you do not want to detect those).
The dataset is just a folder containing some images, plus a text file that contains a list of the image paths, one per line.
The sample models we convert below were trained on the COCO dataset. Let's download the 2017 validation set:

wget http://images.cocodataset.org/zips/val2017.zip

unzip val2017.zip
If you have retrained the model on a custom dataset, use that dataset instead.
val2017 has 5,000 images which is more than we need. The unix command shuf can randomly shuffle a list of names and take the first n, so let's use it to grab 1000 random images from the dataset. We save a list of those file paths to dataset.txt that will be used for quantization:

ls /path/to/val2017/*.jpg | shuf -n 1000 > dataset.txt
dataset.txt should now contain 1000 lines (your results will vary since shuf is random):
/path/to/val2017/000000382030.jpg

/path/to/val2017/000000436617.jpg

/path/to/val2017/000000138550.jpg

/path/to/val2017/000000226154.jpg

/path/to/val2017/000000319369.jpg

...
you will use this dataset during model conversion.

APPROACH 1 (simplest): For YOLOv8 / YOLOv9 / YOLOv10 / YOLO11 models

For best accuracy, we want the quantized network to output separate tensors for class probabilities, bounding box coordinates, segmentation masks, etc. This is because these tensors have very different ranges of values; hence, it is better to quantize each one separately, instead of combining them into one big tensor for post-processing (see Converting and Quantizing Deep Neural Networks for JeVois-Pro for more details).

We provide a script jevoispro-npu-convert.sh to make converting YOLOv8 and later models. It supports all variants: detection, instance segmentation, pose/skeleton, oriented bounding boxes (OBB), and classification. Use it as follows:

1. Get the model and export to ONNX

Here, we just get a standard yolov8n model for example, which we resize to 1024x576 resolution (same aspect ratio as our camera's native video frames). If you have a retrained model, provide its name below instead of yolov8n.pt:
Everything we run here should be from inside aml_npu_sdk/acuity-toolkit/python/

cd aml_npu_sdk/acuity-toolkit/python/

pip install ultralytics

yolo export model=yolov8n.pt format=onnx imgsz=576,1024 opset=12 optimize simplify

mv yolov8n.onnx yolov8n-1024x576-custom.onnx
If you want to retrain the model on a custom dataset first, see our tutorial: http://jevois.org/tutorials/UserYolo.html

2. Convert for NPU using our script

The script just determines the correct output tensors to use for various YOLO variants and various tasks, for compatibility with the JeVois post-processors. It works as follows:
USAGE: jevoispro-npu-convert.sh <base><version><size>[-task]-<suffix>.onnx [dataset.txt] [iterations]

Where:

- base: yolo (will add more later)

- version: v8, v9, v10, 11

- size: n (nano, fastest but less accurate), t (tiny), s (small), m (medium), etc

- task: none or -det for detection

-seg (detection + semantic intance segmentation)

-pose (detection + pose/skeleton estimation)

-obb (detection with oriented bounding boxes)

-cls (classification)

- suffix: something that makes your model name different than the existing ones, e.g.,

yolov8n-1024x576-test.onnx, yolo11s-pose-custom.onnx

- dataset.txt: a text file containing a list of images to use for quantization. These should be part of the validation

set used to train the model. Typically 1000 images or so. If no dataset file is given, will try to use

dataset-<task>.txt in the current directory.

- iterations: number of images from dataset.txt to use for quantization, or use all if not given.

Note
It is not completely clear from the NPU SDK docs how adding more iterations may improve model accuracy. On your first try converting a model, you may just want to use 1 iteration, which will be fast. Then if the model runs correctly on JeVois-Pro, you can convert it again with more iterations and see whether the accuracy improves.
So we just run the script as follows:
jevoispro-npu-convert.sh yolov8n-1024x576-custom.onnx dataset.txt 1
We obtain 3 results in outputs/yolov8n-1024x576-custom/
libnn_yolov8n-1024x576-custom.so # runtime library for JeVois-Pro that will load the model

yolov8n-1024x576-custom.nb # model weights

yolov8n-1024x576-custom.yml # JeVois model zoo file
Copy all 3 files to your microSD into JEVOISPRO:/share/dnn/custom/
Start your JeVois-Pro camera
Select the DNN machine vision module
Set the pipe parameter to NPU:Detect:yolov8n-1024x576-custom

APPROACH 2 (easy): Using the Khadas conversion script

The NPU SDK contains a (closed-source) conversion script that can be used not only for YOLO variants, but for any supported model. See docs on the Khadas web site.
Using the same yolov8n-1024x576-custom.onnx as in approach 1, we can convert the model as follows:

cd aml_npu_sdk/acuity-toolkit/python/

./convert --model-name yolov8n-1024x576-custom \

--platform onnx \

--model yolov8n-1024x576-custom.onnx \

--mean-values '0 0 0 0.00392156' \

--quantized-dtype asymmetric_affine \

--source-files dataset.txt \

--batch-size 1 \

--iterations 1 \

--kboard VIM3 --print-level 1
We obtain 2 results in outputs/yolov8n-1024x576-custom/
libnn_yolov8n-1024x576-custom.so # runtime library for JeVois-Pro that will load the model

yolov8n-1024x576-custom.nb # model weights
For JeVois-Pro to run this model, we need to create a YAML zoo file yolov8n-1024x576-custom.yml as follows:
%YAML 1.0

---

yolov8n-1024x576-custom:

preproc: Blob

mean: "0 0 0"

scale: 0.003921568393707275

nettype: NPU

model: "dnn/custom/yolov8n-1024x576-custom.nb"

library: "dnn/custom/libnn_yolov8n-1024x576-custom.so"

comment: "Converted using JeVois NPU SDK and Asymmetric Affine quant"

postproc: Detect

detecttype: YOLOv8

# Create a text file with your own class names:

classes: "dnn/labels/coco-labels.txt"
Copy all 3 files to your microSD into JEVOISPRO:/share/dnn/custom/
Start your JeVois-Pro camera
Select the DNN machine vision module
Set the pipe parameter to NPU:Detect:yolov8n-1024x576-custom
You may want to try the conversion again with more iterations and see whether the resulting model is more accurate.

APPROACH 3 (most step-by-step): Using the NPU SDK conversion scripts

The NPU SDK provides 3 scripts in aml_npu_sdk/acuity-toolkit/demo/ that will need to be edited and then run in sequence on your desktop:

0_import_model.sh: Converts from the source framework (Caffe, TensorFlow, etc) to an intermediate representation, and then computes some stats about the ranges of values encountered on each layer on a sample dataset. These ranges of values will be used to set quantization parameters.
1_quantize_model.sh: Quantize the model using asymmetric affine uint8, or dynamic fixed point int8 or int16. This yields the model that we will run on JeVois-Pro, in a .nb proprietary binary blob format.
2_export_case_code.sh: Creates some C code that can be compiled on a target platform (like JeVois-Pro) to create a standalone app that will load and run the model on one image. We will not use that code since the JeVois software provides its own code that directly links the model to the camera sensor and to the GUI. However, we will inspect it so we can get input and output specifications for our YAML zoo file.

Check out some more docs about this here.

We choose YOLOv7-tiny for this tutorial because:

It is available in Darknet format, using only operations and layers supported by the NPU.
JeVois already provides a post-processor for it.
It cnnot be handled by the simpler approaches shown above.

1. Get the model

Go to https://github.com/AlexeyAB/darknet
Let's download YOLOv7-tiny which runs on 416x416 inputs. Either click on the latest Release and then on yolov7-tiny.weights in the list of assets at the bottom, or run this:
wget https://github.com/AlexeyAB/darknet/releases/download/yolov4/yolov7-tiny.weights
We also need a description of the network's structure, which we find in the cfg folder of the repo:
wget https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov7-tiny.cfg

2. Get a sample dataset

Do as above in Approach 1.

3. Edit and run 0_import_model.sh

The file contains various commented out examples for various frameworks. Here we need to:
- change the NAME
- enable the Darknet blurb, comment out the other frameworks. Note: pegasus is run twice, first for import, and then to generate metadata about the input. We only replace the first one by the framework we want to import from.
- (not needed for Darknet, maybe needed for others) set the input-size-list to our input size, which here is 1x3x416x416 accouding to yolov7-tiny.cfg.
- we need to know input pre-processing scale, mean, stdev. Tiny YOLO v7 just expects RGB images with pixel values in [0.0 .. 1.0]. Hence we will use mean=[0 0 0], stdev=[1 1 1], scale=1/255=0.0039215686, rgb=true.
- Below, channel-mean-value expects 4 values: 3 means for 3 color channels, and 1 scale.
We end up with this modified 0_import_model.sh:
#!/bin/bash

NAME=yolov7-tiny # JEVOIS edited

ACUITY_PATH=../bin/

pegasus=${ACUITY_PATH}pegasus

if [ ! -e "$pegasus" ]; then

pegasus=${ACUITY_PATH}pegasus.py

fi

$pegasus import darknet\

--model ${NAME}.cfg \

--weights ${NAME}.weights \

--output-model ${NAME}.json \

--output-data ${NAME}.data \

$pegasus generate inputmeta \

--model ${NAME}.json \

--input-meta-output ${NAME}_inputmeta.yml \

--channel-mean-value "0 0 0 0.0039215686" \ # JEVOIS edited

--source-file dataset.txt

Note
Do not cut and paste the '# JEVOIS edited' comments above, they will break the code, remove them.
Run it, it should complete with no errors:
./0_import_model.sh
You can check out yolov7-tiny_inputmeta.yml for a description of the input parameters, and yolov7-tiny.json for a description of the model graph.
If this steps yields errors, maybe some operations are not supported, which may happen towards the end of the network. In such case, examine the network in Netron and look at which layer failed to convert, then add a parameter for –outputs /some/layer to truncate the network to that layer.

4. Edit and run 1_quantize_model.sh

Because our input range is [0.0 .. 1.0], this calls for uint8 asymmetric affine quantization so that our pre-processing followed by quantization will reduce to a no-op (see Converting and Quantizing Deep Neural Networks for JeVois-Pro).
So we enable that and also change the model NAME, end up with this modified 1_quantize_model.sh:
#!/bin/bash

NAME=yolov7-tiny # JEVOIS edited

ACUITY_PATH=../bin/

pegasus=${ACUITY_PATH}pegasus

if [ ! -e "$pegasus" ]; then

pegasus=${ACUITY_PATH}pegasus.py

fi

$pegasus quantize \

--quantizer asymmetric_affine \ # JEVOIS edited

--qtype uint8 \ # JEVOIS edited

--rebuild \

--with-input-meta ${NAME}_inputmeta.yml \

--model ${NAME}.json \

--model-data ${NAME}.data
Run it, it should complete with no errors:
./1_quantize_model.sh
You can check out yolov7-tiny.quantize to see the min/max value range for each layer on the sample dataset, and how every layer was accordingly quantized. This file will be deleted when the next script runs.

5. Edit and run 2_export_case_code.sh

Here, we just need to set the model NAME, and also select the correct NPU model, which for the A311D processor in JeVois-Pro is VIPNANOQI_PID0X88. We end up with:
#!/bin/bash

NAME=yolov7-tiny # JEVOIS edited

ACUITY_PATH=../bin/

pegasus=$ACUITY_PATH/pegasus

if [ ! -e "$pegasus" ]; then

pegasus=$ACUITY_PATH/pegasus.py

fi

$pegasus export ovxlib \

--model ${NAME}.json \

--model-data ${NAME}.data \

--model-quantize ${NAME}.quantize \

--with-input-meta ${NAME}_inputmeta.yml \

--dtype quantized \

--optimize VIPNANOQI_PID0X88 \ # JEVOIS edited

--viv-sdk ${ACUITY_PATH}vcmdtools \

--pack-nbg-unify

rm -rf ${NAME}_nbg_unify

mv ../*_nbg_unify ${NAME}_nbg_unify

cd ${NAME}_nbg_unify

mv network_binary.nb ${NAME}.nb

cd ..

#save normal case demo export.data

mkdir -p ${NAME}_normal_case_demo

mv *.h *.c .project .cproject *.vcxproj BUILD *.linux *.export.data ${NAME}_normal_case_demo

# delete normal_case demo source

#rm *.h *.c .project .cproject *.vcxproj BUILD *.linux *.export.data

rm *.data *.quantize *.json *_inputmeta.yml
Run it, it should complete with no errors:
./2_export_case_code.sh
All right, everything we need is in yolov7-tiny_nbg_unify/
- Converted model: yolov7-tiny.nb
- C code: vnn_yolov7tiny.c which we will inspect to derive our input and output tensor specifications.

6. Create YAML zoo file

We start with the YoloV4 entry from zoo file /jevoispro/share/dnn/npu.yml already in the JeVois microSD (and available in the Config tab of the GUI).
To get the specs of the quantized input and outputs, we inspect yolov7-tiny_nbg_unify/vnn_yolov7tiny.c and look for tensor definitions. We find this (comments added to explain the next step):
/*-----------------------------------------

Tensor initialize

-----------------------------------------*/

attr.dtype.fmt = VSI_NN_DIM_FMT_NCHW;

/* @input_0:out0 */

attr.size[0] = 416; // JEVOIS: last dimension (fastest varying; here, W)

attr.size[1] = 416; // JEVOIS: next dimension (here, H)

attr.size[2] = 3; // JEVOIS: next dimension (here, C)

attr.size[3] = 1; // JEVOIS: first dimension (here, N)

attr.dim_num = 4; // JEVOIS: input should be a 4D tensor

attr.dtype.scale = 0.003921568393707275; // JEVOIS: scale for AA quantization

attr.dtype.zero_point = 0; // JEVOIS: zero point for AA quantization

attr.dtype.qnt_type = VSI_NN_QNT_TYPE_AFFINE_ASYMMETRIC; // JEVOIS: that is AA quantization

NEW_NORM_TENSOR(norm_tensor[0], attr, VSI_NN_TYPE_UINT8); // JEVOIS: that is 8U type

/* @output_90_198:out0 */

attr.size[0] = 52;

attr.size[1] = 52;

attr.size[2] = 255;

attr.size[3] = 1;

attr.dim_num = 4;

attr.dtype.scale = 0.0038335032295435667;

attr.dtype.zero_point = 0;

attr.dtype.qnt_type = VSI_NN_QNT_TYPE_AFFINE_ASYMMETRIC;

NEW_NORM_TENSOR(norm_tensor[1], attr, VSI_NN_TYPE_UINT8);

/* @output_94_205:out0 */

attr.size[0] = 26;

attr.size[1] = 26;

attr.size[2] = 255;

attr.size[3] = 1;

attr.dim_num = 4;

attr.dtype.scale = 0.0038371747359633446;

attr.dtype.zero_point = 0;

attr.dtype.qnt_type = VSI_NN_QNT_TYPE_AFFINE_ASYMMETRIC;

NEW_NORM_TENSOR(norm_tensor[2], attr, VSI_NN_TYPE_UINT8);

/* @output_98_212:out0 */

attr.size[0] = 13;

attr.size[1] = 13;

attr.size[2] = 255;

attr.size[3] = 1;

attr.dim_num = 4;

attr.dtype.scale = 0.003918845672160387;

attr.dtype.zero_point = 0;

attr.dtype.qnt_type = VSI_NN_QNT_TYPE_AFFINE_ASYMMETRIC;

NEW_NORM_TENSOR(norm_tensor[3], attr, VSI_NN_TYPE_UINT8);

NEW_NORM_TENSOR
#define NEW_NORM_TENSOR(_id, _attr, _dtype)
Definition NetworkNPU.C:62

So we have one input and three outputs (for 3 YOLO scales), and we derive their JeVois specs from the above generated code as follows (see Converting and Quantizing Deep Neural Networks for JeVois-Pro for syntax of JeVois tensor specs):
intensors: "NCHW:8U:1x3x416x416:AA:0.003921568393707275:0"

outtensors: "8U:1x255x52x52:AA:0.0038335032295435667:0, 8U:1x255x26x26:AA:0.0038371747359633446:0, 8U:1x255x13x13:AA:0.003918845672160387:0"
For YOLO post-processing (until VOLOv7), we need definitions of anchors. We just get those from yolov7-tiny.cfg:
anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326

In the YAML file below, we will split this list by YOLO scales (here, the 9 w,h pairs correspond to 3 pairs each for 3 scales. We separate scales by semicolon below).
Because YOLOv7 uses the "new style" of YOLO coordinates, we need to disable Post-Processor sigmoid and set Post-Processor scalexy to 2.0. You would want that for YOLOv5/v7, and set sigmoid to true and scalexy to 0.0 to use old-style box coordinates for YOLOv2/v3/v4. You can look at the differences in jevois::dnn::PostProcessorDetectYOLO::yolo_one()
We just modify the name and file locations, bring all the global definitions into our file (like preproc, nettype, etc which were set globally in npu.yml and hence not repeated for the YoloV4 entry we are copying from), and end up with the following yolov7-tiny.yml (pre-processor mean and scale are as we used them in 0_import_model.sh):
%YAML 1.0

---

yolov7-tiny:

preproc: Blob

mean: "0 0 0"

scale: 0.0039215686

nettype: NPU

model: "dnn/custom/yolov7-tiny.nb"

intensors: "NCHW:8U:1x3x416x416:AA:0.003921568393707275:0"

outtensors: "8U:1x255x52x52:AA:0.0038335032295435667:0, 8U:1x255x26x26:AA:0.0038371747359633446:0, 8U:1x255x13x13:AA:0.003918845672160387:0"

postproc: Detect

detecttype: RAWYOLO

classes: "npu/detection/coco-labels.txt"

anchors: "10,13, 16,30, 33,23; 30,61, 62,45, 59,119; 116,90, 156,198, 373,326"

sigmoid: false

scalexy: 2.0
We copy yolov7-tiny.yml and yolov7-tiny.nb to /jevoispro/share/dnn/custom/ on JeVois-Pro and let's try it!

7. Test the model and adjust any parameters

Select the DNN machine vision module
Set the pipe parameter to NPU:Detect:yolov7-tiny

It works! Quite fast too, around 55 fps for the network inference only (the part that runs on NPU).

Note: You can repeat this tutorial using int8-DFP quantization instead in 1_quantize_model.sh. You then only need to change the intensors and outtensors specs in your YAML to the obtained DFP parameters. We did it, and the network is running fine, but about half the speed (about 22.5 fps). So AA quantization is the way to go with this network.

Tips

If you get a "Graph verification failed" when you try to run your network, maybe you entered the tensor specifications incorrectly. Under the Config tab of the GUI you can edit your yolov7-tiny.yml and fix it.
Khadas has their own docs which you may want to check out since their VIM3 board uses the same Amlogic A311D processor as JeVois-Pro. But note that on JeVois we skip using the generated C code as JeVois-Pro already provides all the code we need to run NPU models.
- https://docs.khadas.com/products/sbc/vim3/npu/start
- https://docs.khadas.com/products/sbc/vim3/npu/npu-sdk
When converting from pyTorch, if you get some strange error like
```
 RuntimeError: [enforce fail at inline_container.cc:208] . file not found: archive/constants.pkl
```
maybe your model was saved with a version of pyTorch more recent than the one included in the NPU SDK. You may need to save your model to ONNX instead, and then try to convert again from ONNX. Or it might be that the model uses layer types or operations that the NPU does not support, in which case converting to ONNX will not help. You would need to change your network to only include layers and operations that can be mapped to NPU.

Note
Thus far, we have not been able to successfully convert directly from pyTorch using the NPU SDK, we always get some kind of error. But first exporting the source model to ONNX and then running the NPU SDK on that works fine, as long as only NPU-supported operations are used in the source model.
For examples of how we sometimes skip unsupported last layers (e.g., which do reshaping, detection box decoding, non-maximum suppresion over boxes, etc), see Converting and running neural networks for Myriad-X VPU and Converting and running neural networks for Hailo-8 SPU.
Also see Tips for running custom neural networks

Another example: Object Detection using YOLOv10n

The procedure for YOLOv10n was almost the same. We had to fine-tune a few things:
Get the code at https://github.com/THU-MIG/yolov10/tree/main
Go through their installation steps, including creating a conda environment and getting all dependencies.
You can also retrain on a custom dataset at this stage.
To export to ONNX, using opset 13 gave some conversion errors, so we used opset 12. We can also set a custom image resolution in that step using imgsz parameter:
yolo export model=jameslahm/yolov10n format=onnx opset=12 simplify imgsz=288,512
Some layers towards the end of the network were giving size errors, maybe because of the different opset. In any case, we wanted the raw YOLO output. Thus, we inspected the network in Netron and decided to use /model.23/Transpose_output_0 as output layer in 0_import_model.sh:
# ...

#Onnx

$pegasus import onnx\

--model ${NAME}.onnx \

--output-model ${NAME}.json \

--outputs /model.23/Transpose_output_0 \

--output-data ${NAME}.data

# ...
It converted fine. However, it was not generating any boxes on the display. Upon inspection of the output tensor, confidence values for all classes were always zero. Looking at the yolov10n-512x288.quantize generated during conversion showed a range of [-108.24 .. 611.29] for the values in the output tensor. Indeed, in that output tensor, which is 1x3024x84 for inputs 1x3x288x512, both 4 box coordinates (which can vary within 512x288, and more in case boxes partially go outside the input image), and 80 class confidences (in [0..1]), were concatenated. That means that with 8 bits of quantization, any class confidence in [0..1] would always map to the same number (the zero-point of quantization):
- 0 quantized maps to -108.24 dequantized
- 1 quantized maps to -105.43 dequantized (quantization step is (611.29+108.24)/256 = 2.81)
- ...
- 255 quantized maps to 611.29 dequantized
So with a quantization step of 2.81, we cannot discriminate values in the [0..1] range.
So we changed the quantization to dynamic_fixed_point and int16. The results used 5 bits for the decimal portion, which should be enough to represent class accuracies reasonably well. Another approach, which we use for YOLOv8 and later, is to output separate tensors for class probabilities and bounding boxes. This is what we did in Approach 1 and Approach 2.
This worked well, and YOLOv10n for NPU now included in the JeVois distribution and microSD image.
However, a 16-bit quantized model is slower than 8-bit quantized. This is why we now prefer to output raw tensors before concatenation, as we have done above in Approach 1 above.