Welcome new user! You can search existing questions and answers without registering, but please register to post new questions and receive answers. Note that due to large amounts of spam attempts, your first three posts will be manually moderated, so please be patient.
Because of un-manageable amounts of spam despite our use of CAPTCHAs, email authorization, and other tools, we have discontinued this forum (see the 700k+ registered users with validated email addresses at right?). Please email us any questions or post bug reports and feature requests on GitHub at https://github.com/jevois -- The content below remains available for future reference.
Welcome to JeVois Tech Zone, where you can ask questions and receive answers from other members of the community.

AlexyAB has an XNOR version of tiny yolov3 (link attached). Are you looking into a video mode for XNOR?

0 votes
https://github.com/AlexeyAB/darknet/blob/master/cfg/yolov3-tiny_xnor.cfg

Do you think XNOR is a path for improved performance over NNPack/NEON methods?
asked Oct 7, 2018 in Programmer Questions by spinoza1791 (170 points)

1 Answer

0 votes
Certainly of interest indeed!

Looking at the related commits from AlexeyAB, I see a lot of modified CUDA code but not much modified CPU code. Since we don't run CUDA, this may take some work.

Do you know whether OpenCV DNN supports XNOR? This might be an easier path than the original (or AlexeyAB) darknet code.
answered Oct 11, 2018 by JeVois (46,580 points)
Yes, if quantized was bundled into OpenCV DNN, that would be slick!  I have not seen this yet.

However, I know that Tensorflow Light (mobilenetV2 quantized) can work on Raspberry Pi (similar CPU arch to Jevois, right?) https://www.tensorflow.org/lite/rpi

Also, here is an even lighter yolo version from Alexey that is meant for CPU and GPU, so CUDA would only be needed for training w Nvidia.  Inference can use CPU. https://github.com/AlexeyAB/yolo2_light.

I doubt quantized mobilenetv2 with TF would be any faster than tiny-yolo with NNPack for NEON, but I haven't tested.  I am thinking a DarkFlow implementation of TF lite would be interesting...

Here is an example of an optimized NNPack (40% faster than original, I've confirmed on Pi) with an interesting (slower) option to use the Pi GPU/QPU. https://github.com/shizukachan/darknet-nnpack

~Andrew
Also, I received this reply from AlexyAB via github:

Re: [AlexeyAB/darknet] Will XNOR yolo work on ARMv7? (#1751)

Alexey
4:59 PM (1 hour ago)
to AlexeyAB/darknet, me, Author

I didn't test XNOR on ARM CPU. Also XNOR currently doesn't use SIMD instructions on ARM, since XNOR implementation is optimized for AVX2 instructions on Intel CPUs.

It should work on ARM if you use GCC compiler with built-in popcnt-instruction: https://github.com/AlexeyAB/darknet/blob/7ee4135910624f11e80de36b236208b223f58eb4/src/gemm.c#L1640
But you should compile with OPENMP=1 AVX=0 in the Makefile.
excellent, thanks much, we should be able to make it work! Realistically, next week is shot because of ARM TechCon, but will be next on our todo list after that. Do you have pretrained weights that we could use to test, that would significantly speed up the process?

We have support for mobilenets v2 in TensorFlowEasy already, see the commented out entries in its params.cfg at http://jevois.org/moddoc/TensorFlowEasy/modinfo.html

But we do not have the quantized weights, and mobilenet v2 with float weights are slower than quantized mobilenet v1, which is why v1 is still the one enabled by default. But remember that those are a different kind of network (recognition only, as opposed to detection plus recognition in yolo). If you have pointers to the mobilenet v2 quantized weights for imagenet, that would be great too, we could add them to our standard distro.
...