Welcome new user! You can search existing questions and answers without registering, but please register to post new questions and receive answers.

Integration with tiny-dnn

0 votes
Is there a preferred way or library for converting the image stream between the formats that the camera sensor on jevois produces( for the current problem YUVY) and the input format that the convolutional neural network support tiny-dnn::vec_t.

On a broader note : Even a conversion between the opencv::mat type and vec_t on tiny-dnn would suffice . As the memory layout of opencv and image type supported by tiny-dnn are different.

Anybody have any experience, regarding this ? Any  help would be much appreciated.
asked Aug 17 in Programmer Questions by Bilal (170 points)

1 Answer

0 votes
Have a look at DemoSalGistFaceObj.C in jevoisbase, starting at line 191:

for both greyscale and color, we convert from YUYV to either RGB or GRAY cv::Mat in there. Then that cv::Mat is sent to process() of the ObjectRecognition component.

Now if you look at process() in ObjectRecognition.C you will see how in there we convert from cv::Mat (either gray or RGB) to tiny_dnn vec_t

This has worked well for us for MNIST (gray) and CIFAR (color).
answered Aug 18 by JeVois (11,490 points)
I understand this pipeline but there are significant accuracy differences when using this method for a cifar(color) using a tiny-dnn network.

A trained network by itself by reading from saved images in a file is 100% accurate.

Looking at those same images through a camera is completely off, even recognition does not happen.

Could there be any other reason for this ?
hum, we have not used CIFAR in a while so if you are using color maybe a bug has gotten in that code. Can you try to use the OpenCV highgui to display the cv::Mat that is received by ObjectRecognition::process() and compare that to displaying the vec_t images during training (reversing the code we have in process to go from Mat to vec_t, so you go from vec_t to Mat instead)? It might also be worth just checking the ranges of values (e.g., 0 to 255 or -1.0 to 1.0 or whatever else).

Or maybe what you need is to do some domain adaptation and fine tuning. We were quite surprised after training the MNIST network and getting 99.2% correct on the test set that we were getting very bad results live if we just passed a greyscale live image to it. So we added some extra machinery in DemoSalGistFaceObj to binarize, crop, center, resize, etc the digit before passing it to MNIST, and it helped a lot. Deep networks often tend to become very specific to a given dataset that they are trained on. By adding a bit of training using images captured by the JeVois camera you may be able to fine-tune the network to perform better on the images captured by the camera. There is quite a bit of available literature on "domain adaptation" and "fine tuning" that might be of interest here (if indeed there was no bug in the Mat to vec_t conversion).