What pixel format for running a simple ported tensorflow network at 120fps ?

Firstly, thanks for making a great product. It's awesome.

I have a bee detection project that I have running in TF on a RaspPi ( I'm battling getting it working on the pi using a movidus neural compute stick due to bugs in their tensorflow support but that's another story ) For now I want to spend some time porting this to the JeVois.

My network is not a standard classification network so i won't be able to use TensorFlowEasy directly (though I'm sure it's going to be a great reference point for code). Instead it's a fully convolution network going from (H,W) camera input to a binary (H/2,W/2) image (representing bee/no bee). See that linked blog post above for more info.

As I understand it, to run at 120fps I can't use RGB (since the conversion from native camera sensor format to RGB will be too slow) Instead I'm going to have to work in BAYER RGGB (or YUYV) . Is this correct?

Since I'm training the network for scratch, and conv nets can take whatever format, I'm happy to work in BAYER (or whatever) I just a way of capturing data in this encoding that I can ship to my desktop (to train a model there using that format)

What I've done far is just captured some sample video (using the PassThrough module & ffmpeg)

YUYV 176 144 120.0 YUYV 176 144 120.0 JeVois PassThrough # mapping

ffmpeg -f v4l2 -pixel_format yuyv422 -video_size 176x144 -framerate 120 -i /dev/video0 output.mp4

Is there a simple way to convert an output.mp4 like this back to the Bayer format? I know for RGB encoding my tensors are (H, W, 3) and I guess what I'm really trying to understand for Bayer is what's the equivalent tensor structure... I'm a total pixel format newb, so any insights much appreciated :)

Mat

[edit]: after thinking about this a bit more it might be that the single format conversation might be the least of my problems; my network has multiple convolutions, up sampling through near neighbour interpolation (though it could be deconvolution if required) as well as big chunks of channel concatenation... I should get everything going with the simple RGB conversion (even if it's on the cpu) and actually benchmark everything...

asked Jun 5, 2018 in Programmer Questions by matpalm (170 points)

1 Answer

Best answer

Great project! At 176x144, I think converting to BGR would not in itself be out of the question, but certainly any tensorflow model will have to be tiny to run at that speed and that resolution. So your idea of using raw Bayer is great. I actually looked into that a while back and there is relevant ressearch about it (search google for "bayer neural network") but most of it is about writing a CNN to do demosaicing, not using raw Bayer as input.

We are the only ones that I know of who provide conversion into BAYER. See http://jevois.org/doc/RawImageOps_8C_source.html look for bgrToBayer() and convertCvBGRtoBayer()

Then check this one out: http://jevois.org/tutorials/ProgrammerPythonSaveImages.html

So you could convert to Bayer and then save to microSD.

But I would recommend a variation on that theme, for the purpose of getting training data. Just forget about conversion at all, and grab in Bayer, program a C++ module to run on your host, and configure JeVois as PassThrough with BAYER. In your C++ module, grab the Bayer stream from JeVois into a RawImage (see http://jevois.org/doc/ModuleTutorial.html), use jevois::rawimage::cvImage() to reinterpret it as an OpenCV array (with no copy or conversion), and save. Bayer has 1 byte per pixel. So it will look like a greyscale image with nasty banding (because of the BAYER mosaic). Your host will have much more horsepower and disk bandwidth to save the greyscale images using OpenCV imwrite(). So you just run PassThrough on JeVois and then run jevois-daemon on the host to capture BAYER from JeVois and save to disk. Alternatively, you might be able to just use plain opencv on your host, grab the video in Bayer using an opencv VideoCapture (if it can support Bayer?) and save it frame by frame using imwrite().

Once your network is trained and accepts greyscale tensors as input, our TensorFlow code should work out of the box since it works with MNIST (which is also greyscale).

answered Jun 7, 2018 by JeVois (46,580 points)
selected Jun 7, 2018 by matpalm

Great thanks, that's perfect. I hadn't thought of using jevois-daemon to do this capture on the host, but it makes sense.

I understand the model is going to have to be tiny to run at that framerate but that's part of the interest for me on this one.

I'm quite interested in the tradeoff from fps to accuracy; especially with respect to the ops used in network and have very intentionally implemented this using a simple network that uses minimal operations. see http://matpalm.com/blog/imgs/2018/bnn/network.png

* striding for downsampling (as oppposed to pooling)
* nearest neighbour upsampling (as opposed to deconvolution)

and I have other things I can play with to make things even faster

* removal of skip connections
* converting to depth wise separable convolutions (e.g. https://arxiv.org/abs/1704.04861)

Thanks for the pointers!

commented Jun 7, 2018 by matpalm (170 points)

Looks great! Yes, depthwise does wonders with FPS on mobilenets here, so definitely something to try out. Striding and/or dilated kernels is a great idea as well, who cares about Nyquist when speed is the goal...

commented Jun 8, 2018 by JeVois (46,580 points)

Most popular tags

What pixel format for running a simple ported tensorflow network at 120fps ?

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Related questions

Most popular tags

What pixel format for running a simple ported tensorflow network at 120fps ?

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.