Can TensorFlow Lite use the GPU ?

I am running a custom tensorflow model using tensorflow lite. From the results it seems the GPU is not really used. Is the TensorfFlow Easy model using the GPU or not and if not (how) can I activate it?

Edit:

Follow up question: Is there any framework that allows running custom deep nets on the jevois, using the GPU?

Thanks already:)

asked Jun 6, 2018 in Programmer Questions by phildue (420 points)
edited Jun 7, 2018 by phildue

1 Answer

Best answer

Great question! There is a lot of work these days on hybrid CPU+GPU deep networks on embedded systems, but I don't think TensorFlow implements this yet. On the other hand, the GPU in JeVois is much slower than its CPU: CPU is 4x1344 Mhz but GPU is only 2x408 MHz (of course, hard to compare since those are very different kinds of processing units). So it is not clear that much would be gained for the effort. This is very different than desktops with ~10 CPU cores vs ~1000 GPU cores; there a big gain is obtained by using the GPU.

answered Jun 7, 2018 by JeVois (46,580 points)
selected Nov 13, 2018 by phildue

Thank you a lot for the answer! This leaves me with another question though: I run a network smaller than TinyYolo using Tensorflow-Lite and I get an average inference time of 3.5s although i use 4 processing threads. The darknet-yolo version in contrast achieves ~1.5s on average and is, as far as I understand, running on the GPU. Do you know what leads to this big difference in inference time?

It also seems that there is no difference in choosing between 1 and 4 threads in tensorflow lite

commented Jun 7, 2018 by phildue (420 points)
edited Jun 7, 2018 by phildue

I am certainly with Joseph Redmond (of Darknet YOLO) on that one, what matters is the complexity (number of multiply accumulate) more than just network size.

Have a look at his tests here: https://pjreddie.com/darknet/tiny-darknet/

His darknet reference is much bigger than squeezenet but runs faster (fewer operations) and is more accurate.

For the threads, we have not played with that too much, but we do see near 400% CPU usage when running TensorFlow mobilenets, so parallelism is happening for sure.

Note that on embedded systems there is another variable, which is ARM NEON acceleration (the equivalent of SSE on intel processors). Darknet uses the NNPACK package which uses NEON copiously. I am not sure how far along the TensorFlow people are on this front. You may also be interested in the ARM Compute Library and ARM NN SDK which have plenty of acceleration with NEON, GPU, etc, but those will require some work to transfer a trained Caffe or TensorFlow model to these frameworks.

commented Jun 8, 2018 by JeVois (46,580 points)

Totally agree that it is the number of operations rather than parameters. When I was talking about "a smaller network than TinyYolo" I actually referred to a model that has the same architecture as tiny yolo but uses less filters on each layer and was trained with a slightly different loss function. So it would definitely have less operations than TinyYolo.

That's why I think the main difference is the fact that darknet uses NEON acceleration already.

Thanks a lot for the support!

commented Jun 10, 2018 by phildue (420 points)

Most popular tags

Can TensorFlow Lite use the GPU ?

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Related questions

Most popular tags

Can TensorFlow Lite use the GPU ?

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.