Great question! There is a lot of work these days on hybrid CPU+GPU deep networks on embedded systems, but I don't think TensorFlow implements this yet. On the other hand, the GPU in JeVois is much slower than its CPU: CPU is 4x1344 Mhz but GPU is only 2x408 MHz (of course, hard to compare since those are very different kinds of processing units). So it is not clear that much would be gained for the effort. This is very different than desktops with ~10 CPU cores vs ~1000 GPU cores; there a big gain is obtained by using the GPU.