JeVoisBase  1.20
JeVois Smart Embedded Machine Vision Toolkit Base Modules
Share this page:
TensorFlowSingle.C
Go to the documentation of this file.
1 // ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////
2 //
3 // JeVois Smart Embedded Machine Vision Toolkit - Copyright (C) 2016 by Laurent Itti, the University of Southern
4 // California (USC), and iLab at USC. See http://iLab.usc.edu and http://jevois.org for information about this project.
5 //
6 // This file is part of the JeVois Smart Embedded Machine Vision Toolkit. This program is free software; you can
7 // redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software
8 // Foundation, version 2. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;
9 // without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public
10 // License for more details. You should have received a copy of the GNU General Public License along with this program;
11 // if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
12 //
13 // Contact information: Laurent Itti - 3641 Watt Way, HNB-07A - Los Angeles, CA 90089-2520 - USA.
14 // Tel: +1 213 740 3527 - itti@pollux.usc.edu - http://iLab.usc.edu - http://jevois.org
15 // ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////
16 /*! \file */
17 
18 #include <jevois/Core/Module.H>
19 #include <jevois/Debug/Timer.H>
21 #include <opencv2/core/core.hpp>
22 #include <opencv2/imgproc/imgproc.hpp>
24 
25 // icon from tensorflow youtube
26 
27 //! Identify objects using TensorFlow deep neural network
28 /*! TensorFlow is a popular neural network framework. This module identifies the object in a square region in the center
29  of the camera field of view using a deep convolutional neural network.
30 
31  The deep network analyzes the image by filtering it using many different filter kernels, and several stacked passes
32  (network layers). This essentially amounts to detecting the presence of both simple and complex parts of known
33  objects in the image (e.g., from detecting edges in lower layers of the network to detecting car wheels or even
34  whole cars in higher layers). The last layer of the network is reduced to a vector with one entry per known kind of
35  object (object class). This module returns the class names of the top scoring candidates in the output vector, if
36  any have scored above a minimum confidence threshold. When nothing is recognized with sufficiently high confidence,
37  there is no output.
38 
39  \youtube{TRk8rCuUVEE}
40 
41  This module runs a TensorFlow network and shows the top-scoring results. Larger deep networks can be a bit slow,
42  hence the network prediction is only run once in a while. Point your camera towards some interesting object, make
43  the object fit in the picture shown at right (which will be fed to the neural network), keep it stable, and wait for
44  TensorFlow to tell you what it found. The framerate figures shown at the bottom left of the display reflect the
45  speed at which each new video frame from the camera is processed, but in this module this just amounts to converting
46  the image to RGB, sending it to the neural network for processing in a separate thread, and creating the demo
47  display. Actual network inference speed (time taken to compute the predictions on one image) is shown at the bottom
48  right. See below for how to trade-off speed and accuracy.
49 
50  Note that by default this module runs different flavors of MobileNets trained on the ImageNet dataset. There are
51  1000 different kinds of objects (object classes) that these networks can recognize (too long to list here). The
52  input layer of these networks is 299x299, 224x224, 192x192, 160x160, or 128x128 pixels by default, depending on the
53  network used. This modules takes a crop at the center of the video image, with size determined by the USB video
54  size: the crop size is USB output width - 2 - camera sensor image width. With the default network parameters, this
55  module hence requires at least 320x240 camera sensor resolution. The networks provided on the JeVois microSD image
56  have been trained on large clusters of GPUs, using 1.2 million training images from the ImageNet dataset.
57 
58  For more information about MobileNets, see
59  https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md
60 
61  For more information about the ImageNet dataset used for training, see
62  http://www.image-net.org/challenges/LSVRC/2012/
63 
64  Sometimes this module will make mistakes! The performance of mobilenets is about 40% to 70% correct (mean average
65  precision) on the test set, depending on network size (bigger networks are more accurate but slower).
66 
67  Neural network size and speed
68  -----------------------------
69 
70  When using a video mapping with USB output, the cropped window sent to the network is automatically sized to a
71  square size that is the difference between the USB output video width and the camera sensor input width minus 16
72  pixels (e.g., when USB video mode is 560x240 and camera sensor mode is 320x240, the network will be resized to
73  224x224 since 224=560-16-320).
74 
75  The network actual input size varies depending on which network is used; for example, mobilenet_v1_0.25_128_quant
76  expects 128x128 input images, while mobilenet_v1_1.0_224 expects 224x224. We automatically rescale the cropped
77  window to the network's desired input size. Note that there is a cost to rescaling, so, for best performance, you
78  should match the USB output width to be the camera sensor width + 2 + network input width.
79 
80  For example:
81 
82  - with USB output 464x240 (crop size 128x128), mobilenet_v1_0.25_128_quant (network size 128x128), runs at about
83  8ms/prediction (125 frames/s).
84 
85  - with USB output 464x240 (crop size 128x128), mobilenet_v1_0.5_128_quant (network size 128x128), runs at about
86  18ms/prediction (55 frames/s).
87 
88  - with USB output 560x240 (crop size 224x224), mobilenet_v1_0.25_224_quant (network size 224x224), runs at about
89  24ms/prediction (41 frames/s).
90 
91  - with USB output 560x240 (crop size 224x224), mobilenet_v1_1.0_224_quant (network size 224x224), runs at about
92  139ms/prediction (7 frames/s).
93 
94  When using a videomapping with no USB output, the image crop is directly taken to match the network input size, so
95  that no resizing occurs.
96 
97  Note that network dims must always be such that they fit inside the camera input image.
98 
99  To easily select one of the available networks, see <B>JEVOIS:/modules/JeVois/TensorFlowSingle/params.cfg</B> on the
100  microSD card of your JeVois camera.
101 
102  Serial messages
103  ---------------
104 
105  When detections are found with confidence scores above \p thresh, a message containing up to \p top category:score
106  pairs will be sent per video frame. Exact message format depends on the current \p serstyle setting and is described
107  in \ref UserSerialStyle. For example, when \p serstyle is \b Detail, this module sends:
108 
109  \verbatim
110  DO category:score category:score ... category:score
111  \endverbatim
112 
113  where \a category is a category name (from \p namefile) and \a score is the confidence score from 0.0 to 100.0 that
114  this category was recognized. The pairs are in order of decreasing score.
115 
116  See \ref UserSerialStyle for more on standardized serial messages, and \ref coordhelpers for more info on
117  standardized coordinates.
118 
119  Using your own network
120  ----------------------
121 
122  For a step-by-step tutorial, see [Training custom TensorFlow networks for
123  JeVois](http://jevois.org/tutorials/UserTensorFlowTraining.html).
124 
125  This module supports RGB or grayscale inputs, byte or float32. You should create and train your network using fast
126  GPUs, and then follow the instruction here to convert your trained network to TFLite format:
127 
128  https://www.tensorflow.org/lite/
129 
130  Then you just need to create a directory under <b>JEVOIS:/share/tensorflow/</B> with the name of your network, and,
131  in there, two files, \b labels.txt with the category labels, and \b model.tflite with your model converted to
132  TensorFlow Lite (flatbuffer format). Finally, edit <B>JEVOIS:/modules/JeVois/TensorFlowEasy/params.cfg</B> to
133  select your new network when the module is launched.
134 
135 
136  @author Laurent Itti
137 
138  @displayname TensorFlow Single
139  @videomapping NONE 0 0 0.0 YUYV 320 240 30.0 JeVois TensorFlowSingle
140  @videomapping YUYV 560 240 15.0 YUYV 320 240 15.0 JeVois TensorFlowSingle
141  @videomapping YUYV 464 240 15.0 YUYV 320 240 15.0 JeVois TensorFlowSingle
142  @videomapping YUYV 880 480 15.0 YUYV 640 480 15.0 JeVois TensorFlowSingle
143  @email itti\@usc.edu
144  @address University of Southern California, HNB-07A, 3641 Watt Way, Los Angeles, CA 90089-2520, USA
145  @copyright Copyright (C) 2017 by Laurent Itti, iLab and the University of Southern California
146  @mainurl http://jevois.org
147  @supporturl http://jevois.org/doc
148  @otherurl http://iLab.usc.edu
149  @license GPL v3
150  @distribution Unrestricted
151  @restrictions None
152  \ingroup modules */
154 {
155  public:
156  // ####################################################################################################
157  //! Constructor
158  // ####################################################################################################
159  TensorFlowSingle(std::string const & instance) : jevois::StdModule(instance)
160  {
161  itsTensorFlow = addSubComponent<TensorFlow>("tf");
162  }
163 
164  // ####################################################################################################
165  //! Virtual destructor for safe inheritance
166  // ####################################################################################################
168  { }
169 
170  // ####################################################################################################
171  //! Un-initialization
172  // ####################################################################################################
173  virtual void postUninit() override
174  {
175  try { itsPredictFut.get(); } catch (...) { }
176  }
177 
178  // ####################################################################################################
179  //! Processing function, no video output
180  // ####################################################################################################
181  virtual void process(jevois::InputFrame && inframe) override
182  {
183  // Wait for next available camera image:
184  jevois::RawImage const inimg = inframe.get();
185  int const w = inimg.width, h = inimg.height;
186 
187  // Check input vs network dims, will throw if network not ready:
188  int netw, neth, netc;
189  try { itsTensorFlow->getInDims(netw, neth, netc); }
190  catch (std::logic_error const & e) { inframe.done(); return; }
191 
192  if (netw > w || neth > h)
193  LFATAL("Network wants " << netw << 'x' << neth << " input, larger than camera " << w << 'x' << h);
194 
195  // Take a central crop of the input, with size given by network input:
196  int const offx = ((w - netw) / 2) & (~1);
197  int const offy = ((h - neth) / 2) & (~1);
198 
199  cv::Mat cvimg = jevois::rawimage::cvImage(inimg);
200  cv::Mat crop = cvimg(cv::Rect(offx, offy, netw, neth));
201 
202  // Convert crop to RGB for predictions:
203  cv::cvtColor(crop, itsCvImg, cv::COLOR_YUV2RGB_YUYV);
204 
205  // Let camera know we are done processing the input image:
206  inframe.done();
207 
208  // Launch the predictions (do not catch exceptions, we already tested for network ready in this block):
209  float const ptime = itsTensorFlow->predict(itsCvImg, itsResults);
210  LINFO("Predicted in " << ptime << "ms");
211 
212  // Send serial results:
214  }
215 
216  // ####################################################################################################
217  //! Processing function with video output to USB
218  // ####################################################################################################
219  virtual void process(jevois::InputFrame && inframe, jevois::OutputFrame && outframe) override
220  {
221  static jevois::Timer timer("processing", 30, LOG_DEBUG);
222 
223  // Wait for next available camera image:
224  jevois::RawImage const inimg = inframe.get();
225 
226  timer.start();
227 
228  // We only handle one specific pixel format, but any image size in this module:
229  int const w = inimg.width, h = inimg.height;
230  inimg.require("input", w, h, V4L2_PIX_FMT_YUYV);
231 
232  // While we process it, start a thread to wait for out frame and paste the input into it:
233  jevois::RawImage outimg;
234  auto paste_fut = jevois::async([&]() {
235  outimg = outframe.get();
236  outimg.require("output", outimg.width, outimg.height, V4L2_PIX_FMT_YUYV);
237 
238  // Paste the current input image:
239  jevois::rawimage::paste(inimg, outimg, 0, 0);
240  jevois::rawimage::writeText(outimg, "JeVois TensorFlow - input", 3, 3, jevois::yuyv::White);
241 
242  // Draw a 16-pixel wide rectangle:
243  jevois::rawimage::drawFilledRect(outimg, w, 0, 16, h, jevois::yuyv::MedGrey);
244 
245  // Paste the latest prediction results, if any, otherwise a wait message:
246  cv::Mat outimgcv = jevois::rawimage::cvImage(outimg);
247  if (itsRawPrevOutputCv.empty() == false)
248  itsRawPrevOutputCv.copyTo(outimgcv(cv::Rect(w + 16, 0, itsRawPrevOutputCv.cols, itsRawPrevOutputCv.rows)));
249  else
250  {
251  jevois::rawimage::drawFilledRect(outimg, w + 16, 0, outimg.width - w, h, jevois::yuyv::Black);
252  jevois::rawimage::writeText(outimg, "Loading network -", w + 19, 3, jevois::yuyv::White);
253  jevois::rawimage::writeText(outimg, "please wait...", w + 19, 15, jevois::yuyv::White);
254  }
255  });
256 
257  // Decide on what to do based on itsPredictFut: if it is valid, we are still predicting, so check whether we are
258  // done and if so draw the results. Otherwise, start predicting using the current input frame:
259  if (itsPredictFut.valid())
260  {
261  // Are we finished predicting?
262  if (itsPredictFut.wait_for(std::chrono::milliseconds(5)) == std::future_status::ready)
263  {
264  // Do a get() on our future to free up the async thread and get any exception it might have thrown. In
265  // particular, it will throw a logic_error if we are still loading the network:
266  bool success = true; float ptime = 0.0F;
267  try { ptime = itsPredictFut.get(); } catch (std::logic_error const & e) { success = false; }
268 
269  // Wait for paste to finish up and let camera know we are done processing the input image:
270  paste_fut.get(); inframe.done();
271 
272  if (success)
273  {
274  int const cropw = itsRawInputCv.cols, croph = itsRawInputCv.rows;
275  cv::Mat outimgcv = jevois::rawimage::cvImage(outimg);
276 
277  // Update our output image: First paste the image we have been making predictions on:
278  itsRawInputCv.copyTo(outimgcv(cv::Rect(w + 16, 0, cropw, croph)));
279  jevois::rawimage::drawFilledRect(outimg, w + 16, croph, cropw, h - croph, jevois::yuyv::Black);
280  jevois::rawimage::drawFilledRect(outimg, w, 0, 16, h, jevois::yuyv::MedGrey);
281 
282  // Then draw the detections: either below the detection crop if there is room, or on top of it if not enough
283  // room below:
284  int y = croph + 3; if (y + int(itsTensorFlow->top::get()) * 12 > h - 21) y = 3;
285 
286  for (auto const & p : itsResults)
287  {
288  jevois::rawimage::writeText(outimg, jevois::sformat("%s: %.2F", p.category.c_str(), p.score),
289  w + 19, y, jevois::yuyv::White);
290  y += 12;
291  }
292 
293  // Send serial results:
295 
296  // Draw some text messages:
297  jevois::rawimage::writeText(outimg, "Predict time: " + std::to_string(int(ptime)) + "ms",
298  w + 19, h - 11, jevois::yuyv::White);
299 
300  // Finally make a copy of these new results so we can display them again while we wait for the next round:
301  itsRawPrevOutputCv = cv::Mat(h, cropw, CV_8UC2);
302  outimgcv(cv::Rect(w + 16, 0, cropw, h)).copyTo(itsRawPrevOutputCv);
303 
304  } else { itsRawPrevOutputCv.release(); } // network is not ready yet
305  }
306  else
307  {
308  // Future is not ready, do nothing except drawings on this frame (done in paste_fut thread) and we will try
309  // again on the next one...
310  paste_fut.get(); inframe.done();
311  }
312  }
313  else // We are not predicting, launch a prediction:
314  {
315  // Wait for paste to finish up:
316  paste_fut.get();
317 
318  // In this module, we use square crops for the network, with size given by USB width - camera width:
319  if (outimg.width < inimg.width + 16) LFATAL("USB output image must be larger than camera input");
320  int const cropw = outimg.width - inimg.width - 16; // 16 pix separator to distinguish darknet vs tensorflow
321  int const croph = cropw; // square crop
322 
323  // Check input vs network dims:
324  if (cropw <= 0 || croph <= 0 || cropw > w || croph > h)
325  LFATAL("Network crop window must fit within camera frame");
326 
327  // Take a central crop of the input:
328  int const offx = ((w - cropw) / 2) & (~1);
329  int const offy = ((h - croph) / 2) & (~1);
330  cv::Mat cvimg = jevois::rawimage::cvImage(inimg);
331  cv::Mat crop = cvimg(cv::Rect(offx, offy, cropw, croph));
332 
333  // Convert crop to RGB for predictions:
334  cv::cvtColor(crop, itsCvImg, cv::COLOR_YUV2RGB_YUYV);
335 
336  // Also make a raw YUYV copy of the crop for later displays:
337  crop.copyTo(itsRawInputCv);
338 
339  // Let camera know we are done processing the input image:
340  inframe.done();
341 
342  // Rescale the cropped image to network dims if needed:
343  try
344  {
345  int netinw, netinh, netinc; itsTensorFlow->getInDims(netinw, netinh, netinc);
346  itsCvImg = jevois::rescaleCv(itsCvImg, cv::Size(netinw, netinh));
347 
348  // Launch the predictions:
350  { return itsTensorFlow->predict(itsCvImg, itsResults); });
351  }
352  catch (std::logic_error const & e) { itsRawPrevOutputCv.release(); } // network is not ready yet
353  }
354 
355  // Show processing fps:
356  std::string const & fpscpu = timer.stop();
357  jevois::rawimage::writeText(outimg, fpscpu, 3, h - 13, jevois::yuyv::White);
358 
359  // Send the output image with our processing results to the host over USB:
360  outframe.send();
361  }
362 
363  // ####################################################################################################
364  protected:
365  std::shared_ptr<TensorFlow> itsTensorFlow;
366  std::vector<jevois::ObjReco> itsResults;
367  std::future<float> itsPredictFut;
368  cv::Mat itsRawInputCv;
369  cv::Mat itsCvImg;
371 };
372 
373 // Allow the module to be loaded as a shared object (.so) file:
jevois::OutputFrame
TensorFlowSingle::postUninit
virtual void postUninit() override
Un-initialization.
Definition: TensorFlowSingle.C:173
jevois::async
std::future< std::invoke_result_t< std::decay_t< Function >, std::decay_t< Args >... > > async(Function &&f, Args &&... args)
Timer.H
Module.H
jevois::rescaleCv
cv::Mat rescaleCv(cv::Mat const &img, cv::Size const &newdims)
jevois::sformat
std::string sformat(char const *fmt,...) __attribute__((format(__printf__
TensorFlowSingle::itsCvImg
cv::Mat itsCvImg
Definition: TensorFlowSingle.C:369
TensorFlowSingle::process
virtual void process(jevois::InputFrame &&inframe, jevois::OutputFrame &&outframe) override
Processing function with video output to USB.
Definition: TensorFlowSingle.C:219
jevois::RawImage
TensorFlowSingle::~TensorFlowSingle
virtual ~TensorFlowSingle()
Virtual destructor for safe inheritance.
Definition: TensorFlowSingle.C:167
jevois::Timer::start
void start()
TensorFlowSingle::itsRawPrevOutputCv
cv::Mat itsRawPrevOutputCv
Definition: TensorFlowSingle.C:370
jevois::RawImage::require
void require(char const *info, unsigned int w, unsigned int h, unsigned int f) const
jevois::RawImage::width
unsigned int width
jevois::rawimage::writeText
void writeText(RawImage &img, std::string const &txt, int x, int y, unsigned int col, Font font=Font6x10)
TensorFlowSingle::itsRawInputCv
cv::Mat itsRawInputCv
Definition: TensorFlowSingle.C:368
jevois
TensorFlowSingle::itsTensorFlow
std::shared_ptr< TensorFlow > itsTensorFlow
Definition: TensorFlowSingle.C:365
success
#define success()
jevois::Timer::stop
const std::string & stop(double *seconds)
jevois::rawimage::drawFilledRect
void drawFilledRect(RawImage &img, int x, int y, unsigned int w, unsigned int h, unsigned int col)
jevois::StdModule::StdModule
StdModule(std::string const &instance)
LFATAL
#define LFATAL(msg)
RawImageOps.H
jevois::RawImage::height
unsigned int height
to_string
std::string to_string(T const &val)
jevois::InputFrame
jevois::rawimage::cvImage
cv::Mat cvImage(RawImage const &src)
TensorFlowSingle::itsResults
std::vector< jevois::ObjReco > itsResults
Definition: TensorFlowSingle.C:366
TensorFlowSingle
Identify objects using TensorFlow deep neural network.
Definition: TensorFlowSingle.C:153
jevois::rawimage::paste
void paste(RawImage const &src, RawImage &dest, int dx, int dy)
h
int h
jevois::StdModule
LINFO
#define LINFO(msg)
JEVOIS_REGISTER_MODULE
JEVOIS_REGISTER_MODULE(TensorFlowSingle)
TensorFlowSingle::TensorFlowSingle
TensorFlowSingle(std::string const &instance)
Constructor.
Definition: TensorFlowSingle.C:159
demo.w
w
Definition: demo.py:85
TensorFlow.H
TensorFlowSingle::itsPredictFut
std::future< float > itsPredictFut
Definition: TensorFlowSingle.C:367
TensorFlowSingle::process
virtual void process(jevois::InputFrame &&inframe) override
Processing function, no video output.
Definition: TensorFlowSingle.C:181
jevois::Timer
jevois::StdModule::sendSerialObjReco
void sendSerialObjReco(std::vector< ObjReco > const &res)