JeVoisBase  1.20
JeVois Smart Embedded Machine Vision Toolkit Base Modules
Share this page:
TensorFlowSaliency.C
Go to the documentation of this file.
1 // ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////
2 //
3 // JeVois Smart Embedded Machine Vision Toolkit - Copyright (C) 2016 by Laurent Itti, the University of Southern
4 // California (USC), and iLab at USC. See http://iLab.usc.edu and http://jevois.org for information about this project.
5 //
6 // This file is part of the JeVois Smart Embedded Machine Vision Toolkit. This program is free software; you can
7 // redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software
8 // Foundation, version 2. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;
9 // without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public
10 // License for more details. You should have received a copy of the GNU General Public License along with this program;
11 // if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
12 //
13 // Contact information: Laurent Itti - 3641 Watt Way, HNB-07A - Los Angeles, CA 90089-2520 - USA.
14 // Tel: +1 213 740 3527 - itti@pollux.usc.edu - http://iLab.usc.edu - http://jevois.org
15 // ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////
16 /*! \file */
17 
18 #include <jevois/Core/Module.H>
19 #include <jevois/Debug/Timer.H>
21 #include <opencv2/core/core.hpp>
22 #include <opencv2/imgproc/imgproc.hpp>
25 
26 // icon from tensorflow youtube
27 
28 static jevois::ParameterCategory const ParamCateg("TensorFlow Saliency Options");
29 
30 //! Parameter \relates TensorFlowSaliency
31 JEVOIS_DECLARE_PARAMETER(foa, cv::Size, "Width and height (in pixels) of the focus of attention. "
32  "This is the size of the image crop that is taken around the most salient "
33  "location in each frame. The foa size must fit within the camera input frame size. To avoid "
34  "rescaling, it is best to use here the size that the deep network expects as input.",
35  cv::Size(128, 128), ParamCateg);
36 
37 //! Detect salient objects and identify them using TensorFlow deep neural network
38 /*! TensorFlow is a popular neural network framework. This module first finds the most conspicuous (salient) object in
39  the scene, then identifies it using a deep neural network. It returns the top scoring candidates.
40 
41  See http://ilab.usc.edu/bu/ for more information about saliency detection, and https://www.tensorflow.org for more
42  information about the TensorFlow deep neural network framework.
43 
44  \youtube{TRk8rCuUVEE}
45 
46  This module runs a TensorFlow network on an image window around the most salient point and shows the top-scoring
47  results. We alternate, on every other frame, between updating the salient window crop location, and predicting what
48  is in it. Actual network inference speed (time taken to compute the predictions on one image crop) is shown at the
49  bottom right. See below for how to trade-off speed and accuracy.
50 
51  Note that by default this module runs fast variant of MobileNets trained on the ImageNet dataset. There are 1000
52  different kinds of objects (object classes) that this network can recognize (too long to list here). It is possible
53  to use bigger and more complex networks, but it will likely slow down the framerate.
54 
55  For more information about MobileNets, see
56  https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md
57 
58  For more information about the ImageNet dataset used for training, see
59  http://www.image-net.org/challenges/LSVRC/2012/
60 
61  Sometimes this module will make mistakes! The performance of mobilenets is about 40% to 70% correct (mean average
62  precision) on the test set, depending on network size (bigger networks are more accurate but slower).
63 
64  Neural network size and speed
65  -----------------------------
66 
67  This module provides a parameter, \p foa, which determines the size of a region of interest that is cropped around
68  the most salient location. This region will then be rescaled, if needed, to the neural network's expected input
69  size. To avoid wasting time rescaling, it is hence best to select an \p foa size that is equal to the network's
70  input size.
71 
72  The network actual input size varies depending on which network is used; for example, mobilenet_v1_0.25_128_quant
73  expects 128x128 input images, while mobilenet_v1_1.0_224 expects 224x224. We automatically rescale the cropped
74  window to the network's desired input size. Note that there is a cost to rescaling, so, for best performance, you
75  should match \p foa size to the network input size.
76 
77  For example:
78 
79  - mobilenet_v1_0.25_128_quant (network size 128x128), runs at about 8ms/prediction (125 frames/s).
80  - mobilenet_v1_0.5_128_quant (network size 128x128), runs at about 18ms/prediction (55 frames/s).
81  - mobilenet_v1_0.25_224_quant (network size 224x224), runs at about 24ms/prediction (41 frames/s).
82  - mobilenet_v1_1.0_224_quant (network size 224x224), runs at about 139ms/prediction (7 frames/s).
83 
84  When using video mappings with USB output, irrespective of \p foa, the crop around the most salient image region
85  (with size given by \p foa) will always also be rescaled so that, when placed to the right of the input image, it
86  fills the desired USB output dims. For example, if camera mode is 320x240 and USB output size is 544x240, then the
87  attended and recognized object will be rescaled to 224x224 (since 224 = 544-320) for display purposes only. This is
88  so that one does not need to change USB video resolution while playing with different values of \p foa live.
89 
90  Serial messages
91  ---------------
92 
93  On every frame where detection results were obtained that are above \p thresh, this module sends a standardized 2D
94  message as specified in \ref UserSerialStyle:
95  + Serial message type: \b 2D
96  + `id`: top-scoring category name of the recognized object, followed by ':' and the confidence score in percent
97  + `x`, `y`, or vertices: standardized 2D coordinates of object center or corners
98  + `w`, `h`: standardized object size
99  + `extra`: any number of additional category:score pairs which had an above-threshold score, in order of
100  decreasing score
101  where \a category is the category name (from \p namefile) and \a score is the confidence score from 0.0 to 100.0
102 
103  See \ref UserSerialStyle for more on standardized serial messages, and \ref coordhelpers for more info on
104  standardized coordinates.
105 
106  Using your own network
107  ----------------------
108 
109  For a step-by-step tutorial, see [Training custom TensorFlow networks for
110  JeVois](http://jevois.org/tutorials/UserTensorFlowTraining.html).
111 
112  This module supports RGB or grayscale inputs, byte or float32. You should create and train your network using fast
113  GPUs, and then follow the instruction here to convert your trained network to TFLite format:
114 
115  https://www.tensorflow.org/lite/
116 
117  Then you just need to create a directory under <b>JEVOIS:/share/tensorflow/</B> with the name of your network, and,
118  in there, two files, \b labels.txt with the category labels, and \b model.tflite with your model converted to
119  TensorFlow Lite (flatbuffer format). Finally, edit <B>JEVOIS:/modules/JeVois/TensorFlowEasy/params.cfg</B> to
120  select your new network when the module is launched.
121 
122  @author Laurent Itti
123 
124  @displayname TensorFlow Saliency
125  @videomapping NONE 0 0 0.0 YUYV 320 240 15.0 JeVois TensorFlowSaliency
126  @videomapping YUYV 448 240 30.0 YUYV 320 240 30.0 JeVois TensorFlowSaliency # recommended network size 128x128
127  @videomapping YUYV 512 240 30.0 YUYV 320 240 30.0 JeVois TensorFlowSaliency # recommended network size 192x192
128  @videomapping YUYV 544 240 30.0 YUYV 320 240 30.0 JeVois TensorFlowSaliency # recommended network size 224x224
129  @email itti\@usc.edu
130  @address University of Southern California, HNB-07A, 3641 Watt Way, Los Angeles, CA 90089-2520, USA
131  @copyright Copyright (C) 2018 by Laurent Itti, iLab and the University of Southern California
132  @mainurl http://jevois.org
133  @supporturl http://jevois.org/doc
134  @otherurl http://iLab.usc.edu
135  @license GPL v3
136  @distribution Unrestricted
137  @restrictions None
138  \ingroup modules */
140  public jevois::Parameter<foa>
141 {
142  public:
143  // ####################################################################################################
144  //! Constructor
145  // ####################################################################################################
146  TensorFlowSaliency(std::string const & instance) : jevois::StdModule(instance), itsRx(0), itsRy(0),
147  itsRw(0), itsRh(0)
148  {
149  itsSaliency = addSubComponent<Saliency>("saliency");
150  itsTensorFlow = addSubComponent<TensorFlow>("tensorflow");
151  }
152 
153  // ####################################################################################################
154  //! Virtual destructor for safe inheritance
155  // ####################################################################################################
157  { }
158 
159  // ####################################################################################################
160  //! Un-initialization
161  // ####################################################################################################
162  virtual void postUninit() override
163  {
164  try { itsPredictFut.get(); } catch (...) { }
165  }
166 
167  // ####################################################################################################
168  //! Helper function: compute saliency ROI in a thread, return top-left corner and size
169  // ####################################################################################################
170  virtual void getSalROI(jevois::RawImage const & inimg)
171  {
172  int const w = inimg.width, h = inimg.height;
173 
174  // Check whether the input image size is small, in which case we will scale the maps up one notch:
175  if (w < 170) { itsSaliency->centermin::set(1); itsSaliency->smscale::set(3); }
176  else { itsSaliency->centermin::set(2); itsSaliency->smscale::set(4); }
177 
178  // Find the most salient location, no gist for now:
179  itsSaliency->process(inimg, false);
180 
181  // Get some info from the saliency computation:
182  int const smlev = itsSaliency->smscale::get();
183  int const smfac = (1 << smlev);
184 
185  // Find most salient point:
186  int mx, my; intg32 msal; itsSaliency->getSaliencyMax(mx, my, msal);
187 
188  // Compute attended ROI (note: coords must be even to avoid flipping U/V when we later paste):
189  cv::Size roisiz = foa::get(); itsRw = roisiz.width; itsRh = roisiz.height;
190  itsRw = std::min(itsRw, w); itsRh = std::min(itsRh, h); itsRw &= ~1; itsRh &= ~1;
191  unsigned int const dmx = (mx << smlev) + (smfac >> 2);
192  unsigned int const dmy = (my << smlev) + (smfac >> 2);
193  itsRx = int(dmx + 1 + smfac / 4) - itsRw / 2;
194  itsRy = int(dmy + 1 + smfac / 4) - itsRh / 2;
195  itsRx = std::max(0, std::min(itsRx, w - itsRw));
196  itsRy = std::max(0, std::min(itsRy, h - itsRh));
197  itsRx &= ~1; itsRy &= ~1;
198  if (itsRw <= 0 || itsRh <= 0) LFATAL("Ooops, foa size cannot be zero or negative");
199  }
200 
201  // ####################################################################################################
202  //! Processing function, no video output
203  // ####################################################################################################
204  virtual void process(jevois::InputFrame && inframe) override
205  {
206  // Wait for next available camera image:
207  jevois::RawImage const inimg = inframe.get();
208  unsigned int const w = inimg.width, h = inimg.height;
209 
210  // Find the most salient location, no gist for now:
211  getSalROI(inimg);
212 
213  // Extract a raw YUYV ROI around attended point:
214  cv::Mat rawimgcv = jevois::rawimage::cvImage(inimg);
215  cv::Mat rawroi = rawimgcv(cv::Rect(itsRx, itsRy, itsRw, itsRh));
216 
217  // Convert the ROI to RGB:
218  cv::Mat rgbroi;
219  cv::cvtColor(rawroi, rgbroi, cv::COLOR_YUV2RGB_YUYV);
220 
221  // Let camera know we are done processing the input image:
222  inframe.done();
223 
224  // Launch the predictions, will throw if network is not ready:
225  itsResults.clear();
226  try
227  {
228  int netinw, netinh, netinc; itsTensorFlow->getInDims(netinw, netinh, netinc);
229 
230  // Scale the ROI if needed:
231  cv::Mat scaledroi = jevois::rescaleCv(rgbroi, cv::Size(netinw, netinh));
232 
233  // Predict:
234  float const ptime = itsTensorFlow->predict(scaledroi, itsResults);
235  LINFO("Predicted in " << ptime << "ms");
236 
237  // Send serial results and switch to next frame:
239  }
240  catch (std::logic_error const & e) { } // network still loading
241  }
242 
243  // ####################################################################################################
244  //! Processing function with video output to USB
245  // ####################################################################################################
246  virtual void process(jevois::InputFrame && inframe, jevois::OutputFrame && outframe) override
247  {
248  static jevois::Timer timer("processing", 30, LOG_DEBUG);
249 
250  // Wait for next available camera image:
251  jevois::RawImage const inimg = inframe.get();
252 
253  timer.start();
254 
255  // We only handle one specific pixel format, but any image size in this module:
256  unsigned int const w = inimg.width, h = inimg.height;
257  inimg.require("input", w, h, V4L2_PIX_FMT_YUYV);
258 
259  // While we process it, start a thread to wait for out frame and paste the input into it:
260  jevois::RawImage outimg;
261  auto paste_fut = jevois::async([&]() {
262  outimg = outframe.get();
263  outimg.require("output", outimg.width, outimg.height, V4L2_PIX_FMT_YUYV);
264 
265  // Paste the current input image:
266  jevois::rawimage::paste(inimg, outimg, 0, 0);
267  jevois::rawimage::writeText(outimg, "JeVois TensorFlow Saliency", 3, 3, jevois::yuyv::White);
268 
269  // Paste the latest prediction results, if any, otherwise a wait message:
270  cv::Mat outimgcv = jevois::rawimage::cvImage(outimg);
271  if (itsRawPrevOutputCv.empty() == false)
272  itsRawPrevOutputCv.copyTo(outimgcv(cv::Rect(w, 0, itsRawPrevOutputCv.cols, itsRawPrevOutputCv.rows)));
273  else
274  {
275  jevois::rawimage::drawFilledRect(outimg, w, 0, outimg.width - w, h, jevois::yuyv::Black);
276  jevois::rawimage::writeText(outimg, "Loading network -", w + 3, 3, jevois::yuyv::White);
277  jevois::rawimage::writeText(outimg, "please wait...", w + 3, 15, jevois::yuyv::White);
278  }
279  });
280 
281  // On even frames, update the salient ROI, on odd frames, run the deep network on the latest ROI:
282  if ((jevois::frameNum() & 1) == 0 || itsRw == 0)
283  {
284  // Run the saliency model, will update itsRx, itsRy, itsRw, and itsRh:
285  getSalROI(inimg);
286  }
287  else
288  {
289  // Extract a raw YUYV ROI around attended point:
290  cv::Mat rawimgcv = jevois::rawimage::cvImage(inimg);
291  cv::Mat rawroi = rawimgcv(cv::Rect(itsRx, itsRy, itsRw, itsRh));
292 
293  // Convert the ROI to RGB:
294  cv::Mat rgbroi;
295  cv::cvtColor(rawroi, rgbroi, cv::COLOR_YUV2RGB_YUYV);
296 
297  // Let camera know we are done processing the input image:
298  inframe.done();
299 
300  // Launch the predictions, will throw if network is not ready:
301  itsResults.clear();
302  try
303  {
304  // Get the network input dims:
305  int netinw, netinh, netinc; itsTensorFlow->getInDims(netinw, netinh, netinc);
306 
307  // Scale the ROI if needed:
308  cv::Mat scaledroi = jevois::rescaleCv(rgbroi, cv::Size(netinw, netinh));
309 
310  // In a thread, also scale the ROI to the desired output size, i.e., USB width - camera width:
311  auto scale_fut = jevois::async([&]() {
312  float fac = float(outimg.width - w) / float(rgbroi.cols);
313  cv::Size displaysize(outimg.width - w, int(rgbroi.rows * fac + 0.4999F));
314  cv::Mat displayroi = jevois::rescaleCv(rgbroi, displaysize);
315 
316  // Convert back the display ROI to YUYV:
318  });
319 
320  // Predict:
321  float const ptime = itsTensorFlow->predict(scaledroi, itsResults);
322 
323  // Wait for paste and scale to finish up:
324  paste_fut.get(); scale_fut.get();
325 
326  int const dispw = itsRawInputCv.cols, disph = itsRawInputCv.rows;
327  cv::Mat outimgcv = jevois::rawimage::cvImage(outimg);
328 
329  // Update our output image: First paste the image we have been making predictions on:
330  itsRawInputCv.copyTo(outimgcv(cv::Rect(w, 0, dispw, disph)));
331  jevois::rawimage::drawFilledRect(outimg, w, disph, dispw, h - disph, jevois::yuyv::Black);
332 
333  // Then draw the detections: either below the detection crop if there is room, or on top of it if not enough
334  // room below:
335  int y = disph + 3; if (y + itsTensorFlow->top::get() * 12 > h - 21) y = 3;
336 
337  for (auto const & p : itsResults)
338  {
339  jevois::rawimage::writeText(outimg, jevois::sformat("%s: %.2F", p.category.c_str(), p.score),
340  w + 3, y, jevois::yuyv::White);
341  y += 12;
342  }
343 
344  // Send serial results:
346 
347  // Draw some text messages:
348  jevois::rawimage::writeText(outimg, "Predict time: " + std::to_string(int(ptime)) + "ms",
349  w + 3, h - 11, jevois::yuyv::White);
350 
351  // Finally make a copy of these new results so we can display them again on the next frame while we compute
352  // saliency:
353  itsRawPrevOutputCv = cv::Mat(h, dispw, CV_8UC2);
354  outimgcv(cv::Rect(w, 0, dispw, h)).copyTo(itsRawPrevOutputCv);
355 
356  }
357  catch (std::logic_error const & e) { itsRawPrevOutputCv.release(); } // network still loading
358  }
359 
360  // Show processing fps:
361  std::string const & fpscpu = timer.stop();
362  jevois::rawimage::writeText(outimg, fpscpu, 3, h - 13, jevois::yuyv::White);
363 
364  // Show attended location:
365  jevois::rawimage::drawFilledRect(outimg, itsRx + itsRw/2 - 4, itsRy + itsRh/2 - 4, 8, 8, jevois::yuyv::LightPink);
366  jevois::rawimage::drawRect(outimg, itsRx, itsRy, itsRw, itsRh, 2, jevois::yuyv::LightPink);
367 
368  // Send the output image with our processing results to the host over USB:
369  outframe.send();
370  }
371 
372  // ####################################################################################################
373  protected:
374  std::shared_ptr<Saliency> itsSaliency;
375  std::shared_ptr<TensorFlow> itsTensorFlow;
376  std::vector<jevois::ObjReco> itsResults;
377  std::future<float> itsPredictFut;
378  cv::Mat itsRawInputCv;
379  cv::Mat itsCvImg;
381  int itsRx, itsRy, itsRw, itsRh; // last computed saliency ROI
382  };
383 
384 // Allow the module to be loaded as a shared object (.so) file:
TensorFlowSaliency::itsRawInputCv
cv::Mat itsRawInputCv
Definition: TensorFlowSaliency.C:378
TensorFlowSaliency::itsRawPrevOutputCv
cv::Mat itsRawPrevOutputCv
Definition: TensorFlowSaliency.C:380
jevois::OutputFrame
jevois::async
std::future< std::invoke_result_t< std::decay_t< Function >, std::decay_t< Args >... > > async(Function &&f, Args &&... args)
jevois::rawimage::convertCvRGBtoCvYUYV
void convertCvRGBtoCvYUYV(cv::Mat const &src, cv::Mat &dst)
demo.int
int
Definition: demo.py:37
TensorFlowSaliency::itsTensorFlow
std::shared_ptr< TensorFlow > itsTensorFlow
Definition: TensorFlowSaliency.C:375
Timer.H
TensorFlowSaliency::itsCvImg
cv::Mat itsCvImg
Definition: TensorFlowSaliency.C:379
Module.H
jevois::rescaleCv
cv::Mat rescaleCv(cv::Mat const &img, cv::Size const &newdims)
jevois::sformat
std::string sformat(char const *fmt,...) __attribute__((format(__printf__
TensorFlowSaliency::~TensorFlowSaliency
virtual ~TensorFlowSaliency()
Virtual destructor for safe inheritance.
Definition: TensorFlowSaliency.C:156
jevois::RawImage
jevois::Timer::start
void start()
jevois::ParameterCategory
jevois::RawImage::require
void require(char const *info, unsigned int w, unsigned int h, unsigned int f) const
JEVOIS_REGISTER_MODULE
JEVOIS_REGISTER_MODULE(TensorFlowSaliency)
jevois::RawImage::width
unsigned int width
jevois::rawimage::writeText
void writeText(RawImage &img, std::string const &txt, int x, int y, unsigned int col, Font font=Font6x10)
jevois
TensorFlowSaliency::itsRh
int itsRh
Definition: TensorFlowSaliency.C:381
TensorFlowSaliency::postUninit
virtual void postUninit() override
Un-initialization.
Definition: TensorFlowSaliency.C:162
TensorFlowSaliency
Detect salient objects and identify them using TensorFlow deep neural network.
Definition: TensorFlowSaliency.C:139
jevois::Timer::stop
const std::string & stop(double *seconds)
TensorFlowSaliency::process
virtual void process(jevois::InputFrame &&inframe, jevois::OutputFrame &&outframe) override
Processing function with video output to USB.
Definition: TensorFlowSaliency.C:246
jevois::StdModule::sendSerialObjDetImg2D
void sendSerialObjDetImg2D(unsigned int camw, unsigned int camh, float x, float y, float w, float h, std::vector< ObjReco > const &res)
jevois::rawimage::drawFilledRect
void drawFilledRect(RawImage &img, int x, int y, unsigned int w, unsigned int h, unsigned int col)
TensorFlowSaliency::itsResults
std::vector< jevois::ObjReco > itsResults
Definition: TensorFlowSaliency.C:376
TensorFlowSaliency::itsRw
int itsRw
Definition: TensorFlowSaliency.C:381
jevois::StdModule::StdModule
StdModule(std::string const &instance)
TensorFlowSaliency::getSalROI
virtual void getSalROI(jevois::RawImage const &inimg)
Helper function: compute saliency ROI in a thread, return top-left corner and size.
Definition: TensorFlowSaliency.C:170
LFATAL
#define LFATAL(msg)
RawImageOps.H
jevois::RawImage::height
unsigned int height
demo.float
float
Definition: demo.py:39
to_string
std::string to_string(T const &val)
jevois::InputFrame
jevois::rawimage::cvImage
cv::Mat cvImage(RawImage const &src)
jevois::rawimage::paste
void paste(RawImage const &src, RawImage &dest, int dx, int dy)
jevois::rawimage::drawRect
void drawRect(RawImage &img, int x, int y, unsigned int w, unsigned int h, unsigned int thick, unsigned int col)
TensorFlowSaliency::process
virtual void process(jevois::InputFrame &&inframe) override
Processing function, no video output.
Definition: TensorFlowSaliency.C:204
TensorFlowSaliency::itsRy
int itsRy
Definition: TensorFlowSaliency.C:381
h
int h
TensorFlowSaliency::itsSaliency
std::shared_ptr< Saliency > itsSaliency
Definition: TensorFlowSaliency.C:374
jevois::StdModule
Saliency.H
TensorFlowSaliency::itsPredictFut
std::future< float > itsPredictFut
Definition: TensorFlowSaliency.C:377
ARtoolkit::JEVOIS_DECLARE_PARAMETER
JEVOIS_DECLARE_PARAMETER(camparams, std::string, "File stem of camera parameters, or empty. Camera resolution " "will be appended, as well as a .dat extension. For example, specifying 'camera_para' " "here and running the camera sensor at 320x240 will attempt to load " "camera_para320x240.dat from within the module's directory (if relative stem) or " "from the specified absolute location (if absolute stem).", JEVOIS_SHARE_PATH "/camera/camera_para", ParamCateg)
Parameter.
LINFO
#define LINFO(msg)
demo.w
w
Definition: demo.py:85
intg32
ENV_INTG32_TYPE intg32
32-bit signed integer
Definition: env_types.h:52
TensorFlow.H
TensorFlowSaliency::TensorFlowSaliency
TensorFlowSaliency(std::string const &instance)
Constructor.
Definition: TensorFlowSaliency.C:146
jevois::Timer
TensorFlowSaliency::itsRx
int itsRx
Definition: TensorFlowSaliency.C:381