JeVoisBase  1.6
JeVois Smart Embedded Machine Vision Toolkit Base Modules
Share this page:
DarknetSaliency.C
Go to the documentation of this file.
1 // ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////
2 //
3 // JeVois Smart Embedded Machine Vision Toolkit - Copyright (C) 2016 by Laurent Itti, the University of Southern
4 // California (USC), and iLab at USC. See http://iLab.usc.edu and http://jevois.org for information about this project.
5 //
6 // This file is part of the JeVois Smart Embedded Machine Vision Toolkit. This program is free software; you can
7 // redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software
8 // Foundation, version 2. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;
9 // without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public
10 // License for more details. You should have received a copy of the GNU General Public License along with this program;
11 // if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
12 //
13 // Contact information: Laurent Itti - 3641 Watt Way, HNB-07A - Los Angeles, CA 90089-2520 - USA.
14 // Tel: +1 213 740 3527 - itti@pollux.usc.edu - http://iLab.usc.edu - http://jevois.org
15 // ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////
16 /*! \file */
17 
18 #include <jevois/Core/Module.H>
19 #include <jevois/Debug/Timer.H>
21 #include <opencv2/core/core.hpp>
22 #include <opencv2/imgproc/imgproc.hpp>
25 
26 // icon from https://pjreddie.com/darknet/
27 
28 static jevois::ParameterCategory const ParamCateg("Darknet Saliency Options");
29 
30 //! Parameter \relates DarknetSaliency
31 JEVOIS_DECLARE_PARAMETER(foa, cv::Size, "Width and height (in pixels) of the focus of attention. "
32  "This is the size of the image crop that is taken around the most salient "
33  "location in each frame. The foa size must fit within the camera input frame size.",
34  cv::Size(128, 128), ParamCateg);
35 
36 //! Parameter \relates DarknetSaliency
37 JEVOIS_DECLARE_PARAMETER(netin, cv::Size, "Width and height (in pixels) of the neural network input "
38  "layer. This is the size to which the image crop taken around the most salient "
39  "location in each frame will be rescaled before feeding to the neural network.",
40  cv::Size(128, 128), ParamCateg);
41 
42 
43 //! Detect salient objects and identify them using Darknet deep neural network
44 /*! Darknet is a popular neural network framework. This module first finds the most conspicuous (salient) object in the
45  scene, then identifies it using a deep neural network. It returns the top scoring candidates.
46 
47  See http://ilab.usc.edu/bu/ for more information about saliency detection, and https://pjreddie.com/darknet for more
48  information about the Darknet deep neural network framework.
49 
50  This module runs a Darknet network on an image window around the most salient point and shows the top-scoring
51  results. The network is currently a bit slow, hence it is only run once in a while. Point your camera towards some
52  interesting object, and wait for Darknet to tell you what it found. The framerate figures shown at the bottom left
53  of the display reflect the speed at which each new video frame from the camera is processed, but in this module this
54  just amounts to computing the saliency map from the camera input, converting the input image to RGB, cropping it
55  around the most salient location, sending it to the neural network for processing in a separate thread, and creating
56  the demo display. Actual network inference speed (time taken to compute the predictions on one image crop) is shown
57  at the bottom right. See below for how to trade-off speed and accuracy.
58 
59  Note that by default this module runs the Imagenet1k tiny Darknet (it can also run the slightly slower but a bit
60  more accurate Darknet Reference network; see parameters). There are 1000 different kinds of objects (object classes)
61  that this network can recognize (too long to list here).
62 
63  Sometimes it will make mistakes! The performance of darknet-tiny is about 58.7% correct (mean average precision) on
64  the test set, and Darknet Reference is about 61.1% correct on the test set. This is when running these networks at
65  224x224 network input resolution (see parameter \p netin below).
66 
67  \youtube{77VRwFtIe8I}
68 
69  Neural network size and speed
70  -----------------------------
71 
72  When using networks that are fully convolutional (as is the case for the default networks provided with this
73  module), one can resize the network to any desired input size. The network size direcly affects both speed and
74  accuracy. Larger networks run slower but are more accurate.
75 
76  This module provides two parameters that allow you to adjust this tradeoff:
77  - \p foa determines the size of a region of interest that is cropped around the most salient location
78  - \p netin determines the size to which that region of interest is rescaled and fed to the neural network
79 
80  For example:
81 
82  - with netin = (224 224), this module runs at about 450ms/prediction.
83  - with netin = (128 128), this module runs at about 180ms/prediction.
84 
85  Finally note that, when using video mappings with USB output, irrespective of \p foa and \p netin, the crop around
86  the most salient image region (with size given by \p foa) will always also be rescaled so that, when placed to the
87  right of the input image, it fills the desired USB output dims. For example, if camera mode is 320x240 and USB
88  output size is 544x240, then the attended and recognized object will be rescaled to 224x224 (since 224 = 544-320)
89  for display purposes only. This is so that one does not need to change USB video resolution while playing with
90  different values of \p foa and \p netin live.
91 
92  Serial messages
93  ---------------
94 
95  - On every frame where detection results were obtained, this module sends a message
96  \verbatim
97  DKS framenum
98  T2 x y
99  \endverbatim
100  where \a framenum is the frame number (starts at 0). The T2 message is a standardized message about the location
101  and size of the salient region of interest in which the object was found. The message can be customized, see \ref
102  UserSerialStyle.
103  - In addition, when detections are found which are above threshold, up to \p top messages will be sent, for those
104  category candidates that have scored above \p thresh:
105  \verbatim
106  DKR category score
107  \endverbatim
108  where \a category is the category name (from \p namefile) and \a score is the confidence score from 0.0 to 100.0
109 
110  @author Laurent Itti
111 
112  @displayname Darknet Saliency
113  @videomapping NONE 0 0 0.0 YUYV 320 240 5.0 JeVois DarknetSaliency
114  @videomapping YUYV 460 240 15.0 YUYV 320 240 15.0 JeVois DarknetSaliency
115  @videomapping YUYV 560 240 15.0 YUYV 320 240 15.0 JeVois DarknetSaliency
116  @videomapping YUYV 880 480 15.0 YUYV 640 480 15.0 JeVois DarknetSaliency # set foa param to 256 256
117  @email itti\@usc.edu
118  @address University of Southern California, HNB-07A, 3641 Watt Way, Los Angeles, CA 90089-2520, USA
119  @copyright Copyright (C) 2017 by Laurent Itti, iLab and the University of Southern California
120  @mainurl http://jevois.org
121  @supporturl http://jevois.org/doc
122  @otherurl http://iLab.usc.edu
123  @license GPL v3
124  @distribution Unrestricted
125  @restrictions None
126  \ingroup modules */
128  public jevois::Parameter<foa, netin>
129 {
130  public:
131  // ####################################################################################################
132  //! Constructor
133  // ####################################################################################################
134  DarknetSaliency(std::string const & instance) : jevois::StdModule(instance), itsFrame(0)
135  {
136  itsSaliency = addSubComponent<Saliency>("saliency");
137  itsDarknet = addSubComponent<Darknet>("darknet");
138  }
139 
140  // ####################################################################################################
141  //! Virtual destructor for safe inheritance
142  // ####################################################################################################
144  { }
145 
146  // ####################################################################################################
147  //! Un-initialization
148  // ####################################################################################################
149  virtual void postUninit() override
150  {
151  try { itsPredictFut.get(); } catch (...) { }
152  }
153 
154  // ####################################################################################################
155  //! Send serial messages
156  // ####################################################################################################
157  void sendAllSerial(int inw, int inh, int salx, int saly, int roiw, int roih)
158  {
159  // Send frame marker:
161 
162  // Send saliency info to serial port (for arduino, etc):
163  sendSerialImg2D(inw, inh, salx, saly, roiw, roih, "sm");
164 
165  // Send all detections:
166  for (auto const & r : itsResults) sendSerial("DKR " + r.second + ' ' + jevois::sformat("%.1f", r.first));
167  }
168 
169  // ####################################################################################################
170  //! Helper function: compute saliency ROI in a thread, return top-left corner and size
171  // ####################################################################################################
172  virtual void getSalROI(jevois::RawImage const & inimg, int & rx, int & ry, int & rw, int & rh)
173  {
174  int const w = inimg.width, h = inimg.height;
175 
176  // Check whether the input image size is small, in which case we will scale the maps up one notch:
177  if (w < 170) { itsSaliency->centermin::set(1); itsSaliency->smscale::set(3); }
178  else { itsSaliency->centermin::set(2); itsSaliency->smscale::set(4); }
179 
180  // Find the most salient location, no gist for now:
181  itsSaliency->process(inimg, false);
182 
183  // Get some info from the saliency computation:
184  int const smlev = itsSaliency->smscale::get();
185  int const smfac = (1 << smlev);
186 
187  // Find most salient point:
188  int mx, my; intg32 msal; itsSaliency->getSaliencyMax(mx, my, msal);
189 
190  // Compute attended ROI (note: coords must be even to avoid flipping U/V when we later paste):
191  cv::Size roisiz = foa::get(); rw = roisiz.width; rh = roisiz.height;
192  rw = std::min(rw, w); rh = std::min(rh, h); rw &= ~1; rh &= ~1;
193  unsigned int const dmx = (mx << smlev) + (smfac >> 2);
194  unsigned int const dmy = (my << smlev) + (smfac >> 2);
195  rx = int(dmx + 1 + smfac / 4) - rw / 2;
196  ry = int(dmy + 1 + smfac / 4) - rh / 2;
197  rx = std::max(0, std::min(rx, w - rw));
198  ry = std::max(0, std::min(ry, h - rh));
199  rx &= ~1; ry &= ~1;
200  }
201 
202  // ####################################################################################################
203  //! Processing function, no video output
204  // ####################################################################################################
205  virtual void process(jevois::InputFrame && inframe) override
206  {
207  // Wait for next available camera image:
208  jevois::RawImage const inimg = inframe.get();
209  unsigned int const w = inimg.width, h = inimg.height;
210 
211  if (itsDarknet->ready())
212  {
213  // Check input vs network dims:
214  int netw, neth, netc;
215  itsDarknet->getInDims(netw, neth, netc);
216  if (netw > w) netw = w;
217  if (neth > h) neth = h;
218 
219  // Find the most salient location, no gist for now:
220  int rx, ry, rw, rh;
221  getSalROI(inimg, rx, ry, rw, rh);
222 
223  // Extract a raw YUYV ROI around attended point:
224  cv::Mat rawimgcv = jevois::rawimage::cvImage(inimg);
225  cv::Mat rawroi = rawimgcv(cv::Rect(rx, ry, rw, rh));
226 
227  // Convert the ROI to RGB:
228  cv::Mat rgbroi;
229  cv::cvtColor(rawroi, rgbroi, CV_YUV2RGB_YUYV);
230 
231  // Let camera know we are done processing the input image:
232  inframe.done();
233 
234  // Scale the ROI if needed:
235  cv::Size scaledsize = netin::get();
236  cv::Mat scaledroi;
237  if (scaledsize.width == rw && scaledsize.height == rh)
238  scaledroi = rgbroi;
239  else if (scaledsize.width > rw || scaledsize.height > rh)
240  cv::resize(rgbroi, scaledroi, scaledsize, 0, 0, cv::INTER_LINEAR);
241  else
242  cv::resize(rgbroi, scaledroi, scaledsize, 0, 0, cv::INTER_AREA);
243 
244  // Launch the predictions (do not catch exceptions, we already tested for network ready in this block):
245  float const ptime = itsDarknet->predict(scaledroi, itsResults);
246  LINFO("Predicted in " << ptime << "ms");
247 
248  // Send serial results and switch to next frame:
249  sendAllSerial(w, h, rx + rw/2, ry + rh/2, rw, rh);
250 
251  ++itsFrame;
252  }
253  else inframe.done();
254  }
255 
256  // ####################################################################################################
257  //! Processing function with video output to USB
258  // ####################################################################################################
259  virtual void process(jevois::InputFrame && inframe, jevois::OutputFrame && outframe) override
260  {
261  static jevois::Timer timer("processing", 30, LOG_DEBUG);
262 
263  // Make sure the network is ready:
264  bool const netready = itsDarknet->ready();
265  if (netready == false) itsRawPrevOutputCv.release();
266 
267  // Wait for next available camera image:
268  jevois::RawImage const inimg = inframe.get();
269 
270  timer.start();
271 
272  // We only handle one specific pixel format, but any image size in this module:
273  unsigned int const w = inimg.width, h = inimg.height;
274  inimg.require("input", w, h, V4L2_PIX_FMT_YUYV);
275 
276  // Launch the saliency computation in a thread:
277  int rx, ry, rw, rh;
278  auto sal_fut = std::async(std::launch::async, [&](){ this->getSalROI(inimg, rx, ry, rw, rh); });
279 
280  // While we process it, start a thread to wait for out frame and paste the input into it:
281  jevois::RawImage outimg;
282  auto paste_fut = std::async(std::launch::async, [&]() {
283  outimg = outframe.get();
284  outimg.require("output", outimg.width, outimg.height, V4L2_PIX_FMT_YUYV);
285 
286  // Paste the current input image:
287  jevois::rawimage::paste(inimg, outimg, 0, 0);
288  jevois::rawimage::writeText(outimg, "JeVois Darknet Saliency", 3, 3, jevois::yuyv::White);
289 
290  // Paste the latest prediction results, if any, otherwise a wait message:
291  cv::Mat outimgcv = jevois::rawimage::cvImage(outimg);
292  if (itsRawPrevOutputCv.empty() == false)
293  itsRawPrevOutputCv.copyTo(outimgcv(cv::Rect(w, 0, itsRawPrevOutputCv.cols, itsRawPrevOutputCv.rows)));
294  else
295  {
296  jevois::rawimage::drawFilledRect(outimg, w, 0, outimg.width - w, h, jevois::yuyv::Black);
297  jevois::rawimage::writeText(outimg, "Loading network -", w + 3, 3, jevois::yuyv::White);
298  jevois::rawimage::writeText(outimg, "please wait...", w + 3, 15, jevois::yuyv::White);
299  }
300  });
301 
302  // Decide on what to do based on itsPredictFut: if it is valid, we are still predicting, so check whether we are
303  // done and if so draw the results. Otherwise, start predicting using the current input frame:
304  if (itsPredictFut.valid())
305  {
306  // Are we finished predicting?
307  if (itsPredictFut.wait_for(std::chrono::milliseconds(5)) == std::future_status::ready)
308  {
309  // Do a get() on our future to free up the async thread and get any exception it might have thrown. In
310  // particular, it will throw a logic_error if we are still loading the network:
311  bool success = true; float ptime = 0.0F;
312  try { ptime = itsPredictFut.get(); } catch (std::logic_error const & e) { success = false; }
313 
314  // Wait for paste to finish up and let camera know we are done processing the input image:
315  paste_fut.get(); inframe.done();
316 
317  if (success)
318  {
319  int const dispw = itsRawInputCv.cols, disph = itsRawInputCv.rows;
320  cv::Mat outimgcv = jevois::rawimage::cvImage(outimg);
321 
322  // Update our output image: First paste the image we have been making predictions on:
323  if (itsRawPrevOutputCv.empty()) itsRawPrevOutputCv = cv::Mat(h, dispw, CV_8UC2);
324  itsRawInputCv.copyTo(outimgcv(cv::Rect(w, 0, dispw, disph)));
325  jevois::rawimage::drawFilledRect(outimg, w, disph, dispw, h - disph, jevois::yuyv::Black);
326 
327  // Then draw the detections: either below the detection crop if there is room, or on top of it if not enough
328  // room below:
329  int y = disph + 13; if (disph + itsResults.size() * 12 > h - 10) y = 3;
330 
331  for (auto const & p : itsResults)
332  {
333  jevois::rawimage::writeText(outimg, jevois::sformat("%s: %.2F", p.second.c_str(), p.first),
334  w + 3, y, jevois::yuyv::White);
335  y += 12;
336  }
337 
338  // Send serial results:
339  sal_fut.get();
340  sendAllSerial(w, h, rx + rw/2, ry + rh/2, rw, rh);
341 
342  // Draw some text messages:
343  jevois::rawimage::writeText(outimg, "Predict time: " + std::to_string(int(ptime)) + "ms",
344  w + 3, h - 13, jevois::yuyv::White);
345 
346  // Finally make a copy of these new results so we can display them again while we wait for the next round:
347  outimgcv(cv::Rect(w, 0, dispw, h)).copyTo(itsRawPrevOutputCv);
348 
349  // Switch to next frame:
350  ++itsFrame;
351  }
352  }
353  else
354  {
355  // Future is not ready, do nothing except drawings on this frame (done in paste_fut thread) and we will try
356  // again on the next one...
357  paste_fut.get(); sal_fut.get(); inframe.done();
358  }
359  }
360  else if (netready) // We are not predicting but network is ready: start new predictions
361  {
362  // Wait for paste to finish up. Also wait for saliency to finish up so that rx, ry, rw, rh are available:
363  paste_fut.get(); sal_fut.get();
364 
365  // Extract a raw YUYV ROI around attended point:
366  cv::Mat rawimgcv = jevois::rawimage::cvImage(inimg);
367  cv::Mat rawroi = rawimgcv(cv::Rect(rx, ry, rw, rh));
368 
369  // Convert the ROI to RGB:
370  cv::Mat rgbroi;
371  cv::cvtColor(rawroi, rgbroi, CV_YUV2RGB_YUYV);
372 
373  // Let camera know we are done processing the input image:
374  inframe.done();
375 
376  // Scale the ROI if needed to the desired network input dims:
377  cv::Size scaledsize = netin::get();
378  if (scaledsize.width == rw && scaledsize.height == rh)
379  itsCvImg = rgbroi;
380  else if (scaledsize.width > rw || scaledsize.height > rh)
381  cv::resize(rgbroi, itsCvImg, scaledsize, 0, 0, cv::INTER_LINEAR);
382  else
383  cv::resize(rgbroi, itsCvImg, scaledsize, 0, 0, cv::INTER_AREA);
384 
385  // Also scale the ROI to the desired output size, i.e., USB width - camera width:
386  float fac = float(outimg.width - w) / float(rgbroi.cols);
387  cv::Size displaysize(outimg.width - w, int(rgbroi.rows * fac + 0.4999F));
388  cv::Mat displayroi;
389  if (displaysize.width == rw && displaysize.height == rh)
390  displayroi = rgbroi;
391  else if (displaysize.width > rw || displaysize.height > rh)
392  cv::resize(rgbroi, displayroi, displaysize, 0, 0, cv::INTER_LINEAR);
393  else
394  cv::resize(rgbroi, displayroi, displaysize, 0, 0, cv::INTER_AREA);
395 
396  // Convert back the display ROI to YUYV and store for later display, while we are still computing the network
397  // predictions on that ROI:
399 
400  // Launch the predictions:
401  itsPredictFut = std::async(std::launch::async, [&]() { return itsDarknet->predict(itsCvImg, itsResults); });
402  }
403  else // We are not predicting and network is not ready - do nothing except drawings of paste_fut:
404  {
405  paste_fut.get(); sal_fut.get(); inframe.done();
406  }
407 
408  // Show processing fps:
409  std::string const & fpscpu = timer.stop();
410  jevois::rawimage::writeText(outimg, fpscpu, 3, h - 13, jevois::yuyv::White);
411 
412  // Show attended location:
413  jevois::rawimage::drawFilledRect(outimg, rx + rw/2 - 4, ry + rh/2 - 4, 8, 8, jevois::yuyv::LightPink);
414  jevois::rawimage::drawRect(outimg, rx, ry, rw, rh, 2, jevois::yuyv::LightPink);
415 
416  // Send the output image with our processing results to the host over USB:
417  outframe.send();
418  }
419 
420  // ####################################################################################################
421  protected:
422  std::shared_ptr<Saliency> itsSaliency;
423  std::shared_ptr<Darknet> itsDarknet;
424  std::vector<Darknet::predresult> itsResults;
425  std::future<float> itsPredictFut;
426  cv::Mat itsRawInputCv;
427  cv::Mat itsCvImg;
429  unsigned long itsFrame;
430  };
431 
432 // Allow the module to be loaded as a shared object (.so) file:
cv::Mat cvImage(RawImage const &src)
std::shared_ptr< Saliency > itsSaliency
void writeText(RawImage &img, std::string const &txt, int x, int y, unsigned int col, Font font=Font6x10)
unsigned int height
std::shared_ptr< Darknet > itsDarknet
DarknetSaliency(std::string const &instance)
Constructor.
cv::Mat itsRawPrevOutputCv
std::string sformat(char const *fmt,...) __attribute__((format(__printf__
#define success()
virtual void process(jevois::InputFrame &&inframe) override
Processing function, no video output.
StdModule(std::string const &instance)
unsigned long itsFrame
void sendSerialImg2D(unsigned int camw, unsigned int camh, float x, float y, float w=0.0F, float h=0.0F, std::string const &id="", std::string const &extra="")
virtual void process(jevois::InputFrame &&inframe, jevois::OutputFrame &&outframe) override
Processing function with video output to USB.
std::string const & stop()
virtual void sendSerial(std::string const &str)
JEVOIS_DECLARE_PARAMETER(camparams, std::string, "File stem of camera parameters, or empty. Camera resolution " "will be appended, as well as a .cfg extension. For example, specifying 'camera_para' " "here and running the camera sensor at 320x240 will attempt to load " "camera_para320x240.dat from within the module's directory.", "camera_para", ParamCateg)
Parameter.
void sendAllSerial(int inw, int inh, int salx, int saly, int roiw, int roih)
Send serial messages.
ENV_INTG32_TYPE intg32
32-bit signed integer
Definition: env_types.h:52
void convertCvRGBtoCvYUYV(cv::Mat const &src, cv::Mat &dst)
virtual void postUninit() override
Un-initialization.
void drawFilledRect(RawImage &img, int x, int y, unsigned int w, unsigned int h, unsigned int col)
std::future< float > itsPredictFut
std::string to_string(T const &val)
std::vector< Darknet::predresult > itsResults
JEVOIS_REGISTER_MODULE(DarknetSaliency)
virtual void getSalROI(jevois::RawImage const &inimg, int &rx, int &ry, int &rw, int &rh)
Helper function: compute saliency ROI in a thread, return top-left corner and size.
virtual ~DarknetSaliency()
Virtual destructor for safe inheritance.
Detect salient objects and identify them using Darknet deep neural network.
#define LINFO(msg)
void drawRect(RawImage &img, int x, int y, unsigned int w, unsigned int h, unsigned int thick, unsigned int col)
unsigned int width
void paste(RawImage const &src, RawImage &dest, int dx, int dy)
void require(char const *info, unsigned int w, unsigned int h, unsigned int f) const