JeVoisBase  1.8
JeVois Smart Embedded Machine Vision Toolkit Base Modules
Share this page:
DarknetSaliency.C
Go to the documentation of this file.
1 // ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////
2 //
3 // JeVois Smart Embedded Machine Vision Toolkit - Copyright (C) 2016 by Laurent Itti, the University of Southern
4 // California (USC), and iLab at USC. See http://iLab.usc.edu and http://jevois.org for information about this project.
5 //
6 // This file is part of the JeVois Smart Embedded Machine Vision Toolkit. This program is free software; you can
7 // redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software
8 // Foundation, version 2. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;
9 // without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public
10 // License for more details. You should have received a copy of the GNU General Public License along with this program;
11 // if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
12 //
13 // Contact information: Laurent Itti - 3641 Watt Way, HNB-07A - Los Angeles, CA 90089-2520 - USA.
14 // Tel: +1 213 740 3527 - itti@pollux.usc.edu - http://iLab.usc.edu - http://jevois.org
15 // ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////
16 /*! \file */
17 
18 #include <jevois/Core/Module.H>
19 #include <jevois/Debug/Timer.H>
21 #include <opencv2/core/core.hpp>
22 #include <opencv2/imgproc/imgproc.hpp>
25 
26 // icon from https://pjreddie.com/darknet/
27 
28 static jevois::ParameterCategory const ParamCateg("Darknet Saliency Options");
29 
30 //! Parameter \relates DarknetSaliency
31 JEVOIS_DECLARE_PARAMETER(foa, cv::Size, "Width and height (in pixels) of the focus of attention. "
32  "This is the size of the image crop that is taken around the most salient "
33  "location in each frame. The foa size must fit within the camera input frame size.",
34  cv::Size(128, 128), ParamCateg);
35 
36 //! Parameter \relates DarknetSaliency
37 JEVOIS_DECLARE_PARAMETER(netin, cv::Size, "Width and height (in pixels) of the neural network input "
38  "layer. This is the size to which the image crop taken around the most salient "
39  "location in each frame will be rescaled before feeding to the neural network.",
40  cv::Size(128, 128), ParamCateg);
41 
42 
43 //! Detect salient objects and identify them using Darknet deep neural network
44 /*! Darknet is a popular neural network framework. This module first finds the most conspicuous (salient) object in the
45  scene, then identifies it using a deep neural network. It returns the top scoring candidates.
46 
47  See http://ilab.usc.edu/bu/ for more information about saliency detection, and https://pjreddie.com/darknet for more
48  information about the Darknet deep neural network framework.
49 
50  This module runs a Darknet network on an image window around the most salient point and shows the top-scoring
51  results. The network is currently a bit slow, hence it is only run once in a while. Point your camera towards some
52  interesting object, and wait for Darknet to tell you what it found. The framerate figures shown at the bottom left
53  of the display reflect the speed at which each new video frame from the camera is processed, but in this module this
54  just amounts to computing the saliency map from the camera input, converting the input image to RGB, cropping it
55  around the most salient location, sending it to the neural network for processing in a separate thread, and creating
56  the demo display. Actual network inference speed (time taken to compute the predictions on one image crop) is shown
57  at the bottom right. See below for how to trade-off speed and accuracy.
58 
59  Note that by default this module runs the Imagenet1k tiny Darknet (it can also run the slightly slower but a bit
60  more accurate Darknet Reference network; see parameters). There are 1000 different kinds of objects (object classes)
61  that this network can recognize (too long to list here).
62 
63  Sometimes it will make mistakes! The performance of darknet-tiny is about 58.7% correct (mean average precision) on
64  the test set, and Darknet Reference is about 61.1% correct on the test set. This is when running these networks at
65  224x224 network input resolution (see parameter \p netin below).
66 
67  \youtube{77VRwFtIe8I}
68 
69  Neural network size and speed
70  -----------------------------
71 
72  When using networks that are fully convolutional (as is the case for the default networks provided with this
73  module), one can resize the network to any desired input size. The network size direcly affects both speed and
74  accuracy. Larger networks run slower but are more accurate.
75 
76  This module provides two parameters that allow you to adjust this tradeoff:
77  - \p foa determines the size of a region of interest that is cropped around the most salient location
78  - \p netin determines the size to which that region of interest is rescaled and fed to the neural network
79 
80  For example:
81 
82  - with netin = (224 224), this module runs at about 450ms/prediction.
83  - with netin = (128 128), this module runs at about 180ms/prediction.
84 
85  Finally note that, when using video mappings with USB output, irrespective of \p foa and \p netin, the crop around
86  the most salient image region (with size given by \p foa) will always also be rescaled so that, when placed to the
87  right of the input image, it fills the desired USB output dims. For example, if camera mode is 320x240 and USB
88  output size is 544x240, then the attended and recognized object will be rescaled to 224x224 (since 224 = 544-320)
89  for display purposes only. This is so that one does not need to change USB video resolution while playing with
90  different values of \p foa and \p netin live.
91 
92  Serial messages
93  ---------------
94 
95  - On every frame where detection results were obtained, this module sends a message
96  \verbatim
97  DKS framenum
98  T2 x y
99  \endverbatim
100  where \a framenum is the frame number (starts at 0). The T2 message is a standardized message about the location
101  and size of the salient region of interest in which the object was found. The message can be customized, see \ref
102  UserSerialStyle.
103  - In addition, when detections are found which are above threshold, up to \p top messages will be sent, for those
104  category candidates that have scored above \p thresh:
105  \verbatim
106  DKR category score
107  \endverbatim
108  where \a category is the category name (from \p namefile) and \a score is the confidence score from 0.0 to 100.0
109 
110  @author Laurent Itti
111 
112  @displayname Darknet Saliency
113  @videomapping NONE 0 0 0.0 YUYV 320 240 5.0 JeVois DarknetSaliency
114  @videomapping YUYV 460 240 15.0 YUYV 320 240 15.0 JeVois DarknetSaliency # not for mac (width not multiple of 16)
115  @videomapping YUYV 560 240 15.0 YUYV 320 240 15.0 JeVois DarknetSaliency
116  @videomapping YUYV 880 480 15.0 YUYV 640 480 15.0 JeVois DarknetSaliency # set foa param to 256 256
117  @email itti\@usc.edu
118  @address University of Southern California, HNB-07A, 3641 Watt Way, Los Angeles, CA 90089-2520, USA
119  @copyright Copyright (C) 2017 by Laurent Itti, iLab and the University of Southern California
120  @mainurl http://jevois.org
121  @supporturl http://jevois.org/doc
122  @otherurl http://iLab.usc.edu
123  @license GPL v3
124  @distribution Unrestricted
125  @restrictions None
126  \ingroup modules */
128  public jevois::Parameter<foa, netin>
129 {
130  public:
131  // ####################################################################################################
132  //! Constructor
133  // ####################################################################################################
134  DarknetSaliency(std::string const & instance) : jevois::StdModule(instance), itsFrame(0)
135  {
136  itsSaliency = addSubComponent<Saliency>("saliency");
137  itsDarknet = addSubComponent<Darknet>("darknet");
138  }
139 
140  // ####################################################################################################
141  //! Virtual destructor for safe inheritance
142  // ####################################################################################################
144  { }
145 
146  // ####################################################################################################
147  //! Un-initialization
148  // ####################################################################################################
149  virtual void postUninit() override
150  {
151  try { itsPredictFut.get(); } catch (...) { }
152  }
153 
154  // ####################################################################################################
155  //! Send serial messages
156  // ####################################################################################################
157  void sendAllSerial(int inw, int inh, int salx, int saly, int roiw, int roih)
158  {
159  // Send frame marker:
161 
162  // Send saliency info to serial port (for arduino, etc):
163  sendSerialImg2D(inw, inh, salx, saly, roiw, roih, "sm");
164 
165  // Send all detections:
166  for (auto const & r : itsResults) sendSerial("DKR " + r.second + ' ' + jevois::sformat("%.1f", r.first));
167  }
168 
169  // ####################################################################################################
170  //! Helper function: compute saliency ROI in a thread, return top-left corner and size
171  // ####################################################################################################
172  virtual void getSalROI(jevois::RawImage const & inimg, int & rx, int & ry, int & rw, int & rh)
173  {
174  int const w = inimg.width, h = inimg.height;
175 
176  // Check whether the input image size is small, in which case we will scale the maps up one notch:
177  if (w < 170) { itsSaliency->centermin::set(1); itsSaliency->smscale::set(3); }
178  else { itsSaliency->centermin::set(2); itsSaliency->smscale::set(4); }
179 
180  // Find the most salient location, no gist for now:
181  itsSaliency->process(inimg, false);
182 
183  // Get some info from the saliency computation:
184  int const smlev = itsSaliency->smscale::get();
185  int const smfac = (1 << smlev);
186 
187  // Find most salient point:
188  int mx, my; intg32 msal; itsSaliency->getSaliencyMax(mx, my, msal);
189 
190  // Compute attended ROI (note: coords must be even to avoid flipping U/V when we later paste):
191  cv::Size roisiz = foa::get(); rw = roisiz.width; rh = roisiz.height;
192  rw = std::min(rw, w); rh = std::min(rh, h); rw &= ~1; rh &= ~1;
193  unsigned int const dmx = (mx << smlev) + (smfac >> 2);
194  unsigned int const dmy = (my << smlev) + (smfac >> 2);
195  rx = int(dmx + 1 + smfac / 4) - rw / 2;
196  ry = int(dmy + 1 + smfac / 4) - rh / 2;
197  rx = std::max(0, std::min(rx, w - rw));
198  ry = std::max(0, std::min(ry, h - rh));
199  rx &= ~1; ry &= ~1;
200  }
201 
202  // ####################################################################################################
203  //! Processing function, no video output
204  // ####################################################################################################
205  virtual void process(jevois::InputFrame && inframe) override
206  {
207  // Wait for next available camera image:
208  jevois::RawImage const inimg = inframe.get();
209  unsigned int const w = inimg.width, h = inimg.height;
210 
211  // Find the most salient location, no gist for now:
212  int rx, ry, rw, rh;
213  getSalROI(inimg, rx, ry, rw, rh);
214 
215  // Extract a raw YUYV ROI around attended point:
216  cv::Mat rawimgcv = jevois::rawimage::cvImage(inimg);
217  cv::Mat rawroi = rawimgcv(cv::Rect(rx, ry, rw, rh));
218 
219  // Convert the ROI to RGB:
220  cv::Mat rgbroi;
221  cv::cvtColor(rawroi, rgbroi, CV_YUV2RGB_YUYV);
222 
223  // Let camera know we are done processing the input image:
224  inframe.done();
225 
226  // Launch the predictions, will throw if network is not ready:
227  itsResults.clear();
228  try
229  {
230  int netinw, netinh, netinc; itsDarknet->getInDims(netinw, netinh, netinc);
231 
232  // Scale the ROI if needed:
233  cv::Mat scaledroi = jevois::rescaleCv(rgbroi, cv::Size(netinw, netinh));
234 
235  // Predict:
236  float const ptime = itsDarknet->predict(scaledroi, itsResults);
237  LINFO("Predicted in " << ptime << "ms");
238 
239  // Send serial results and switch to next frame:
240  sendAllSerial(w, h, rx + rw/2, ry + rh/2, rw, rh);
241  }
242  catch (std::logic_error const & e) { } // network still loading
243 
244  ++itsFrame;
245  }
246 
247  // ####################################################################################################
248  //! Processing function with video output to USB
249  // ####################################################################################################
250  virtual void process(jevois::InputFrame && inframe, jevois::OutputFrame && outframe) override
251  {
252  static jevois::Timer timer("processing", 30, LOG_DEBUG);
253 
254  // Wait for next available camera image:
255  jevois::RawImage const inimg = inframe.get();
256 
257  timer.start();
258 
259  // We only handle one specific pixel format, but any image size in this module:
260  unsigned int const w = inimg.width, h = inimg.height;
261  inimg.require("input", w, h, V4L2_PIX_FMT_YUYV);
262 
263  // Launch the saliency computation in a thread:
264  int rx, ry, rw, rh;
265  auto sal_fut = std::async(std::launch::async, [&](){ this->getSalROI(inimg, rx, ry, rw, rh); });
266 
267  // While we process it, start a thread to wait for out frame and paste the input into it:
268  jevois::RawImage outimg;
269  auto paste_fut = std::async(std::launch::async, [&]() {
270  outimg = outframe.get();
271  outimg.require("output", outimg.width, outimg.height, V4L2_PIX_FMT_YUYV);
272 
273  // Paste the current input image:
274  jevois::rawimage::paste(inimg, outimg, 0, 0);
275  jevois::rawimage::writeText(outimg, "JeVois Darknet Saliency", 3, 3, jevois::yuyv::White);
276 
277  // Paste the latest prediction results, if any, otherwise a wait message:
278  cv::Mat outimgcv = jevois::rawimage::cvImage(outimg);
279  if (itsRawPrevOutputCv.empty() == false)
280  itsRawPrevOutputCv.copyTo(outimgcv(cv::Rect(w, 0, itsRawPrevOutputCv.cols, itsRawPrevOutputCv.rows)));
281  else
282  {
283  jevois::rawimage::drawFilledRect(outimg, w, 0, outimg.width - w, h, jevois::yuyv::Black);
284  jevois::rawimage::writeText(outimg, "Loading network -", w + 3, 3, jevois::yuyv::White);
285  jevois::rawimage::writeText(outimg, "please wait...", w + 3, 15, jevois::yuyv::White);
286  }
287  });
288 
289  // Decide on what to do based on itsPredictFut: if it is valid, we are still predicting, so check whether we are
290  // done and if so draw the results. Otherwise, start predicting using the current input frame:
291  if (itsPredictFut.valid())
292  {
293  // Are we finished predicting?
294  if (itsPredictFut.wait_for(std::chrono::milliseconds(5)) == std::future_status::ready)
295  {
296  // Do a get() on our future to free up the async thread and get any exception it might have thrown. In
297  // particular, it will throw a logic_error if we are still loading the network:
298  bool success = true; float ptime = 0.0F;
299  try { ptime = itsPredictFut.get(); } catch (std::logic_error const & e) { success = false; }
300 
301  // Wait for paste to finish up and let camera know we are done processing the input image:
302  paste_fut.get(); inframe.done();
303 
304  if (success)
305  {
306  int const dispw = itsRawInputCv.cols, disph = itsRawInputCv.rows;
307  cv::Mat outimgcv = jevois::rawimage::cvImage(outimg);
308 
309  // Update our output image: First paste the image we have been making predictions on:
310  itsRawInputCv.copyTo(outimgcv(cv::Rect(w, 0, dispw, disph)));
311  jevois::rawimage::drawFilledRect(outimg, w, disph, dispw, h - disph, jevois::yuyv::Black);
312 
313  // Then draw the detections: either below the detection crop if there is room, or on top of it if not enough
314  // room below:
315  int y = disph + 3; if (y + itsDarknet->top::get() * 12 > h - 21) y = 3;
316 
317  for (auto const & p : itsResults)
318  {
319  jevois::rawimage::writeText(outimg, jevois::sformat("%s: %.2F", p.second.c_str(), p.first),
320  w + 3, y, jevois::yuyv::White);
321  y += 12;
322  }
323 
324  // Send serial results:
325  sal_fut.get();
326  sendAllSerial(w, h, rx + rw/2, ry + rh/2, rw, rh);
327 
328  // Draw some text messages:
329  jevois::rawimage::writeText(outimg, "Predict time: " + std::to_string(int(ptime)) + "ms",
330  w + 3, h - 11, jevois::yuyv::White);
331 
332  // Finally make a copy of these new results so we can display them again while we wait for the next round:
333  itsRawPrevOutputCv = cv::Mat(h, dispw, CV_8UC2);
334  outimgcv(cv::Rect(w, 0, dispw, h)).copyTo(itsRawPrevOutputCv);
335 
336  // Switch to next frame:
337  ++itsFrame;
338  }
339  }
340  else
341  {
342  // Future is not ready, do nothing except drawings on this frame (done in paste_fut thread) and we will try
343  // again on the next one...
344  paste_fut.get(); sal_fut.get(); inframe.done();
345  }
346  }
347  else // We are not predicting: start new predictions
348  {
349  // Wait for paste to finish up. Also wait for saliency to finish up so that rx, ry, rw, rh are available:
350  paste_fut.get(); sal_fut.get();
351 
352  // Extract a raw YUYV ROI around attended point:
353  cv::Mat rawimgcv = jevois::rawimage::cvImage(inimg);
354  cv::Mat rawroi = rawimgcv(cv::Rect(rx, ry, rw, rh));
355 
356  // Convert the ROI to RGB:
357  cv::Mat rgbroi;
358  cv::cvtColor(rawroi, rgbroi, CV_YUV2RGB_YUYV);
359 
360  // Let camera know we are done processing the input image:
361  inframe.done();
362 
363  // Scale the ROI if needed to the desired network input dims:
364  itsCvImg = jevois::rescaleCv(rgbroi, netin::get());
365 
366  // Also scale the ROI to the desired output size, i.e., USB width - camera width:
367  float fac = float(outimg.width - w) / float(rgbroi.cols);
368  cv::Size displaysize(outimg.width - w, int(rgbroi.rows * fac + 0.4999F));
369  cv::Mat displayroi = jevois::rescaleCv(rgbroi, displaysize);
370 
371  // Convert back the display ROI to YUYV and store for later display, while we are still computing the network
372  // predictions on that ROI:
374 
375  // Launch the predictions; will throw if network is not ready:
376  try
377  {
378  int netinw, netinh, netinc; itsDarknet->getInDims(netinw, netinh, netinc); // will throw if not ready
379  itsPredictFut = std::async(std::launch::async, [&]() { return itsDarknet->predict(itsCvImg, itsResults); });
380  }
381  catch (std::logic_error const & e) { itsRawPrevOutputCv.release(); } // network is not ready yet
382  }
383 
384  // Show processing fps:
385  std::string const & fpscpu = timer.stop();
386  jevois::rawimage::writeText(outimg, fpscpu, 3, h - 13, jevois::yuyv::White);
387 
388  // Show attended location:
389  jevois::rawimage::drawFilledRect(outimg, rx + rw/2 - 4, ry + rh/2 - 4, 8, 8, jevois::yuyv::LightPink);
390  jevois::rawimage::drawRect(outimg, rx, ry, rw, rh, 2, jevois::yuyv::LightPink);
391 
392  // Send the output image with our processing results to the host over USB:
393  outframe.send();
394  }
395 
396  // ####################################################################################################
397  protected:
398  std::shared_ptr<Saliency> itsSaliency;
399  std::shared_ptr<Darknet> itsDarknet;
400  std::vector<Darknet::predresult> itsResults;
401  std::future<float> itsPredictFut;
402  cv::Mat itsRawInputCv;
403  cv::Mat itsCvImg;
405  unsigned long itsFrame;
406  };
407 
408 // Allow the module to be loaded as a shared object (.so) file:
cv::Mat cvImage(RawImage const &src)
std::shared_ptr< Saliency > itsSaliency
void writeText(RawImage &img, std::string const &txt, int x, int y, unsigned int col, Font font=Font6x10)
unsigned int height
std::shared_ptr< Darknet > itsDarknet
DarknetSaliency(std::string const &instance)
Constructor.
cv::Mat itsRawPrevOutputCv
std::string sformat(char const *fmt,...) __attribute__((format(__printf__
#define success()
virtual void process(jevois::InputFrame &&inframe) override
Processing function, no video output.
StdModule(std::string const &instance)
unsigned long itsFrame
void sendSerialImg2D(unsigned int camw, unsigned int camh, float x, float y, float w=0.0F, float h=0.0F, std::string const &id="", std::string const &extra="")
virtual void process(jevois::InputFrame &&inframe, jevois::OutputFrame &&outframe) override
Processing function with video output to USB.
std::string const & stop()
virtual void sendSerial(std::string const &str)
JEVOIS_DECLARE_PARAMETER(camparams, std::string, "File stem of camera parameters, or empty. Camera resolution " "will be appended, as well as a .cfg extension. For example, specifying 'camera_para' " "here and running the camera sensor at 320x240 will attempt to load " "camera_para320x240.dat from within the module's directory.", "camera_para", ParamCateg)
Parameter.
void sendAllSerial(int inw, int inh, int salx, int saly, int roiw, int roih)
Send serial messages.
ENV_INTG32_TYPE intg32
32-bit signed integer
Definition: env_types.h:52
virtual void postUninit() override
Un-initialization.
void drawFilledRect(RawImage &img, int x, int y, unsigned int w, unsigned int h, unsigned int col)
std::future< float > itsPredictFut
std::string to_string(T const &val)
std::vector< Darknet::predresult > itsResults
JEVOIS_REGISTER_MODULE(DarknetSaliency)
void convertCvRGBtoCvYUYV(cv::Mat const &src, cv::Mat &dst)
virtual void getSalROI(jevois::RawImage const &inimg, int &rx, int &ry, int &rw, int &rh)
Helper function: compute saliency ROI in a thread, return top-left corner and size.
virtual ~DarknetSaliency()
Virtual destructor for safe inheritance.
Detect salient objects and identify them using Darknet deep neural network.
#define LINFO(msg)
void drawRect(RawImage &img, int x, int y, unsigned int w, unsigned int h, unsigned int thick, unsigned int col)
cv::Mat rescaleCv(cv::Mat const &img, cv::Size const &newdims)
unsigned int width
void paste(RawImage const &src, RawImage &dest, int dx, int dy)
void require(char const *info, unsigned int w, unsigned int h, unsigned int f) const