JeVoisBase  1.11
JeVois Smart Embedded Machine Vision Toolkit Base Modules
Share this page:
DarknetSaliency.C
Go to the documentation of this file.
1 // ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////
2 //
3 // JeVois Smart Embedded Machine Vision Toolkit - Copyright (C) 2016 by Laurent Itti, the University of Southern
4 // California (USC), and iLab at USC. See http://iLab.usc.edu and http://jevois.org for information about this project.
5 //
6 // This file is part of the JeVois Smart Embedded Machine Vision Toolkit. This program is free software; you can
7 // redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software
8 // Foundation, version 2. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;
9 // without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public
10 // License for more details. You should have received a copy of the GNU General Public License along with this program;
11 // if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
12 //
13 // Contact information: Laurent Itti - 3641 Watt Way, HNB-07A - Los Angeles, CA 90089-2520 - USA.
14 // Tel: +1 213 740 3527 - itti@pollux.usc.edu - http://iLab.usc.edu - http://jevois.org
15 // ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////
16 /*! \file */
17 
18 #include <jevois/Core/Module.H>
19 #include <jevois/Debug/Timer.H>
21 #include <opencv2/core/core.hpp>
22 #include <opencv2/imgproc/imgproc.hpp>
25 
26 // icon from https://pjreddie.com/darknet/
27 
28 static jevois::ParameterCategory const ParamCateg("Darknet Saliency Options");
29 
30 //! Parameter \relates DarknetSaliency
31 JEVOIS_DECLARE_PARAMETER(foa, cv::Size, "Width and height (in pixels) of the focus of attention. "
32  "This is the size of the image crop that is taken around the most salient "
33  "location in each frame. The foa size must fit within the camera input frame size.",
34  cv::Size(128, 128), ParamCateg);
35 
36 //! Parameter \relates DarknetSaliency
37 JEVOIS_DECLARE_PARAMETER(netin, cv::Size, "Width and height (in pixels) of the neural network input "
38  "layer. This is the size to which the image crop taken around the most salient "
39  "location in each frame will be rescaled before feeding to the neural network.",
40  cv::Size(128, 128), ParamCateg);
41 
42 
43 //! Detect salient objects and identify them using Darknet deep neural network
44 /*! Darknet is a popular neural network framework. This module first finds the most conspicuous (salient) object in the
45  scene, then identifies it using a deep neural network. It returns the top scoring candidates.
46 
47  See http://ilab.usc.edu/bu/ for more information about saliency detection, and https://pjreddie.com/darknet for more
48  information about the Darknet deep neural network framework.
49 
50  This module runs a Darknet network on an image window around the most salient point and shows the top-scoring
51  results. The network is currently a bit slow, hence it is only run once in a while. Point your camera towards some
52  interesting object, and wait for Darknet to tell you what it found. The framerate figures shown at the bottom left
53  of the display reflect the speed at which each new video frame from the camera is processed, but in this module this
54  just amounts to computing the saliency map from the camera input, converting the input image to RGB, cropping it
55  around the most salient location, sending it to the neural network for processing in a separate thread, and creating
56  the demo display. Actual network inference speed (time taken to compute the predictions on one image crop) is shown
57  at the bottom right. See below for how to trade-off speed and accuracy.
58 
59  Note that by default this module runs the Imagenet1k tiny Darknet (it can also run the slightly slower but a bit
60  more accurate Darknet Reference network; see parameters). There are 1000 different kinds of objects (object classes)
61  that this network can recognize (too long to list here).
62 
63  Sometimes it will make mistakes! The performance of darknet-tiny is about 58.7% correct (mean average precision) on
64  the test set, and Darknet Reference is about 61.1% correct on the test set. This is when running these networks at
65  224x224 network input resolution (see parameter \p netin below).
66 
67  \youtube{77VRwFtIe8I}
68 
69  Neural network size and speed
70  -----------------------------
71 
72  When using networks that are fully convolutional (as is the case for the default networks provided with this
73  module), one can resize the network to any desired input size. The network size direcly affects both speed and
74  accuracy. Larger networks run slower but are more accurate.
75 
76  This module provides two parameters that allow you to adjust this tradeoff:
77  - \p foa determines the size of a region of interest that is cropped around the most salient location
78  - \p netin determines the size to which that region of interest is rescaled and fed to the neural network
79 
80  For example:
81 
82  - with netin = (224 224), this module runs at about 450ms/prediction.
83  - with netin = (128 128), this module runs at about 180ms/prediction.
84 
85  Finally note that, when using video mappings with USB output, irrespective of \p foa and \p netin, the crop around
86  the most salient image region (with size given by \p foa) will always also be rescaled so that, when placed to the
87  right of the input image, it fills the desired USB output dims. For example, if camera mode is 320x240 and USB
88  output size is 544x240, then the attended and recognized object will be rescaled to 224x224 (since 224 = 544-320)
89  for display purposes only. This is so that one does not need to change USB video resolution while playing with
90  different values of \p foa and \p netin live.
91 
92  Serial messages
93  ---------------
94 
95  On every frame where detection results were obtained that are above \p thresh, this module sends a standardized 2D
96  message as specified in \ref UserSerialStyle:
97  + Serial message type: \b 2D
98  + `id`: top-scoring category name of the recognized object, followed by ':' and the confidence score in percent
99  + `x`, `y`, or vertices: standardized 2D coordinates of object center or corners
100  + `w`, `h`: standardized object size
101  + `extra`: any number of additional category:score pairs which had an above-threshold score, in order of
102  decreasing score
103  where \a category is the category name (from \p namefile) and \a score is the confidence score from 0.0 to 100.0
104 
105  See \ref UserSerialStyle for more on standardized serial messages, and \ref coordhelpers for more info on
106  standardized coordinates.
107 
108 
109  @author Laurent Itti
110 
111  @displayname Darknet Saliency
112  @videomapping NONE 0 0 0.0 YUYV 320 240 5.0 JeVois DarknetSaliency
113  @videomapping YUYV 460 240 15.0 YUYV 320 240 15.0 JeVois DarknetSaliency # not for mac (width not multiple of 16)
114  @videomapping YUYV 560 240 15.0 YUYV 320 240 15.0 JeVois DarknetSaliency
115  @videomapping YUYV 880 480 15.0 YUYV 640 480 15.0 JeVois DarknetSaliency # set foa param to 256 256
116  @email itti\@usc.edu
117  @address University of Southern California, HNB-07A, 3641 Watt Way, Los Angeles, CA 90089-2520, USA
118  @copyright Copyright (C) 2017 by Laurent Itti, iLab and the University of Southern California
119  @mainurl http://jevois.org
120  @supporturl http://jevois.org/doc
121  @otherurl http://iLab.usc.edu
122  @license GPL v3
123  @distribution Unrestricted
124  @restrictions None
125  \ingroup modules */
127  public jevois::Parameter<foa, netin>
128 {
129  public:
130  // ####################################################################################################
131  //! Constructor
132  // ####################################################################################################
133  DarknetSaliency(std::string const & instance) : jevois::StdModule(instance)
134  {
135  itsSaliency = addSubComponent<Saliency>("saliency");
136  itsDarknet = addSubComponent<Darknet>("darknet");
137  }
138 
139  // ####################################################################################################
140  //! Virtual destructor for safe inheritance
141  // ####################################################################################################
143  { }
144 
145  // ####################################################################################################
146  //! Un-initialization
147  // ####################################################################################################
148  virtual void postUninit() override
149  {
150  try { itsPredictFut.get(); } catch (...) { }
151  }
152 
153  // ####################################################################################################
154  //! Helper function: compute saliency ROI in a thread, return top-left corner and size
155  // ####################################################################################################
156  virtual void getSalROI(jevois::RawImage const & inimg, int & rx, int & ry, int & rw, int & rh)
157  {
158  int const w = inimg.width, h = inimg.height;
159 
160  // Check whether the input image size is small, in which case we will scale the maps up one notch:
161  if (w < 170) { itsSaliency->centermin::set(1); itsSaliency->smscale::set(3); }
162  else { itsSaliency->centermin::set(2); itsSaliency->smscale::set(4); }
163 
164  // Find the most salient location, no gist for now:
165  itsSaliency->process(inimg, false);
166 
167  // Get some info from the saliency computation:
168  int const smlev = itsSaliency->smscale::get();
169  int const smfac = (1 << smlev);
170 
171  // Find most salient point:
172  int mx, my; intg32 msal; itsSaliency->getSaliencyMax(mx, my, msal);
173 
174  // Compute attended ROI (note: coords must be even to avoid flipping U/V when we later paste):
175  cv::Size roisiz = foa::get(); rw = roisiz.width; rh = roisiz.height;
176  rw = std::min(rw, w); rh = std::min(rh, h); rw &= ~1; rh &= ~1;
177  unsigned int const dmx = (mx << smlev) + (smfac >> 2);
178  unsigned int const dmy = (my << smlev) + (smfac >> 2);
179  rx = int(dmx + 1 + smfac / 4) - rw / 2;
180  ry = int(dmy + 1 + smfac / 4) - rh / 2;
181  rx = std::max(0, std::min(rx, w - rw));
182  ry = std::max(0, std::min(ry, h - rh));
183  rx &= ~1; ry &= ~1;
184  }
185 
186  // ####################################################################################################
187  //! Processing function, no video output
188  // ####################################################################################################
189  virtual void process(jevois::InputFrame && inframe) override
190  {
191  // Wait for next available camera image:
192  jevois::RawImage const inimg = inframe.get();
193  unsigned int const w = inimg.width, h = inimg.height;
194 
195  // Find the most salient location, no gist for now:
196  int rx, ry, rw, rh;
197  getSalROI(inimg, rx, ry, rw, rh);
198 
199  // Extract a raw YUYV ROI around attended point:
200  cv::Mat rawimgcv = jevois::rawimage::cvImage(inimg);
201  cv::Mat rawroi = rawimgcv(cv::Rect(rx, ry, rw, rh));
202 
203  // Convert the ROI to RGB:
204  cv::Mat rgbroi;
205  cv::cvtColor(rawroi, rgbroi, cv::COLOR_YUV2RGB_YUYV);
206 
207  // Let camera know we are done processing the input image:
208  inframe.done();
209 
210  // Launch the predictions, will throw if network is not ready:
211  itsResults.clear();
212  try
213  {
214  int netinw, netinh, netinc; itsDarknet->getInDims(netinw, netinh, netinc);
215 
216  // Scale the ROI if needed:
217  cv::Mat scaledroi = jevois::rescaleCv(rgbroi, cv::Size(netinw, netinh));
218 
219  // Predict:
220  float const ptime = itsDarknet->predict(scaledroi, itsResults);
221  LINFO("Predicted in " << ptime << "ms");
222 
223  // Send serial results and switch to next frame:
224  sendSerialObjDetImg2D(w, h, rx + rw/2, ry + rh/2, rw, rh, itsResults);
225  }
226  catch (std::logic_error const & e) { } // network still loading
227  }
228 
229  // ####################################################################################################
230  //! Processing function with video output to USB
231  // ####################################################################################################
232  virtual void process(jevois::InputFrame && inframe, jevois::OutputFrame && outframe) override
233  {
234  static jevois::Timer timer("processing", 30, LOG_DEBUG);
235 
236  // Wait for next available camera image:
237  jevois::RawImage const inimg = inframe.get();
238 
239  timer.start();
240 
241  // We only handle one specific pixel format, but any image size in this module:
242  unsigned int const w = inimg.width, h = inimg.height;
243  inimg.require("input", w, h, V4L2_PIX_FMT_YUYV);
244 
245  // Launch the saliency computation in a thread:
246  int rx, ry, rw, rh;
247  auto sal_fut = std::async(std::launch::async, [&](){ this->getSalROI(inimg, rx, ry, rw, rh); });
248 
249  // While we process it, start a thread to wait for out frame and paste the input into it:
250  jevois::RawImage outimg;
251  auto paste_fut = std::async(std::launch::async, [&]() {
252  outimg = outframe.get();
253  outimg.require("output", outimg.width, outimg.height, V4L2_PIX_FMT_YUYV);
254 
255  // Paste the current input image:
256  jevois::rawimage::paste(inimg, outimg, 0, 0);
257  jevois::rawimage::writeText(outimg, "JeVois Darknet Saliency", 3, 3, jevois::yuyv::White);
258 
259  // Paste the latest prediction results, if any, otherwise a wait message:
260  cv::Mat outimgcv = jevois::rawimage::cvImage(outimg);
261  if (itsRawPrevOutputCv.empty() == false)
262  itsRawPrevOutputCv.copyTo(outimgcv(cv::Rect(w, 0, itsRawPrevOutputCv.cols, itsRawPrevOutputCv.rows)));
263  else
264  {
265  jevois::rawimage::drawFilledRect(outimg, w, 0, outimg.width - w, h, jevois::yuyv::Black);
266  jevois::rawimage::writeText(outimg, "Loading network -", w + 3, 3, jevois::yuyv::White);
267  jevois::rawimage::writeText(outimg, "please wait...", w + 3, 15, jevois::yuyv::White);
268  }
269  });
270 
271  // Decide on what to do based on itsPredictFut: if it is valid, we are still predicting, so check whether we are
272  // done and if so draw the results. Otherwise, start predicting using the current input frame:
273  if (itsPredictFut.valid())
274  {
275  // Are we finished predicting?
276  if (itsPredictFut.wait_for(std::chrono::milliseconds(5)) == std::future_status::ready)
277  {
278  // Do a get() on our future to free up the async thread and get any exception it might have thrown. In
279  // particular, it will throw a logic_error if we are still loading the network:
280  bool success = true; float ptime = 0.0F;
281  try { ptime = itsPredictFut.get(); } catch (std::logic_error const & e) { success = false; }
282 
283  // Wait for paste to finish up and let camera know we are done processing the input image:
284  paste_fut.get(); inframe.done();
285 
286  if (success)
287  {
288  int const dispw = itsRawInputCv.cols, disph = itsRawInputCv.rows;
289  cv::Mat outimgcv = jevois::rawimage::cvImage(outimg);
290 
291  // Update our output image: First paste the image we have been making predictions on:
292  itsRawInputCv.copyTo(outimgcv(cv::Rect(w, 0, dispw, disph)));
293  jevois::rawimage::drawFilledRect(outimg, w, disph, dispw, h - disph, jevois::yuyv::Black);
294 
295  // Then draw the detections: either below the detection crop if there is room, or on top of it if not enough
296  // room below:
297  int y = disph + 3; if (y + itsDarknet->top::get() * 12 > h - 21) y = 3;
298 
299  for (auto const & p : itsResults)
300  {
301  jevois::rawimage::writeText(outimg, jevois::sformat("%s: %.2F", p.category.c_str(), p.score),
302  w + 3, y, jevois::yuyv::White);
303  y += 12;
304  }
305 
306  // Send serial results:
307  sal_fut.get();
308  sendSerialObjDetImg2D(w, h, rx + rw/2, ry + rh/2, rw, rh, itsResults);
309 
310  // Draw some text messages:
311  jevois::rawimage::writeText(outimg, "Predict time: " + std::to_string(int(ptime)) + "ms",
312  w + 3, h - 11, jevois::yuyv::White);
313 
314  // Finally make a copy of these new results so we can display them again while we wait for the next round:
315  itsRawPrevOutputCv = cv::Mat(h, dispw, CV_8UC2);
316  outimgcv(cv::Rect(w, 0, dispw, h)).copyTo(itsRawPrevOutputCv);
317  }
318  }
319  else
320  {
321  // Future is not ready, do nothing except drawings on this frame (done in paste_fut thread) and we will try
322  // again on the next one...
323  paste_fut.get(); sal_fut.get(); inframe.done();
324  }
325  }
326  else // We are not predicting: start new predictions
327  {
328  // Wait for paste to finish up. Also wait for saliency to finish up so that rx, ry, rw, rh are available:
329  paste_fut.get(); sal_fut.get();
330 
331  // Extract a raw YUYV ROI around attended point:
332  cv::Mat rawimgcv = jevois::rawimage::cvImage(inimg);
333  cv::Mat rawroi = rawimgcv(cv::Rect(rx, ry, rw, rh));
334 
335  // Convert the ROI to RGB:
336  cv::Mat rgbroi;
337  cv::cvtColor(rawroi, rgbroi, cv::COLOR_YUV2RGB_YUYV);
338 
339  // Let camera know we are done processing the input image:
340  inframe.done();
341 
342  // Scale the ROI if needed to the desired network input dims:
343  itsCvImg = jevois::rescaleCv(rgbroi, netin::get());
344 
345  // Also scale the ROI to the desired output size, i.e., USB width - camera width:
346  float fac = float(outimg.width - w) / float(rgbroi.cols);
347  cv::Size displaysize(outimg.width - w, int(rgbroi.rows * fac + 0.4999F));
348  cv::Mat displayroi = jevois::rescaleCv(rgbroi, displaysize);
349 
350  // Convert back the display ROI to YUYV and store for later display, while we are still computing the network
351  // predictions on that ROI:
353 
354  // Launch the predictions; will throw if network is not ready:
355  try
356  {
357  int netinw, netinh, netinc; itsDarknet->getInDims(netinw, netinh, netinc); // will throw if not ready
358  itsPredictFut = std::async(std::launch::async, [&]() { return itsDarknet->predict(itsCvImg, itsResults); });
359  }
360  catch (std::logic_error const & e) { itsRawPrevOutputCv.release(); } // network is not ready yet
361  }
362 
363  // Show processing fps:
364  std::string const & fpscpu = timer.stop();
365  jevois::rawimage::writeText(outimg, fpscpu, 3, h - 13, jevois::yuyv::White);
366 
367  // Show attended location:
368  jevois::rawimage::drawFilledRect(outimg, rx + rw/2 - 4, ry + rh/2 - 4, 8, 8, jevois::yuyv::LightPink);
369  jevois::rawimage::drawRect(outimg, rx, ry, rw, rh, 2, jevois::yuyv::LightPink);
370 
371  // Send the output image with our processing results to the host over USB:
372  outframe.send();
373  }
374 
375  // ####################################################################################################
376  protected:
377  std::shared_ptr<Saliency> itsSaliency;
378  std::shared_ptr<Darknet> itsDarknet;
379  std::vector<jevois::ObjReco> itsResults;
380  std::future<float> itsPredictFut;
381  cv::Mat itsRawInputCv;
382  cv::Mat itsCvImg;
384  };
385 
386 // Allow the module to be loaded as a shared object (.so) file:
cv::Mat cvImage(RawImage const &src)
std::shared_ptr< Saliency > itsSaliency
void writeText(RawImage &img, std::string const &txt, int x, int y, unsigned int col, Font font=Font6x10)
unsigned int height
std::shared_ptr< Darknet > itsDarknet
DarknetSaliency(std::string const &instance)
Constructor.
cv::Mat itsRawPrevOutputCv
std::vector< jevois::ObjReco > itsResults
std::string sformat(char const *fmt,...) __attribute__((format(__printf__
#define success()
virtual void process(jevois::InputFrame &&inframe) override
Processing function, no video output.
StdModule(std::string const &instance)
virtual void process(jevois::InputFrame &&inframe, jevois::OutputFrame &&outframe) override
Processing function with video output to USB.
std::string const & stop()
JEVOIS_DECLARE_PARAMETER(camparams, std::string, "File stem of camera parameters, or empty. Camera resolution " "will be appended, as well as a .cfg extension. For example, specifying 'camera_para' " "here and running the camera sensor at 320x240 will attempt to load " "camera_para320x240.dat from within the module's directory.", "camera_para", ParamCateg)
Parameter.
ENV_INTG32_TYPE intg32
32-bit signed integer
Definition: env_types.h:52
virtual void postUninit() override
Un-initialization.
void drawFilledRect(RawImage &img, int x, int y, unsigned int w, unsigned int h, unsigned int col)
std::future< float > itsPredictFut
std::string to_string(T const &val)
JEVOIS_REGISTER_MODULE(DarknetSaliency)
void convertCvRGBtoCvYUYV(cv::Mat const &src, cv::Mat &dst)
virtual void getSalROI(jevois::RawImage const &inimg, int &rx, int &ry, int &rw, int &rh)
Helper function: compute saliency ROI in a thread, return top-left corner and size.
virtual ~DarknetSaliency()
Virtual destructor for safe inheritance.
Detect salient objects and identify them using Darknet deep neural network.
#define LINFO(msg)
void drawRect(RawImage &img, int x, int y, unsigned int w, unsigned int h, unsigned int thick, unsigned int col)
cv::Mat rescaleCv(cv::Mat const &img, cv::Size const &newdims)
unsigned int width
void sendSerialObjDetImg2D(unsigned int camw, unsigned int camh, float x, float y, float w, float h, std::vector< ObjReco > const &res)
void paste(RawImage const &src, RawImage &dest, int dx, int dy)
void require(char const *info, unsigned int w, unsigned int h, unsigned int f) const