JeVoisBase  1.8
JeVois Smart Embedded Machine Vision Toolkit Base Modules
Share this page:
DarknetYOLO.C
Go to the documentation of this file.
1 // ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////
2 //
3 // JeVois Smart Embedded Machine Vision Toolkit - Copyright (C) 2016 by Laurent Itti, the University of Southern
4 // California (USC), and iLab at USC. See http://iLab.usc.edu and http://jevois.org for information about this project.
5 //
6 // This file is part of the JeVois Smart Embedded Machine Vision Toolkit. This program is free software; you can
7 // redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software
8 // Foundation, version 2. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;
9 // without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public
10 // License for more details. You should have received a copy of the GNU General Public License along with this program;
11 // if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
12 //
13 // Contact information: Laurent Itti - 3641 Watt Way, HNB-07A - Los Angeles, CA 90089-2520 - USA.
14 // Tel: +1 213 740 3527 - itti@pollux.usc.edu - http://iLab.usc.edu - http://jevois.org
15 // ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////
16 /*! \file */
17 
18 #include <jevois/Core/Module.H>
19 #include <jevois/Debug/Timer.H>
21 #include <opencv2/core/core.hpp>
22 #include <opencv2/imgproc/imgproc.hpp>
24 
25 // icon from https://pjreddie.com/darknet/yolo/
26 
27 static jevois::ParameterCategory const ParamCateg("Darknet YOLO Options");
28 
29 //! Parameter \relates DarknetYOLO
30 JEVOIS_DECLARE_PARAMETER(netin, cv::Size, "Width and height (in pixels) of the neural network input layer, or [0 0] "
31  "to make it match camera frame size.",
32  cv::Size(320, 240), ParamCateg);
33 
34 
35 //! Detect multiple objects in scenes using the Darknet YOLO deep neural network
36 /*! Darknet is a popular neural network framework, and YOLO is a very interesting network that detects all objects in a
37  scene in one pass. This module detects all instances of any of the objects it knows about (determined by the
38  network structure, labels, dataset used for training, and weights obtained) in the image that is given to it.
39 
40  See https://pjreddie.com/darknet/yolo/
41 
42  This module runs a YOLO network and shows all detections obtained. The YOLO network is currently quite slow, hence
43  it is only run once in a while. Point your camera towards some interesting scene, keep it stable, and wait for YOLO
44  to tell you what it found. The framerate figures shown at the bottom left of the display reflect the speed at which
45  each new video frame from the camera is processed, but in this module this just amounts to converting the image to
46  RGB, sending it to the neural network for processing in a separate thread, and creating the demo display. Actual
47  network inference speed (time taken to compute the predictions on one image) is shown at the bottom right. See
48  below for how to trade-off speed and accuracy.
49 
50  Note that by default this module runs the Pascal-VOC version of tiny-YOLO, with these object categories:
51 
52  - aeroplane
53  - bicycle
54  - bird
55  - boat
56  - bottle
57  - bus
58  - car
59  - cat
60  - chair
61  - cow
62  - diningtable
63  - dog
64  - horse
65  - motorbike
66  - person
67  - pottedplant
68  - sheep
69  - sofa
70  - train
71  - tvmonitor
72 
73  Sometimes it will make mistakes! The performance of tiny-yolo-voc is about 57.1% correct (mean average precision) on
74  the test set.
75 
76  \youtube{d5CfljT5kec}
77 
78  Speed and network size
79  ----------------------
80 
81  The parameter \p netin allows you to rescale the neural network to the specified size. Beware that this will only
82  work if the network used is fully convolutional (as is the case of the default tiny-yolo network). This not only
83  allows you to adjust processing speed (and, conversely, accuracy), but also to better match the network to the input
84  images (e.g., the default size for tiny-yolo is 416x416, and, thus, passing it a input image of size 640x480 will
85  result in first scaling that input to 416x312, then letterboxing it by adding gray borders on top and bottom so that
86  the final input to the network is 416x416). This letterboxing can be completely avoided by just resizing the network
87  to 320x240.
88 
89  Here are expected processing speeds:
90  - when netin = [0 0], processes letterboxed 416x416 inputs, about 2450ms/image
91  - when netin = [320 240], processes 320x240 inputs, about 1350ms/image
92  - when netin = [160 120], processes 160x120 inputs, about 695ms/image
93 
94  \youtube{77VRwFtIe8I}
95 
96  Serial messages
97  ---------------
98 
99  - On every frame where detection results were obtained, this module sends a message
100  \verbatim
101  DKY framenum
102  \endverbatim
103  where \a framenum is the frame number (starts at 0).
104  - In addition, when detections are found which are above threshold, one message will be sent for each detected
105  object (i.e., for each box that gets drawn when USB outputs are used), using a standardized 2D message:
106  + Serial message type: \b 2D
107  + `id`: the category name of the recognized object
108  + `x`, `y`, or vertices: standardized 2D coordinates of object center or corners
109  + `w`, `h`: standardized object size
110  + `extra`: recognition score (in percent confidence)
111 
112 
113  @author Laurent Itti
114 
115  @displayname Darknet YOLO
116  @videomapping NONE 0 0 0.0 YUYV 640 480 0.4 JeVois DarknetYOLO
117  @videomapping YUYV 1280 480 15.0 YUYV 640 480 15.0 JeVois DarknetYOLO
118  @email itti\@usc.edu
119  @address University of Southern California, HNB-07A, 3641 Watt Way, Los Angeles, CA 90089-2520, USA
120  @copyright Copyright (C) 2017 by Laurent Itti, iLab and the University of Southern California
121  @mainurl http://jevois.org
122  @supporturl http://jevois.org/doc
123  @otherurl http://iLab.usc.edu
124  @license GPL v3
125  @distribution Unrestricted
126  @restrictions None
127  \ingroup modules */
129  public jevois::Parameter<netin>
130 {
131  public:
132  // ####################################################################################################
133  //! Constructor
134  // ####################################################################################################
135  DarknetYOLO(std::string const & instance) : jevois::StdModule(instance), itsFrame(0)
136  {
137  itsYolo = addSubComponent<Yolo>("yolo");
138  }
139 
140  // ####################################################################################################
141  //! Virtual destructor for safe inheritance
142  // ####################################################################################################
143  virtual ~DarknetYOLO()
144  { }
145 
146  // ####################################################################################################
147  //! Un-initialization
148  // ####################################################################################################
149  virtual void postUninit() override
150  {
151  try { itsPredictFut.get(); } catch (...) { }
152  }
153 
154  // ####################################################################################################
155  //! Processing function, no video output
156  // ####################################################################################################
157  virtual void process(jevois::InputFrame && inframe) override
158  {
159  int ready = true; float ptime = 0.0F;
160 
161  // Wait for next available camera image:
162  jevois::RawImage const inimg = inframe.get();
163  unsigned int const w = inimg.width, h = inimg.height;
164 
165  // Convert input image to RGB for predictions:
166  cv::Mat cvimg = jevois::rawimage::convertToCvRGB(inimg);
167 
168  // Resize the network and/or the input if desired:
169  cv::Size nsz = netin::get();
170  if (nsz.width != 0 && nsz.height != 0)
171  {
172  itsYolo->resizeInDims(nsz.width, nsz.height);
173  itsNetInput = jevois::rescaleCv(cvimg, nsz);
174  }
175  else
176  {
177  itsYolo->resizeInDims(cvimg.cols, cvimg.rows);
178  itsNetInput = cvimg;
179  }
180 
181  cvimg.release();
182 
183  // Let camera know we are done processing the input image:
184  inframe.done();
185 
186  // Launch the predictions, will throw logic_error if we are still loading the network:
187  try { ptime = itsYolo->predict(itsNetInput); } catch (std::logic_error const & e) { ready = false; }
188 
189  if (ready)
190  {
191  LINFO("Predicted in " << ptime << "ms");
192 
193  // Compute the boxes:
194  itsYolo->computeBoxes(w, h);
195 
196  // Send serial results and switch to next frame:
197  itsYolo->sendSerial(this, w, h, itsFrame);
198  ++itsFrame;
199  }
200  }
201 
202  // ####################################################################################################
203  //! Processing function with video output to USB
204  // ####################################################################################################
205  virtual void process(jevois::InputFrame && inframe, jevois::OutputFrame && outframe) override
206  {
207  static jevois::Timer timer("processing", 50, LOG_DEBUG);
208 
209  // Wait for next available camera image:
210  jevois::RawImage const inimg = inframe.get();
211 
212  timer.start();
213 
214  // We only handle one specific pixel format, and any image size in this module:
215  unsigned int const w = inimg.width, h = inimg.height;
216  inimg.require("input", w, h, V4L2_PIX_FMT_YUYV);
217 
218  // While we process it, start a thread to wait for out frame and paste the input into it:
219  jevois::RawImage outimg;
220  auto paste_fut = std::async(std::launch::async, [&]() {
221  outimg = outframe.get();
222  outimg.require("output", w * 2, h, inimg.fmt);
223 
224  // Paste the current input image:
225  jevois::rawimage::paste(inimg, outimg, 0, 0);
226  jevois::rawimage::writeText(outimg, "JeVois Darknet YOLO - input", 3, 3, jevois::yuyv::White);
227 
228  // Paste the latest prediction results, if any, otherwise a wait message:
229  cv::Mat outimgcv = jevois::rawimage::cvImage(outimg);
230  if (itsRawPrevOutputCv.empty() == false)
231  itsRawPrevOutputCv.copyTo(outimgcv(cv::Rect(w, 0, w, h)));
232  else
233  {
234  jevois::rawimage::drawFilledRect(outimg, w, 0, w, h, jevois::yuyv::Black);
235  jevois::rawimage::writeText(outimg, "JeVois Darknet YOLO - loading network - please wait...",
236  w + 3, 3, jevois::yuyv::White);
237  }
238  });
239 
240  // Decide on what to do based on itsPredictFut: if it is valid, we are still predicting, so check whether we are
241  // done and if so draw the results. Otherwise, start predicting using the current input frame:
242  if (itsPredictFut.valid())
243  {
244  // Are we finished predicting?
245  if (itsPredictFut.wait_for(std::chrono::milliseconds(5)) == std::future_status::ready)
246  {
247  // Do a get() on our future to free up the async thread and get any exception it might have thrown. In
248  // particular, it will throw a logic_error if we are still loading the network:
249  bool success = true; float ptime = 0.0F;
250  try { ptime = itsPredictFut.get(); } catch (std::logic_error const & e) { success = false; }
251 
252  // Wait for paste to finish up:
253  paste_fut.get();
254 
255  // Let camera know we are done processing the input image:
256  inframe.done();
257 
258  if (success)
259  {
260  cv::Mat outimgcv = jevois::rawimage::cvImage(outimg);
261 
262  // Update our output image: First paste the image we have been making predictions on:
263  if (itsRawPrevOutputCv.empty()) itsRawPrevOutputCv = cv::Mat(h, w, CV_8UC2);
264  itsRawInputCv.copyTo(outimgcv(cv::Rect(w, 0, w, h)));
265 
266  // Then draw the detections:
267  itsYolo->drawDetections(outimg, w, h, w, 0);
268 
269  // Send serial messages:
270  itsYolo->sendSerial(this, w, h, itsFrame);
271 
272  // Draw some text messages:
273  jevois::rawimage::writeText(outimg, "JeVois Darknet YOLO - predictions", w + 3, 3, jevois::yuyv::White);
274  jevois::rawimage::writeText(outimg, "YOLO predict time: " + std::to_string(int(ptime)) + "ms",
275  w + 3, h - 13, jevois::yuyv::White);
276 
277  // Finally make a copy of these new results so we can display them again while we wait for the next round:
278  outimgcv(cv::Rect(w, 0, w, h)).copyTo(itsRawPrevOutputCv);
279 
280  // Switch to next frame:
281  ++itsFrame;
282  }
283  }
284  else
285  {
286  // Future is not ready, do nothing except drawings on this frame (done in paste_fut thread) and we will try
287  // again on the next one...
288  paste_fut.get();
289  inframe.done();
290  }
291  }
292  else
293  {
294  // Note: resizeInDims() could throw if the network is not ready yet.
295  try
296  {
297  // Convert input image to RGB for predictions:
298  cv::Mat cvimg = jevois::rawimage::convertToCvRGB(inimg);
299 
300  // Also make a raw YUYV copy of the input image for later displays:
301  cv::Mat inimgcv = jevois::rawimage::cvImage(inimg);
302  inimgcv.copyTo(itsRawInputCv);
303 
304  // Resize the network if desired:
305  cv::Size nsz = netin::get();
306  if (nsz.width != 0 && nsz.height != 0)
307  {
308  itsYolo->resizeInDims(nsz.width, nsz.height);
309  itsNetInput = jevois::rescaleCv(cvimg, nsz);
310  }
311  else
312  {
313  itsYolo->resizeInDims(cvimg.cols, cvimg.rows);
314  itsNetInput = cvimg;
315  }
316 
317  cvimg.release();
318 
319  // Launch the predictions:
320  itsPredictFut = std::async(std::launch::async, [&](int ww, int hh)
321  {
322  float pt = itsYolo->predict(itsNetInput);
323  itsYolo->computeBoxes(ww, hh);
324  return pt;
325  }, w, h);
326  }
327  catch (std::logic_error const & e) { }
328 
329  // Wait for paste to finish up:
330  paste_fut.get();
331 
332  // Let camera know we are done processing the input image:
333  inframe.done();
334  }
335 
336  // Show processing fps:
337  std::string const & fpscpu = timer.stop();
338  jevois::rawimage::writeText(outimg, fpscpu, 3, h - 13, jevois::yuyv::White);
339 
340  // Send the output image with our processing results to the host over USB:
341  outframe.send();
342  }
343 
344  // ####################################################################################################
345  protected:
346  std::shared_ptr<Yolo> itsYolo;
347  std::future<float> itsPredictFut;
348  cv::Mat itsRawInputCv;
350  cv::Mat itsNetInput;
351  unsigned long itsFrame;
352 };
353 
354 // Allow the module to be loaded as a shared object (.so) file:
cv::Mat convertToCvRGB(RawImage const &src)
cv::Mat cvImage(RawImage const &src)
cv::Mat itsNetInput
Definition: DarknetYOLO.C:350
virtual void postUninit() override
Un-initialization.
Definition: DarknetYOLO.C:149
void writeText(RawImage &img, std::string const &txt, int x, int y, unsigned int col, Font font=Font6x10)
unsigned int height
unsigned int fmt
#define success()
virtual ~DarknetYOLO()
Virtual destructor for safe inheritance.
Definition: DarknetYOLO.C:143
std::shared_ptr< Yolo > itsYolo
Definition: DarknetYOLO.C:346
Detect multiple objects in scenes using the Darknet YOLO deep neural network.
Definition: DarknetYOLO.C:128
cv::Mat itsRawInputCv
Definition: DarknetYOLO.C:348
DarknetYOLO(std::string const &instance)
Constructor.
Definition: DarknetYOLO.C:135
StdModule(std::string const &instance)
JEVOIS_REGISTER_MODULE(DarknetYOLO)
cv::Mat itsRawPrevOutputCv
Definition: DarknetYOLO.C:349
std::string const & stop()
JEVOIS_DECLARE_PARAMETER(camparams, std::string, "File stem of camera parameters, or empty. Camera resolution " "will be appended, as well as a .cfg extension. For example, specifying 'camera_para' " "here and running the camera sensor at 320x240 will attempt to load " "camera_para320x240.dat from within the module's directory.", "camera_para", ParamCateg)
Parameter.
void drawFilledRect(RawImage &img, int x, int y, unsigned int w, unsigned int h, unsigned int col)
virtual void process(jevois::InputFrame &&inframe) override
Processing function, no video output.
Definition: DarknetYOLO.C:157
std::string to_string(T const &val)
std::future< float > itsPredictFut
Definition: DarknetYOLO.C:347
#define LINFO(msg)
cv::Mat rescaleCv(cv::Mat const &img, cv::Size const &newdims)
unsigned int width
virtual void process(jevois::InputFrame &&inframe, jevois::OutputFrame &&outframe) override
Processing function with video output to USB.
Definition: DarknetYOLO.C:205
unsigned long itsFrame
Definition: DarknetYOLO.C:351
void paste(RawImage const &src, RawImage &dest, int dx, int dy)
void require(char const *info, unsigned int w, unsigned int h, unsigned int f) const