JeVoisBase  1.6
JeVois Smart Embedded Machine Vision Toolkit Base Modules
Share this page:
DarknetYOLO.C
Go to the documentation of this file.
1 // ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////
2 //
3 // JeVois Smart Embedded Machine Vision Toolkit - Copyright (C) 2016 by Laurent Itti, the University of Southern
4 // California (USC), and iLab at USC. See http://iLab.usc.edu and http://jevois.org for information about this project.
5 //
6 // This file is part of the JeVois Smart Embedded Machine Vision Toolkit. This program is free software; you can
7 // redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software
8 // Foundation, version 2. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;
9 // without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public
10 // License for more details. You should have received a copy of the GNU General Public License along with this program;
11 // if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
12 //
13 // Contact information: Laurent Itti - 3641 Watt Way, HNB-07A - Los Angeles, CA 90089-2520 - USA.
14 // Tel: +1 213 740 3527 - itti@pollux.usc.edu - http://iLab.usc.edu - http://jevois.org
15 // ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////
16 /*! \file */
17 
18 #include <jevois/Core/Module.H>
19 #include <jevois/Debug/Timer.H>
21 #include <opencv2/core/core.hpp>
22 #include <opencv2/imgproc/imgproc.hpp>
24 
25 // icon from https://pjreddie.com/darknet/yolo/
26 
27 static jevois::ParameterCategory const ParamCateg("Darknet YOLO Options");
28 
29 //! Parameter \relates DarknetYOLO
30 JEVOIS_DECLARE_PARAMETER(netin, cv::Size, "Width and height (in pixels) of the neural network input layer, or [0 0] "
31  "to make it match camera frame size.",
32  cv::Size(320, 240), ParamCateg);
33 
34 
35 //! Detect multiple objects in scenes using the Darknet YOLO deep neural network
36 /*! Darknet is a popular neural network framework, and YOLO is a very interesting network that detects all objects in a
37  scene in one pass. This module detects all instances of any of the objects it knows about (determined by the
38  network structure, labels, dataset used for training, and weights obtained) in the image that is given to it.
39 
40  See https://pjreddie.com/darknet/yolo/
41 
42  This module runs a YOLO network and shows all detections obtained. The YOLO network is currently quite slow, hence
43  it is only run once in a while. Point your camera towards some interesting scene, keep it stable, and wait for YOLO
44  to tell you what it found. The framerate figures shown at the bottom left of the display reflect the speed at which
45  each new video frame from the camera is processed, but in this module this just amounts to converting the image to
46  RGB, sending it to the neural network for processing in a separate thread, and creating the demo display. Actual
47  network inference speed (time taken to compute the predictions on one image) is shown at the bottom right. See
48  below for how to trade-off speed and accuracy.
49 
50  Note that by default this module runs the Pascal-VOC version of tiny-YOLO, with these object categories:
51 
52  - aeroplane
53  - bicycle
54  - bird
55  - boat
56  - bottle
57  - bus
58  - car
59  - cat
60  - chair
61  - cow
62  - diningtable
63  - dog
64  - horse
65  - motorbike
66  - person
67  - pottedplant
68  - sheep
69  - sofa
70  - train
71  - tvmonitor
72 
73  Sometimes it will make mistakes! The performance of tiny-yolo-voc is about 57.1% correct (mean average precision) on
74  the test set.
75 
76  \youtube{d5CfljT5kec}
77 
78  Speed and network size
79  ----------------------
80 
81  The parameter \p netin allows you to rescale the neural network to the specified size. Beware that this will only
82  work if the network used is fully convolutional (as is the case of the default tiny-yolo network). This not only
83  allows you to adjust processing speed (and, conversely, accuracy), but also to better match the network to the input
84  images (e.g., the default size for tiny-yolo is 426x416, and, thus, passing it a input image of size 640x480 will
85  result in first scaling that input to 416x312, then letterboxing it by adding gray borders on top and bottom so that
86  the final input to the network is 416x416). This letterboxing can be completely avoided by just resizing the network
87  to 320x240.
88 
89  Here are expected processing speeds:
90  - when netin = [0 0], processes letterboxed 416x416 inputs, about 2450ms/image
91  - when netin = [320 240], processes 320x240 inputs, about 1350ms/image
92  - when netin = [160 120], processes 160x120 inputs, about 695ms/image
93 
94  \youtube{77VRwFtIe8I}
95 
96  Serial messages
97  ---------------
98 
99  - On every frame where detection results were obtained, this module sends a message
100  \verbatim
101  DKY framenum
102  \endverbatim
103  where \a framenum is the frame number (starts at 0).
104  - In addition, when detections are found which are above threshold, one message will be sent for each detected
105  object (i.e., for each box that gets drawn when USB outputs are used), using a standardized 2D message:
106  + Serial message type: \b 2D
107  + `id`: the category name of the recognized object
108  + `x`, `y`, or vertices: standardized 2D coordinates of object center or corners
109  + `w`, `h`: standardized object size
110  + `extra`: recognition score (in percent confidence)
111 
112 
113  @author Laurent Itti
114 
115  @displayname Darknet YOLO
116  @videomapping NONE 0 0 0.0 YUYV 640 480 0.4 JeVois DarknetYOLO
117  @videomapping YUYV 1280 480 15.0 YUYV 640 480 15.0 JeVois DarknetYOLO
118  @email itti\@usc.edu
119  @address University of Southern California, HNB-07A, 3641 Watt Way, Los Angeles, CA 90089-2520, USA
120  @copyright Copyright (C) 2017 by Laurent Itti, iLab and the University of Southern California
121  @mainurl http://jevois.org
122  @supporturl http://jevois.org/doc
123  @otherurl http://iLab.usc.edu
124  @license GPL v3
125  @distribution Unrestricted
126  @restrictions None
127  \ingroup modules */
129  public jevois::Parameter<netin>
130 {
131  public:
132  // ####################################################################################################
133  //! Constructor
134  // ####################################################################################################
135  DarknetYOLO(std::string const & instance) : jevois::StdModule(instance), itsFrame(0)
136  {
137  itsYolo = addSubComponent<Yolo>("yolo");
138  }
139 
140  // ####################################################################################################
141  //! Virtual destructor for safe inheritance
142  // ####################################################################################################
143  virtual ~DarknetYOLO()
144  { }
145 
146  // ####################################################################################################
147  //! Un-initialization
148  // ####################################################################################################
149  virtual void postUninit() override
150  {
151  try { itsPredictFut.get(); } catch (...) { }
152  }
153 
154  // ####################################################################################################
155  //! Processing function, no video output
156  // ####################################################################################################
157  virtual void process(jevois::InputFrame && inframe) override
158  {
159  int ready = true; float ptime = 0.0F;
160 
161  // Wait for next available camera image:
162  jevois::RawImage const inimg = inframe.get();
163  unsigned int const w = inimg.width, h = inimg.height;
164 
165  // Convert input image to RGB for predictions:
166  cv::Mat cvimg = jevois::rawimage::convertToCvRGB(inimg);
167 
168  // Resize the network if desired:
169  cv::Size nsz = netin::get();
170  if (nsz.width != 0 && nsz.height != 0)
171  {
172  itsYolo->resizeInDims(nsz.width, nsz.height);
173 
174  if (nsz.width == cvimg.cols && nsz.height == cvimg.rows)
175  itsNetInput = cvimg;
176  else if (nsz.width > cvimg.cols || nsz.height > cvimg.rows)
177  cv::resize(cvimg, itsNetInput, nsz, 0, 0, cv::INTER_LINEAR);
178  else
179  cv::resize(cvimg, itsNetInput, nsz, 0, 0, cv::INTER_AREA);
180  }
181  else itsNetInput = cvimg;
182 
183  cvimg.release();
184 
185  // Let camera know we are done processing the input image:
186  inframe.done();
187 
188  // Launch the predictions, will throw logic_error if we are still loading the network:
189  try { ptime = itsYolo->predict(itsNetInput); } catch (std::logic_error const & e) { ready = false; }
190 
191  if (ready)
192  {
193  LINFO("Predicted in " << ptime << "ms");
194 
195  // Compute the boxes:
196  itsYolo->computeBoxes(w, h);
197 
198  // Send serial results and switch to next frame:
199  itsYolo->sendSerial(this, w, h, itsFrame);
200  ++itsFrame;
201  }
202  }
203 
204  // ####################################################################################################
205  //! Processing function with video output to USB
206  // ####################################################################################################
207  virtual void process(jevois::InputFrame && inframe, jevois::OutputFrame && outframe) override
208  {
209  static jevois::Timer timer("processing", 50, LOG_DEBUG);
210 
211  // Wait for next available camera image:
212  jevois::RawImage const inimg = inframe.get();
213 
214  timer.start();
215 
216  // We only handle one specific pixel format, and any image size in this module:
217  unsigned int const w = inimg.width, h = inimg.height;
218  inimg.require("input", w, h, V4L2_PIX_FMT_YUYV);
219 
220  // While we process it, start a thread to wait for out frame and paste the input into it:
221  jevois::RawImage outimg;
222  auto paste_fut = std::async(std::launch::async, [&]() {
223  outimg = outframe.get();
224  outimg.require("output", w * 2, h, inimg.fmt);
225 
226  // Paste the current input image:
227  jevois::rawimage::paste(inimg, outimg, 0, 0);
228  jevois::rawimage::writeText(outimg, "JeVois Darknet YOLO - input", 3, 3, jevois::yuyv::White);
229 
230  // Paste the latest prediction results, if any, otherwise a wait message:
231  cv::Mat outimgcv = jevois::rawimage::cvImage(outimg);
232  if (itsRawPrevOutputCv.empty() == false)
233  itsRawPrevOutputCv.copyTo(outimgcv(cv::Rect(w, 0, w, h)));
234  else
235  {
236  jevois::rawimage::drawFilledRect(outimg, w, 0, w, h, jevois::yuyv::Black);
237  jevois::rawimage::writeText(outimg, "JeVois Darknet YOLO - loading network - please wait...",
238  w + 3, 3, jevois::yuyv::White);
239  }
240  });
241 
242  // Decide on what to do based on itsPredictFut: if it is valid, we are still predicting, so check whether we are
243  // done and if so draw the results. Otherwise, start predicting using the current input frame:
244  if (itsPredictFut.valid())
245  {
246  // Are we finished predicting?
247  if (itsPredictFut.wait_for(std::chrono::milliseconds(5)) == std::future_status::ready)
248  {
249  // Do a get() on our future to free up the async thread and get any exception it might have thrown. In
250  // particular, it will throw a logic_error if we are still loading the network:
251  bool success = true; float ptime = 0.0F;
252  try { ptime = itsPredictFut.get(); } catch (std::logic_error const & e) { success = false; }
253 
254  // Wait for paste to finish up:
255  paste_fut.get();
256 
257  // Let camera know we are done processing the input image:
258  inframe.done();
259 
260  if (success)
261  {
262  cv::Mat outimgcv = jevois::rawimage::cvImage(outimg);
263 
264  // Update our output image: First paste the image we have been making predictions on:
265  if (itsRawPrevOutputCv.empty()) itsRawPrevOutputCv = cv::Mat(h, w, CV_8UC2);
266  itsRawInputCv.copyTo(outimgcv(cv::Rect(w, 0, w, h)));
267 
268  // Then draw the detections:
269  itsYolo->drawDetections(outimg, w, h, w, 0);
270 
271  // Send serial messages:
272  itsYolo->sendSerial(this, w, h, itsFrame);
273 
274  // Draw some text messages:
275  jevois::rawimage::writeText(outimg, "JeVois Darknet YOLO - predictions", w + 3, 3, jevois::yuyv::White);
276  jevois::rawimage::writeText(outimg, "YOLO predict time: " + std::to_string(int(ptime)) + "ms",
277  w + 3, h - 13, jevois::yuyv::White);
278 
279  // Finally make a copy of these new results so we can display them again while we wait for the next round:
280  outimgcv(cv::Rect(w, 0, w, h)).copyTo(itsRawPrevOutputCv);
281 
282  // Switch to next frame:
283  ++itsFrame;
284  }
285  }
286  else
287  {
288  // Future is not ready, do nothing except drawings on this frame (done in paste_fut thread) and we will try
289  // again on the next one...
290  paste_fut.get();
291  inframe.done();
292  }
293  }
294  else
295  {
296  // Note: resizeInDims() could throw if the network is not ready yet.
297  try
298  {
299  // Convert input image to RGB for predictions:
300  cv::Mat cvimg = jevois::rawimage::convertToCvRGB(inimg);
301 
302  // Also make a raw YUYV copy of the input image for later displays:
303  cv::Mat inimgcv = jevois::rawimage::cvImage(inimg);
304  inimgcv.copyTo(itsRawInputCv);
305 
306  // Resize the network if desired:
307  cv::Size nsz = netin::get();
308  if (nsz.width != 0 && nsz.height != 0)
309  {
310  itsYolo->resizeInDims(nsz.width, nsz.height);
311 
312  if (nsz.width == cvimg.cols && nsz.height == cvimg.rows)
313  itsNetInput = cvimg;
314  else if (nsz.width > cvimg.cols || nsz.height > cvimg.rows)
315  cv::resize(cvimg, itsNetInput, nsz, 0, 0, cv::INTER_LINEAR);
316  else
317  cv::resize(cvimg, itsNetInput, nsz, 0, 0, cv::INTER_AREA);
318  }
319  else itsNetInput = cvimg;
320 
321  cvimg.release();
322 
323  // Launch the predictions:
324  itsPredictFut = std::async(std::launch::async, [&](int ww, int hh)
325  {
326  float pt = itsYolo->predict(itsNetInput);
327  itsYolo->computeBoxes(ww, hh);
328  return pt;
329  }, w, h);
330  }
331  catch (std::logic_error const & e) { }
332 
333  // Wait for paste to finish up:
334  paste_fut.get();
335 
336  // Let camera know we are done processing the input image:
337  inframe.done();
338  }
339 
340  // Show processing fps:
341  std::string const & fpscpu = timer.stop();
342  jevois::rawimage::writeText(outimg, fpscpu, 3, h - 13, jevois::yuyv::White);
343 
344  // Send the output image with our processing results to the host over USB:
345  outframe.send();
346  }
347 
348  // ####################################################################################################
349  protected:
350  std::shared_ptr<Yolo> itsYolo;
351  std::future<float> itsPredictFut;
352  cv::Mat itsRawInputCv;
354  cv::Mat itsNetInput;
355  unsigned long itsFrame;
356 };
357 
358 // Allow the module to be loaded as a shared object (.so) file:
cv::Mat convertToCvRGB(RawImage const &src)
cv::Mat cvImage(RawImage const &src)
cv::Mat itsNetInput
Definition: DarknetYOLO.C:354
virtual void postUninit() override
Un-initialization.
Definition: DarknetYOLO.C:149
void writeText(RawImage &img, std::string const &txt, int x, int y, unsigned int col, Font font=Font6x10)
unsigned int height
unsigned int fmt
#define success()
virtual ~DarknetYOLO()
Virtual destructor for safe inheritance.
Definition: DarknetYOLO.C:143
std::shared_ptr< Yolo > itsYolo
Definition: DarknetYOLO.C:350
Detect multiple objects in scenes using the Darknet YOLO deep neural network.
Definition: DarknetYOLO.C:128
cv::Mat itsRawInputCv
Definition: DarknetYOLO.C:352
DarknetYOLO(std::string const &instance)
Constructor.
Definition: DarknetYOLO.C:135
StdModule(std::string const &instance)
JEVOIS_REGISTER_MODULE(DarknetYOLO)
cv::Mat itsRawPrevOutputCv
Definition: DarknetYOLO.C:353
std::string const & stop()
JEVOIS_DECLARE_PARAMETER(camparams, std::string, "File stem of camera parameters, or empty. Camera resolution " "will be appended, as well as a .cfg extension. For example, specifying 'camera_para' " "here and running the camera sensor at 320x240 will attempt to load " "camera_para320x240.dat from within the module's directory.", "camera_para", ParamCateg)
Parameter.
void drawFilledRect(RawImage &img, int x, int y, unsigned int w, unsigned int h, unsigned int col)
virtual void process(jevois::InputFrame &&inframe) override
Processing function, no video output.
Definition: DarknetYOLO.C:157
std::string to_string(T const &val)
std::future< float > itsPredictFut
Definition: DarknetYOLO.C:351
#define LINFO(msg)
unsigned int width
virtual void process(jevois::InputFrame &&inframe, jevois::OutputFrame &&outframe) override
Processing function with video output to USB.
Definition: DarknetYOLO.C:207
unsigned long itsFrame
Definition: DarknetYOLO.C:355
void paste(RawImage const &src, RawImage &dest, int dx, int dy)
void require(char const *info, unsigned int w, unsigned int h, unsigned int f) const