JeVoisBase  1.9
JeVois Smart Embedded Machine Vision Toolkit Base Modules
Share this page:
DarknetYOLO.C
Go to the documentation of this file.
1 // ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////
2 //
3 // JeVois Smart Embedded Machine Vision Toolkit - Copyright (C) 2016 by Laurent Itti, the University of Southern
4 // California (USC), and iLab at USC. See http://iLab.usc.edu and http://jevois.org for information about this project.
5 //
6 // This file is part of the JeVois Smart Embedded Machine Vision Toolkit. This program is free software; you can
7 // redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software
8 // Foundation, version 2. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;
9 // without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public
10 // License for more details. You should have received a copy of the GNU General Public License along with this program;
11 // if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
12 //
13 // Contact information: Laurent Itti - 3641 Watt Way, HNB-07A - Los Angeles, CA 90089-2520 - USA.
14 // Tel: +1 213 740 3527 - itti@pollux.usc.edu - http://iLab.usc.edu - http://jevois.org
15 // ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////
16 /*! \file */
17 
18 #include <jevois/Core/Module.H>
19 #include <jevois/Debug/Timer.H>
21 #include <opencv2/core/core.hpp>
22 #include <opencv2/imgproc/imgproc.hpp>
24 
25 // icon from https://pjreddie.com/darknet/yolo/
26 
27 static jevois::ParameterCategory const ParamCateg("Darknet YOLO Options");
28 
29 //! Parameter \relates DarknetYOLO
30 JEVOIS_DECLARE_PARAMETER(netin, cv::Size, "Width and height (in pixels) of the neural network input layer, or [0 0] "
31  "to make it match camera frame size. NOTE: for YOLO v3 sizes must be multiples of 32.",
32  cv::Size(320, 224), ParamCateg);
33 
34 
35 //! Detect multiple objects in scenes using the Darknet YOLO deep neural network
36 /*! Darknet is a popular neural network framework, and YOLO is a very interesting network that detects all objects in a
37  scene in one pass. This module detects all instances of any of the objects it knows about (determined by the
38  network structure, labels, dataset used for training, and weights obtained) in the image that is given to it.
39 
40  See https://pjreddie.com/darknet/yolo/
41 
42  This module runs a YOLO network and shows all detections obtained. The YOLO network is currently quite slow, hence
43  it is only run once in a while. Point your camera towards some interesting scene, keep it stable, and wait for YOLO
44  to tell you what it found. The framerate figures shown at the bottom left of the display reflect the speed at which
45  each new video frame from the camera is processed, but in this module this just amounts to converting the image to
46  RGB, sending it to the neural network for processing in a separate thread, and creating the demo display. Actual
47  network inference speed (time taken to compute the predictions on one image) is shown at the bottom right. See
48  below for how to trade-off speed and accuracy.
49 
50  Note that by default this module runs tiny-YOLO V3 which can detect and recognize 80 different kinds of objects from
51  the Microsoft COCO dataset. This module can also run tiny-YOLO V2 for COCO, or tiny-YOLO V2 for the Pascal-VOC
52  dataset with 20 object categories. See the module's \b params.cfg file to switch network.
53 
54  - The 80 COCO object categories are: person, bicycle, car, motorbike, aeroplane, bus, train, truck, boat, traffic,
55  fire, stop, parking, bench, bird, cat, dog, horse, sheep, cow, elephant, bear, zebra, giraffe, backpack, umbrella,
56  handbag, tie, suitcase, frisbee, skis, snowboard, sports, kite, baseball, baseball, skateboard, surfboard, tennis,
57  bottle, wine, cup, fork, knife, spoon, bowl, banana, apple, sandwich, orange, broccoli, carrot, hot, pizza, donut,
58  cake, chair, sofa, pottedplant, bed, diningtable, toilet, tvmonitor, laptop, mouse, remote, keyboard, cell,
59  microwave, oven, toaster, sink, refrigerator, book, clock, vase, scissors, teddy, hair, toothbrush.
60 
61  - The 20 Pascal-VOC object categories are: aeroplane, bicycle, bird, boat, bottle, bus, car, cat, chair, cow,
62  diningtable, dog, horse, motorbike, person, pottedplant, sheep, sofa, train, tvmonitor.
63 
64  Sometimes it will make mistakes! The performance of yolov3-tiny is about 33.1% correct (mean average precision) on
65  the COCO test set.
66 
67  \youtube{d5CfljT5kec}
68 
69  Speed and network size
70  ----------------------
71 
72  The parameter \p netin allows you to rescale the neural network to the specified size. Beware that this will only
73  work if the network used is fully convolutional (as is the case of the default tiny-yolo network). This not only
74  allows you to adjust processing speed (and, conversely, accuracy), but also to better match the network to the input
75  images (e.g., the default size for tiny-yolo is 416x416, and, thus, passing it a input image of size 640x480 will
76  result in first scaling that input to 416x312, then letterboxing it by adding gray borders on top and bottom so that
77  the final input to the network is 416x416). This letterboxing can be completely avoided by just resizing the network
78  to 320x240.
79 
80  Here are expected processing speeds for yolov2-tiny-voc:
81  - when netin = [0 0], processes letterboxed 416x416 inputs, about 2450ms/image
82  - when netin = [320 240], processes 320x240 inputs, about 1350ms/image
83  - when netin = [160 120], processes 160x120 inputs, about 695ms/image
84 
85  YOLO V3 is faster, more accurate, uses less memory, and can detect 80 COCO categories:
86  - when netin = [320 240], processes 320x240 inputs, about 870ms/image
87 
88  \youtube{77VRwFtIe8I}
89 
90  Serial messages
91  ---------------
92 
93  When detections are found which are above threshold, one message will be sent for each detected
94  object (i.e., for each box that gets drawn when USB outputs are used), using a standardized 2D message:
95  + Serial message type: \b 2D
96  + `id`: the category of the recognized object, followed by ':' and the confidence score in percent
97  + `x`, `y`, or vertices: standardized 2D coordinates of object center or corners
98  + `w`, `h`: standardized object size
99  + `extra`: any number of additional category:score pairs which had an above-threshold score for that box
100 
101  See \ref UserSerialStyle for more on standardized serial messages, and \ref coordhelpers for more info on
102  standardized coordinates.
103 
104  @author Laurent Itti
105 
106  @displayname Darknet YOLO
107  @videomapping NONE 0 0 0.0 YUYV 640 480 0.4 JeVois DarknetYOLO
108  @videomapping YUYV 1280 480 15.0 YUYV 640 480 15.0 JeVois DarknetYOLO
109  @email itti\@usc.edu
110  @address University of Southern California, HNB-07A, 3641 Watt Way, Los Angeles, CA 90089-2520, USA
111  @copyright Copyright (C) 2017 by Laurent Itti, iLab and the University of Southern California
112  @mainurl http://jevois.org
113  @supporturl http://jevois.org/doc
114  @otherurl http://iLab.usc.edu
115  @license GPL v3
116  @distribution Unrestricted
117  @restrictions None
118  \ingroup modules */
120  public jevois::Parameter<netin>
121 {
122  public:
123  // ####################################################################################################
124  //! Constructor
125  // ####################################################################################################
126  DarknetYOLO(std::string const & instance) : jevois::StdModule(instance)
127  {
128  itsYolo = addSubComponent<Yolo>("yolo");
129  }
130 
131  // ####################################################################################################
132  //! Virtual destructor for safe inheritance
133  // ####################################################################################################
134  virtual ~DarknetYOLO()
135  { }
136 
137  // ####################################################################################################
138  //! Un-initialization
139  // ####################################################################################################
140  virtual void postUninit() override
141  {
142  try { itsPredictFut.get(); } catch (...) { }
143  }
144 
145  // ####################################################################################################
146  //! Processing function, no video output
147  // ####################################################################################################
148  virtual void process(jevois::InputFrame && inframe) override
149  {
150  int ready = true; float ptime = 0.0F;
151 
152  // Wait for next available camera image:
153  jevois::RawImage const inimg = inframe.get();
154  unsigned int const w = inimg.width, h = inimg.height;
155 
156  // Convert input image to RGB for predictions:
157  cv::Mat cvimg = jevois::rawimage::convertToCvRGB(inimg);
158 
159  // Resize the network and/or the input if desired:
160  cv::Size nsz = netin::get();
161  if (nsz.width != 0 && nsz.height != 0)
162  {
163  itsYolo->resizeInDims(nsz.width, nsz.height);
164  itsNetInput = jevois::rescaleCv(cvimg, nsz);
165  }
166  else
167  {
168  itsYolo->resizeInDims(cvimg.cols, cvimg.rows);
169  itsNetInput = cvimg;
170  }
171 
172  cvimg.release();
173 
174  // Let camera know we are done processing the input image:
175  inframe.done();
176 
177  // Launch the predictions, will throw logic_error if we are still loading the network:
178  try { ptime = itsYolo->predict(itsNetInput); } catch (std::logic_error const & e) { ready = false; }
179 
180  if (ready)
181  {
182  LINFO("Predicted in " << ptime << "ms");
183 
184  // Compute the boxes:
185  itsYolo->computeBoxes(w, h);
186 
187  // Send serial results:
188  itsYolo->sendSerial(this, w, h);
189  }
190  }
191 
192  // ####################################################################################################
193  //! Processing function with video output to USB
194  // ####################################################################################################
195  virtual void process(jevois::InputFrame && inframe, jevois::OutputFrame && outframe) override
196  {
197  static jevois::Timer timer("processing", 50, LOG_DEBUG);
198 
199  // Wait for next available camera image:
200  jevois::RawImage const inimg = inframe.get();
201 
202  timer.start();
203 
204  // We only handle one specific pixel format, and any image size in this module:
205  unsigned int const w = inimg.width, h = inimg.height;
206  inimg.require("input", w, h, V4L2_PIX_FMT_YUYV);
207 
208  // While we process it, start a thread to wait for out frame and paste the input into it:
209  jevois::RawImage outimg;
210  auto paste_fut = std::async(std::launch::async, [&]() {
211  outimg = outframe.get();
212  outimg.require("output", w * 2, h, inimg.fmt);
213 
214  // Paste the current input image:
215  jevois::rawimage::paste(inimg, outimg, 0, 0);
216  jevois::rawimage::writeText(outimg, "JeVois Darknet YOLO - input", 3, 3, jevois::yuyv::White);
217 
218  // Paste the latest prediction results, if any, otherwise a wait message:
219  cv::Mat outimgcv = jevois::rawimage::cvImage(outimg);
220  if (itsRawPrevOutputCv.empty() == false)
221  itsRawPrevOutputCv.copyTo(outimgcv(cv::Rect(w, 0, w, h)));
222  else
223  {
224  jevois::rawimage::drawFilledRect(outimg, w, 0, w, h, jevois::yuyv::Black);
225  jevois::rawimage::writeText(outimg, "JeVois Darknet YOLO - loading network - please wait...",
226  w + 3, 3, jevois::yuyv::White);
227  }
228  });
229 
230  // Decide on what to do based on itsPredictFut: if it is valid, we are still predicting, so check whether we are
231  // done and if so draw the results. Otherwise, start predicting using the current input frame:
232  if (itsPredictFut.valid())
233  {
234  // Are we finished predicting?
235  if (itsPredictFut.wait_for(std::chrono::milliseconds(5)) == std::future_status::ready)
236  {
237  // Do a get() on our future to free up the async thread and get any exception it might have thrown. In
238  // particular, it will throw a logic_error if we are still loading the network:
239  bool success = true; float ptime = 0.0F;
240  try { ptime = itsPredictFut.get(); } catch (std::logic_error const & e) { success = false; }
241 
242  // Wait for paste to finish up:
243  paste_fut.get();
244 
245  // Let camera know we are done processing the input image:
246  inframe.done();
247 
248  if (success)
249  {
250  cv::Mat outimgcv = jevois::rawimage::cvImage(outimg);
251 
252  // Update our output image: First paste the image we have been making predictions on:
253  if (itsRawPrevOutputCv.empty()) itsRawPrevOutputCv = cv::Mat(h, w, CV_8UC2);
254  itsRawInputCv.copyTo(outimgcv(cv::Rect(w, 0, w, h)));
255 
256  // Then draw the detections:
257  itsYolo->drawDetections(outimg, w, h, w, 0);
258 
259  // Send serial messages:
260  itsYolo->sendSerial(this, w, h);
261 
262  // Draw some text messages:
263  jevois::rawimage::writeText(outimg, "JeVois Darknet YOLO - predictions", w + 3, 3, jevois::yuyv::White);
264  jevois::rawimage::writeText(outimg, "YOLO predict time: " + std::to_string(int(ptime)) + "ms",
265  w + 3, h - 13, jevois::yuyv::White);
266 
267  // Finally make a copy of these new results so we can display them again while we wait for the next round:
268  outimgcv(cv::Rect(w, 0, w, h)).copyTo(itsRawPrevOutputCv);
269  }
270  }
271  else
272  {
273  // Future is not ready, do nothing except drawings on this frame (done in paste_fut thread) and we will try
274  // again on the next one...
275  paste_fut.get();
276  inframe.done();
277  }
278  }
279  else
280  {
281  // Note: resizeInDims() could throw if the network is not ready yet.
282  try
283  {
284  // Convert input image to RGB for predictions:
285  cv::Mat cvimg = jevois::rawimage::convertToCvRGB(inimg);
286 
287  // Also make a raw YUYV copy of the input image for later displays:
288  cv::Mat inimgcv = jevois::rawimage::cvImage(inimg);
289  inimgcv.copyTo(itsRawInputCv);
290 
291  // Resize the network if desired:
292  cv::Size nsz = netin::get();
293  if (nsz.width != 0 && nsz.height != 0)
294  {
295  itsYolo->resizeInDims(nsz.width, nsz.height);
296  itsNetInput = jevois::rescaleCv(cvimg, nsz);
297  }
298  else
299  {
300  itsYolo->resizeInDims(cvimg.cols, cvimg.rows);
301  itsNetInput = cvimg;
302  }
303 
304  cvimg.release();
305 
306  // Launch the predictions:
307  itsPredictFut = std::async(std::launch::async, [&](int ww, int hh)
308  {
309  float pt = itsYolo->predict(itsNetInput);
310  itsYolo->computeBoxes(ww, hh);
311  return pt;
312  }, w, h);
313  }
314  catch (std::logic_error const & e) { }
315 
316  // Wait for paste to finish up:
317  paste_fut.get();
318 
319  // Let camera know we are done processing the input image:
320  inframe.done();
321  }
322 
323  // Show processing fps:
324  std::string const & fpscpu = timer.stop();
325  jevois::rawimage::writeText(outimg, fpscpu, 3, h - 13, jevois::yuyv::White);
326 
327  // Send the output image with our processing results to the host over USB:
328  outframe.send();
329  }
330 
331  // ####################################################################################################
332  protected:
333  std::shared_ptr<Yolo> itsYolo;
334  std::future<float> itsPredictFut;
335  cv::Mat itsRawInputCv;
337  cv::Mat itsNetInput;
338 };
339 
340 // Allow the module to be loaded as a shared object (.so) file:
cv::Mat convertToCvRGB(RawImage const &src)
cv::Mat cvImage(RawImage const &src)
cv::Mat itsNetInput
Definition: DarknetYOLO.C:337
virtual void postUninit() override
Un-initialization.
Definition: DarknetYOLO.C:140
void writeText(RawImage &img, std::string const &txt, int x, int y, unsigned int col, Font font=Font6x10)
unsigned int height
unsigned int fmt
#define success()
virtual ~DarknetYOLO()
Virtual destructor for safe inheritance.
Definition: DarknetYOLO.C:134
std::shared_ptr< Yolo > itsYolo
Definition: DarknetYOLO.C:333
Detect multiple objects in scenes using the Darknet YOLO deep neural network.
Definition: DarknetYOLO.C:119
cv::Mat itsRawInputCv
Definition: DarknetYOLO.C:335
DarknetYOLO(std::string const &instance)
Constructor.
Definition: DarknetYOLO.C:126
StdModule(std::string const &instance)
JEVOIS_REGISTER_MODULE(DarknetYOLO)
cv::Mat itsRawPrevOutputCv
Definition: DarknetYOLO.C:336
std::string const & stop()
JEVOIS_DECLARE_PARAMETER(camparams, std::string, "File stem of camera parameters, or empty. Camera resolution " "will be appended, as well as a .cfg extension. For example, specifying 'camera_para' " "here and running the camera sensor at 320x240 will attempt to load " "camera_para320x240.dat from within the module's directory.", "camera_para", ParamCateg)
Parameter.
void drawFilledRect(RawImage &img, int x, int y, unsigned int w, unsigned int h, unsigned int col)
virtual void process(jevois::InputFrame &&inframe) override
Processing function, no video output.
Definition: DarknetYOLO.C:148
std::string to_string(T const &val)
std::future< float > itsPredictFut
Definition: DarknetYOLO.C:334
#define LINFO(msg)
cv::Mat rescaleCv(cv::Mat const &img, cv::Size const &newdims)
unsigned int width
virtual void process(jevois::InputFrame &&inframe, jevois::OutputFrame &&outframe) override
Processing function with video output to USB.
Definition: DarknetYOLO.C:195
void paste(RawImage const &src, RawImage &dest, int dx, int dy)
void require(char const *info, unsigned int w, unsigned int h, unsigned int f) const