JeVoisBase  1.22
JeVois Smart Embedded Machine Vision Toolkit Base Modules
Share this page:
Loading...
Searching...
No Matches
DarknetSaliency.C
Go to the documentation of this file.
1// ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////
2//
3// JeVois Smart Embedded Machine Vision Toolkit - Copyright (C) 2016 by Laurent Itti, the University of Southern
4// California (USC), and iLab at USC. See http://iLab.usc.edu and http://jevois.org for information about this project.
5//
6// This file is part of the JeVois Smart Embedded Machine Vision Toolkit. This program is free software; you can
7// redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software
8// Foundation, version 2. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;
9// without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public
10// License for more details. You should have received a copy of the GNU General Public License along with this program;
11// if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
12//
13// Contact information: Laurent Itti - 3641 Watt Way, HNB-07A - Los Angeles, CA 90089-2520 - USA.
14// Tel: +1 213 740 3527 - itti@pollux.usc.edu - http://iLab.usc.edu - http://jevois.org
15// ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////
16/*! \file */
17
18#include <jevois/Core/Module.H>
19#include <jevois/Debug/Timer.H>
21#include <opencv2/core/core.hpp>
22#include <opencv2/imgproc/imgproc.hpp>
25
26// icon from https://pjreddie.com/darknet/
27
28static jevois::ParameterCategory const ParamCateg("Darknet Saliency Options");
29
30//! Parameter \relates DarknetSaliency
31JEVOIS_DECLARE_PARAMETER(foa, cv::Size, "Width and height (in pixels) of the focus of attention. "
32 "This is the size of the image crop that is taken around the most salient "
33 "location in each frame. The foa size must fit within the camera input frame size.",
34 cv::Size(128, 128), ParamCateg);
35
36//! Parameter \relates DarknetSaliency
37JEVOIS_DECLARE_PARAMETER(netin, cv::Size, "Width and height (in pixels) of the neural network input "
38 "layer. This is the size to which the image crop taken around the most salient "
39 "location in each frame will be rescaled before feeding to the neural network.",
40 cv::Size(128, 128), ParamCateg);
41
42
43//! Detect salient objects and identify them using Darknet deep neural network
44/*! Darknet is a popular neural network framework. This module first finds the most conspicuous (salient) object in the
45 scene, then identifies it using a deep neural network. It returns the top scoring candidates.
46
47 See http://ilab.usc.edu/bu/ for more information about saliency detection, and https://pjreddie.com/darknet for more
48 information about the Darknet deep neural network framework.
49
50 This module runs a Darknet network on an image window around the most salient point and shows the top-scoring
51 results. The network is currently a bit slow, hence it is only run once in a while. Point your camera towards some
52 interesting object, and wait for Darknet to tell you what it found. The framerate figures shown at the bottom left
53 of the display reflect the speed at which each new video frame from the camera is processed, but in this module this
54 just amounts to computing the saliency map from the camera input, converting the input image to RGB, cropping it
55 around the most salient location, sending it to the neural network for processing in a separate thread, and creating
56 the demo display. Actual network inference speed (time taken to compute the predictions on one image crop) is shown
57 at the bottom right. See below for how to trade-off speed and accuracy.
58
59 Note that by default this module runs the Imagenet1k tiny Darknet (it can also run the slightly slower but a bit
60 more accurate Darknet Reference network; see parameters). There are 1000 different kinds of objects (object classes)
61 that this network can recognize (too long to list here).
62
63 Sometimes it will make mistakes! The performance of darknet-tiny is about 58.7% correct (mean average precision) on
64 the test set, and Darknet Reference is about 61.1% correct on the test set. This is when running these networks at
65 224x224 network input resolution (see parameter \p netin below).
66
67 \youtube{77VRwFtIe8I}
68
69 Neural network size and speed
70 -----------------------------
71
72 When using networks that are fully convolutional (as is the case for the default networks provided with this
73 module), one can resize the network to any desired input size. The network size direcly affects both speed and
74 accuracy. Larger networks run slower but are more accurate.
75
76 This module provides two parameters that allow you to adjust this tradeoff:
77 - \p foa determines the size of a region of interest that is cropped around the most salient location
78 - \p netin determines the size to which that region of interest is rescaled and fed to the neural network
79
80 For example:
81
82 - with netin = (224 224), this module runs at about 450ms/prediction.
83 - with netin = (128 128), this module runs at about 180ms/prediction.
84
85 Finally note that, when using video mappings with USB output, irrespective of \p foa and \p netin, the crop around
86 the most salient image region (with size given by \p foa) will always also be rescaled so that, when placed to the
87 right of the input image, it fills the desired USB output dims. For example, if camera mode is 320x240 and USB
88 output size is 544x240, then the attended and recognized object will be rescaled to 224x224 (since 224 = 544-320)
89 for display purposes only. This is so that one does not need to change USB video resolution while playing with
90 different values of \p foa and \p netin live.
91
92 Serial messages
93 ---------------
94
95 On every frame where detection results were obtained that are above \p thresh, this module sends a standardized 2D
96 message as specified in \ref UserSerialStyle
97 + Serial message type: \b 2D
98 + `id`: top-scoring category name of the recognized object, followed by ':' and the confidence score in percent
99 + `x`, `y`, or vertices: standardized 2D coordinates of object center or corners
100 + `w`, `h`: standardized object size
101 + `extra`: any number of additional category:score pairs which had an above-threshold score, in order of
102 decreasing score
103 where \a category is the category name (from \p namefile) and \a score is the confidence score from 0.0 to 100.0
104
105 See \ref UserSerialStyle for more on standardized serial messages, and \ref coordhelpers for more info on
106 standardized coordinates.
107
108
109 @author Laurent Itti
110
111 @displayname Darknet Saliency
112 @videomapping NONE 0 0 0.0 YUYV 320 240 5.0 JeVois DarknetSaliency
113 @videomapping YUYV 460 240 15.0 YUYV 320 240 15.0 JeVois DarknetSaliency # not for mac (width not multiple of 16)
114 @videomapping YUYV 560 240 15.0 YUYV 320 240 15.0 JeVois DarknetSaliency
115 @videomapping YUYV 880 480 15.0 YUYV 640 480 15.0 JeVois DarknetSaliency # set foa param to 256 256
116 @email itti\@usc.edu
117 @address University of Southern California, HNB-07A, 3641 Watt Way, Los Angeles, CA 90089-2520, USA
118 @copyright Copyright (C) 2017 by Laurent Itti, iLab and the University of Southern California
119 @mainurl http://jevois.org
120 @supporturl http://jevois.org/doc
121 @otherurl http://iLab.usc.edu
122 @license GPL v3
123 @distribution Unrestricted
124 @restrictions None
125 \ingroup modules */
127 public jevois::Parameter<foa, netin>
128{
129 public:
130 // ####################################################################################################
131 //! Constructor
132 // ####################################################################################################
133 DarknetSaliency(std::string const & instance) : jevois::StdModule(instance)
134 {
135 itsSaliency = addSubComponent<Saliency>("saliency");
136 itsDarknet = addSubComponent<Darknet>("darknet");
137 }
138
139 // ####################################################################################################
140 //! Virtual destructor for safe inheritance
141 // ####################################################################################################
143 { }
144
145 // ####################################################################################################
146 //! Un-initialization
147 // ####################################################################################################
148 virtual void postUninit() override
149 {
150 try { itsPredictFut.get(); } catch (...) { }
151 }
152
153 // ####################################################################################################
154 //! Helper function: compute saliency ROI in a thread, return top-left corner and size
155 // ####################################################################################################
156 virtual void getSalROI(jevois::RawImage const & inimg, int & rx, int & ry, int & rw, int & rh)
157 {
158 int const w = inimg.width, h = inimg.height;
159
160 // Check whether the input image size is small, in which case we will scale the maps up one notch:
161 if (w < 170) { itsSaliency->centermin::set(1); itsSaliency->smscale::set(3); }
162 else { itsSaliency->centermin::set(2); itsSaliency->smscale::set(4); }
163
164 // Find the most salient location, no gist for now:
165 itsSaliency->process(inimg, false);
166
167 // Get some info from the saliency computation:
168 int const smlev = itsSaliency->smscale::get();
169 int const smfac = (1 << smlev);
170
171 // Find most salient point:
172 int mx, my; intg32 msal; itsSaliency->getSaliencyMax(mx, my, msal);
173
174 // Compute attended ROI (note: coords must be even to avoid flipping U/V when we later paste):
175 cv::Size roisiz = foa::get(); rw = roisiz.width; rh = roisiz.height;
176 rw = std::min(rw, w); rh = std::min(rh, h); rw &= ~1; rh &= ~1;
177 unsigned int const dmx = (mx << smlev) + (smfac >> 2);
178 unsigned int const dmy = (my << smlev) + (smfac >> 2);
179 rx = int(dmx + 1 + smfac / 4) - rw / 2;
180 ry = int(dmy + 1 + smfac / 4) - rh / 2;
181 rx = std::max(0, std::min(rx, w - rw));
182 ry = std::max(0, std::min(ry, h - rh));
183 rx &= ~1; ry &= ~1;
184 }
185
186 // ####################################################################################################
187 //! Processing function, no video output
188 // ####################################################################################################
189 virtual void process(jevois::InputFrame && inframe) override
190 {
191 // Wait for next available camera image:
192 jevois::RawImage const inimg = inframe.get();
193 unsigned int const w = inimg.width, h = inimg.height;
194
195 // Find the most salient location, no gist for now:
196 int rx, ry, rw, rh;
197 getSalROI(inimg, rx, ry, rw, rh);
198
199 // Extract a raw YUYV ROI around attended point:
200 cv::Mat rawimgcv = jevois::rawimage::cvImage(inimg);
201 cv::Mat rawroi = rawimgcv(cv::Rect(rx, ry, rw, rh));
202
203 // Convert the ROI to RGB:
204 cv::Mat rgbroi;
205 cv::cvtColor(rawroi, rgbroi, cv::COLOR_YUV2RGB_YUYV);
206
207 // Let camera know we are done processing the input image:
208 inframe.done();
209
210 // Launch the predictions, will throw if network is not ready:
211 itsResults.clear();
212 try
213 {
214 int netinw, netinh, netinc; itsDarknet->getInDims(netinw, netinh, netinc);
215
216 // Scale the ROI if needed:
217 cv::Mat scaledroi = jevois::rescaleCv(rgbroi, cv::Size(netinw, netinh));
218
219 // Predict:
220 float const ptime = itsDarknet->predict(scaledroi, itsResults);
221 LINFO("Predicted in " << ptime << "ms");
222
223 // Send serial results and switch to next frame:
224 sendSerialObjDetImg2D(w, h, rx + rw/2, ry + rh/2, rw, rh, itsResults);
225 }
226 catch (std::logic_error const & e) { } // network still loading
227 }
228
229 // ####################################################################################################
230 //! Processing function with video output to USB
231 // ####################################################################################################
232 virtual void process(jevois::InputFrame && inframe, jevois::OutputFrame && outframe) override
233 {
234 static jevois::Timer timer("processing", 30, LOG_DEBUG);
235
236 // Wait for next available camera image:
237 jevois::RawImage const inimg = inframe.get();
238
239 timer.start();
240
241 // We only handle one specific pixel format, but any image size in this module:
242 unsigned int const w = inimg.width, h = inimg.height;
243 inimg.require("input", w, h, V4L2_PIX_FMT_YUYV);
244
245 // Launch the saliency computation in a thread:
246 int rx, ry, rw, rh;
247 auto sal_fut = jevois::async([&](){ this->getSalROI(inimg, rx, ry, rw, rh); });
248
249 // While we process it, start a thread to wait for out frame and paste the input into it:
250 jevois::RawImage outimg;
251 auto paste_fut = jevois::async([&]() {
252 outimg = outframe.get();
253 outimg.require("output", outimg.width, outimg.height, V4L2_PIX_FMT_YUYV);
254
255 // Paste the current input image:
256 jevois::rawimage::paste(inimg, outimg, 0, 0);
257 jevois::rawimage::writeText(outimg, "JeVois Darknet Saliency", 3, 3, jevois::yuyv::White);
258
259 // Paste the latest prediction results, if any, otherwise a wait message:
260 cv::Mat outimgcv = jevois::rawimage::cvImage(outimg);
261 if (itsRawPrevOutputCv.empty() == false)
262 itsRawPrevOutputCv.copyTo(outimgcv(cv::Rect(w, 0, itsRawPrevOutputCv.cols, itsRawPrevOutputCv.rows)));
263 else
264 {
265 jevois::rawimage::drawFilledRect(outimg, w, 0, outimg.width - w, h, jevois::yuyv::Black);
266 jevois::rawimage::writeText(outimg, "Loading network -", w + 3, 3, jevois::yuyv::White);
267 jevois::rawimage::writeText(outimg, "please wait...", w + 3, 15, jevois::yuyv::White);
268 }
269 });
270
271 // Decide on what to do based on itsPredictFut: if it is valid, we are still predicting, so check whether we are
272 // done and if so draw the results. Otherwise, start predicting using the current input frame:
273 if (itsPredictFut.valid())
274 {
275 // Are we finished predicting?
276 if (itsPredictFut.wait_for(std::chrono::milliseconds(5)) == std::future_status::ready)
277 {
278 // Do a get() on our future to free up the async thread and get any exception it might have thrown. In
279 // particular, it will throw a logic_error if we are still loading the network:
280 bool success = true; float ptime = 0.0F;
281 try { ptime = itsPredictFut.get(); } catch (std::logic_error const & e) { success = false; }
282
283 // Wait for paste to finish up and let camera know we are done processing the input image:
284 paste_fut.get(); inframe.done();
285
286 if (success)
287 {
288 int const dispw = itsRawInputCv.cols, disph = itsRawInputCv.rows;
289 cv::Mat outimgcv = jevois::rawimage::cvImage(outimg);
290
291 // Update our output image: First paste the image we have been making predictions on:
292 itsRawInputCv.copyTo(outimgcv(cv::Rect(w, 0, dispw, disph)));
293 jevois::rawimage::drawFilledRect(outimg, w, disph, dispw, h - disph, jevois::yuyv::Black);
294
295 // Then draw the detections: either below the detection crop if there is room, or on top of it if not enough
296 // room below:
297 int y = disph + 3; if (y + itsDarknet->top::get() * 12 > h - 21) y = 3;
298
299 for (auto const & p : itsResults)
300 {
301 jevois::rawimage::writeText(outimg, jevois::sformat("%s: %.2F", p.category.c_str(), p.score),
302 w + 3, y, jevois::yuyv::White);
303 y += 12;
304 }
305
306 // Send serial results:
307 sal_fut.get();
308 sendSerialObjDetImg2D(w, h, rx + rw/2, ry + rh/2, rw, rh, itsResults);
309
310 // Draw some text messages:
311 jevois::rawimage::writeText(outimg, "Predict time: " + std::to_string(int(ptime)) + "ms",
312 w + 3, h - 11, jevois::yuyv::White);
313
314 // Finally make a copy of these new results so we can display them again while we wait for the next round:
315 itsRawPrevOutputCv = cv::Mat(h, dispw, CV_8UC2);
316 outimgcv(cv::Rect(w, 0, dispw, h)).copyTo(itsRawPrevOutputCv);
317 }
318 }
319 else
320 {
321 // Future is not ready, do nothing except drawings on this frame (done in paste_fut thread) and we will try
322 // again on the next one...
323 paste_fut.get(); sal_fut.get(); inframe.done();
324 }
325 }
326 else // We are not predicting: start new predictions
327 {
328 // Wait for paste to finish up. Also wait for saliency to finish up so that rx, ry, rw, rh are available:
329 paste_fut.get(); sal_fut.get();
330
331 // Extract a raw YUYV ROI around attended point:
332 cv::Mat rawimgcv = jevois::rawimage::cvImage(inimg);
333 cv::Mat rawroi = rawimgcv(cv::Rect(rx, ry, rw, rh));
334
335 // Convert the ROI to RGB:
336 cv::Mat rgbroi;
337 cv::cvtColor(rawroi, rgbroi, cv::COLOR_YUV2RGB_YUYV);
338
339 // Let camera know we are done processing the input image:
340 inframe.done();
341
342 // Scale the ROI if needed to the desired network input dims:
343 itsCvImg = jevois::rescaleCv(rgbroi, netin::get());
344
345 // Also scale the ROI to the desired output size, i.e., USB width - camera width:
346 float fac = float(outimg.width - w) / float(rgbroi.cols);
347 cv::Size displaysize(outimg.width - w, int(rgbroi.rows * fac + 0.4999F));
348 cv::Mat displayroi = jevois::rescaleCv(rgbroi, displaysize);
349
350 // Convert back the display ROI to YUYV and store for later display, while we are still computing the network
351 // predictions on that ROI:
353
354 // Launch the predictions; will throw if network is not ready:
355 try
356 {
357 int netinw, netinh, netinc; itsDarknet->getInDims(netinw, netinh, netinc); // will throw if not ready
358 itsPredictFut = jevois::async([&]() { return itsDarknet->predict(itsCvImg, itsResults); });
359 }
360 catch (std::logic_error const & e) { itsRawPrevOutputCv.release(); } // network is not ready yet
361 }
362
363 // Show processing fps:
364 std::string const & fpscpu = timer.stop();
365 jevois::rawimage::writeText(outimg, fpscpu, 3, h - 13, jevois::yuyv::White);
366
367 // Show attended location:
368 jevois::rawimage::drawFilledRect(outimg, rx + rw/2 - 4, ry + rh/2 - 4, 8, 8, jevois::yuyv::LightPink);
369 jevois::rawimage::drawRect(outimg, rx, ry, rw, rh, 2, jevois::yuyv::LightPink);
370
371 // Send the output image with our processing results to the host over USB:
372 outframe.send();
373 }
374
375 // ####################################################################################################
376 protected:
377 std::shared_ptr<Saliency> itsSaliency;
378 std::shared_ptr<Darknet> itsDarknet;
379 std::vector<jevois::ObjReco> itsResults;
380 std::future<float> itsPredictFut;
382 cv::Mat itsCvImg;
384 };
385
386// Allow the module to be loaded as a shared object (.so) file:
JEVOIS_REGISTER_MODULE(ArUcoBlob)
int h
#define success()
Detect salient objects and identify them using Darknet deep neural network.
JEVOIS_DECLARE_PARAMETER(netin, cv::Size, "Width and height (in pixels) of the neural network input " "layer. This is the size to which the image crop taken around the most salient " "location in each frame will be rescaled before feeding to the neural network.", cv::Size(128, 128), ParamCateg)
Parameter.
JEVOIS_DECLARE_PARAMETER(foa, cv::Size, "Width and height (in pixels) of the focus of attention. " "This is the size of the image crop that is taken around the most salient " "location in each frame. The foa size must fit within the camera input frame size.", cv::Size(128, 128), ParamCateg)
Parameter.
DarknetSaliency(std::string const &instance)
Constructor.
virtual void process(jevois::InputFrame &&inframe, jevois::OutputFrame &&outframe) override
Processing function with video output to USB.
std::shared_ptr< Darknet > itsDarknet
virtual void getSalROI(jevois::RawImage const &inimg, int &rx, int &ry, int &rw, int &rh)
Helper function: compute saliency ROI in a thread, return top-left corner and size.
std::shared_ptr< Saliency > itsSaliency
virtual void postUninit() override
Un-initialization.
std::future< float > itsPredictFut
virtual void process(jevois::InputFrame &&inframe) override
Processing function, no video output.
virtual ~DarknetSaliency()
Virtual destructor for safe inheritance.
std::vector< jevois::ObjReco > itsResults
unsigned int width
unsigned int height
void require(char const *info, unsigned int w, unsigned int h, unsigned int f) const
void sendSerialObjDetImg2D(unsigned int camw, unsigned int camh, float x, float y, float w, float h, std::vector< ObjReco > const &res)
StdModule(std::string const &instance)
std::string const & stop(double *seconds)
ENV_INTG32_TYPE intg32
32-bit signed integer
Definition env_types.h:52
#define LINFO(msg)
void paste(RawImage const &src, RawImage &dest, int dx, int dy)
cv::Mat cvImage(RawImage const &src)
void writeText(RawImage &img, std::string const &txt, int x, int y, unsigned int col, Font font=Font6x10)
void drawFilledRect(RawImage &img, int x, int y, unsigned int w, unsigned int h, unsigned int col)
cv::Mat rescaleCv(cv::Mat const &img, cv::Size const &newdims)
void convertCvRGBtoCvYUYV(cv::Mat const &src, cv::Mat &dst)
void drawRect(RawImage &img, int x, int y, unsigned int w, unsigned int h, unsigned int thick, unsigned int col)
std::future< std::invoke_result_t< std::decay_t< Function >, std::decay_t< Args >... > > async(Function &&f, Args &&... args)
std::string sformat(char const *fmt,...) __attribute__((format(__printf__
unsigned short constexpr Black
unsigned short constexpr LightPink
unsigned short constexpr White