JeVoisBase  1.21
JeVois Smart Embedded Machine Vision Toolkit Base Modules
Share this page:
Loading...
Searching...
No Matches
TensorFlowSingle.C
Go to the documentation of this file.
1// ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////
2//
3// JeVois Smart Embedded Machine Vision Toolkit - Copyright (C) 2016 by Laurent Itti, the University of Southern
4// California (USC), and iLab at USC. See http://iLab.usc.edu and http://jevois.org for information about this project.
5//
6// This file is part of the JeVois Smart Embedded Machine Vision Toolkit. This program is free software; you can
7// redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software
8// Foundation, version 2. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;
9// without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public
10// License for more details. You should have received a copy of the GNU General Public License along with this program;
11// if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
12//
13// Contact information: Laurent Itti - 3641 Watt Way, HNB-07A - Los Angeles, CA 90089-2520 - USA.
14// Tel: +1 213 740 3527 - itti@pollux.usc.edu - http://iLab.usc.edu - http://jevois.org
15// ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////
16/*! \file */
17
18#include <jevois/Core/Module.H>
19#include <jevois/Debug/Timer.H>
21#include <opencv2/core/core.hpp>
22#include <opencv2/imgproc/imgproc.hpp>
24
25// icon from tensorflow youtube
26
27//! Identify objects using TensorFlow deep neural network
28/*! TensorFlow is a popular neural network framework. This module identifies the object in a square region in the center
29 of the camera field of view using a deep convolutional neural network.
30
31 The deep network analyzes the image by filtering it using many different filter kernels, and several stacked passes
32 (network layers). This essentially amounts to detecting the presence of both simple and complex parts of known
33 objects in the image (e.g., from detecting edges in lower layers of the network to detecting car wheels or even
34 whole cars in higher layers). The last layer of the network is reduced to a vector with one entry per known kind of
35 object (object class). This module returns the class names of the top scoring candidates in the output vector, if
36 any have scored above a minimum confidence threshold. When nothing is recognized with sufficiently high confidence,
37 there is no output.
38
39 \youtube{TRk8rCuUVEE}
40
41 This module runs a TensorFlow network and shows the top-scoring results. Larger deep networks can be a bit slow,
42 hence the network prediction is only run once in a while. Point your camera towards some interesting object, make
43 the object fit in the picture shown at right (which will be fed to the neural network), keep it stable, and wait for
44 TensorFlow to tell you what it found. The framerate figures shown at the bottom left of the display reflect the
45 speed at which each new video frame from the camera is processed, but in this module this just amounts to converting
46 the image to RGB, sending it to the neural network for processing in a separate thread, and creating the demo
47 display. Actual network inference speed (time taken to compute the predictions on one image) is shown at the bottom
48 right. See below for how to trade-off speed and accuracy.
49
50 Note that by default this module runs different flavors of MobileNets trained on the ImageNet dataset. There are
51 1000 different kinds of objects (object classes) that these networks can recognize (too long to list here). The
52 input layer of these networks is 299x299, 224x224, 192x192, 160x160, or 128x128 pixels by default, depending on the
53 network used. This modules takes a crop at the center of the video image, with size determined by the USB video
54 size: the crop size is USB output width - 2 - camera sensor image width. With the default network parameters, this
55 module hence requires at least 320x240 camera sensor resolution. The networks provided on the JeVois microSD image
56 have been trained on large clusters of GPUs, using 1.2 million training images from the ImageNet dataset.
57
58 For more information about MobileNets, see
59 https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md
60
61 For more information about the ImageNet dataset used for training, see
62 http://www.image-net.org/challenges/LSVRC/2012/
63
64 Sometimes this module will make mistakes! The performance of mobilenets is about 40% to 70% correct (mean average
65 precision) on the test set, depending on network size (bigger networks are more accurate but slower).
66
67 Neural network size and speed
68 -----------------------------
69
70 When using a video mapping with USB output, the cropped window sent to the network is automatically sized to a
71 square size that is the difference between the USB output video width and the camera sensor input width minus 16
72 pixels (e.g., when USB video mode is 560x240 and camera sensor mode is 320x240, the network will be resized to
73 224x224 since 224=560-16-320).
74
75 The network actual input size varies depending on which network is used; for example, mobilenet_v1_0.25_128_quant
76 expects 128x128 input images, while mobilenet_v1_1.0_224 expects 224x224. We automatically rescale the cropped
77 window to the network's desired input size. Note that there is a cost to rescaling, so, for best performance, you
78 should match the USB output width to be the camera sensor width + 2 + network input width.
79
80 For example:
81
82 - with USB output 464x240 (crop size 128x128), mobilenet_v1_0.25_128_quant (network size 128x128), runs at about
83 8ms/prediction (125 frames/s).
84
85 - with USB output 464x240 (crop size 128x128), mobilenet_v1_0.5_128_quant (network size 128x128), runs at about
86 18ms/prediction (55 frames/s).
87
88 - with USB output 560x240 (crop size 224x224), mobilenet_v1_0.25_224_quant (network size 224x224), runs at about
89 24ms/prediction (41 frames/s).
90
91 - with USB output 560x240 (crop size 224x224), mobilenet_v1_1.0_224_quant (network size 224x224), runs at about
92 139ms/prediction (7 frames/s).
93
94 When using a videomapping with no USB output, the image crop is directly taken to match the network input size, so
95 that no resizing occurs.
96
97 Note that network dims must always be such that they fit inside the camera input image.
98
99 To easily select one of the available networks, see <B>JEVOIS:/modules/JeVois/TensorFlowSingle/params.cfg</B> on the
100 microSD card of your JeVois camera.
101
102 Serial messages
103 ---------------
104
105 When detections are found with confidence scores above \p thresh, a message containing up to \p top category:score
106 pairs will be sent per video frame. Exact message format depends on the current \p serstyle setting and is described
107 in \ref UserSerialStyle. For example, when \p serstyle is \b Detail, this module sends:
108
109 \verbatim
110 DO category:score category:score ... category:score
111 \endverbatim
112
113 where \a category is a category name (from \p namefile) and \a score is the confidence score from 0.0 to 100.0 that
114 this category was recognized. The pairs are in order of decreasing score.
115
116 See \ref UserSerialStyle for more on standardized serial messages, and \ref coordhelpers for more info on
117 standardized coordinates.
118
119 Using your own network
120 ----------------------
121
122 For a step-by-step tutorial, see [Training custom TensorFlow networks for
123 JeVois](http://jevois.org/tutorials/UserTensorFlowTraining.html).
124
125 This module supports RGB or grayscale inputs, byte or float32. You should create and train your network using fast
126 GPUs, and then follow the instruction here to convert your trained network to TFLite format:
127
128 https://www.tensorflow.org/lite/
129
130 Then you just need to create a directory under <b>JEVOIS:/share/tensorflow/</B> with the name of your network, and,
131 in there, two files, \b labels.txt with the category labels, and \b model.tflite with your model converted to
132 TensorFlow Lite (flatbuffer format). Finally, edit <B>JEVOIS:/modules/JeVois/TensorFlowEasy/params.cfg</B> to
133 select your new network when the module is launched.
134
135
136 @author Laurent Itti
137
138 @displayname TensorFlow Single
139 @videomapping NONE 0 0 0.0 YUYV 320 240 30.0 JeVois TensorFlowSingle
140 @videomapping YUYV 560 240 15.0 YUYV 320 240 15.0 JeVois TensorFlowSingle
141 @videomapping YUYV 464 240 15.0 YUYV 320 240 15.0 JeVois TensorFlowSingle
142 @videomapping YUYV 880 480 15.0 YUYV 640 480 15.0 JeVois TensorFlowSingle
143 @email itti\@usc.edu
144 @address University of Southern California, HNB-07A, 3641 Watt Way, Los Angeles, CA 90089-2520, USA
145 @copyright Copyright (C) 2017 by Laurent Itti, iLab and the University of Southern California
146 @mainurl http://jevois.org
147 @supporturl http://jevois.org/doc
148 @otherurl http://iLab.usc.edu
149 @license GPL v3
150 @distribution Unrestricted
151 @restrictions None
152 \ingroup modules */
154{
155 public:
156 // ####################################################################################################
157 //! Constructor
158 // ####################################################################################################
159 TensorFlowSingle(std::string const & instance) : jevois::StdModule(instance)
160 {
161 itsTensorFlow = addSubComponent<TensorFlow>("tf");
162 }
163
164 // ####################################################################################################
165 //! Virtual destructor for safe inheritance
166 // ####################################################################################################
168 { }
169
170 // ####################################################################################################
171 //! Un-initialization
172 // ####################################################################################################
173 virtual void postUninit() override
174 {
175 try { itsPredictFut.get(); } catch (...) { }
176 }
177
178 // ####################################################################################################
179 //! Processing function, no video output
180 // ####################################################################################################
181 virtual void process(jevois::InputFrame && inframe) override
182 {
183 // Wait for next available camera image:
184 jevois::RawImage const inimg = inframe.get();
185 int const w = inimg.width, h = inimg.height;
186
187 // Check input vs network dims, will throw if network not ready:
188 int netw, neth, netc;
189 try { itsTensorFlow->getInDims(netw, neth, netc); }
190 catch (std::logic_error const & e) { inframe.done(); return; }
191
192 if (netw > w || neth > h)
193 LFATAL("Network wants " << netw << 'x' << neth << " input, larger than camera " << w << 'x' << h);
194
195 // Take a central crop of the input, with size given by network input:
196 int const offx = ((w - netw) / 2) & (~1);
197 int const offy = ((h - neth) / 2) & (~1);
198
199 cv::Mat cvimg = jevois::rawimage::cvImage(inimg);
200 cv::Mat crop = cvimg(cv::Rect(offx, offy, netw, neth));
201
202 // Convert crop to RGB for predictions:
203 cv::cvtColor(crop, itsCvImg, cv::COLOR_YUV2RGB_YUYV);
204
205 // Let camera know we are done processing the input image:
206 inframe.done();
207
208 // Launch the predictions (do not catch exceptions, we already tested for network ready in this block):
209 float const ptime = itsTensorFlow->predict(itsCvImg, itsResults);
210 LINFO("Predicted in " << ptime << "ms");
211
212 // Send serial results:
214 }
215
216 // ####################################################################################################
217 //! Processing function with video output to USB
218 // ####################################################################################################
219 virtual void process(jevois::InputFrame && inframe, jevois::OutputFrame && outframe) override
220 {
221 static jevois::Timer timer("processing", 30, LOG_DEBUG);
222
223 // Wait for next available camera image:
224 jevois::RawImage const inimg = inframe.get();
225
226 timer.start();
227
228 // We only handle one specific pixel format, but any image size in this module:
229 int const w = inimg.width, h = inimg.height;
230 inimg.require("input", w, h, V4L2_PIX_FMT_YUYV);
231
232 // While we process it, start a thread to wait for out frame and paste the input into it:
233 jevois::RawImage outimg;
234 auto paste_fut = jevois::async([&]() {
235 outimg = outframe.get();
236 outimg.require("output", outimg.width, outimg.height, V4L2_PIX_FMT_YUYV);
237
238 // Paste the current input image:
239 jevois::rawimage::paste(inimg, outimg, 0, 0);
240 jevois::rawimage::writeText(outimg, "JeVois TensorFlow - input", 3, 3, jevois::yuyv::White);
241
242 // Draw a 16-pixel wide rectangle:
244
245 // Paste the latest prediction results, if any, otherwise a wait message:
246 cv::Mat outimgcv = jevois::rawimage::cvImage(outimg);
247 if (itsRawPrevOutputCv.empty() == false)
248 itsRawPrevOutputCv.copyTo(outimgcv(cv::Rect(w + 16, 0, itsRawPrevOutputCv.cols, itsRawPrevOutputCv.rows)));
249 else
250 {
251 jevois::rawimage::drawFilledRect(outimg, w + 16, 0, outimg.width - w, h, jevois::yuyv::Black);
252 jevois::rawimage::writeText(outimg, "Loading network -", w + 19, 3, jevois::yuyv::White);
253 jevois::rawimage::writeText(outimg, "please wait...", w + 19, 15, jevois::yuyv::White);
254 }
255 });
256
257 // Decide on what to do based on itsPredictFut: if it is valid, we are still predicting, so check whether we are
258 // done and if so draw the results. Otherwise, start predicting using the current input frame:
259 if (itsPredictFut.valid())
260 {
261 // Are we finished predicting?
262 if (itsPredictFut.wait_for(std::chrono::milliseconds(5)) == std::future_status::ready)
263 {
264 // Do a get() on our future to free up the async thread and get any exception it might have thrown. In
265 // particular, it will throw a logic_error if we are still loading the network:
266 bool success = true; float ptime = 0.0F;
267 try { ptime = itsPredictFut.get(); } catch (std::logic_error const & e) { success = false; }
268
269 // Wait for paste to finish up and let camera know we are done processing the input image:
270 paste_fut.get(); inframe.done();
271
272 if (success)
273 {
274 int const cropw = itsRawInputCv.cols, croph = itsRawInputCv.rows;
275 cv::Mat outimgcv = jevois::rawimage::cvImage(outimg);
276
277 // Update our output image: First paste the image we have been making predictions on:
278 itsRawInputCv.copyTo(outimgcv(cv::Rect(w + 16, 0, cropw, croph)));
279 jevois::rawimage::drawFilledRect(outimg, w + 16, croph, cropw, h - croph, jevois::yuyv::Black);
281
282 // Then draw the detections: either below the detection crop if there is room, or on top of it if not enough
283 // room below:
284 int y = croph + 3; if (y + int(itsTensorFlow->top::get()) * 12 > h - 21) y = 3;
285
286 for (auto const & p : itsResults)
287 {
288 jevois::rawimage::writeText(outimg, jevois::sformat("%s: %.2F", p.category.c_str(), p.score),
289 w + 19, y, jevois::yuyv::White);
290 y += 12;
291 }
292
293 // Send serial results:
295
296 // Draw some text messages:
297 jevois::rawimage::writeText(outimg, "Predict time: " + std::to_string(int(ptime)) + "ms",
298 w + 19, h - 11, jevois::yuyv::White);
299
300 // Finally make a copy of these new results so we can display them again while we wait for the next round:
301 itsRawPrevOutputCv = cv::Mat(h, cropw, CV_8UC2);
302 outimgcv(cv::Rect(w + 16, 0, cropw, h)).copyTo(itsRawPrevOutputCv);
303
304 } else { itsRawPrevOutputCv.release(); } // network is not ready yet
305 }
306 else
307 {
308 // Future is not ready, do nothing except drawings on this frame (done in paste_fut thread) and we will try
309 // again on the next one...
310 paste_fut.get(); inframe.done();
311 }
312 }
313 else // We are not predicting, launch a prediction:
314 {
315 // Wait for paste to finish up:
316 paste_fut.get();
317
318 // In this module, we use square crops for the network, with size given by USB width - camera width:
319 if (outimg.width < inimg.width + 16) LFATAL("USB output image must be larger than camera input");
320 int const cropw = outimg.width - inimg.width - 16; // 16 pix separator to distinguish darknet vs tensorflow
321 int const croph = cropw; // square crop
322
323 // Check input vs network dims:
324 if (cropw <= 0 || croph <= 0 || cropw > w || croph > h)
325 LFATAL("Network crop window must fit within camera frame");
326
327 // Take a central crop of the input:
328 int const offx = ((w - cropw) / 2) & (~1);
329 int const offy = ((h - croph) / 2) & (~1);
330 cv::Mat cvimg = jevois::rawimage::cvImage(inimg);
331 cv::Mat crop = cvimg(cv::Rect(offx, offy, cropw, croph));
332
333 // Convert crop to RGB for predictions:
334 cv::cvtColor(crop, itsCvImg, cv::COLOR_YUV2RGB_YUYV);
335
336 // Also make a raw YUYV copy of the crop for later displays:
337 crop.copyTo(itsRawInputCv);
338
339 // Let camera know we are done processing the input image:
340 inframe.done();
341
342 // Rescale the cropped image to network dims if needed:
343 try
344 {
345 int netinw, netinh, netinc; itsTensorFlow->getInDims(netinw, netinh, netinc);
346 itsCvImg = jevois::rescaleCv(itsCvImg, cv::Size(netinw, netinh));
347
348 // Launch the predictions:
350 { return itsTensorFlow->predict(itsCvImg, itsResults); });
351 }
352 catch (std::logic_error const & e) { itsRawPrevOutputCv.release(); } // network is not ready yet
353 }
354
355 // Show processing fps:
356 std::string const & fpscpu = timer.stop();
357 jevois::rawimage::writeText(outimg, fpscpu, 3, h - 13, jevois::yuyv::White);
358
359 // Send the output image with our processing results to the host over USB:
360 outframe.send();
361 }
362
363 // ####################################################################################################
364 protected:
365 std::shared_ptr<TensorFlow> itsTensorFlow;
366 std::vector<jevois::ObjReco> itsResults;
367 std::future<float> itsPredictFut;
369 cv::Mat itsCvImg;
371};
372
373// Allow the module to be loaded as a shared object (.so) file:
JEVOIS_REGISTER_MODULE(ArUcoBlob)
int h
#define success()
Identify objects using TensorFlow deep neural network.
virtual ~TensorFlowSingle()
Virtual destructor for safe inheritance.
virtual void process(jevois::InputFrame &&inframe) override
Processing function, no video output.
std::future< float > itsPredictFut
virtual void postUninit() override
Un-initialization.
std::shared_ptr< TensorFlow > itsTensorFlow
virtual void process(jevois::InputFrame &&inframe, jevois::OutputFrame &&outframe) override
Processing function with video output to USB.
std::vector< jevois::ObjReco > itsResults
TensorFlowSingle(std::string const &instance)
Constructor.
unsigned int width
unsigned int height
void require(char const *info, unsigned int w, unsigned int h, unsigned int f) const
void sendSerialObjReco(std::vector< ObjReco > const &res)
StdModule(std::string const &instance)
std::string const & stop(double *seconds)
#define LFATAL(msg)
#define LINFO(msg)
void paste(RawImage const &src, RawImage &dest, int dx, int dy)
cv::Mat cvImage(RawImage const &src)
void writeText(RawImage &img, std::string const &txt, int x, int y, unsigned int col, Font font=Font6x10)
void drawFilledRect(RawImage &img, int x, int y, unsigned int w, unsigned int h, unsigned int col)
cv::Mat rescaleCv(cv::Mat const &img, cv::Size const &newdims)
std::future< std::invoke_result_t< std::decay_t< Function >, std::decay_t< Args >... > > async(Function &&f, Args &&... args)
std::string sformat(char const *fmt,...) __attribute__((format(__printf__
unsigned short constexpr Black
unsigned short constexpr White
unsigned short constexpr MedGrey