Darknet Saliency
Detect salient objects and identify them using Darknet deep neural network.
By Laurent Ittiitti@usc.eduhttp://jevois.orgGPL v3
 Language: C++Supports mappings with USB output: YesSupports mappings with NO USB output: Yes 
 Video Mapping:   NONE 0 0 0.0 YUYV 320 240 5.0 JeVois DarknetSaliency
 Video Mapping:   YUYV 460 240 15.0 YUYV 320 240 15.0 JeVois DarknetSaliency # not for mac (width not multiple of 16)
 Video Mapping:   YUYV 560 240 15.0 YUYV 320 240 15.0 JeVois DarknetSaliency
 Video Mapping:   YUYV 880 480 15.0 YUYV 640 480 15.0 JeVois DarknetSaliency # set foa param to 256 256
 Video Mapping:   YUYV 880 480 15.0 YUYV 640 480 15.0 JeVois DarknetSaliency

Module Documentation

Darknet is a popular neural network framework. This module first finds the most conspicuous (salient) object in the scene, then identifies it using a deep neural network. It returns the top scoring candidates.

See http://ilab.usc.edu/bu/ for more information about saliency detection, and https://pjreddie.com/darknet for more information about the Darknet deep neural network framework.

This module runs a Darknet network on an image window around the most salient point and shows the top-scoring results. The network is currently a bit slow, hence it is only run once in a while. Point your camera towards some interesting object, and wait for Darknet to tell you what it found. The framerate figures shown at the bottom left of the display reflect the speed at which each new video frame from the camera is processed, but in this module this just amounts to computing the saliency map from the camera input, converting the input image to RGB, cropping it around the most salient location, sending it to the neural network for processing in a separate thread, and creating the demo display. Actual network inference speed (time taken to compute the predictions on one image crop) is shown at the bottom right. See below for how to trade-off speed and accuracy.

Note that by default this module runs the Imagenet1k tiny Darknet (it can also run the slightly slower but a bit more accurate Darknet Reference network; see parameters). There are 1000 different kinds of objects (object classes) that this network can recognize (too long to list here).

Sometimes it will make mistakes! The performance of darknet-tiny is about 58.7% correct (mean average precision) on the test set, and Darknet Reference is about 61.1% correct on the test set. This is when running these networks at 224x224 network input resolution (see parameter netin below).

Neural network size and speed

When using networks that are fully convolutional (as is the case for the default networks provided with this module), one can resize the network to any desired input size. The network size direcly affects both speed and accuracy. Larger networks run slower but are more accurate.

This module provides two parameters that allow you to adjust this tradeoff:

  • foa determines the size of a region of interest that is cropped around the most salient location
  • netin determines the size to which that region of interest is rescaled and fed to the neural network

For example:

  • with netin = (224 224), this module runs at about 450ms/prediction.
  • with netin = (128 128), this module runs at about 180ms/prediction.

Finally note that, when using video mappings with USB output, irrespective of foa and netin, the crop around the most salient image region (with size given by foa) will always also be rescaled so that, when placed to the right of the input image, it fills the desired USB output dims. For example, if camera mode is 320x240 and USB output size is 544x240, then the attended and recognized object will be rescaled to 224x224 (since 224 = 544-320) for display purposes only. This is so that one does not need to change USB video resolution while playing with different values of foa and netin live.

Serial messages

On every frame where detection results were obtained that are above thresh, this module sends a standardized 2D message as specified in UserSerialStyle:

  • Serial message type: 2D
  • id: top-scoring category name of the recognized object, followed by ':' and the confidence score in percent
  • x, y, or vertices: standardized 2D coordinates of object center or corners
  • w, h: standardized object size
  • extra: any number of additional category:score pairs which had an above-threshold score, in order of decreasing score where category is the category name (from namefile) and score is the confidence score from 0.0 to 100.0

See Standardized serial messages formatting for more on standardized serial messages, and Helper functions to convert coordinates from camera resolution to standardized for more info on standardized coordinates.

ParameterTypeDescriptionDefaultValid Values
(DarknetSaliency) foacv::SizeWidth and height (in pixels) of the focus of attention. This is the size of the image crop that is taken around the most salient location in each frame. The foa size must fit within the camera input frame size.cv::Size(128, 128)-
(DarknetSaliency) netincv::SizeWidth and height (in pixels) of the neural network input layer. This is the size to which the image crop taken around the most salient location in each frame will be rescaled before feeding to the neural network.cv::Size(128, 128)-
(Darknet) netwNetNetwork to load. This meta-parameter sets parameters dataroot, datacfg, cfgfile, weightfile, and namefile for the chosen network.Net::TinyNet_Values
(Darknet) datarootstd::stringRoot path for data, config, and weight files. If empty, use the module's path.JEVOIS_SHARE_PATH /darknet/single-
(Darknet) datacfgstd::stringData configuration file (if relative, relative to dataroot)cfg/imagenet1k.data-
(Darknet) cfgfilestd::stringNetwork configuration file (if relative, relative to dataroot)cfg/tiny.cfg-
(Darknet) weightfilestd::stringNetwork weights file (if relative, relative to dataroot)weights/tiny.weights-
(Darknet) namefilestd::stringCategory names file, or empty to fetch it from the network config file (if relative, relative to dataroot)-
(Darknet) topunsigned intMax number of top-scoring predictions that score above thresh to return5-
(Darknet) threshfloatThreshold (in percent confidence) above which predictions will be reported20.0Fjevois::Range<float>(0.0F, 100.0F)
(Darknet) threadsintNumber of parallel computation threads6jevois::Range<int>(1, 1024)
(Saliency) cweightbyteColor channel weight255-
(Saliency) iweightbyteIntensity channel weight255-
(Saliency) oweightbyteOrientation channel weight255-
(Saliency) fweightbyteFlicker channel weight255-
(Saliency) mweightbyteMotion channel weight255-
(Saliency) centerminsize_tLowest (finest) of the 3 center scales2-
(Saliency) deltaminsize_tLowest (finest) of the 2 center-surround delta scales3-
(Saliency) smscalesize_tScale of the saliency map4-
(Saliency) mthreshbyteMotion threshold0-
(Saliency) fthreshbyteFlicker threshold0-
(Saliency) msflickboolUse multiscale flicker computationfalse-
Detailed docs:DarknetSaliency
Copyright:Copyright (C) 2017 by Laurent Itti, iLab and the University of Southern California
License:GPL v3
Distribution:Unrestricted
Restrictions:None
Support URL:http://jevois.org/doc
Other URL:http://iLab.usc.edu
Address:University of Southern California, HNB-07A, 3641 Watt Way, Los Angeles, CA 90089-2520, USA