Darknet is a popular neural network framework. This module first finds the most conspicuous (salient) object in the scene, then identifies it using a deep neural network. It returns the top scoring candidates.
This module runs a Darknet network on an image window around the most salient point and shows the top-scoring results. The network is currently a bit slow, hence it is only run once in a while. Point your camera towards some interesting object, and wait for Darknet to tell you what it found.
Note that by default this module runs the Imagenet1k tiny Darknet (it can also run the slightly slower but a bit more accurate Darknet Reference network; see parameters). There are 1000 different kinds of objects (object classes) that this network can recognize (too long to list here).
Sometimes it will make mistakes! The performance of darknet-tiny is about 58.7% correct (mean average precision) on the test set, and Darknet Reference is about 61.1% correct on the test set. This is when running these networks at 224x224 network input resolution (see parameter netin).
Neural network size and speed
When using networks that are fully convolutional (as is the case for the default networks provided with this module), one can resize the network to any desired input size. The network size direcly affects both speed and accuracy. Larger networks run slower but are more accurate.
This module provides two parameters that allow you to adjust this tradeoff:
foa determines the size of a region of interest that is cropped around the most salient location
netin determines the size to which that region of interest is rescaled and fed to the neural network
with netin = (224 224), this module runs at about 450ms/prediction.
with netin = (128 128), this module runs at about 180ms/prediction.
Finally note that, when using video mappings with USB output, irrespective of foa and netin, the crop around the most salient image region (with size given by foa) will always be rescaled so that, when placed to the right of the input image, it fills the desired USB output dims. For example, if camera mode is 320x240 and USB output size is 544x240, then the attended and recognized object will be rescaled to 224x224 (since 224 = 544-320) for display purposes only. This is so that one does not need to change USB video resolution while playing with diffrent values of foa and netin live.
On every frame where detection results were obtained, this module sends a message
T2 x y
where framenum is the frame number (starts at 0). The T2 message is a standardized message about the location and size of the salient region of interest in which the object was found. The message can be customized, see Standardized serial messages formatting.
In addition, when detections are found which are above threshold, up to top messages will be sent, for those category candidates that have scored above thresh:
DKR category score
where category is the category name (from namefile) and score is the confidence score from 0.0 to 100.0
Width and height (in pixels) of the focus of attention. This is the size of the image crop that is taken around the most salient location in each frame. The foa size must fit within the camera input frame size.
Width and height (in pixels) of the neural network input layer. This is the size to which the image crop taken around the most salient location in each frame will be rescaled before feeding to the neural network.