JeVois Tutorials  1.6
JeVois Smart Embedded Machine Vision Tutorials
Share this page:
Surprise-based surveillance camera

JeVois can directly process video input from its camera sensor, and record results to microSD. Here we will develop a new module, which records a short video clip each time a surprising event is detected in the video stream, so that the event is captured to microSD. This is an enhanced version of standard motion-based video surveillance systems, which only record video when things are moving. One drawback of such systems is that repetitive motions, such as foliage moving in the wind, easily trip the motion detection algorithm and could potentially trigger recording of a lot of data.

This tutorial will show you how to

This is a fairly long and detailed tutorial. To get you excited, let us look at the payoff upfront: Here is an hour-long surveillance video. It is very boring overall, except that a few brief surprising things occur (a few seconds each). Can you find them?

Here is what the SurpriseRecorder module we develop in this tutorial found (4 true events plus 2 false alarms):

That is, it summarized 1 hour of video into 6 snippets of about 12 seconds each (50x reduction: you just watch slightly over a minute of surprising snippets instead of 1 hour of mostly boring footage). Upon closer inspection after this tutorial results video was made, the last detected event actually appears to be a bird flying very quickly across the video frames. So, as far as surprise is concerned, this is actually a hit rather than a false alarm noted in the results video (i.e., it was a surprising event, although possibly not relevant to a surveillance goal - see here for recent highly related work on relevance). So we got 5 hits and 1 false alarm. No misses as far as we can tell by watching the full hour-long video, i.e., our module did detect all the boats (and birds) that passed by. Not bad at all!

Theory of operation

We will use Itti & Baldi's theory of surprise to detect surprising events in video.

They defined surprise in a formal, quantitative manner (for the first time!), as follows: An observation is surprising if it significantly affects the internal (subjective) beliefs of an observer. For example, if I believe that there is a 10% chance of rain today (my prior belief), and then I look outside and I see only a few small scattered clouds, then I may still believe in that same 10% chance of rain (posterior belief after the observation). My observation was not surprising, and Itti & Baldi say that this is because it did not affect my beliefs. Formally, when my posterior beliefs after an observation are very similar to what my prior beliefs were before the observation, the observation carries no surprise. In contrast, if I see a sky covered with menacing dark clouds all over, I may revise my belief to a 80% chance of rain today. Because my posterior beliefs are now much different than my prior beliefs (80% vs 10% chance of rain), the observation of clouds is said to carry a high surprise. Itti & Baldi further specify how to compute surprise by using Bayes' theorem to compute posterior beliefs in a principled way, and by using the Kullback-Leibler (KL) divergence to measure the difference between posterior and prior distributions of beliefs. This gives rise to a new quantitative measure of surprise, with a new unit, the wow (one wow of surprise is experienced when your belief in something doubles).

For more information, check out L. Itti, P. F. Baldi, Bayesian Surprise Attracts Human Attention, Vision Research, Vol. 49, No. 10, pp. 1295-1306, May 2009

Here, we will:

This approach is related to [R. C. Voorhies, L. Elazary, L. Itti, Neuromorphic Bayesian Surprise for Far Range Event Detection, In Proc 9th IEEE AVSS, Beijing, China, Sep 2012](http://ilab.usc.edu/publications/doc/Voorhies_etal12avss.pdf)

Plan of attack

Note
Because the result of this tutorial is expected to be useful to many, the source code has been committed into jevoisbase, and hence all the code for this tutorial is already in jevoisbase. However, this tutorial was written while that code was developed and before it was committed, to make sure that all the steps are detailed and explained.

Surprise component

We are done with the Surprise component. Final code is in Surprise.H and Surprise.C of jevoisbase and should be pretty close to what we have developed above, except for small optimizations introduced after this tutorial was written. The details of KLgamma() are also in there.

SurpriseRecorder module

We are now ready to develop a new module, which we will call SurpriseRecorder. It will compute surprise and record to microSD small video snippets around each detected surprising event.

To get started, we:

Test run on the host computer

Let us first try our new module on the host computer. We will use YUYV 640x480 @ 15 fps.

To provide video input, let us use our JeVois camera, configured in "dumb camera mode": We add a video mapping on our microSD that allows it to just output YUYV 640x480 @ 15 fps using the PassThrough module (no processing on JeVois). Then we will run jevois-daemon on our host, grab that format of video, and process it on the host:

Profiling to determine how fast this can run

The JeVois framework provides convenient jevois::Timer and jevois::Profiler classes to help you measure how much time it takes to do things on each frame. This will help us decide what standard videomapping we should suggest for our surprise recorder. Both classes operate in the same way:

The timer and profiler classes will accumulate average statistics over 100 frames and will display those once in a while. We do not display on every frame as this could slow us down too much, especially if sending those reports over serial port.

Let us first include the profiler declarations so we can use it:

Let us instrument our process() function with a jevois::Profiler as follows. The new lines below all have the word prof in them, look for it, and we also added some //////////////////////////////////////////////////////// markers to help:

void process(jevois::InputFrame && inframe) override
{
static jevois::Profiler prof("surpriserecorder"); ////////////////////////////////////////////////////////
// Wait for next available camera image:
jevois::RawImage inimg = inframe.get(); unsigned int const w = inimg.width, h = inimg.height;
inimg.require("input", w, h, V4L2_PIX_FMT_YUYV); // accept any image size but require YUYV pixels
prof.start(); ////////////////////////////////////////////////////////
// Compute surprise in a thread:
std::future<double> itsSurpFut =
std::async(std::launch::async, [&]() { return itsSurprise->process(inimg); } );
prof.checkpoint("surprise launched"); ////////////////////////////////////////////////////////
// Convert the image to OpenCV BGR and push into our context buffer:
cv::Mat cvimg = jevois::rawimage::convertToCvBGR(inimg);
itsCtxBuf.push_back(cvimg);
if (itsCtxBuf.size() > ctxframes::get()) itsCtxBuf.pop_front();
prof.checkpoint("image pushed"); ////////////////////////////////////////////////////////
// Wait until our surprise thread is done:
double surprise = itsSurpFut.get(); // this could throw and that is ok
//LINFO("surprise = " << surprise << " itsToSave = " << itsToSave);
prof.checkpoint("surprise done"); ////////////////////////////////////////////////////////
// Let camera know we are done processing the raw input image:
inframe.done();
// If the current frame is surprising, check whether we are already saving. If so, just push the current frame for
// saving and reset itsToSave to full context length (after the event). Otherwise, keep saving until the context
// after the event is exhausted:
if (surprise >= thresh::get())
{
// Draw a rectangle on surprising frames. Note that we draw it in cvimg but, since the pixel memory is shared
// with the copy of it we just pushed into itsCtxBuf, the rectangle will get drawn in there too:
cv::rectangle(cvimg, cv::Point(3, 3), cv::Point(w-4, h-4), cv::Scalar(0,0,255), 7);
if (itsToSave)
{
// we are still saving the context after the previous event, just add our new one:
itsBuf.push(cvimg);
// Reset the number of frames we will save after the end of the event:
itsToSave = ctxframes::get();
}
else
{
// Start of a new event. Dump the whole itsCtxBuf to the writer:
for (cv::Mat const & im : itsCtxBuf) itsBuf.push(im);
// Initialize the number of frames we will save after the end of the event:
itsToSave = ctxframes::get();
}
}
else if (itsToSave)
{
// No more surprising event, but we are still saving the context after the last one:
itsBuf.push(cvimg);
// One more context frame after the last event was saved:
--itsToSave;
// Last context frame after the event was just pushed? If so, push an empty frame as well to close the current
// video file. We will open a new file on the next surprising event:
if (itsToSave == 0) itsBuf.push(cv::Mat());
}
prof.stop(); ////////////////////////////////////////////////////////
}

Now, every 100 frames, you will see something like this:

INF Profiler::stop: surpriserecorder overall average (100) duration 15.4445ms [11.2414ms .. 22.1041ms] (64.7478 fps)
INF Profiler::stop: surpriserecorder - surprise launched average (100) delta duration 43.7507us [27.532us .. 77.293us] (22856.8 fps)
INF Profiler::stop: surpriserecorder - image pushed average (100) delta duration 950.279us [501.272us .. 1.96373ms] (1052.32 fps)
INF Profiler::stop: surpriserecorder - surprise done average (100) delta duration 14.4426ms [10.6092ms .. 20.9499ms] (69.2396 fps)

The overall average is the time from start() to stop(). The others are for checkpoints and they report the time between start to first checkpoint, then from first to second checkpoint, etc. Durations displayed will depend on how fast your host computer is.

On the host this is not very useful, so let us run this puppy on the JeVois camera now that everything seems to be working well.

Compiling and installing to JeVois smart camera

We basically follow the standard compilation instructions (see Flashing to microSD card).

Fine-tuning your algorithms using canned data

Sometimes, it is useful to be able to run an algorithm on a pre-recorded video sequence to fine-tune it. Here, for example, we might want to tune the threshold, update factor, channels, of the algorithm in a systematic manner using always the same data. The JeVois framework allows for this, simply by specifying a video file as cameradev when starting jevois-daemon (see The jevois-daemon executable).

Here, we will use an hour-long 320x240 video that was posted live on the web several years ago as part of the now defunct blueservo project. These cameras were recording live outdoors video near the border between Texas and Mexico, and citizens were asked to watch those and to call the sheriff if they saw anything suspect.

We will run out tests on the host. The same could work on JeVois.

Additional activities

You could add the following to this module: