JeVois v1.9.0

In this tutorial, we program an Arduino to decode the results of JeVois modules that detect and identify multiple object in scenes, sending one message with information about the bounding box, object category, and recognition score for each detection.

Example modules with these outputs are DarknetYOLO, TensorFlowSaliency, DetectionDNN, and DarknetSaliency.

This tutorial directly builds on JeVois + Arduino: blink for X, which you go through first.

Setting up

We start with the same Arduino board and hardware hookup as in JeVois + Arduino: blink for X
The message format that these modules output is described in Standardized serial messages formatting, in the section on Object detection + recognition messages, which itself refers to the section on Two-dimensional (2D) location messages. Indeed, the messages describe the bounding box of the object (2D location message), with information about object category and recognition scores in the id and extra fields of the 2D location messages.
To get a feel for these messages, let's fire up JeVois Inventor and launch DarknetYOLO. In the Console tab of the Inventor, we turn on serial messages to USB and 4-pin, and we select message detail level Normal so we get some information about each bounding box and its top-scoring object category:

In the example above, we are detecting a dog, a bicycle, and a car on each video frame. Hence, 3 messages of type N2 are sent by JeVois on each frame. Note that here we don't know which message came from which frame. If you need to know, review Standardized serial messages formatting and look for parameter serstamp which can be set to pre-pend a frame number to each serial message. We will not use this here.
The messages are as follows (from Standardized serial messages formatting):

N2 category:score left top width height

Note that the coordinates are in the JeVois standardized coordinates system described in Helper functions to convert coordinates from camera resolution to standardized, where:
- center of the camera's field of view is at x=0, y=0
- left edge of the camera image is always at x=-1000
- right edge of the camera image is always at x=1000
- top edge of the camera image is usually at y=-750 (unless camera image aspect ratio is not 4:3)
- bottom edge of the camera image is usually at y=750
This is so that detections reported by JeVois are independent of the camera resolution at which JeVois is grabbing frames (e.g., 320x240 or 640x480).
Note that by default, the floating-point precision of the standardized messages is zero digit after the decimal point, i.e., we get integer scores and coordinates. If you change that using the parameter serprec described in Standardized serial messages formatting, you can get more precise floating-point values (e.g., try setpar serprec 3 in the Console of JeVois Inventor). For the code below, we will assume floating point values which could be integers as well.

Writing the code

We will use a state machine approach as in JeVois + Arduino: blink for X, just now it has a few more states because we have a total of 6 tokens to decode for each message.

For the sake of developing a non-trivial example, let's say we want to turn on the LED of the Arduino when we detect a dog at least 200 units wide (i.e., the bounding box around the dog should be at least as wide as 1/10th of the field of view, and the full field of view is 2000 standardized units wide as explained above).

We extend the state machine code developed in JeVois + Arduino: blink for X as follows:

// JeVois + Arduino blink for X from YOLO
 
// Pin for LED, will turn on as we detect the desired object:
#define LEDPIN 17
 
// Serial port to use: on chips with USB (e.g., 32u4), that usually is Serial1.
// On chips without USB, use Serial:
#define SERIAL Serial1
 
// Buffer for received serial port bytes:
#define INLEN 256
char instr[INLEN + 1];
 
// Our desired object: should be one of the 1000 ImageNet category names
#define CATEGORY "dog"
 
// Our desired minimum object width in standardized coordinates:
#define MIN_WIDTH 200.0F
 
void setup()
{
  SERIAL.begin(115200);
  SERIAL.setTimeout(500);
  
  pinMode(LEDPIN, OUTPUT);
  digitalWrite(LEDPIN, HIGH);
}
 
void loop()
{
  byte len = SERIAL.readBytesUntil('\n', instr, INLEN);
  instr[len] = 0;
 
  char * tok = strtok(instr, " \r\n");
  int state = 0, i; float score, left, top, width, height;
  
  while (tok)
  {
    // State machine:
    // 0: start parsing; if we get N2, move to state 1, otherwise state 1000
    // 1: decode category name; if it is the one we want, move to state 2, otherwise state 1000
    // 2: decode left and move to state 3
    // 3: decode top and move to state 4
    // 4: decode width and move to state 5
    // 5: decode height and move to state 6
    // 6: we got a full message, stay in this state until we run out of tokens
    // 1000: we stay in this state until we run out of tokens
    switch (state)
    {
      // First token should be: N2
    case 0:
      if (strcmp(tok, "N2") == 0) state = 1; else state = 1000;
      // We are done with this token. Break from the switch() statement
      break;
      
      // Second token should be: category:score
    case 1:
      // Find the ':' between category and score:
      i = strlen(tok) - 1;
      while (i >= 0 && tok[i] != ':') --i;
      
      // If i is >= 0, we found a ':'; terminate the tok string at that ':':
      if (i >= 0)
      {
        tok[i] = '\0';
        score = atof(&tok[i+1]);
      }
      
      // Is the category name what we want?
      if (strcmp(tok, CATEGORY) == 0) state = 2; else state = 1000;
      
      // We are done with this token. Break from the switch() statement
      break;
      
      // Third token: left
    case 2:
      left = atof(tok);
      state = 3;
      break;
      
      // Fourth token: top
    case 3:
      top = atof(tok);
      state = 4;
      break;
      
      // Fifth token: width
    case 4:
      width = atof(tok);
      state = 5;
      break;
      
      // Sixth token: height
    case 5:
      height = atof(tok);
      state = 6;
      // We got a whole message!
      break;
      
      // In any other state: do nothing
    default:
      break;
    }
    
    // Move to the next token:
    tok = strtok(0, " \r\n");
  }
  
  // If we are in state 6, we successfully parsed a whole message and the category is a match.
  // We just need to test the width and activate the LED accordingly:
  if (state == 6 && width >= MIN_WIDTH)
    digitalWrite(LEDPIN, LOW); // turn LED on (it has inverted logic)
  else
    digitalWrite(LEDPIN, HIGH); // turn LED off
}

A few notes:

lines 1-36: The preliminaries are as in JeVois + Arduino: blink for X, except that we change the category name to dog (line 15) and we define MIN_WIDTH to be the minimum desired object width (line 18).
lines 39-47: We decide on the various states for our state machine.
line 52: Now we look for N2 instead of DO in JeVois + Arduino: blink for X
lines 76-98: We decode left, top, width and height one at a time. Note that those could be floating point depending on serprec, hence we use atof() to decode them.
line 111: We just check that we are in state 6 (complete decoding went through, and category matched) and that the width is large enough; if so, turn on the LED, otherwise turn it off.

Compile and upload the code to your Arduino and here you go!

Woohoo, the LED turns on when JeVois detects a dog that is big enough!

Note that in scenes where JeVois also detects other things, the code as written will turn off the LED. So it may only briefly blink if something else is detected (e.g., the bicycle in the above scene) just after the dog is.

Going further

Check out these other tutorials. They use similar state machine decoding: