JeVois Tutorials  1.17
JeVois Smart Embedded Machine Vision Tutorials
Share this page:
JeVois + Arduino: Decoding object detection boxes

JeVois v1.9.0

In this tutorial, we program an Arduino to decode the results of JeVois modules that detect and identify multiple object in scenes, sending one message with information about the bounding box, object category, and recognition score for each detection.

Example modules with these outputs are DarknetYOLO, TensorFlowSaliency, DetectionDNN, and DarknetSaliency.

This tutorial directly builds on JeVois + Arduino: blink for X, which you go through first.

Setting up

  • We start with the same Arduino board and hardware hookup as in JeVois + Arduino: blink for X
  • The message format that these modules output is described in Standardized serial messages formatting, in the section on Object detection + recognition messages, which itself refers to the section on Two-dimensional (2D) location messages. Indeed, the messages describe the bounding box of the object (2D location message), with information about object category and recognition scores in the id and extra fields of the 2D location messages.
  • To get a feel for these messages, let's fire up JeVois Inventor and launch DarknetYOLO. In the Console tab of the Inventor, we turn on serial messages to USB and 4-pin, and we select message detail level Normal so we get some information about each bounding box and its top-scoring object category:
  • In the example above, we are detecting a dog, a bicycle, and a car on each video frame. Hence, 3 messages of type N2 are sent by JeVois on each frame. Note that here we don't know which message came from which frame. If you need to know, review Standardized serial messages formatting and look for parameter serstamp which can be set to pre-pend a frame number to each serial message. We will not use this here.
  • The messages are as follows (from Standardized serial messages formatting):
N2 category:score left top width height
  • Note that the coordinates are in the JeVois standardized coordinates system described in Helper functions to convert coordinates from camera resolution to standardized, where:

    • center of the camera's field of view is at x=0, y=0
    • left edge of the camera image is always at x=-1000
    • right edge of the camera image is always at x=1000
    • top edge of the camera image is usually at y=-750 (unless camera image aspect ratio is not 4:3)
    • bottom edge of the camera image is usually at y=750

    This is so that detections reported by JeVois are independent of the camera resolution at which JeVois is grabbing frames (e.g., 320x240 or 640x480).

  • Note that by default, the floating-point precision of the standardized messages is zero digit after the decimal point, i.e., we get integer scores and coordinates. If you change that using the parameter serprec described in Standardized serial messages formatting, you can get more precise floating-point values (e.g., try setpar serprec 3 in the Console of JeVois Inventor). For the code below, we will assume floating point values which could be integers as well.

Writing the code

We will use a state machine approach as in JeVois + Arduino: blink for X, just now it has a few more states because we have a total of 6 tokens to decode for each message.

For the sake of developing a non-trivial example, let's say we want to turn on the LED of the Arduino when we detect a dog at least 200 units wide (i.e., the bounding box around the dog should be at least as wide as 1/10th of the field of view, and the full field of view is 2000 standardized units wide as explained above).

We extend the state machine code developed in JeVois + Arduino: blink for X as follows:

1 // JeVois + Arduino blink for X from YOLO
2 
3 // Pin for LED, will turn on as we detect the desired object:
4 #define LEDPIN 17
5 
6 // Serial port to use: on chips with USB (e.g., 32u4), that usually is Serial1.
7 // On chips without USB, use Serial:
8 #define SERIAL Serial1
9 
10 // Buffer for received serial port bytes:
11 #define INLEN 256
12 char instr[INLEN + 1];
13 
14 // Our desired object: should be one of the 1000 ImageNet category names
15 #define CATEGORY "dog"
16 
17 // Our desired minimum object width in standardized coordinates:
18 #define MIN_WIDTH 200.0F
19 
20 void setup()
21 {
22  SERIAL.begin(115200);
23  SERIAL.setTimeout(500);
24 
25  pinMode(LEDPIN, OUTPUT);
26  digitalWrite(LEDPIN, HIGH);
27 }
28 
29 void loop()
30 {
31  byte len = SERIAL.readBytesUntil('\n', instr, INLEN);
32  instr[len] = 0;
33 
34  char * tok = strtok(instr, " \r\n");
35  int state = 0, i; float score, left, top, width, height;
36 
37  while (tok)
38  {
39  // State machine:
40  // 0: start parsing; if we get N2, move to state 1, otherwise state 1000
41  // 1: decode category name; if it is the one we want, move to state 2, otherwise state 1000
42  // 2: decode left and move to state 3
43  // 3: decode top and move to state 4
44  // 4: decode width and move to state 5
45  // 5: decode height and move to state 6
46  // 6: we got a full message, stay in this state until we run out of tokens
47  // 1000: we stay in this state until we run out of tokens
48  switch (state)
49  {
50  // First token should be: N2
51  case 0:
52  if (strcmp(tok, "N2") == 0) state = 1; else state = 1000;
53  // We are done with this token. Break from the switch() statement
54  break;
55 
56  // Second token should be: category:score
57  case 1:
58  // Find the ':' between category and score:
59  i = strlen(tok) - 1;
60  while (i >= 0 && tok[i] != ':') --i;
61 
62  // If i is >= 0, we found a ':'; terminate the tok string at that ':':
63  if (i >= 0)
64  {
65  tok[i] = '\0';
66  score = atof(&tok[i+1]);
67  }
68 
69  // Is the category name what we want?
70  if (strcmp(tok, CATEGORY) == 0) state = 2; else state = 1000;
71 
72  // We are done with this token. Break from the switch() statement
73  break;
74 
75  // Third token: left
76  case 2:
77  left = atof(tok);
78  state = 3;
79  break;
80 
81  // Fourth token: top
82  case 3:
83  top = atof(tok);
84  state = 4;
85  break;
86 
87  // Fifth token: width
88  case 4:
89  width = atof(tok);
90  state = 5;
91  break;
92 
93  // Sixth token: height
94  case 5:
95  height = atof(tok);
96  state = 6;
97  // We got a whole message!
98  break;
99 
100  // In any other state: do nothing
101  default:
102  break;
103  }
104 
105  // Move to the next token:
106  tok = strtok(0, " \r\n");
107  }
108 
109  // If we are in state 6, we successfully parsed a whole message and the category is a match.
110  // We just need to test the width and activate the LED accordingly:
111  if (state == 6 && width >= MIN_WIDTH)
112  digitalWrite(LEDPIN, LOW); // turn LED on (it has inverted logic)
113  else
114  digitalWrite(LEDPIN, HIGH); // turn LED off
115 }

A few notes:

  • lines 1-36: The preliminaries are as in JeVois + Arduino: blink for X, except that we change the category name to dog (line 15) and we define MIN_WIDTH to be the minimum desired object width (line 18).
  • lines 39-47: We decide on the various states for our state machine.
  • line 52: Now we look for N2 instead of DO in JeVois + Arduino: blink for X
  • lines 76-98: We decode left, top, width and height one at a time. Note that those could be floating point depending on serprec, hence we use atof() to decode them.
  • line 111: We just check that we are in state 6 (complete decoding went through, and category matched) and that the width is large enough; if so, turn on the LED, otherwise turn it off.

Compile and upload the code to your Arduino and here you go!

Woohoo, the LED turns on when JeVois detects a dog that is big enough!

Note that in scenes where JeVois also detects other things, the code as written will turn off the LED. So it may only briefly blink if something else is detected (e.g., the bicycle in the above scene) just after the dog is.

Going further

Check out these other tutorials. They use similar state machine decoding:

loop
void loop()
Definition: arduyolo.C:29
setup
void setup()
Definition: arduyolo.C:20
CATEGORY
#define CATEGORY
Definition: arduyolo.C:15
INLEN
#define INLEN
Definition: arduyolo.C:11
SERIAL
#define SERIAL
Definition: arduyolo.C:8
instr
char instr[INLEN+1]
Definition: arduyolo.C:12
MIN_WIDTH
#define MIN_WIDTH
Definition: arduyolo.C:18
LEDPIN
#define LEDPIN
Definition: arduyolo.C:4