JeVois v1.3

JeVois provides a standardized set of serial output messages, with the goal of harmonizing serial messages across many machine vision modules, so that different vision modules can control an Arduino or similar embedded controller in the same way. Users can then experiment with different machine vision modules without needing to rewrite and reflash their Arduino code. For example, a motorized pan/tilt head can be directly controlled by a color-based object tracker, a saliency-based visual attention module, an object recognition module, or an ArUco virtual reality marker detection module using the same Arduino code to receive the control messages from JeVois and to actuate the pan/tilt motors.

The standardized serial messages focus (at least for now) onto sending location and identity information (in 1D, 2D, or 3D) about things that have been detected by JeVois.

Modules that wish to send standardized messages as defined below should derive from jevois::StdModule instead of jevois::Module. The class jevois::StdModule adds parameters that allow end users to control the formatting of the serial messages, and additional member functions that assist programmers in issuing standardized messages.

The parameters serstyle and serprec of jevois::StdModule define the following standardized serial message styles. Parameter serstamp allow additional stamping of every message with a frame number, date, and/or time.

Note: Beware that serstyle and serprec are reset to their default value each time a vision processing module is loaded. Hence, you should set them after loading the module. Parameter sertamp is not affected by module change, because it is owned by the JeVois Engine.

Coordinates conventions and precision

All 1D and 2D coordinates use the JeVois standardized coordinates (between -1000 and 1000), as defined in Helper functions to convert coordinates from camera resolution to standardized.

All 3D coordinates are assumed to arise from a calibrated camera and are expressed in millimeters of real-world space.

Parameter serprec defines the precision with which these coordinate will be sent out over serial, as the number of decimal positions after the decimal point. For example:

When serprec is 0, coordinates will be sent as integer numbers, e.g., 123 (no trailing decimal point).
When serprec is 1, coordinates will be sent with 1 decimal position, e.g., 123.4
When serprec is 2, coordinates will be sent with 2 decimal positions, e.g., 123.45
And so on.

Given the already large range of -1000 to 1000 provided by the Helper functions to convert coordinates from camera resolution to standardized for 1D and 2D coordinates, or the quite high (millimeter) accuracy of 3D coordinates, the expectation is that serprec will be non-zero only in exceptional situations. Most Arduino control software can reasonably be expected to support serprec=0 only.

One-dimensional (1D) location messages

1D location messages are used to communicate the location of something on a 1D axis (usually, the horizontal axis). For example, the x coordinate of the vanishing point in the RoadNavigation module.

Inputs from machine vision module:

x the standardized 1D position of the center of the reported object.
id a text string describing what the reported object is (e.g., which ArUco marker ID it has). To ease parsing by the receiving Arduino, ID should have no white space. Any white space will be replaced by underscores.
size the standardized 1D size of the reported object (thus, object extends from x-size/2 to x+size/2). Note that size will be output with the same precision as the coordinate.
extra any additional text string about the reported object.
xmin and xmax the standardized 1D positions of the two edges of the reported object. For 1D objects, specifying xmin and xmax is equivalent to specifying x and size (but that is not always true in 2D, 3D, etc).

Serial messages:

serstyle	message
Terse	`T1 x`
Normal	`N1 id x size`
Detail	`D1 id xmin xmax extra`
Fine	N/A - a `D1D` message will be issued instead.

Two-dimensional (2D) location messages

2D location messages are used to communicate the location of something in 2D space (usually, the plane of the camera image). For example, the x,y standardized coordinates of an object detected by the ObjectDetect module.

Inputs from machine vision module:

x,y the standardized 2D position of the center of the reported object.
id a text string describing what the reported object is (e.g., which ArUco marker ID it has). To ease parsing by the receiving Arduino, ID should have no white space. Any white space will be replaced by underscores.
w,h the standardized width and height of the reported object (thus, object extends from x-w/2 to x+w/2 horizontally, and from y-h/2 to y+h/2 vertically). Note that size data will be output with the same precision as the coordinate data.
extra any additional text string about the reported object.
x1,y1 ... x4,y4 the standardized x,y coordinates of the 4 corners of a bounding rectangle around the reported object.
x1,y1 ... xn,yn the standardized x,y coordinates of n vertices of a bounding polygon around the reported object. Note than n can vary from object to object.

Serial messages:

serstyle	message
Terse	`T2 x y`
Normal	`N2 id x y w h`
Detail	`D2 id x1 y1 x2 y2 x3 y3 x4 y4 extra`
Fine	`F2 id n x1 y1 ... xn yn extra`

Note: The tightness of the bounding box for Normal and Detail messages depends on what information a machine vision module produces. Typically, machine vision modules will invoke the serial message functions of jevois::StdModule with a list of 2D vertices defining a shape (polygonal contour). In Normal style, an upright bounding rectangle is fit to those vertices to generate the message, and a rotated rectangle is fit around the vertices to generate the Detail message.

Three-dimensional (3D) location messages

Warning: This is not fully tested yet!

3D location messages are used to communicate the location of something in the real-world's 3D space, and usually are available from modules where the camera has been calibrated and an object of known real-world size is observed by the camera. For example, the ArUco module can recover the full 3D location and pose of markers of known real-world size.

We use the right-hand rule for 3D coordinates as it is the dominant convention in robotics. We also follow the conventions of REP-0103 of the Robotics Operating System (ROS) for camera frames (called suffix frames) to ease inter-operability, except that we report distances in millimeters as opposed to meters:

X axis points right
Y axis points down
Z axis points forward

Inputs from machine vision module:

x,y,z the real-world 3D position of the center of the reported object in millimeters from the camera's optical center.
id a text string describing what the reported object is (e.g., which ArUco marker ID it has). To ease parsing by the receiving Arduino, ID should have no white space. Any white space will be replaced by underscores.
w,h,d the width (X axis), height (Y axis), and depth (Z axis) of the reported object in mm (thus, object extends from x-w/2 to x+w/2 horizontally, from y-h/2 to y+h/2 vertically, and from from z-d/2 t z+d/2 along the optical axis of the camera). Note that size data will be output with the same precision as the coordinate data.
extra any additional text string about the reported object.
q1,q2,q3,q4 a quaternion that relates the object's frame to the camera's frame, if appropriate.
x1,y1,zn ... xn,yn,zn the 3D coordinates of n vertices defining a 3D polyhedron around the object.

Serial messages:

serstyle	message
Terse	`T3 x y z`
Normal	`N3 id x y z w h d`
Detail	`D3 id x y z w h d q1 q2 q3 q4 extra`
Fine	`F3 id n x1 y1 z1 ... xn yn zn extra`

New machine vision modules can use the convenience functions provided in the jevois::StdModule class to send standardized serial messages.

Note: If you will use the quaternion data (Detail message style), you should probably set the serprec parameter to something non-zero to get enough accuracy in the quaternion values.

Object recognition messages

JeVois v1.8.2

Several modules, such as TensorFlowEasy, DarknetSingle, etc recognize objects that may belong to a given set of categories. These modules send an object recognition message for each detected object.

A recognition consists of a number of category:score pairs, where category is the name of the object category (type of object), and score is the recognition score for this category, in the range 0.0 ... 100.0

Note that many algorithms provide a list of such pairs as their results, to cover possible ambiguous recognition. For example, cat:77.2 dog:27.3 indicates that the algorithm is quite confident it is looking at a cat, but, with a lower confidence, it thinks it could also be looking at a dog.

serstyle	message
Terse	`TO topcateg`
Normal	`NO topcateg:topscore`
Detail	`DO topcateg:topscore categ2:score2 ... categN:scoreN`
Fine	`FO topcateg:topscore categ2:score2 ... categN:scoreN`

where topcateg is the category name of the top-scoring category, and topscore its score, whereas categ2 is the second-to-top scoring category, and so on. The number 2...N of secondary recognitions may vary from one message to the next and could be zero, but topcateg and topscore are always guaranteed to be present (if there are no results, no message will be sent).

Note that some object category names may contain spaces, e.g., dining table; when the message is constructed, any white space is replaced by underscores (e.g., dining_table) to make parsing of messages by Arduino and similar easier.

Object detection + recognition messages

Some modules, like DarknetYOLO, TensorFlowSaliency, and DarknetSaliency first detect one or more bounding boxes that may contain an object, and then recognize which object might be in each box.

These modules send Two-dimensional (2D) location messages as detailed above (T2, N2, D2, F2 messages), with the following information in the id and extra fields:

serstyle	id	extra
Terse	`topcateg`	(no extra)
Normal	`topcateg:topscore`	(no extra)
Detail	`topcateg:topscore`	`categ2:score2 ... categN:scoreN`
Fine	`topcateg:topscore`	`categ2:score2 ... categN:scoreN`

where topcateg is the category name of the top-scoring category, and topscore its score, whereas categ2 is the second-to-top scoring category, and so on. The same notes as for object recognition messages apply.

Optional stamping of serial messages with frame number, date and time

JeVois v1.8.1

Parameter serstamp allows for optional stamping of standardized serial messages by date, time, or frame number.

Note that JeVois does not have a battery, and hence its internal clock is reset to Jan 1, 1970 UTC upon each power up. An external computer can, however, set the date using the JeVois date command. See Command-line interface user guide for more information about the date command and how to set/retrive the date and time of JeVois.

Stamps are prepended to standardized serial messages, and are followed by a single space. Beware that using stamps will hence add one field (for the stamp) to the beginning of each standardized serial message. An Arduino that was programmed to respond to T2 messages will likely be confused if receiving messages that start with a date or frame number, and the Arduino code should hence be adjusted accordingly.

Stamping modes:

serstamp	stamp format	example for frame 769, on Dec 24, 2017 at 15:52:42
None	No stamp
Frame	FRAME	769
Time	hh:mm:ss	15:52:42
FrameTime	FRAME/hh:mm:ss	769/15:52:42
FrameDateTime	FRAME/YYYY-MM-DD/hh:mm:ss	769/2017-12-24/15:52:42

So, if serstyle is Terse and one is detecting a target at x=123, y=456, the message would change from, with default serstamp of None:

T2 123 456

to, with serstamp of FrameDateTime:

769/2017-12-24/15:52:42 T2 123 456

Optional frame markers

JeVois v1.8.2

Parameter sermark allows one to possibly send the following messages, either just before a video frame from the camera sensor is processed (start mark message), just after (stop mark message), or both:

sermark	start mark message	stop mark message
None	(no message)	(no message)
Start	MARK START	(no message)
Stop	(no message)	MARK STOP
Both	MARK START	MARK STOP

Note that a stamp may be sent just before the mark, depending on the value of parameter serstamp.

Limiting the number of serial messages per video frame

JeVois v1.8.1

Parameter serlimit of the JeVois Engine can be set to limit the number of serial messages that are sent on every video frame, to avoid overloading the serial link. For example, if a module sends one message for each detected item, but many items are present, serlimit can be used to limit the number of items that will be reported over serial port.

Note how the limit per frame cannot be strictly enforced if modules send messages using sendSerial() from asynchronous threads. The basic way serlimit works under the hood is that a (thread-safe) counter is incremented for each message sent by sendSerial() or its derivatives like SendSerialImg2D(), etc, and the counter is reset to zero at the completion of each call to the module's process() function (be it normal completion or exception).

Recommendations for embedded controllers and robots

A robot that can only orient towards something along one axis (e.g., steering of a robot car) would usually expect T1 messages.
A robot that can only orient towards something in the 2D image plane (e.g., a motorized pan/tilt head) would usually expect T2 messages.
A robot that can move an end-effector towards an object in 3D space (e.g., robot arm with no gripper) would usually expect T3 messages.
A robot that will take different actions depending on what the object is would typically expect N1, N2, or N3 messages, then decoding the id field to decide what to do.
A robot that also needs to know the size of an object in a controlled setting (e.g., inspection robot that may eject defective items that are coming on a conveyor belt, with the camera placed at a fixed distance above the belt and looking down at it) would typically expect D1, D2, or D3 messages, or, if shape is important, F2 or F3 messages.
A robot that can move an end-effector towards an object in 3D space and grasp it (e.g., robot arm with a gripper) would usually expect N3, D3, or F3 messages.