JeVois  1.6
JeVois Smart Embedded Machine Vision Toolkit
Share this page:
Standardized serial messages formatting

JeVois v1.3

JeVois provides a standardized set of serial output messages, with the goal of harmonizing serial messages across many machine vision modules, so that different vision modules can control an Arduino or similar embedded controller in the same way. Users can then experiment with different machine vision modules without needing to rewrite and reflash their Arduino code. For example, a motorized pan/tilt head can be directly controlled by a color-based object tracker, a saliency-based visual attention module, an object recognition module, or an ArUco virtual reality marker detection module using the same Arduino code to receive the control messages from JeVois and to actuate the pan/tilt motors.

The standardized serial messages focus (at least for now) onto sending location and identity information (in 1D, 2D, or 3D) about things that have been detected by JeVois.

Modules that wish to send standardized messages as defined below should derive from jevois::StdModule instead of jevois::Module. The class jevois::StdModule adds parameters that allow end users to control the formatting of the serial messages, and additional member functions that assist programmers in issuing standardized messages.

The parameters serstyle and serprec of jevois::StdModule define the following standardized serial message styles.

Coordinates conventions and precision

All 1D and 2D coordinates use the JeVois standardized coordinates (between -1000 and 1000), as defined in Helper functions to convert coordinates from camera resolution to standardized.

All 3D coordinates are assumed to arise from a calibrated camera and are expressed in millimeters of real-world space.

Parameter serprec defines the precision with which these coordinate will be sent out over serial, as the number of decimal positions after the decimal point. For example:

  • When serprec is 0, coordinates will be sent as integer numbers, e.g., 123 (no trailing decimal point).
  • When serprec is 1, coordinates will be sent with 1 decimal position, e.g., 123.4
  • When serprec is 2, coordinates will be sent with 2 decimal positions, e.g., 123.45
  • And so on.

Given the already large range of -1000 to 1000 provided by the Helper functions to convert coordinates from camera resolution to standardized for 1D and 2D coordinates, or the quite high (millimeter) accuracy of 3D coordinates, the expectation is that serprec will be non-zero only in exceptional situations. Most Arduino control software can reasonably be expected to support serprec=0 only.

One-dimensional (1D) location messages

1D location messages are used to communicate the location of something on a 1D axis (usually, the horizontal axis). For example, the x coordinate of the vanishing point in the RoadNavigation module.

Inputs from machine vision module:

  • x the standardized 1D position of the center of the reported object.
  • id a text string describing what the reported object is (e.g., which ArUco marker ID it has). To ease parsing by the receiving Arduino, ID should have no white space. Any white space will be replaced by underscores.
  • size the standardized 1D size of the reported object (thus, object extends from x-size/2 to x+size/2). Note that size will be output with the same precision as the coordinate.
  • extra any additional text string about the reported object.
  • xmin and xmax the standardized 1D positions of the two edges of the reported object. For 1D objects, specifying xmin and xmax is equivalent to specifying x and size (but that is not always true in 2D, 3D, etc).

Serial messages:

serstyle message
Terse T1 x
Normal N1 id x size
Detail D1 id xmin xmax extra
Fine N/A - a D1D message will be issued instead.

Two-dimensional (2D) location messages

2D location messages are used to communicate the location of something in 2D space (usually, the plane of the camera image). For example, the x,y standardized coordinates of an object detected by the ObjectDetect module.

Inputs from machine vision module:

  • x,y the standardized 2D position of the center of the reported object.
  • id a text string describing what the reported object is (e.g., which ArUco marker ID it has). To ease parsing by the receiving Arduino, ID should have no white space. Any white space will be replaced by underscores.
  • w,h the standardized width and height of the reported object (thus, object extends from x-w/2 to x+w/2 horizontally, and from y-h/2 to y+h/2 vertically). Note that size data will be output with the same precision as the coordinate data.
  • extra any additional text string about the reported object.
  • x1,y1 ... x4,y4 the standardized x,y coordinates of the 4 corners of a bounding rectangle around the reported object.
  • x1,y1 ... xn,yn the standardized x,y coordinates of n vertices of a bounding polygon around the reported object. Note than n can vary from object to object.

Serial messages:

serstyle message
Terse T2 x y
Normal N2 id x y w h
Detail D2 id x1 y1 x2 y2 x3 y3 x4 y4 extra
Fine F2 id n x1 y1 ... xn yn extra
Note
The tightness of the bounding box for Normal and Detail messages depends on what information a machine vision module produces. Typically, machine vision modules will invoke the serial message functions of jevois::StdModule with a list of 2D vertices defining a shape (polygonal contour). In Normal style, an upright bounding rectangle is fit to those vertices to generate the message, and a rotated rectangle is fit around the vertices to generate the Detail message.

Three-dimensional (3D) location messages

Warning
This is not fully tested yet!

3D location messages are used to communicate the location of something in the real-world's 3D space, and usually are available from modules where the camera has been calibrated and an object of known real-world size is observed by the camera. For example, the ArUco module can recover the full 3D location and pose of markers of known real-world size.

We use the right-hand rule for 3D coordinates as it is the dominant convention in robotics. We also follow the conventions of REP-0103 of the Robotics Operating System (ROS) for camera frames (called suffix frames) to ease inter-operability, except that we report distances in millimeters as opposed to meters:

  • X axis points right
  • Y axis points down
  • Z axis points forward

Inputs from machine vision module:

  • x,y,z the real-world 3D position of the center of the reported object in millimeters from the camera's optical center.
  • id a text string describing what the reported object is (e.g., which ArUco marker ID it has). To ease parsing by the receiving Arduino, ID should have no white space. Any white space will be replaced by underscores.
  • w,h,d the width (X axis), height (Y axis), and depth (Z axis) of the reported object in mm (thus, object extends from x-w/2 to x+w/2 horizontally, from y-h/2 to y+h/2 vertically, and from from z-d/2 t z+d/2 along the optical axis of the camera). Note that size data will be output with the same precision as the coordinate data.
  • extra any additional text string about the reported object.
  • q1,q2,q3,q4 a quaternion that relates the object's frame to the camera's frame, if appropriate.
  • x1,y1,zn ... xn,yn,zn the 3D coordinates of n vertices defining a 3D polyhedron around the object.

Serial messages:

serstyle message
Terse T3 x y z
Normal N3 id x y z w h d
Detail D3 id x y z w h d q1 q2 q3 q4 extra
Fine F3 id n x1 y1 z1 ... xn yn zn extra

New machine vision modules can use the convenience functions provided in the jevois::StdModule class to send standardized serial messages.

Note
If you will use the quaternion data (Detail message style), you should probably set the serprec parameter to something non-zero to get enough accuracy in the quaternion values.

Recommendations for embedded controllers and robots

  • A robot that can only orient towards something along one axis (e.g., steering of a robot car) would usually expect T1 messages.
  • A robot that can only orient towards something in the 2D image plane (e.g., a motorized pan/tilt head) would usually expect T2 messages.
  • A robot that can move an end-effector towards an object in 3D space (e.g., robot arm with no gripper) would usually expect T3 messages.
  • A robot that will take different actions depending on what the object is would typically expect N1, N2, or N3 messages, then decoding the id field to decide what to do.
  • A robot that also needs to know the size of an object in a controlled setting (e.g., inspection robot that may eject defective items that are coming on a conveyor belt, with the camera placed at a fixed distance above the belt and looking down at it) would typically expect D1, D2, or D3 messages, or, if shape is important, F2 or F3 messages.
  • A robot that can move an end-effector towards an object in 3D space and grasp it (e.g., robot arm with a gripper) would usually expect N3, D3, or F3 messages.