Welcome new user! You can search existing questions and answers without registering, but please register to post new questions and receive answers. Note that due to large amounts of spam attempts, your first three posts will be manually moderated, so please be patient.
Because of un-manageable amounts of spam despite our use of CAPTCHAs, email authorization, and other tools, we have discontinued this forum (see the 700k+ registered users with validated email addresses at right?). Please email us any questions or post bug reports and feature requests on GitHub at https://github.com/jevois -- The content below remains available for future reference.
Welcome to JeVois Tech Zone, where you can ask questions and receive answers from other members of the community.

Yolo Coordinate output to serial. What data is output?

0 votes

When I use Yolo, I am getting serial output like:

D2 person -888 -2067 -124 -2067 -124 5 -124 -2067 27.1

The docs and code comments say this is

D2 type   x     y     w    h  ?   ?  ?  ?   confidence

But how can the coordinates be negative and larger than the height and width of the image?

I found in YOLO.H the code that appears to be outputing this, but I don't understand what it's doing and why.

case jevois::module::SerStyle::Detail:

462     oss << "D2 ";

463     if (id.empty()) oss << "unknown "; else oss << jevois::replaceWhitespace(id) << ' ';

464     oss << x - 0.5F * w << ' ' << y - 0.5F * h << ' ';

465     oss << x + 0.5F * w << ' ' << y - 0.5F * h << ' ';

466     oss << x + 0.5F * w << ' ' << y + 0.5F * h << ' ';

467     oss << x + 0.5F * w << ' ' << y - 0.5F * h;

468     if (extra.empty() == false) oss << ' ' << extra;

469     break;

I supposed I could try to back out what x, y, w, and h are from this output, but I don't understand the logic. I'm not a C++ programmer, so I'm guessing a bit about what's going on. It kind of looks like it's calculating the xy coords corners of the bounding box . If that's the case, why are these such large negative numbers?

<update.>So I did back out x, y, w and h from this code and I still can't figure it out. The numbers are large and negative and I've tried to see if I could massage them into x,y coords that I can use. Still no luck. Are these values being transformed in some other code that I'm not seeing?

asked Jun 12, 2018 in Programmer Questions by PeterQuinn (1,020 points)
edited Jun 22, 2018 by PeterQuinn

1 Answer

0 votes

It depends on what level of detail you want to use for the messages. Please see

http://jevois.org/doc/UserSerialStyle.html

YOLO gives us the coordinates of the 4 corners of each rectangular box around each detected object. Then, depending on parameter serstyle this will be converted to a message.

From the YOLO doc at http://jevois.org/moddoc/DarknetYOLO/modinfo.html you will see:

Serial messages 

  • On every frame where detection results were obtained, this module sends a message 
      DKY framenum
    where framenum is the frame number (starts at 0).
  • In addition, when detections are found which are above threshold, one message will be sent for each detected object (i.e., for each box that gets drawn when USB outputs are used), using a standardized 2D message:
    • Serial message type: 2D
    • id: the category name of the recognized object
    • x, y, or vertices: standardized 2D coordinates of object center or corners
    • w, h: standardized object size
    • extra: recognition score (in percent confidence)

So it sends two types of messages, a frame marker, and then a standardized 2D message. That doc then tells you that the id field contains the recognized object's name (category name), and the extra field contains the recognition score.

So, going back to http://jevois.org/doc/UserSerialStyle.html, let's take a look at 2D messages and we find:

Two-dimensional (2D) location messages 

2D location messages are used to communicate the location of something in 2D space (usually, the plane of the camera image). For example, the x,y standardized coordinates of an object detected by the ObjectDetect module.

Inputs from machine vision module:

  • x,y the standardized 2D position of the center of the reported object.
  • id a text string describing what the reported object is (e.g., which ArUco marker ID it has). To ease parsing by the receiving Arduino, ID should have no white space. Any white space will be replaced by underscores.
  • w,h the standardized width and height of the reported object (thus, object extends from x-w/2 to x+w/2 horizontally, and from y-h/2 to y+h/2 vertically). Note that size data will be output with the same precision as the coordinate data.
  • extra any additional text string about the reported object.
  • x1,y1 ... x4,y4 the standardized x,y coordinates of the 4 corners of a bounding rectangle around the reported object.
  • x1,y1 ... xn,yn the standardized x,y coordinates of n vertices of a bounding polygon around the reported object. Note than n can vary from object to object.

Serial messages:

serstyle  message 
Terse  T2 x y
Normal  N2 id x y w h
Detail  D2 id x1 y1 x2 y2 x3 y3 x4 y4 extra
Fine  F2 id n x1 y1 ... xn yn extra
 

So if you setpar serstyle Detail you will get

D2 id x1 y1 x2 y2 x3 y3 x4 y4 extra

JeVois Inventor released yesterday should make this much easier to understand, please see http://jevois.org/doc/JeVoisInventor.html

answered Jun 22, 2018 by JeVois (46,580 points)
Thanks for the reply. What I really need is Serstyle Normal, so that was helpful. However, I'm still seeing coordinate data that makes no sense. I get negative coordinates and w/h that are too large. Such as:
N2 person -388 -550 1316 1981
Is there any transformation that's happening to the data that comes back from Darknet/Yolo?

I'm trying out Inventor and I just sent you an email about it.
yes, you need one more (maybe last?) piece, going back to http://jevois.org/doc/UserSerialStyle.html let's have a look at the first section:

--------
Coordinates conventions and precision

All 1D and 2D coordinates use the JeVois standardized coordinates (between -1000 and 1000), as defined in Helper functions to convert coordinates from camera resolution to standardized.

All 3D coordinates are assumed to arise from a calibrated camera and are expressed in millimeters of real-world space.
-----

there is a link in that text, which has more info about the standardized 2D coordinates:

http://jevois.org/doc/group__coordhelpers.html
Damn.  It was right there in the documentation: http://jevois.org/doc/group__coordhelpers.html
I thought I was looking at pixels.  My only suggestion is to add a link to coord doc on the Yolo page (and maybe the other module pages) http://jevois.org/moddoc/DarknetYOLO/modinfo.html
Thanks for all your support.
...