JeVois
1.23
JeVois Smart Embedded Machine Vision Toolkit
|
|
#include <jevois/DNN/CLIP.H>
Interface to a CLIP model used to compute text and image embeddings.
The CLIP model runs on CPU using clip.cpp and ggml. It is used to compute text or image embeddings for open-world object detection models like YOLO-JeVois. The embeddings are stored in float cv::Mat with size 1x512 for easy concatenation of several embeddings to be given as input to YOLO-JeVois as a 1xCx512 tensor for C object detection classes.
Public Member Functions | |
CLIP (std::string const &modelpath) | |
Construct and load a model from disk. | |
virtual | ~CLIP () |
Virtual destructor for safe inheritance. | |
void | freeze (bool doit) |
Freeze/unfreeze parameters that users should not change while running. | |
cv::Mat | textEmbedding (std::string const &txt) |
Get embedding for some text, typically as a 1x512 float matrix (depends on clip model version) | |
int | textEmbeddingSize () const |
Get text embedding size, useful if we need to know it before getting an embedding, or 0 if no text encoder. | |
cv::Mat | imageEmbedding (cv::Mat const &img) |
Get embedding for some RGB uint8 packed image, typically as a 1x512 float matrix. | |
int | imageEmbeddingSize () const |
Get image embedding size, useful if we need to know it before getting an embedding, or 0 if no image encoder. | |
float | similarity (cv::Mat const &emb1, cv::Mat const &emb2) const |
Compute cosine similarity between two embeddings. | |
jevois::dnn::CLIP::CLIP | ( | std::string const & | modelpath | ) |
|
virtual |
void jevois::dnn::CLIP::freeze | ( | bool | doit | ) |
Freeze/unfreeze parameters that users should not change while running.
cv::Mat jevois::dnn::CLIP::imageEmbedding | ( | cv::Mat const & | img | ) |
Get embedding for some RGB uint8 packed image, typically as a 1x512 float matrix.
Any image size is ok, the image will be rescaled and normalized to match what the CLIP model wants.
Definition at line 74 of file CLIP.C.
References CLIP_THREADS, and LFATAL.
int jevois::dnn::CLIP::imageEmbeddingSize | ( | ) | const |
float jevois::dnn::CLIP::similarity | ( | cv::Mat const & | emb1, |
cv::Mat const & | emb2 | ||
) | const |
cv::Mat jevois::dnn::CLIP::textEmbedding | ( | std::string const & | txt | ) |
Get embedding for some text, typically as a 1x512 float matrix (depends on clip model version)
Definition at line 46 of file CLIP.C.
References CLIP_THREADS, and LFATAL.
int jevois::dnn::CLIP::textEmbeddingSize | ( | ) | const |