#include <jevois/DNN/CLIP.H>

Interface to a CLIP model used to compute text and image embeddings.

The CLIP model runs on CPU using clip.cpp and ggml. It is used to compute text or image embeddings for open-world object detection models like YOLO-JeVois. The embeddings are stored in float cv::Mat with size 1x512 for easy concatenation of several embeddings to be given as input to YOLO-JeVois as a 1xCx512 tensor for C object detection classes.

Definition at line 35 of file CLIP.H.

Public Member Functions
	CLIP (std::string const &modelpath)
	Construct and load a model from disk.

virtual	~CLIP ()
	Virtual destructor for safe inheritance.

void	freeze (bool doit)
	Freeze/unfreeze parameters that users should not change while running.

cv::Mat	textEmbedding (std::string const &txt)
	Get embedding for some text, typically as a 1x512 float matrix (depends on clip model version)

int	textEmbeddingSize () const
	Get text embedding size, useful if we need to know it before getting an embedding, or 0 if no text encoder.

cv::Mat	imageEmbedding (cv::Mat const &img)
	Get embedding for some RGB uint8 packed image, typically as a 1x512 float matrix.

int	imageEmbeddingSize () const
	Get image embedding size, useful if we need to know it before getting an embedding, or 0 if no image encoder.

float	similarity (cv::Mat const &emb1, cv::Mat const &emb2) const
	Compute cosine similarity between two embeddings.

Constructor & Destructor Documentation

◆ CLIP()

jevois::dnn::CLIP::CLIP ( std::string const & modelpath )

Construct and load a model from disk.

Definition at line 33 of file CLIP.C.

References LFATAL, and LINFO.

◆ ~CLIP()

jevois::dnn::CLIP::~CLIP ( )

virtual

Virtual destructor for safe inheritance.

Definition at line 27 of file CLIP.C.

Member Function Documentation

◆ freeze()

void jevois::dnn::CLIP::freeze ( bool doit )

Freeze/unfreeze parameters that users should not change while running.

◆ imageEmbedding()

cv::Mat jevois::dnn::CLIP::imageEmbedding ( cv::Mat const & img )

Get embedding for some RGB uint8 packed image, typically as a 1x512 float matrix.

Any image size is ok, the image will be rescaled and normalized to match what the CLIP model wants.

Definition at line 74 of file CLIP.C.

References CLIP_THREADS, and LFATAL.

◆ imageEmbeddingSize()

int jevois::dnn::CLIP::imageEmbeddingSize ( ) const

Get image embedding size, useful if we need to know it before getting an embedding, or 0 if no image encoder.

Definition at line 101 of file CLIP.C.

References LFATAL.

◆ similarity()

float jevois::dnn::CLIP::similarity	(	cv::Mat const &	emb1,
		cv::Mat const &	emb2
	)		const

Compute cosine similarity between two embeddings.

Definition at line 109 of file CLIP.C.

References LFATAL.

◆ textEmbedding()

cv::Mat jevois::dnn::CLIP::textEmbedding ( std::string const & txt )

Get embedding for some text, typically as a 1x512 float matrix (depends on clip model version)

Definition at line 46 of file CLIP.C.

References CLIP_THREADS, and LFATAL.

◆ textEmbeddingSize()

int jevois::dnn::CLIP::textEmbeddingSize ( ) const

Get text embedding size, useful if we need to know it before getting an embedding, or 0 if no text encoder.

Definition at line 66 of file CLIP.C.

References LFATAL.

The documentation for this class was generated from the following files:

include/jevois/DNN/CLIP.H
src/jevois/DNN/CLIP.C

Public Member Functions

Constructor & Destructor Documentation

◆ CLIP()

◆ ~CLIP()

Member Function Documentation

◆ freeze()

◆ imageEmbedding()

◆ imageEmbeddingSize()

◆ similarity()

◆ textEmbedding()

◆ textEmbeddingSize()