PicArrange 3.0 now finds images with words
07/24/2024

Find your photos with words

Watch the video on YouTube

How does PicArrange’s visual and text search work?

Over the years, research has continually improved image descriptors, enabling efficient and compact representation of the content and appearance of images. A significant advancement came with OpenAI’s CLIP model, which embeds textual descriptions of images and their visual representations into a shared latent space.

However, the CLIP model has its limitations. For instance, the same text can describe vastly different-looking images. A phrase like “dinner in a fast food restaurant” might refer to an image of a typical fast food dish or one of people eating in a fast food restaurant. This ambiguity in visual concepts hampers the effectiveness of image searches using the CLIP model.

To address this issue, the we developed a method to fine-tune CLIP models. This enhancement significantly improves retrieval quality while preserving the joint text-image embedding’s effectiveness for text-to-image searches. By leveraging this advanced technique, PicArrange offers a more accurate and reliable visual search experience, ensuring that you can quickly find the exact images you need based on both visual content and textual descriptions.

Project page Download PicArrange