Boosting the best images by hacking their cosine similarity
At Freepik, we know how important it is to find the perfect image you’re looking for. To develop the now-available brand-new search engine, two different teams joined efforts: the Artificial Intelligence team (AI) and the Search team. While the AI team was testing different AI models and finding the most relevant answers to diverse searches, the search team had a job: to boost the highest-quality assets. Even though “quality” is a fuzzy term, the intention behind it is clear: When you search for “car” on Freepik, you don’t want to see images that show an average car, you want to see the pictures you’re hoping for, the ones in your imagination. Unfortunately, we don’t have a crystal ball —yet— but we take a look at resources that other users chose, and we create a global boost for each of those high-quality images.
When searching for resources, relevance and quality are both factors that contribute to a successful result. In order to determine which results are relevant, we convert text and images to points —“embedding in the latent space”, which is a fancy way to call a specific list of 512 numbers— in a multi-dimensional space. By measuring the proximity of these points, we are able to determine how relevant an image is to the given text. The closer the points, the closer the image will be to the input text.
But in a highly-dimensional space, most points stand on the edge of the space. That’s why it’s common to measure the distance between two points simply by looking at their angle (or even easier, the cosine of their angle). Essentially, it’s like searching for stars in the sky: two stars will appear closer if they are separated by a small angle, no matter the real distance.
The same happens with text and image embeddings.
To calculate the cosine between these embeddings, we normalize them and calculate their inner product. After the normalization, all our code assumes the inner product is equal to the cosine similarity. A cosine close to 0 means that the two points are very different, while a cosine close to 1 means the two points are very similar. The higher the cosine, the more relevant is your result. How can we display high-quality images higher up in the search results?
To accomplish this, one of our team members came up with a hack. We “denormalized” the embeddings of the high-quality images by multiplying them by (1 + ɑ). This amplifies the inner product of the image, giving them an artificially high cosine similarity to any embedding, and making these images “closer” to the search query. Of course, the result is no longer a cosine because it can exceed +1, but it has no adverse effects and works surprisingly well! We improved the number of successful searches by +3% when we AB tested the search engine with the cosine hack, an excellent outcome!
We are currently working on many improvements, ranging from how we judge the “quality” of the images to making these boosts extra individualized. If working with a fast-paced team on AI, search engines, and massive amounts of data fascinates you, good news: we are hiring!