SaladCloud Blog

INSIDE SALAD

Comparing Price-Performance of 22 GPUs for AI Image Tagging (GTX vs RTX)

Shawn Rushefsky

Older Consumer GPUs: A Perfect-Fit for AI Image Tagging

In the current AI boom, there’s a palpable excitement around sophisticated image generation models like Stable Diffusion XL (SDXL) and the cutting-edge GPUs that power them. These models often require more powerful GPUs with larger amounts of vRAM. However, while the industry is abuzz with these advancements, we shouldn’t overlook the potential of older GPUs, especially for tasks like image tagging and search embedding generation. These processes, employed by image generation platforms like Civit.ai and Midjourney, play a crucial role in enhancing search capabilities and overall user experience. We leveraged Salad’s distributed GPU cloud to evaluate the cost-performance of this task across a wide range of hardware configurations.

Image tagging example. Tags: realistic: 0.9306589365005493, food: 0.7583510279655457, 1other: 0.5011857151985168, holding: 0.45994266867637634, food_focus: 0.4451887905597687, no_humans: 0.41321325302124023, ambiguous_gender: 0.38422226905822754, close-up: 0.3592078685760498 | Rating: general: 0.8978943824768066, sensitive: 0.11550512909889221, questionable: 0.003336608409881592, explicit: 0.0008636713027954102

What is AI Image Tagging?

AI image tagging is a technology that can automatically identify and label the content of images, such as objects, people, places, colors, and more. This helps users to organize, search, and discover their images more easily and efficiently.

AI image tagging works:

  • By using deep learning models trained on large datasets of labeled images
  • or by using Natural Language Processing (NLP) that can understand the meaning and context of words and phrases:

AI image tagging can be used for various purposes and applications, such as:

  • Image search: Users can find specific images by using keywords or phrases that match the tags of the images. For example, Brandfolder is a digital asset management platform that uses AI auto-tagging to help users search for their images faster and easier.
  • Image organization: Users can sort and group their images by categories, themes, or locations based on the tags of the images. For example, Google Photos is a photo management app that uses AI image tagging to automatically create albums and collections of similar images.
  • Image discovery: Users can explore and learn more about their images by finding similarities and differences among them based on the tags of the images. For example, Imagga is an image recognition service that uses AI image tagging to provide visual insights and suggestions for users’ images.
Image tagging example on SaladCloud: Tags: flower: 0.9464260935783386, simple_background: 0.938351035118103, black_background: 0.9165157079696655, no_humans: 0.9135617017745972, still_life: 0.715178906917572, white_flower: 0.6529031991958618, liquid: 0.458692729473114, glowing: 0.36541783809661865, chrysanthemum: 0.36194002628326416, animal: 0.3560096025466919 | Rating: general: 0.9667389392852783, sensitive: 0.03438687324523926, questionable: 0.0045448243618011475, explicit: 0.0006334185600280762

Benchmarking 22 Consumer-Grade GPUs for AI Image Tagging

In designing the benchmark, our primary objective was to ensure a comprehensive and unbiased evaluation. We selected a range of GPUs on SaladCloud, starting from the GTX 1050 and extending up to the RTX 4090, to capture a broad spectrum of performance capabilities. Each node in our setup was equipped with 16 vCPUs and 7 GB of RAM, ensuring a standardized environment for all tests. For the datasets, we chose two prominent collections from Kaggle: the AVA Aesthetic Visual Assessment and the COCO 2017 Dataset. These datasets offer a mix of aesthetic visuals and diverse object categories, providing a robust testbed for our image tagging and search embedding generation tasks.

We used ConvNextV2 Tagger V2 to generate tags and ratings for images, and CLIP to generate embedding vectors. The tagger model used the ONNX runtime, while CLIP used Transformers with PyTorch. ONNX’s GPU capabilities are not a great fit for Salad, because of inconsistent Nvidia driver versions across the network, so we chose to go with the CPU runtime and to allocate 16 vCPUs for each node. PyTorch with Transformers works quite well across a large range of GPUs and driver versions with no additional configuration, so CLIP was run on GPU.

Image tagging of a close-up picture of an orange. Tags: orange_theme: 0.9728666543960571, fruit: 0.94382643699646, food: 0.9286314249038696, orange_(fruit): 0.8056939840316772, food_focus: 0.5921673774719238, no_humans: 0.5881966352462769, light_particles: 0.5497734546661377, glowing: 0.5307691097259521, shadow: 0.5253725051879883, orange_slice: 0.4853847026824951, black_background: 0.4318349063396454, silhouette: 0.3653726875782013 | Rating: general: 0.953162431716919, sensitive: 0.06405341625213623, questionable: 0.002493232488632202, explicit: 0.001814037561416626

Benchmark Results: GTX 1650 is the Surprising Winner

As expected, our nodes with higher end GPUs took less time per image, with the flagship RTX 4090 offering the best performance. What is interesting, though, is that the median time per image is actually very similar for the GTX 1650 and the RTX 4090: 1 second. The best-case and worst-case performance of the 4090 is notably better.

Comparison of inference time across 22 GPUs for image tagging

Weighting our findings by cost, we can confirm our intuition that the 1650 is a much better value at $0.02/hr than is the 4090 at $0.30/hr.

Comparison of images processed per dollar from the image tagging benchmark across GPUs

While the older GPUs like the GTX 1650 have worse absolute performance compared to the RTX 4090, the great difference in price causes the older GPUs to be the best value, as long as your use case can withstand the additional latency. In fact, we see all GTX series GPUs outperforming all RTX GPUs in terms of images tagged per dollar.

GTX Series: The Cost-Effective Option for AI Image Tagging with 3x More Images Tagged per Dollar

In the ever-advancing realm of AI and GPU technology, the allure of the latest hardware often overshadows the nuanced decisions that drive optimal performance. Our analysis not only emphasizes the balance between raw performance and cost-effectiveness but also resonates with broader cloud best practices. Just as it’s pivotal not to oversubscribe to compute resources in cloud environments, it’s equally essential to avoid overcommitting to high-end GPUs when more cost-effective options can meet the requirements. The GTX 1650’s value proposition, especially for tasks with flexible latency needs, serves as a testament to this principle, delivering 3x as many images tagged per dollar as the RTX 4090. As we navigate the expanding AI applications landscape, making judicious hardware choices based on comprehensive real-world benchmarks becomes paramount. It’s a reminder that the goal isn’t always about harnessing the most powerful tools, but rather the most appropriate ones for the task and budget at hand.

Run Your Image Tagging on Salad’s Distributed Cloud

If you are running AI image tagging or any AI inference at scale, Salad’s distributed cloud has 10,000+ GPUs at the lowest price in the market. Sign up for a demo with our team to discuss your specific use case.

Have questions about SaladCloud for your workload?

Book a 15 min call with our team. Get $50 in testing credits.

Related Blog Posts

AI transcription - Parakeet TRT 1.1B batch transription compared against APIs

AI Transcription Benchmark: 1 Million Hours of Youtube Videos with Parakeet TDT 1.1B for Just $1260, a 1000-fold cost reduction 

Building upon the inference benchmark of Parakeet TDT 1.1B on SaladCloud and with our ongoing efforts to enhance the system architecture and implementation for batch jobs, we have achieved a 1000-fold...
Read More
Self-managed Openvoice vs Metavoice comparison: A Text to speech API alternative

Text-to-Speech (TTS) API Alternative: Self-Managed OpenVoice vs MetaVoice Comparison

A cost-effective alternative to Text-to-speech APIs In the realm of text-to-speech (TTS) technology, two open-source models have recently garnered everyone's attention: OpenVoice and MetaVoice. Each model has unique capabilities in...
Read More
Blog_Stable_diffusion_fine_tuning_api_service

Cost-effective Stable Diffusion fine tuning on Salad

Stable Diffusion XL (SDXL) fine tuning as a service I recently wrote a blog about fine tuning Stable Diffusion XL (SDXL) on interruptible GPUs at low cost, starring my dog...
Read More

Don’t miss anything!

Subscribe To SaladCloud Newsletter & Stay Updated.