SaladCloud Blog

INSIDE SALAD

Analyzing the Stunning Realism of GTA6 with YOLOv8 and SaladCloud

Salad Technologies

Running YOLOv8 on the GTA6 trailer with Salad

The gaming community was recently electrified with the release of a new trailer for “Grand Theft Auto VI” (GTA6), a title known for its immersive gameplay and hyper-realistic graphics. To gauge the level of detail and realism in the game’s graphics, we conducted an interesting experiment: we ran the trailer through the YOLOv8 model, a cutting-edge object detection AI, hosted on SaladCloud for this experiment. The results were nothing short of fascinating, providing a glimpse into the intricate world that GTA6 promises to offer.

YOLO (You Only Look Once) models are renowned for their efficiency and accuracy in detecting objects in images and videos. We chose YOLOv8 for its latest advancements in machine learning and its ability to discern objects with high precision. We picked a medium pretrained model provided by Ultralytics: “yolov8m.pt”.

To facilitate this experiment, we utilized SaladCloud, the most affordable GPU compute platform available today. We created an API that, upon receiving the URL and storage information, processed the video through the YOLOv8 model and saved all detections to our storage account. Additionally, it generated a summary detailing the duration each object was present in the video.

For those interested in the technical details or in replicating this experiment, we have prepared a comprehensive YOLOv8 tutorial available on Salad’s documentation: YOLOv8 Deployment Tutorial. Our computational setup for this experiment included 8 vCPUs, 8GB of memory, and an RTX 3090 GPU with 24 GB of VRAM. Remarkably, this configuration is priced at only $0.29 per hour on Salad Cloud.

Results from the object detection experiment

The entire process of running the video through the model and saving the results took approximately 90 seconds. This translates to a very cost-effective operation.

To calculate the exact cost:

Total time: 90 seconds (or 1.5 minutes)
Hourly rate: $0.29
Cost for 1.5 minutes: (1.5×0.29)/60 = $0.0072

Let’s compute the exact cost for this operation.
The cost of running the video through the model on SaladCloud for 1.5 minutes came to approximately $0.0072 which is approximately 0.73 cents. This exceptionally low cost demonstrates the efficiency and affordability of using Salad Cloud for high-end GPU compute tasks.

Let’s check our results now.

The model easily detected and tracked main characters, especially when they were the central figures in a scene.

What is more impressive is how the model performed detecting NPCs in the bustling scenes set on the beaches of Vice City, even amidst massive crowds. This level of accuracy is crucial for understanding the dynamics of densely populated game environments, a staple in the GTA series.

Another area where YOLOv8 excelled was in identifying various modes of transportation that are central to the GTA experience, such as motorcycles, cars, and boats. The accuracy in this domain is essential given the franchise’s emphasis on vehicular exploration and interaction.

However, the model wasn’t flawless. In some instances, it confused birds with kites, likely due to their similar appearance in motion. A gator from one of the scenes was mistaken with a dog, probably because gator is not a part of the labels in the pretrained model.

The model’s performance in analyzing aerial shots or bird’s-eye views of the city was also noteworthy. Capturing details from such perspectives can be challenging due to changes in scale and perspective, yet YOLOv8 managed to do a commendable job.

Perhaps one of the most striking demonstrations of the model’s capabilities was its detection of little details, such as bottles on the shelves in a shop scene.

Here is a count of all the unique objects our solution detected in the trailer:

OBJECT IN GTA6 TRAILERCOUNT
PERSON133
CAR65
BIRD23
BOTTLE16
KITE15
TRUCK9
MOTORCYCLE9
BOAT8
CHAIR7
UMBRELLA4
BUS3
AIRPLANE2
SPORTS BALL, CAT, DOG, TRAFFIC LIGHT1

Overall, the results from running the GTA6 trailer through YOLOv8 on SaladCloud illustrates the remarkable advancements in both video game graphics and AI technology. As we move forward, such synergies between AI and gaming are likely to enhance our virtual experiences, blurring the lines between the digital and real world even further. GTA6, with its stunning graphics validated by computer vision, is poised to be more than just a game; it’s a glimpse into the future of immersive virtual experiences.

What also stands out is the cost-effectiveness of SaladCloud. Running this sophisticated AI analysis cost us merely 0.73 cents, a testament to the affordability of high-end GPU capabilities for object detection. SaladCloud’s role in enabling the analysis of GTA6’s stunning graphics with YOLOv8’s precision at such a low cost highlights the growing accessibility of advanced technology like computer vision in gaming and AI. This synergy is not just pushing the boundaries of virtual experiences but also making them more attainable, heralding a future where such advancements are within reach of a wider audience.

Have questions about SaladCloud for your workload?

Book a 15 min call with our team. Get $50 in testing credits.

Related Blog Posts

Speech to text inference benchmark - Distil Whisper Large v2

Inference Benchmark on Salad: Distil-Whisper Large V2 vs. Whisper Large V3 for Speech-to-text

Hugging Face Distil-Whisper Large V2 is a distilled version of the OpenAI Whisper model that is 6 times faster, 49% smaller and performs within 1%  WER (word error rates) on...
Read More
Openvoice text to speech gpu benchmark on SaladCloud

OpenVoice Text-to-Speech (TTS) Benchmark: 6 Million+ Words/$ Using Salad

What is OpenVoice? OpenVoice is an open-source, instant voice cloning technology that enables the creation of realistic and customizable speech from just a short audio clip of a reference speaker....
Read More
Whisper large v3 - Automatic speech - recognition - gpu benchmark

Whisper Large V3 Speech Recognition Benchmark: 1 Million hours of audio transcription for just $5110

Save over 99.8% on audio transcription using Whisper Large V3 and consumer GPUs A 99.8% cost-savings for automatic speech recognition sounds unreal. But with the right choice of GPUs and...
Read More

Don’t miss anything!

Subscribe To SaladCloud Newsletter & Stay Updated.