...

SaladCloud Blog

INSIDE SALAD

How to run cog applications on SaladCloud

Salad Technologies

Introduction to Cog: Containers for Machine Learning

Cog is an open-source tool designed to streamline the creation of inference applications for various AI models. It offers CLI tools, Python modules for prediction and fine-tuning, and an HTTP prediction server powered by FastAPI, letting you package models in a standard, production-ready container.

When using the Cog HTTP prediction server, the main tasks involve defining two Python functions: one for loading models and initialization, and another for performing inference. The server manages all other aspects such as input/output, logging, health checks, and exception handling. It supports synchronous prediction, streaming output, and asynchronous prediction via webhooks. Its health-check feature is robust, offering various server statuses (STARTING, READY, BUSY, and FAILED) to ensure operational reliability.

Some applications primarily use the Cog HTTP prediction server for easy implementation, while others also leverage its CLI tools to manage container images. By defining your environment with a ‘cog.yaml’ file and using Cog CLI tools, you can automatically generate a container image following best practices. This approach eliminates the need to write a Dockerfile from scratch, though it requires learning how to configure the cog.yaml file effectively.

Running cog applications on SaladCloud

All Cog-based applications can be easily run on SaladCloud, enabling you to build a massive, elastic and cost-effective AI inference system across SaladCloud’s global, high-speed network in just minutes.

Here are the main scenarios and suggested approaches. You can find these described in detail on Salad’s documentation page.

ScenarioDescription
1Deploy the Cog-based images directly on SaladCloudRun the images without any modifications. 
If a load balancer is in place for inbound traffic, override the ENTRYPOINT and CMD settings of the images by using SaladCloud Portal or APIs, and configure the Cog HTTP prediction server to use IPv6 before starting the server.
2Build a wrapper image for SaladCloud based on an existing Cog-based imageCreate a new Dockerfile without needing to modify
the original dockerfile. 
Introduce new features and incorporate IPv6 support if applications need to process inbound traffic through a load balancer. 
3Build an image using Cog HTTP prediction server for SaladCloud  Use only the Cog HTTP prediction server without its CLI tools.
Directly work with the Dockerfile for flexible and precise control over the construction of the image.
4Build an image using both Cog CLI tools and HTTP prediction server for SaladCloudUse Cog CLI tools and the cog.yaml file to manage the Dockerfile and image.

Scenario 1: Deploy the Cog-based images directly on SaladCloud 

All Cog-based images can directly run on SaladCloud without any modifications, and you can leverage a load balancer or a job queue along with Cloud Storage for input and output.

If applications need to process inbound traffic through a load balancer on SaladCloud, and since SaladCloud requires listening on an IPv6 port for inbound traffic while the Cog HTTP prediction server currently uses only IPv4 that cannot be configured via an environment variable, configuring the Cog server for IPv6 is necessary when running the images on SaladCloud.

SaladCloud Portal and APIs offer the capability to override the ENTRYPOINT and CMD settings of an image at runtime. This allows configuring the Cog server to use IPv6 with a designated command before starting the server. 

For detailed steps, please refer to the guide [https://docs.salad.com/container-engine/guides/run-cog#scenario-1-deploy-the-cog-based-images-directly-on-saladcloud], where we use two images built by Replicate, BLIP and Whisper, as examples, and provide a walkthrough to run these Cog-based images directly on SaladCloud.

Scenario 2: Build a wrapper image for SaladCloud based on an existing Cog-based image

If you want to introduce new features, such as adding an I/O worker to the Cog HTTP prediction server,  you can create a wrapper image based on an existing Cog-based image, without needing to modify its original Dockerfile.

In the new Dockerfile, you can begin with the original image, introduce additional features and then incorporate IPv6 support if a load balancer is required. There are multiple approaches when working directly with the Dockerfile: you can execute a command to configure the Cog server for IPv6 during the image build process, or you can include a relay tool like socat to facilitate IPv6 to IPv4 routing. For detailed instructions, please consult the guide [https://docs.salad.com/container-engine/guides/run-cog#scenario-2-build-a-wrapper-image-for-saladcloud-based-on-an-existing-cog-based-image].

Scenario 3: Build an image using Cog HTTP prediction server for SaladCloud 

Using only the Cog HTTP prediction server without its CLI tools is feasible if you are comfortable writing a Dockerfile directly. This method offers flexible and precise control over the construction of the image. 

You can refer to this guide [https://docs.salad.com/container-engine/guides/deploy-blip-with-cog]: it provides the steps to leverage LAVIS, a Python Library for Language-Vision Intelligence, and Cog HTTP prediction server to create a BLIP image from scratch, and build a publicly-accessible and scalable inference endpoint on SaladCloud, capable of handling various image-to-text tasks.

Scenario 4: Build an image using both Cog CLI tools and HTTP prediction server for SaladCloud

If you prefer using Cog CLI tools to manage the Dockerfile and image, you can still directly build an image with socat support in case a load balancer is needed for inbound traffic.  Please refer to this guide [https://docs.salad.com/container-engine/gateway/enabling-ipv6#ipv6-with-cog-through-socat] for more information.

Alternatively, you can use the approaches described in Scenario 1 or Scenario 2 to add IPv6 support later.

SaladCloud: The most affordable GPU Cloud for massive AI inference 

The open-source Cog simplifies the creation of AI inference applications with minimal effort. When deploying these applications on SaladCloud, you can harness the power of massive consumer-grade GPUs and SaladCloud’s global, high-speed network to build a highly efficient, reliable, and cost-effective AI inference system.

If you are overpaying for APIs or need compute that’s affordable & scalable, this approach lets you switch workloads to Salad’s distributed cloud with ease.

Have questions about enterprise pricing for SaladCloud?

Book a 15 min call with our team.

Related Blog Posts

OpenMM-benchmark-on-GPUs-Salad-Blog-cover

Molecular Simulation: OpenMM Benchmark on 25 consumer GPUs, 95% less cost

Benchmarking OpenMM for Molecular Simulation on consumer GPUs OpenMM is one of the most popular toolkits for molecular dynamics simulations, renowned for its high-performance, flexibility and extensibility. It enables users...
Read More
Blend cuts AI inference cost by 85% on SaladCloud running 3X more scale

Blend cuts AI inference cost by 85% on Salad while running 3X more scale

Key takeaways: - The team at Blend were facing high inference costs & scalability challenges on major cloud providers & local vendors- Switching to SaladCloud for image generation helped them...
Read More
Civitai powers 10 Million AI images per day on Salad

Civitai powers 10 Million AI images per day with Salad’s distributed cloud

Civitai: The Home of Open-Source Generative AI “Our mission is rooted in the belief that AI resources should be accessible to all, not monopolized by a few” -  Justin Maier,...
Read More

Don’t miss anything!

Subscribe To SaladCloud Newsletter & Stay Updated.

    Seraphinite AcceleratorOptimized by Seraphinite Accelerator
    Turns on site high speed to be attractive for people and search engines.