Introduction to Cog: Containers for Machine Learning
Cog is an open-source tool designed to streamline the creation of inference applications for various AI models. It offers CLI tools, Python modules for prediction and fine-tuning, and an HTTP prediction server powered by FastAPI, letting you package models in a standard, production-ready container.
When using the Cog HTTP prediction server, the main tasks involve defining two Python functions: one for loading models and initialization, and another for performing inference. The server manages all other aspects such as input/output, logging, health checks, and exception handling. It supports synchronous prediction, streaming output, and asynchronous prediction via webhooks. Its health-check feature is robust, offering various server statuses (STARTING, READY, BUSY, and FAILED) to ensure operational reliability.
Some applications primarily use the Cog HTTP prediction server for easy implementation, while others also leverage its CLI tools to manage container images. By defining your environment with a ‘cog.yaml’ file and using Cog CLI tools, you can automatically generate a container image following best practices. This approach eliminates the need to write a Dockerfile from scratch, though it requires learning how to configure the cog.yaml file effectively.
Running cog applications on SaladCloud
All Cog-based applications can be easily run on SaladCloud, enabling you to build a massive, elastic and cost-effective AI inference system across SaladCloud’s global, high-speed network in just minutes.
Here are the main scenarios and suggested approaches. You can find these described in detail on SaladCloud’s documentation page.
Scenario | Description | |
---|---|---|
1 | Deploy the Cog-based images directly on SaladCloud | Run the images without any modifications. If a load balancer is in place for inbound traffic, override the ENTRYPOINT and CMD settings of the images by using SaladCloud Portal or APIs, and configure the Cog HTTP prediction server to use IPv6 before starting the server. |
2 | Build a wrapper image for SaladCloud based on an existing Cog-based image | Create a new Dockerfile without needing to modify the original dockerfile. Introduce new features and incorporate IPv6 support if applications need to process inbound traffic through a load balancer. |
3 | Build an image using Cog HTTP prediction server for SaladCloud | Use only the Cog HTTP prediction server without its CLI tools. Directly work with the Dockerfile for flexible and precise control over the construction of the image. |
4 | Build an image using both Cog CLI tools and HTTP prediction server for SaladCloud | Use Cog CLI tools and the cog.yaml file to manage the Dockerfile and image. |
Scenario 1: Deploy the Cog-based images directly on SaladCloud
All Cog-based images can directly run on SaladCloud without any modifications, and you can leverage a load balancer or a job queue along with Cloud Storage for input and output.
If applications need to process inbound traffic through a load balancer on SaladCloud, and since SaladCloud requires listening on an IPv6 port for inbound traffic while the Cog HTTP prediction server currently uses only IPv4 that cannot be configured via an environment variable, configuring the Cog server for IPv6 is necessary when running the images on SaladCloud.
SaladCloud Portal and APIs offer the capability to override the ENTRYPOINT and CMD settings of an image at runtime. This allows configuring the Cog server to use IPv6 with a designated command before starting the server.
For detailed steps, please refer to the guide [https://docs.salad.com/container-engine/guides/run-cog#scenario-1-deploy-the-cog-based-images-directly-on-saladcloud], where we use two images built by Replicate, BLIP and Whisper, as examples, and provide a walkthrough to run these Cog-based images directly on SaladCloud.
Scenario 2: Build a wrapper image for SaladCloud based on an existing Cog-based image
If you want to introduce new features, such as adding an I/O worker to the Cog HTTP prediction server, you can create a wrapper image based on an existing Cog-based image, without needing to modify its original Dockerfile.
In the new Dockerfile, you can begin with the original image, introduce additional features and then incorporate IPv6 support if a load balancer is required. There are multiple approaches when working directly with the Dockerfile: you can execute a command to configure the Cog server for IPv6 during the image build process, or you can include a relay tool like socat to facilitate IPv6 to IPv4 routing. For detailed instructions, please consult the guide [https://docs.salad.com/container-engine/guides/run-cog#scenario-2-build-a-wrapper-image-for-saladcloud-based-on-an-existing-cog-based-image].
Scenario 3: Build an image using Cog HTTP prediction server for SaladCloud
Using only the Cog HTTP prediction server without its CLI tools is feasible if you are comfortable writing a Dockerfile directly. This method offers flexible and precise control over the construction of the image.
You can refer to this guide [https://docs.salad.com/container-engine/guides/deploy-blip-with-cog]: it provides the steps to leverage LAVIS, a Python Library for Language-Vision Intelligence, and Cog HTTP prediction server to create a BLIP image from scratch, and build a publicly-accessible and scalable inference endpoint on SaladCloud, capable of handling various image-to-text tasks.
Scenario 4: Build an image using both Cog CLI tools and HTTP prediction server for SaladCloud
If you prefer using Cog CLI tools to manage the Dockerfile and image, you can still directly build an image with socat support in case a load balancer is needed for inbound traffic. Please refer to this guide [https://docs.salad.com/container-engine/gateway/enabling-ipv6#ipv6-with-cog-through-socat] for more information.
Alternatively, you can use the approaches described in Scenario 1 or Scenario 2 to add IPv6 support later.
SaladCloud: The most affordable GPU Cloud for massive AI inference
The open-source Cog simplifies the creation of AI inference applications with minimal effort. When deploying these applications on SaladCloud, you can harness the power of massive consumer-grade GPUs and SaladCloud’s global, high-speed network to build a highly efficient, reliable, and cost-effective AI inference system.
If you are overpaying for APIs or need compute that’s affordable & scalable, this approach lets you switch workloads to Salad’s distributed cloud with ease.