Creating inference providers: regolo.ai - scalability and efficiency of GPUs, LLMOps with kubernetes

Learn how to build a GPU-scalable inference provider for open-source models using LLMOps and Kubernetes. Get insights on automation and potential use cases.

Overview

Share our experience and our work to create an inference provider for open-source/free access models. The provider we would like to build is centered around the open-source model. Discuss the topics of GPU scalability, the necessary automations in the field of InferenceOPS, and Kubernetes. We would like to receive feedback on potential use and suggestions for development.

Links

https://regolo.ai
Kubernetes GPU platform serving optimized Llama, Qwen, and FLUX models via API.

Tech stack