Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
regolo.ai: Scalable GPU Inference
This talk covers building an open-source inference provider focused on GPU scalability, InferenceOPS automation, and Kubernetes integration for efficient model deployment.
Share our experience and our work to create an inference provider for open-source/free access models. The provider we would like to build is centered around the open-source model. Discuss the topics of GPU scalability, the necessary automations in the field of InferenceOPS, and Kubernetes. We would like to receive feedback on potential use and suggestions for development.
Kubernetes GPU platform serving optimized Llama, Qwen, and FLUX models via API.