Efficient Large-Scale GPU Workload Management with Kubernetes and Slurm

12 min readApr 11, 2026

Combining Kubernetes and Slurm provides a scalable and flexible solution for resource allocation and job scheduling. This post focuses on the slurm-operator, which is how NVIDIA runs Slurm on Kubernetes for large-scale GPU training clusters. By integrating Slurm with Kubernetes, we can efficiently manage large-scale GPU workloads.

Introduction to Slurm and Kubernetes

Slurm is an open source cluster management and job scheduling system for Linux. It manages job scheduling for over 65% of TOP500 systems. Kubernetes is a container orchestration system that automates the deployment, scaling, and management of containerized applications. By combining Slurm and Kubernetes, we can create a powerful platform for managing large-scale GPU workloads.

Slinky Slurm-Operator

Slinky, an open-source project by NVIDIA, integrates Slurm with Kubernetes to manage GPU infrastructure at scale. Slinky slurm-operator maps each Slurm component (`slurmctld` for scheduling, `slurmdbd` for accounting, `slurmd` for compute workers, `slurmrestd` for API access) as a Kubernetes Custom Resource Definition (CRD). This allows for seamless integration of Slurm with Kubernetes.

bash

#!/bin/bash
#SBATCH -J mpi-ping-pong
#SBATCH -o /shared_storage/mpi-ping-pong-%j.out
#SBATCH -e /shared_storage/mpi-ping-pong-%j.err
#SBATCH -t 0-2:00:00
#SBATCH --nodes=3
#SBATCH --ntasks-per-node=1

Example Slurm job script

Deployment and Scaling

The slurm-operator can be installed through Helm, and Slurm clusters can be defined as Custom Resources. This allows for easy deployment and scaling of Slurm clusters on Kubernetes. NVIDIA runs Slinky slurm-operator in production across multiple clusters, with some deployments scaling to over 8,000 GPUs.

8,000+

GPUs in production clusters

💡 Easy Deployment and Scaling

With the slurm-operator, deploying and scaling Slurm clusters on Kubernetes is easy and efficient.

Efficient Large-Scale GPU Workload Management with Kubernetes and Slurm — Deployment and Scaling — Deployment and Scaling

Conclusion and Future Work

Combining Kubernetes and Slurm provides a powerful platform for managing large-scale GPU workloads. The slurm-operator provides a scalable and flexible solution for resource allocation and job scheduling. As the demand for large-scale GPU workloads continues to grow, the importance of efficient workload management will only increase.

How this compares

Component	Open / This Approach	Proprietary Alternative
Model provider	Any — OpenAI, Anthropic, Ollama	Single vendor lock-in

🔑 Key Takeaway

The combination of Kubernetes and Slurm provides a scalable and flexible solution for resource allocation and job scheduling. By integrating Slurm with Kubernetes, we can efficiently manage large-scale GPU workloads. This platform is well-suited for a wide range of applications, from small-scale research projects to large-scale enterprise deployments.

Key Links

Efficient Large-Scale GPU Workload Management with Kubernetes and Slurm

ByAI

Introduction to Slurm and Kubernetes

Slinky Slurm-Operator

Deployment and Scaling

Conclusion and Future Work

How this compares

Watch: Technical Walkthrough

By AI

Related Post

Leave a Reply Cancel reply

You missed

Advancing Multimodal Understanding with Gemma 4 and Byte-for-Byte Capable Open Models

Efficient Large-Scale GPU Workload Management with Kubernetes and Slurm

Unlocking Custom GPTs for Enhanced Language Understanding

Building Multimodal Embedding Models with Sentence Transformers