Overview

Instant Clusters provide fully managed multi-node compute with high-performance networking for distributed workloads. Deploy jobs or large-scale without managing infrastructure, networking, or cluster configuration.

Scale beyond single machines: Train models too large for one GPU, or accelerate training across multiple nodes.
High-speed networking: 1600-3200 Gbps between nodes for efficient gradient synchronization and data movement.
Zero configuration: Pre-configured static IPs, environment variables, and framework support.
On-demand: Deploy in minutes, pay only for what you use.

Get started

Deploy a Slurm cluster

Managed Slurm for HPC workloads.

PyTorch distributed training

Multi-node PyTorch for deep learning.

Axolotl fine-tuning

Fine-tune LLMs across multiple GPUs.

How it works

Runpod provisions multiple GPU nodes in the same connected with high-speed networking. One node is designated primary (NODE_RANK=0), and all nodes receive pre-configured environment variables for distributed communication.

The high-speed interfaces (ens1-ens8) handle inter-node communication for , , and . The eth0 interface on the primary node handles external traffic. See the configuration reference for environment variables and network details.

Supported hardware

GPU	Network speed	Nodes
B200	3200 Gbps	2-8 nodes (16-64 GPUs)
H200	3200 Gbps	2-8 nodes (16-64 GPUs)
H100	3200 Gbps	2-8 nodes (16-64 GPUs)
A100	1600 Gbps	2-8 nodes (16-64 GPUs)

For clusters larger than 8 nodes (up to 512 GPUs), contact our sales team.

Pricing

Pricing is based on GPU type and number of nodes. See Instant Clusters pricing for current rates. Custom pricing is available for enterprise workloads. Contact our sales team for details.

All accounts have a default spending limit. To deploy larger clusters, contact help@runpod.io.

ConfigurationEnvironment variables, network interfaces, and NCCL configuration for Instant Clusters.

Get started

Flash

Serverless

Pods

Storage

Public Endpoints

Instant Clusters

Integrations

Hub

Fine-tuning

Reference

Get started

Deploy a Slurm cluster

PyTorch distributed training

Axolotl fine-tuning

How it works

Supported hardware

Pricing

Get started

Flash

Serverless

Pods

Storage

Public Endpoints

Instant Clusters

Integrations

Hub

Fine-tuning

Reference

Documentation Index

​Get started

Deploy a Slurm cluster

PyTorch distributed training

Axolotl fine-tuning

​How it works

​Supported hardware

​Pricing

Get started

How it works

Supported hardware

Pricing