Skip to main content
Instant Clusters provide fully managed multi-node compute with high-performance networking for distributed workloads. Deploy jobs or large-scale without managing infrastructure, networking, or cluster configuration.
  • Scale beyond single machines: Train models too large for one GPU, or accelerate training across multiple nodes.
  • High-speed networking: 1600-3200 Gbps between nodes for efficient gradient synchronization and data movement.
  • Zero configuration: Pre-configured static IPs, environment variables, and framework support.
  • On-demand: Deploy in minutes, pay only for what you use.

Get started

Deploy a Slurm cluster

Managed Slurm for HPC workloads.

PyTorch distributed training

Multi-node PyTorch for deep learning.

Axolotl fine-tuning

Fine-tune LLMs across multiple GPUs.

How it works

Runpod provisions multiple GPU nodes in the same connected with high-speed networking. One node is designated primary (NODE_RANK=0), and all nodes receive pre-configured environment variables for distributed communication.
The high-speed interfaces (ens1-ens8) handle inter-node communication for , , and . The eth0 interface on the primary node handles external traffic. See the configuration reference for environment variables and network details.

Supported hardware

GPUNetwork speedNodes
B2003200 Gbps2-8 nodes (16-64 GPUs)
H2003200 Gbps2-8 nodes (16-64 GPUs)
H1003200 Gbps2-8 nodes (16-64 GPUs)
A1001600 Gbps2-8 nodes (16-64 GPUs)
For clusters larger than 8 nodes (up to 512 GPUs), contact our sales team.

Pricing

Pricing is based on GPU type and number of nodes. See Instant Clusters pricing for current rates. Custom pricing is available for enterprise workloads. Contact our sales team for details.
All accounts have a default spending limit. To deploy larger clusters, contact help@runpod.io.