Skip to main content
Flash is currently in beta. Join our Discord to provide feedback and get support.
This quickstart gets you running GPU workloads on Runpod in minutes. You’ll execute a function on a remote GPU and see the results immediately.

Requirements

Step 1: Install Flash

Flash is currently available for macOS and Linux. Windows support is in development.
Create a virtual environment and install Flash using uv:
uv venv
source .venv/bin/activate
uv pip install runpod-flash

Optional: Install coding agent integration

If you’re using an AI coding agent like Claude Code, Cline, or Cursor, you can install the Flash skill package to give your agent detailed context about the Flash SDK:
npx skills add runpod/skills
This enables your coding agent to provide more accurate Flash code suggestions and troubleshooting help.

Step 2: Authenticate with Runpod

Log in to your Runpod account:
flash login
This opens your browser to authorize Flash. After you approve, your credentials are saved, allowing you to run Flash commands and scripts.
Alternatively, you can set the RUNPOD_API_KEY environment variable or add it to a .env file. See flash login for details.

Step 3: Copy this code

Create a file called gpu_demo.py and paste this code into it:
import asyncio
from runpod_flash import Endpoint, GpuGroup

@Endpoint(
    name="flash-quickstart",
    gpu=GpuGroup.ANY,
    workers=3,
    dependencies=["numpy", "torch"]
)
def gpu_matrix_multiply(size):
    # IMPORTANT: Import packages INSIDE the function
    import numpy as np
    import torch

    # Get GPU name
    device_name = torch.cuda.get_device_name(0)

    # Create random matrices
    A = np.random.rand(size, size)
    B = np.random.rand(size, size)

    # Multiply matrices
    C = np.dot(A, B)

    return {
        "matrix_size": size,
        "result_mean": float(np.mean(C)),
        "gpu": device_name
    }

# Call the function
async def main():
    print("Running matrix multiplication on Runpod GPU...")
    result = await gpu_matrix_multiply(1000)

    print(f"\n✓ Matrix size: {result['matrix_size']}x{result['matrix_size']}")
    print(f"✓ Result mean: {result['result_mean']:.4f}")
    print(f"✓ GPU used: {result['gpu']}")

if __name__ == "__main__":
    asyncio.run(main())
Make sure you activate your virtual environment in the same directory where you created the gpu_demo.py file. If you open a new terminal, run source .venv/bin/activate before executing the script.

Step 4: Run it

Execute the script:
python gpu_demo.py
You’ll see Flash provision a GPU worker and execute your function:
Running matrix multiplication on Runpod GPU...
Creating endpoint: flash-quickstart
Provisioning Serverless endpoint...
Endpoint ready
Executing function on RunPod endpoint ID: xvf32dan8rcilp
Initial job status: IN_QUEUE
Job completed, output received

✓ Matrix size: 1000x1000
✓ Result mean: 249.8286
✓ GPU used: NVIDIA RTX A5000
The first run takes 30-60 seconds, while Runpod provisions the endpoint, installs dependencies, and starts a worker. Subsequent runs take 2-3 seconds (because the worker is already running).
If you’re having authorization issues, you can set your API key directly in your terminal:
export RUNPOD_API_KEY="your_key"
Replace your_key with your actual API key from the Runpod console.
Try running the script again immediately and notice how much faster it is. Flash reuses the same endpoint and cached dependencies. You can even update the code and run it again to see the changes take effect instantly.

Step 5: Understand what you just did

Let’s break down the code you just ran:

Imports and setup

import asyncio
from runpod_flash import Endpoint, GpuGroup
  • asyncio: Enables asynchronous execution (endpoint functions run async).
  • Endpoint: The class that marks functions for remote execution.
  • GpuGroup: Enum for selecting GPU types or groups of GPUs.
Flash automatically loads your credentials from flash login or the RUNPOD_API_KEY environment variable.

The @Endpoint decorator

@Endpoint(
    name="flash-quickstart",
    gpu=GpuGroup.ANY,
    workers=3,
    dependencies=["numpy", "torch"]
)
def gpu_matrix_multiply(size):
    import numpy as np
    import torch

    # Get GPU name
    device_name = torch.cuda.get_device_name(0)

    # Create random matrices
    A = np.random.rand(size, size)
    B = np.random.rand(size, size)

    # Multiply matrices
    C = np.dot(A, B)

    return {
        "matrix_size": size,
        "result_mean": float(np.mean(C)),
        "gpu": device_name
    }
The @Endpoint decorator configures everything in one place:
  • name: Identifies your endpoint in the Runpod console.
  • gpu: Which GPU to use (GpuGroup.ANY accepts any available GPU for faster provisioning).
  • workers: Maximum parallel workers (allows 3 concurrent executions).
  • dependencies: Python packages to install on the worker.
  • Function body: The matrix multiplication code runs on the remote GPU, not your local machine.
  • Return value: The result is returned to your local machine as a Python dictionary.
See GPU types for available GPUs or endpoint functions for all configuration options.
You must import packages inside the function body, not at the top of your file. These imports need to happen on the remote worker.

Calling the function

async def main():
    print("Running matrix multiplication on Runpod GPU...")
    result = await gpu_matrix_multiply(1000)

    print(f"\n✓ Matrix size: {result['matrix_size']}x{result['matrix_size']}")
    print(f"✓ Result mean: {result['result_mean']:.4f}")
    print(f"✓ GPU used: {result['gpu']}")

if __name__ == "__main__":
    asyncio.run(main())
Here’s what happens when you call an @Endpoint decorated function:
  1. Flash checks if the endpoint specified in your decorator already exists.
    • If yes: It updates the endpoint if the configuration has changed.
    • If no: It creates a new endpoint, initializes a worker, and installs your dependencies.
  2. Flash sends your code to the GPU worker
  3. The GPU worker executes the function with the provided inputs.
  4. The result is returned to your local machine as a Python dictionary, where it’s printed in your terminal.
Everything outside the @Endpoint function (all the print statements, etc.) runs locally on your machine. Only the decorated function runs remotely.

Step 6: Run multiple operations in parallel

Flash makes it easy to run multiple GPU operations concurrently. Replace your main() function with the code below:
async def main():
    print("Running 3 matrix operations in parallel...")

    # Run all three operations at once
    results = await asyncio.gather(
        gpu_matrix_multiply(500),
        gpu_matrix_multiply(1000),
        gpu_matrix_multiply(2000)
    )

    # Print results
    for i, result in enumerate(results, 1):
        print(f"\n{i}. Size: {result['matrix_size']}x{result['matrix_size']}")
        print(f"   Mean: {result['result_mean']:.4f}")
        print(f"   GPU: {result['gpu']}")
Run the script again:
python gpu_demo.py
All three operations execute simultaneously:
Running 3 matrix operations in parallel...
Initial job status: IN_QUEUE
Initial job status: IN_QUEUE
Initial job status: IN_QUEUE
Job completed, output received
Job completed, output received
Job completed, output received

1. Size: 500x500
   Mean: 125.3097
   GPU: NVIDIA RTX A5000

2. Size: 1000x1000
   Mean: 249.9442
   GPU: NVIDIA RTX A5000

3. Size: 2000x2000
   Mean: 500.1321
   GPU: NVIDIA RTX A5000

Clean up

When you’re done testing, clean up the endpoints:
# List all endpoints
flash undeploy list

# Remove the quickstart endpoint
flash undeploy flash-quickstart

# Or remove all endpoints
flash undeploy --all

Next steps

You’ve successfully run GPU code on Runpod! Now you’re ready to learn more about Flash:

Generate images with Flash

Use Stable Diffusion XL to generate images from text prompts.

Endpoint functions

Learn how to configure and optimize endpoint functions.

Build Flash apps

Deploy production APIs.

Explore Flash examples

Browse example Flash scripts and apps on GitHub.

Troubleshooting

Authentication error

Error: API key is not set
Solution: Run flash login to authenticate with your Runpod account:
flash login
Alternatively, set the RUNPOD_API_KEY environment variable:
export RUNPOD_API_KEY="your_key"

Template name conflict

Error: endpoint template names must be unique
Solution: Each endpoint needs a unique name. If you’ve deployed an endpoint before with the same name, either:
  • Use a different name for your new endpoint
  • Undeploy the existing endpoint with flash undeploy <name> --force

Job stuck in queue

Initial job status: IN_QUEUE
[Stays in queue for >60 seconds]
Solution: No GPUs available. Use GpuGroup.ANY to accept any available GPU:
@Endpoint(
    name="flash-quickstart",
    gpu=GpuGroup.ANY,
    dependencies=["numpy", "torch"]
)
def gpu_matrix_multiply(size):
    ...
Or add multiple specific GPU types for fallback:
@Endpoint(
    name="flash-quickstart",
    gpu=[
        GpuType.NVIDIA_GEFORCE_RTX_4090,
        GpuType.NVIDIA_RTX_A5000,
        GpuType.NVIDIA_RTX_A6000
    ],
    dependencies=["numpy", "torch"]
)
def gpu_matrix_multiply(size):
    ...
You can also check GPU availability in the console.

Import errors

ModuleNotFoundError: No module named 'numpy'
Solution: Move imports inside the @Endpoint function:
@Endpoint(name="compute", gpu=GpuGroup.ANY, dependencies=["numpy"])
def my_function():
    import numpy as np  # Import here, not at top of file
    # ...
See the execution model for more troubleshooting.