If you’re using an AI coding agent like Claude Code, Cline, or Cursor, you can install the Flash skill package to give your agent detailed context about the Flash SDK:
Copy
npx skills add runpod/skills
This enables your coding agent to provide more accurate Flash code suggestions and troubleshooting help.
Create a file called gpu_demo.py and paste this code into it:
Copy
import asynciofrom runpod_flash import Endpoint, GpuGroup@Endpoint( name="flash-quickstart", gpu=GpuGroup.ANY, workers=3, dependencies=["numpy", "torch"])def gpu_matrix_multiply(size): # IMPORTANT: Import packages INSIDE the function import numpy as np import torch # Get GPU name device_name = torch.cuda.get_device_name(0) # Create random matrices A = np.random.rand(size, size) B = np.random.rand(size, size) # Multiply matrices C = np.dot(A, B) return { "matrix_size": size, "result_mean": float(np.mean(C)), "gpu": device_name }# Call the functionasync def main(): print("Running matrix multiplication on Runpod GPU...") result = await gpu_matrix_multiply(1000) print(f"\n✓ Matrix size: {result['matrix_size']}x{result['matrix_size']}") print(f"✓ Result mean: {result['result_mean']:.4f}") print(f"✓ GPU used: {result['gpu']}")if __name__ == "__main__": asyncio.run(main())
Make sure you activate your virtual environment in the same directory where you created the gpu_demo.py file. If you open a new terminal, run source .venv/bin/activate before executing the script.
You’ll see Flash provision a GPU worker and execute your function:
Copy
Running matrix multiplication on Runpod GPU...Creating endpoint: flash-quickstartProvisioning Serverless endpoint...Endpoint readyExecuting function on RunPod endpoint ID: xvf32dan8rcilpInitial job status: IN_QUEUEJob completed, output received✓ Matrix size: 1000x1000✓ Result mean: 249.8286✓ GPU used: NVIDIA RTX A5000
The first run takes 30-60 seconds, while Runpod provisions the endpoint, installs dependencies, and starts a worker. Subsequent runs take 2-3 seconds (because the worker is already running).
If you’re having authorization issues, you can set your API key directly in your terminal:
Copy
export RUNPOD_API_KEY="your_key"
Replace your_key with your actual API key from the Runpod console.
Try running the script again immediately and notice how much faster it is. Flash reuses the same endpoint and cached dependencies. You can even update the code and run it again to see the changes take effect instantly.
Flash makes it easy to run multiple GPU operations concurrently. Replace your main() function with the code below:
Copy
async def main(): print("Running 3 matrix operations in parallel...") # Run all three operations at once results = await asyncio.gather( gpu_matrix_multiply(500), gpu_matrix_multiply(1000), gpu_matrix_multiply(2000) ) # Print results for i, result in enumerate(results, 1): print(f"\n{i}. Size: {result['matrix_size']}x{result['matrix_size']}") print(f" Mean: {result['result_mean']:.4f}") print(f" GPU: {result['gpu']}")