Skip to main content
The flash run command starts a local development server that lets you test your Flash application before deploying to production. The development server runs locally and updates automatically as you edit files. When you call a @Endpoint function, Flash sends the latest function code to Serverless workers on Runpod, so your changes are reflected immediately.

Start the development server

From inside your project directory, run:
flash run
The server starts at http://localhost:8888 by default. Your endpoints are available immediately for testing, and @Endpoint functions provision Serverless endpoints on first call.

Using a custom host and port

# Change port
flash run --port 3000

# Make accessible on network
flash run --host 0.0.0.0

Test your endpoints

Using curl

# Call a queue-based endpoint (gpu_worker.py)
curl -X POST http://localhost:8888/gpu_worker/runsync \
  -H "Content-Type: application/json" \
  -d '{"message": "Hello from Flash"}'

# Call a load-balanced endpoint (lb_worker.py)
curl -X POST http://localhost:8888/lb_worker/process \
  -H "Content-Type: application/json" \
  -d '{"data": "test"}'

Using the API explorer

Open http://localhost:8888/docs in your browser to access the interactive Swagger UI. You can test all endpoints directly from the browser.

Using Python

import requests

# Call queue-based endpoint
response = requests.post(
    "http://localhost:8888/gpu_worker/runsync",
    json={"message": "Hello from Flash"}
)
print(response.json())

# Call load-balanced endpoint
response = requests.post(
    "http://localhost:8888/lb_worker/process",
    json={"data": "test"}
)
print(response.json())

Reduce cold-start delays

The first call to a @Endpoint function provisions a Serverless endpoint, which takes 30-60 seconds. Use --auto-provision to provision all endpoints at startup:
flash run --auto-provision
This scans your project for @Endpoint functions and deploys them before the server starts accepting requests. Endpoints are cached in .runpod/resources.pkl and reused across server restarts.

How it works

With flash run, Flash starts a local development server alongside remote Serverless endpoints: What runs where:
ComponentLocation
Development serverYour machine (localhost:8888)
@Endpoint function codeRunpod Serverless
Endpoint storageRunpod Serverless
Your code updates automatically as you edit files. Endpoints created by flash run are prefixed with live- to distinguish them from production endpoints.

Clean up after testing

Endpoints created by flash run persist until you delete them. To clean up:
# List all endpoints
flash undeploy list

# Remove a specific endpoint
flash undeploy ENDPOINT_NAME

# Remove all endpoints
flash undeploy --all

Troubleshooting

Port already in use
flash run --port 3000
Slow first request Use --auto-provision to eliminate cold-start delays:
flash run --auto-provision
Authentication errors Ensure RUNPOD_API_KEY is set in your .env file or environment:
export RUNPOD_API_KEY="your_api_key_here"

Next steps