> ## Documentation Index
> Fetch the complete documentation index at: https://runpod-b18f5ded-new-sls-quickstart.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Test Flash apps locally

> Use flash run to test your Flash application locally before deploying.

The `flash run` command starts a local development server that lets you test your Flash application before deploying to production. The development server runs locally and updates automatically as you edit files.

When you call a `@Endpoint` function, Flash sends the latest function code to Serverless workers on Runpod, so your changes are reflected immediately.

## Start the development server

From inside your [project directory](/flash/apps/initialize-project), run:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
flash run

# If using uv:
uv run flash run
```

The server starts at `http://localhost:8888` by default. Your endpoints are available immediately for testing, and `@Endpoint` functions provision Serverless endpoints on first call.

### Using a custom host and port

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
# Change port
flash run --port 3000

# Make accessible on network
flash run --host 0.0.0.0

# If using uv:
uv run flash run --port 3000
uv run flash run --host 0.0.0.0
```

## Test your endpoints

### Using curl

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
# Call a queue-based endpoint (gpu_worker.py)
curl -X POST http://localhost:8888/gpu_worker/runsync \
  -H "Content-Type: application/json" \
  -d '{"input": {"input_data": {"message": "Hello from the GPU"}}}'

# Call a load-balanced endpoint (lb_worker.py)
curl -X POST http://localhost:8888/lb_worker/process \
  -H "Content-Type: application/json" \
  -d '{"input_data": {"message": "Hello from Flash"}}'
```

### Using the API explorer

Open [http://localhost:8888/docs](http://localhost:8888/docs) in your browser to access the interactive Swagger UI. You can test all endpoints directly from the browser.

### Using Python

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
import requests

# Call queue-based endpoint
response = requests.post(
    "http://localhost:8888/gpu_worker/runsync",
    json={"input": {"input_data": {"message": "Hello from the GPU"}}}
)
print(response.json())

# Call load-balanced endpoint
response = requests.post(
    "http://localhost:8888/lb_worker/process",
    json={"input_data": {"message": "Hello from Flash"}}
)
print(response.json())
```

## Reduce cold-start delays

The first call to a `@Endpoint` function provisions a Serverless endpoint, which takes 30-60 seconds. Use `--auto-provision` to provision all endpoints at startup:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
flash run --auto-provision

# If using uv:
uv run flash run --auto-provision
```

This scans your project for `@Endpoint` functions and deploys them before the server starts accepting requests. Endpoints are cached in `.runpod/resources.pkl` and reused across server restarts.

## How it works

With `flash run`, Flash starts a local development server alongside remote Serverless endpoints:

```mermaid theme={"theme":{"light":"github-light","dark":"github-dark"}}
%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#9289FE','primaryTextColor':'#fff','primaryBorderColor':'#9289FE','lineColor':'#5F4CFE','secondaryColor':'#AE6DFF','tertiaryColor':'#FCB1FF','edgeLabelBackground':'#5F4CFE', 'fontSize':'14px','fontFamily':'font-inter'}}}%%

flowchart TB
    Browser(["BROWSER/CURL"])

    subgraph Local ["YOUR MACHINE (localhost:8888)"]
        DevServer["Development Server<br/>• Auto-reload on changes<br/>• API explorer at /docs<br/>• Routes requests"]
    end

    subgraph Runpod ["RUNPOD SERVERLESS"]
        LB["live-lb_worker"]
        GPU["live-gpu_worker"]
        CPU["live-cpu_worker"]
    end

    Browser -->|"HTTP"| DevServer
    DevServer -->|"HTTPS"| LB
    DevServer -->|"HTTPS"| GPU
    DevServer -->|"HTTPS"| CPU

    style Local fill:#1a1a2e,stroke:#5F4CFE,stroke-width:2px,color:#fff
    style Runpod fill:#1a1a2e,stroke:#5F4CFE,stroke-width:2px,color:#fff
    style Browser fill:#4D38F5,stroke:#4D38F5,color:#fff
    style DevServer fill:#5F4CFE,stroke:#5F4CFE,color:#fff
    style LB fill:#22C55E,stroke:#22C55E,color:#000
    style GPU fill:#22C55E,stroke:#22C55E,color:#000
    style CPU fill:#22C55E,stroke:#22C55E,color:#000
```

**What runs where:**

| Component                 | Location                      |
| ------------------------- | ----------------------------- |
| Development server        | Your machine (localhost:8888) |
| `@Endpoint` function code | Runpod Serverless             |
| Endpoint storage          | Runpod Serverless             |

Your code updates automatically as you edit files. Endpoints created by `flash run` are prefixed with `live-` to distinguish them from production endpoints.

## Clean up after testing

Endpoints created by `flash run` persist until you delete them. To clean up:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
# List all endpoints
flash undeploy list

# Remove a specific endpoint
flash undeploy ENDPOINT_NAME

# Remove all endpoints
flash undeploy --all

# If using uv:
uv run flash undeploy list
uv run flash undeploy ENDPOINT_NAME
uv run flash undeploy --all
```

## Troubleshooting

**Port already in use**

Flash automatically selects the next available port if your specified port is in use. You'll see a message like `Port 8888 is in use, using 8889 instead.` If you need a specific port, stop the process using it or specify a different starting port with `--port`.

**Slow first request**

Use `--auto-provision` to eliminate cold-start delays:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
flash run --auto-provision

# If using uv:
uv run flash run --auto-provision
```

**Authentication errors**

Run `flash login` to authenticate:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
flash login

# If using uv:
uv run flash login
```

Alternatively, set `RUNPOD_API_KEY` as an environment variable or in your `.env` file:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
export RUNPOD_API_KEY="your_api_key_here"
```

<Note>
  Values in your `.env` file are only available locally for CLI commands. They are not passed to deployed endpoints.
</Note>

## Next steps

* [Deploy to production](/flash/apps/deploy-apps) when your app is ready.
* [Clean up endpoints](/flash/cli/undeploy) after testing.
* [View the flash run reference](/flash/cli/run) for all options.