Skip to main content
After deploying your Flash app with flash deploy, you can call your endpoints directly via HTTP. The request format depends on whether you’re using queue-based or load-balanced configurations.

Authentication

All deployed endpoints require authentication with your Runpod API key:
export RUNPOD_API_KEY="your_key_here"

curl -X POST https://YOUR_ENDPOINT_URL/path \
    -H "Authorization: Bearer $RUNPOD_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{"param": "value"}'
Your endpoint URLs are displayed after running flash deploy. You can also view them with flash env get <environment-name>.

Queue-based endpoints

Queue-based endpoints (using @Endpoint(name=..., gpu=...) decorator) provide two routes for job submission: /run (asynchronous) and /runsync (synchronous).

Asynchronous calls (/run)

Submit a job and receive a job ID for later status checking:
curl -X POST https://api.runpod.ai/v2/abc123xyz/run \
    -H "Authorization: Bearer $RUNPOD_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{"input": {"prompt": "Hello world"}}'
Response:
{
    "id": "job-abc-123",
    "status": "IN_QUEUE"
}
Check job status and retrieve results:
curl https://api.runpod.ai/v2/abc123xyz/status/job-abc-123 \
    -H "Authorization: Bearer $RUNPOD_API_KEY"
When the job completes:
{
    "id": "job-abc-123",
    "status": "COMPLETED",
    "output": {
        "generated_text": "Hello world from GPU!"
    }
}

Synchronous calls (/runsync)

Wait for job completion and receive results directly (with timeout):
curl -X POST https://api.runpod.ai/v2/abc123xyz/runsync \
    -H "Authorization: Bearer $RUNPOD_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{"input": {"prompt": "Hello world"}}'
Response (after job completes):
{
    "id": "job-abc-123",
    "status": "COMPLETED",
    "output": {
        "generated_text": "Hello world from GPU!"
    }
}
Use /run for long-running jobs that you’ll check later. Use /runsync for quick jobs where you want immediate results (with timeout protection).

Queue-based request format

Queue-based endpoints expect input wrapped in an {"input": {...}} object:
curl -X POST https://api.runpod.ai/v2/abc123xyz/runsync \
    -H "Authorization: Bearer $RUNPOD_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
        "input": {
            "param1": "value1",
            "param2": "value2"
        }
    }'
The structure inside "input" depends on your @Endpoint function signature.

Job status states

StatusDescription
IN_QUEUEWaiting for an available worker
IN_PROGRESSWorker is executing your function
COMPLETEDFunction finished successfully
FAILEDExecution encountered an error

Load-balanced endpoints

Load-balanced endpoints (using api = Endpoint(...); @api.post("/path") pattern) provide custom HTTP routes with direct request/response patterns.

Calling load-balanced routes

All routes share the same base URL. Append the route path to call specific functions:
# POST route
curl -X POST https://abc123xyz.api.runpod.ai/analyze \
    -H "Authorization: Bearer $RUNPOD_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{"text": "Hello world from Flash"}'

# GET route
curl -X GET https://abc123xyz.api.runpod.ai/info \
    -H "Authorization: Bearer $RUNPOD_API_KEY"

# Another POST route (same endpoint URL)
curl -X POST https://abc123xyz.api.runpod.ai/validate \
    -H "Authorization: Bearer $RUNPOD_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{"name": "Alice", "email": "alice@example.com"}'

Load-balanced request format

Load-balanced endpoints accept direct JSON payloads (no {"input": {...}} wrapper):
curl -X POST https://abc123xyz.api.runpod.ai/process \
    -H "Authorization: Bearer $RUNPOD_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
        "param1": "value1",
        "param2": "value2"
    }'
The payload structure depends on your function signature. Each route can accept different parameters.

Multiple routes, single endpoint

A single load-balanced endpoint can serve multiple routes:
from runpod_flash import Endpoint

api = Endpoint(name="api-server", cpu="cpu5c-4-8", workers=(1, 5))

# All these routes share one endpoint URL
@api.post("/generate")
async def generate_text(prompt: str): ...

@api.post("/translate")
async def translate_text(text: str): ...

@api.get("/health")
async def health_check(): ...
# All use the same base URL with different paths
curl -X POST https://abc123xyz.api.runpod.ai/generate -H "..." -d '{...}'
curl -X POST https://abc123xyz.api.runpod.ai/translate -H "..." -d '{...}'
curl -X GET https://abc123xyz.api.runpod.ai/health -H "..."

Quick reference

Endpoint TypeRoutesRequest FormatResponse
Queue-based/run, /runsync, /status/{id}{"input": {...}}Job ID (async) or result (sync)
Load-balancedCustom paths (e.g., /process)Direct JSON payloadDirect response

Response status codes

CodeMeaning
200Success (load-balanced) or job accepted (queue-based)
400Bad request (invalid input format)
401Unauthorized (invalid or missing API key)
404Route not found
500Internal server error

Error handling

Queue-based errors appear in the job output:
{
    "id": "job-abc-123",
    "status": "FAILED",
    "error": "Error message from your function"
}
Load-balanced errors return HTTP error codes with JSON body:
{
    "error": "Error message from your function",
    "detail": "Additional error context"
}

Using SDKs

For programmatic access, use the Runpod Python SDK:
import runpod

# Set API key
runpod.api_key = "your_api_key"

# Connect to endpoint
endpoint = runpod.Endpoint("YOUR_ENDPOINT_ID")

# Async call (returns job object immediately)
run_request = endpoint.run({"prompt": "Hello world"})
status = run_request.status()  # Check status
output = run_request.output()  # Get result once complete

# Sync call (blocks until complete)
result = endpoint.run_sync({"prompt": "Hello world"})
See the Runpod SDK documentation for complete SDK usage.

Next steps

Deploy apps

Deploy your Flash app to get endpoint URLs.

Configuration reference

View all endpoint configuration parameters.

Runpod SDK

Use the Python SDK for programmatic access.