After deploying your Flash app with flash deploy, you can call your endpoints directly via HTTP. The request format depends on whether you’re using queue-based or load-balanced configurations.
Authentication
All deployed endpoints require authentication with your Runpod API key:
export RUNPOD_API_KEY = "your_key_here"
curl -X POST https://YOUR_ENDPOINT_URL/path \
-H "Authorization: Bearer $RUNPOD_API_KEY " \
-H "Content-Type: application/json" \
-d '{"param": "value"}'
Your endpoint URLs are displayed after running flash deploy. You can also view them with flash env get <environment-name>.
Queue-based endpoints
Queue-based endpoints (using @Endpoint(name=..., gpu=...) decorator) provide two routes for job submission: /run (asynchronous) and /runsync (synchronous).
Asynchronous calls (/run)
Submit a job and receive a job ID for later status checking:
curl -X POST https://api.runpod.ai/v2/abc123xyz/run \
-H "Authorization: Bearer $RUNPOD_API_KEY " \
-H "Content-Type: application/json" \
-d '{"input": {"prompt": "Hello world"}}'
Response:
{
"id" : "job-abc-123" ,
"status" : "IN_QUEUE"
}
Check job status and retrieve results:
curl https://api.runpod.ai/v2/abc123xyz/status/job-abc-123 \
-H "Authorization: Bearer $RUNPOD_API_KEY "
When the job completes:
{
"id" : "job-abc-123" ,
"status" : "COMPLETED" ,
"output" : {
"generated_text" : "Hello world from GPU!"
}
}
Synchronous calls (/runsync)
Wait for job completion and receive results directly (with timeout):
curl -X POST https://api.runpod.ai/v2/abc123xyz/runsync \
-H "Authorization: Bearer $RUNPOD_API_KEY " \
-H "Content-Type: application/json" \
-d '{"input": {"prompt": "Hello world"}}'
Response (after job completes):
{
"id" : "job-abc-123" ,
"status" : "COMPLETED" ,
"output" : {
"generated_text" : "Hello world from GPU!"
}
}
Use /run for long-running jobs that you’ll check later. Use /runsync for quick jobs where you want immediate results (with timeout protection).
Queue-based endpoints expect input wrapped in an {"input": {...}} object:
curl -X POST https://api.runpod.ai/v2/abc123xyz/runsync \
-H "Authorization: Bearer $RUNPOD_API_KEY " \
-H "Content-Type: application/json" \
-d '{
"input": {
"param1": "value1",
"param2": "value2"
}
}'
The structure inside "input" depends on your @Endpoint function signature.
Job status states
Status Description IN_QUEUEWaiting for an available worker IN_PROGRESSWorker is executing your function COMPLETEDFunction finished successfully FAILEDExecution encountered an error
Load-balanced endpoints
Load-balanced endpoints (using api = Endpoint(...); @api.post("/path") pattern) provide custom HTTP routes with direct request/response patterns.
Calling load-balanced routes
All routes share the same base URL. Append the route path to call specific functions:
# POST route
curl -X POST https://abc123xyz.api.runpod.ai/analyze \
-H "Authorization: Bearer $RUNPOD_API_KEY " \
-H "Content-Type: application/json" \
-d '{"text": "Hello world from Flash"}'
# GET route
curl -X GET https://abc123xyz.api.runpod.ai/info \
-H "Authorization: Bearer $RUNPOD_API_KEY "
# Another POST route (same endpoint URL)
curl -X POST https://abc123xyz.api.runpod.ai/validate \
-H "Authorization: Bearer $RUNPOD_API_KEY " \
-H "Content-Type: application/json" \
-d '{"name": "Alice", "email": "alice@example.com"}'
Load-balanced endpoints accept direct JSON payloads (no {"input": {...}} wrapper):
curl -X POST https://abc123xyz.api.runpod.ai/process \
-H "Authorization: Bearer $RUNPOD_API_KEY " \
-H "Content-Type: application/json" \
-d '{
"param1": "value1",
"param2": "value2"
}'
The payload structure depends on your function signature. Each route can accept different parameters.
Multiple routes, single endpoint
A single load-balanced endpoint can serve multiple routes:
from runpod_flash import Endpoint
api = Endpoint( name = "api-server" , cpu = "cpu5c-4-8" , workers = ( 1 , 5 ))
# All these routes share one endpoint URL
@api.post ( "/generate" )
async def generate_text (prompt: str ): ...
@api.post ( "/translate" )
async def translate_text (text: str ): ...
@api.get ( "/health" )
async def health_check (): ...
# All use the same base URL with different paths
curl -X POST https://abc123xyz.api.runpod.ai/generate -H "..." -d '{...}'
curl -X POST https://abc123xyz.api.runpod.ai/translate -H "..." -d '{...}'
curl -X GET https://abc123xyz.api.runpod.ai/health -H "..."
Quick reference
Endpoint Type Routes Request Format Response Queue-based /run, /runsync, /status/{id}{"input": {...}}Job ID (async) or result (sync) Load-balanced Custom paths (e.g., /process) Direct JSON payload Direct response
Response status codes
Code Meaning 200Success (load-balanced) or job accepted (queue-based) 400Bad request (invalid input format) 401Unauthorized (invalid or missing API key) 404Route not found 500Internal server error
Error handling
Queue-based errors appear in the job output:
{
"id" : "job-abc-123" ,
"status" : "FAILED" ,
"error" : "Error message from your function"
}
Load-balanced errors return HTTP error codes with JSON body:
{
"error" : "Error message from your function" ,
"detail" : "Additional error context"
}
Using SDKs
For programmatic access, use the Runpod Python SDK:
import runpod
# Set API key
runpod.api_key = "your_api_key"
# Connect to endpoint
endpoint = runpod.Endpoint( "YOUR_ENDPOINT_ID" )
# Async call (returns job object immediately)
run_request = endpoint.run({ "prompt" : "Hello world" })
status = run_request.status() # Check status
output = run_request.output() # Get result once complete
# Sync call (blocks until complete)
result = endpoint.run_sync({ "prompt" : "Hello world" })
See the Runpod SDK documentation for complete SDK usage.
Next steps
Deploy apps Deploy your Flash app to get endpoint URLs.
Configuration reference View all endpoint configuration parameters.
Runpod SDK Use the Python SDK for programmatic access.