Use this file to discover all available pages before exploring further.
This quickstart gets you running a Serverless endpoint on Runpod in minutes, using a ready-to-use template to deploy a language model and send a test request.
Once your endpoint shows Ready status, send a test request. If you haven’t already, export your API key in your terminal:
export RUNPOD_API_KEY="your_api_key_here"
cURL
Python
Run this command in your terminal, replacing YOUR_ENDPOINT_ID with your actual endpoint ID:
curl --request POST \ --url "https://api.runpod.ai/v2/YOUR_ENDPOINT_ID/runsync" \ --header "Authorization: Bearer $RUNPOD_API_KEY" \ --header "Content-Type: application/json" \ --data '{ "input": { "prompt": "What is the capital of France?", "max_tokens": 100 } }'
Create a file called test_endpoint.py and paste the following code:
test_endpoint.py
import requestsimport osENDPOINT_ID = "YOUR_ENDPOINT_ID" # Replace with your endpoint IDAPI_KEY = os.environ.get("RUNPOD_API_KEY")if not API_KEY: raise ValueError("RUNPOD_API_KEY environment variable not set")response = requests.post( f"https://api.runpod.ai/v2/{ENDPOINT_ID}/runsync", headers={"Authorization": f"Bearer {API_KEY}"}, json={ "input": { "prompt": "What is the capital of France?", "max_tokens": 100 } })print(response.json())
Install dependencies and run the script:
pip install requestspython test_endpoint.py
You should receive a response like this:
{ "id": "sync-abc123-xyz", "status": "COMPLETED", "output": { "text": "The capital of France is Paris.", ... }}
The first request may take 30-60 seconds as the worker loads the model into GPU memory. Subsequent requests will complete in just a few seconds until the worker scales down due to inactivity.