Endpoint class.
Parameter overview
| Parameter | Type | Description | Default |
|---|---|---|---|
name | str | Endpoint name (required unless id= is used) | - |
id | str | Connect to existing endpoint by ID | None |
gpu | GpuGroup, GpuType, or list | GPU type(s) for the endpoint | GpuGroup.ANY |
cpu | str or CpuInstanceType | CPU instance type (mutually exclusive with gpu) | None |
workers | int or (min, max) | Worker scaling configuration | (0, 1) |
idle_timeout | int | Seconds before scaling down idle workers | 60 |
dependencies | list[str] | Python packages to install | None |
system_dependencies | list[str] | System packages to install (apt) | None |
accelerate_downloads | bool | Enable download acceleration | True |
volume | NetworkVolume | Network volume for persistent storage | None |
datacenter | DataCenter | Preferred datacenter | EU_RO_1 |
env | dict[str, str] | Environment variables | None |
gpu_count | int | GPUs per worker | 1 |
execution_timeout_ms | int | Max execution time in milliseconds | 0 (no limit) |
flashboot | bool | Enable Flashboot fast startup | True |
image | str | Custom Docker image to deploy | None |
scaler_type | ServerlessScalerType | Scaling strategy | auto |
scaler_value | int | Scaling threshold | 4 |
template | PodTemplate | Pod template overrides | None |
Parameter details
name
Type:str
Required: Yes (unless id= is specified)
The endpoint name visible in the Runpod console. Use descriptive names to easily identify endpoints.
id
Type:str
Default: None
Connect to an existing deployed endpoint by its ID. When id is specified, name is not required.
gpu
Type:GpuGroup, GpuType, or list[GpuGroup | GpuType]
Default: GpuGroup.ANY (if neither gpu nor cpu is specified)
Specifies GPU hardware for the endpoint. Accepts a single GPU type/group or a list for fallback strategies.
cpu
Type:str or CpuInstanceType
Default: None
Specifies a CPU instance type. Mutually exclusive with gpu.
workers
Type:int or tuple[int, int]
Default: (0, 1)
Controls worker scaling. Accepts either a single integer (max workers with min=0) or a tuple of (min, max).
workers=Norworkers=(0, N): Cost-optimized, allows scale to zeroworkers=(1, N): Avoid cold starts by keeping at least one worker warmworkers=(N, N): Fixed worker count for consistent performance
idle_timeout
Type:int
Default: 60
Seconds workers stay active with no traffic before scaling down (to minimum workers).
30-60 seconds: Cost-optimized, infrequent traffic60-120 seconds: Balanced, variable traffic patterns120-300 seconds: Latency-optimized, consistent traffic
dependencies
Type:list[str]
Default: None
Python packages to install on the remote worker before executing your function. Supports standard pip syntax.
system_dependencies
Type:list[str]
Default: None
System-level packages to install via apt before your function runs.
accelerate_downloads
Type:bool
Default: True
Enables faster downloads for dependencies, models, and large files. Disable if you encounter compatibility issues.
volume
Type:NetworkVolume
Default: None
Attaches a network volume for persistent storage. Volumes are mounted at /runpod-volume/. Flash uses the volume name to find an existing volume or create a new one.
- Share large models across workers
- Persist data between runs
- Share datasets across endpoints
datacenter
Type:DataCenter
Default: DataCenter.EU_RO_1
Preferred datacenter for worker deployment.
Flash Serverless deployments are currently restricted to
EU-RO-1.env
Type:dict[str, str]
Default: None
Environment variables passed to all workers. Useful for API keys, configuration, and feature flags.
Environment variables are excluded from configuration hashing. Changing environment values won’t trigger endpoint recreation, making it easy to rotate API keys.
gpu_count
Type:int
Default: 1
Number of GPUs per worker. Use for multi-GPU workloads.
execution_timeout_ms
Type:int
Default: 0 (no limit)
Maximum execution time for a single job in milliseconds. Jobs exceeding this timeout are terminated.
flashboot
Type:bool
Default: True
Enables Flashboot for faster cold starts by pre-loading container images.
False for debugging or compatibility reasons.
image
Type:str
Default: None
Custom Docker image to deploy. When specified, the endpoint runs your Docker image instead of Flash’s managed workers.
scaler_type
Type:ServerlessScalerType
Default: Auto-selected based on endpoint type
Scaling algorithm strategy. Defaults are automatically set:
- Queue-based:
QUEUE_DELAY(scales based on queue depth) - Load-balanced:
REQUEST_COUNT(scales based on active requests)
scaler_value
Type:int
Default: 4
Parameter value for the scaling algorithm. With QUEUE_DELAY, represents target jobs per worker before scaling up.
template
Type:PodTemplate
Default: None
Advanced pod configuration overrides.
PodTemplate
PodTemplate provides advanced pod configuration options:
| Parameter | Type | Description | Default |
|---|---|---|---|
containerDiskInGb | int | Container disk size in GB | 64 |
env | list[dict] | Environment variables as list of {"key": "...", "value": "..."} | None |
EndpointJob
When usingEndpoint(id=...) or Endpoint(image=...), the .run() method returns an EndpointJob object for async operations:
Configuration change behavior
When you change configuration and redeploy, Flash automatically updates your endpoint.Changes that recreate workers
These changes restart all workers:- GPU configuration (
gpu,gpu_count) - CPU instance type (
cpu) - Docker image (
image) - Storage (
volume) - Datacenter (
datacenter) - Flashboot setting (
flashboot)
Changes that update settings only
These changes apply immediately with no downtime:- Worker scaling (
workers) - Timeouts (
idle_timeout,execution_timeout_ms) - Scaler settings (
scaler_type,scaler_value) - Environment variables (
env) - Endpoint name (
name)