flash deploy to build and deploy your Flash application:
- Build: Packages your code, dependencies, and manifest.
- Upload: Sends the artifact to Runpod’s storage.
- Provision: Creates or updates Serverless endpoints.
- Configure: Sets up environment variables and service discovery.
Deployment architecture
Flash deploys your application as multiple independent Serverless endpoints. Each endpoint configuration in your worker files becomes a separate endpoint. How Flash deployments work:- One Endpoint class = one Serverless endpoint: Each unique endpoint configuration (defined by its
nameparameter) creates a separate Serverless endpoint with its own URL. - Call any endpoint: After deployment, you can call whichever endpoint you need—
lb_workerfor API requests,gpu_workerfor GPU tasks,cpu_workerfor CPU tasks. - Load balancing endpoints: Create HTTP APIs with custom routes using
.get(),.post(), etc. decorators. - Queue-based endpoints: Run compute tasks using the
/runsyncor/runroutes. - Inter-endpoint communication: Endpoints can call each other’s functions when needed, using the Runpod GraphQL service for discovery.
Deploy to a specific environment
Flash organizes deployments using apps and environments. Deploy to a specific environment using the--env flag:
Post-deployment
After a successful deployment, Flash displays all deployed endpoints grouped by type:Understanding endpoint architecture
Understanding endpoint architecture
The relationship between endpoint configurations and deployed endpoints differs between load-balanced and queue-based endpoints:This creates two separate Serverless endpoints:This creates:
Queue-based endpoints (one function per endpoint)
For queue-based endpoints, each@Endpoint function must have its own unique name:https://api.runpod.ai/v2/abc123xyz(run-model)https://api.runpod.ai/v2/def456xyz(preprocess)
Load-balanced endpoints (multiple routes per endpoint)
For load-balanced endpoints, you can define multiple HTTP routes on a single endpoint:- One Serverless endpoint:
https://abc123xyz.api.runpod.ai(named “api”) - Three HTTP routes:
POST /generate,POST /translate,GET /health
Preview before deploying
You can test your deployment locally using Docker before pushing to production using the--preview flag:
- Builds your project (creates the deployment artifact and manifest).
- Creates a Docker network for inter-container communication.
- Starts one container per endpoint configuration (
lb_worker,gpu_worker,cpu_worker, etc.). - Exposes all endpoints for local testing.
Ctrl+C to stop the preview environment.
Managing deployment size
Runpod Serverless has a 500MB deployment limit. Flash automatically excludes packages that are pre-installed in the base image:torch,torchvision,torchaudionumpy,triton
--exclude flag to skip additional packages:
Base image packages
| Configuration type | Base image | Auto-excluded packages |
|---|---|---|
GPU (gpu=) | PyTorch base | torch, torchvision, torchaudio, numpy, triton |
CPU (cpu=) | Python slim | torch, torchvision, torchaudio, numpy, triton |
| Load-balanced | Same as GPU/CPU | Same as GPU/CPU |
Build process
When you runflash deploy (or flash build), Flash:
- Discovers all
@Endpointdecorated functions. - Groups functions by their endpoint name.
- Generates handler files for each endpoint.
- Creates a
flash_manifest.jsonfile for service discovery. - Installs dependencies with Linux x86_64 compatibility.
- Packages everything into
.flash/artifact.tar.gz.
Build artifacts
After building, these artifacts are created in the.flash/ directory:
| Artifact | Description |
|---|---|
.flash/artifact.tar.gz | Deployment package |
.flash/flash_manifest.json | Service discovery configuration |
.flash/.build/ | Temporary build directory (removed by default) |
What gets deployed
When you deploy a Flash app, you’re deploying a build artifact (tarball) onto pre-built Flash Docker images. This architecture is similar to AWS Lambda layers: the base runtime is pre-built, and your code and dependencies are layered on top.The build artifact
The.flash/artifact.tar.gz file (max 500 MB) contains:
artifact.tar.gz
lb_worker.py
gpu_worker.py
cpu_worker.py
flash_manifest.json
requirements.txt
[installed dependencies]
torch
transformers
...
The deployment manifest
Theflash_manifest.json file is the brain of your deployment. It tells each endpoint:
- Which functions to execute.
- What Docker image to use.
- How to configure resources (GPUs, workers, scaling).
- How to route HTTP requests (for load balancer endpoints).
What gets created on Runpod
For each endpoint configuration in the manifest, Flash creates an independent Serverless endpoint, identified by itsname parameter.
Cross-endpoint communication
When one endpoint needs to call a function on another endpoint:- Manifest lookup: The calling endpoint checks
flash_manifest.jsonfor function-to-resource mapping. - Service discovery: It queries the state manager (Runpod GraphQL API) for target endpoint URL.
- Direct call: It makes an HTTP request directly to the target endpoint.
- Response: The target endpoint executes the function and returns the result.
Troubleshooting
No @Endpoint functions found
If the build process can’t find your endpoint functions:- Ensure functions are decorated with
@Endpoint(...). - Check that Python files aren’t excluded by
.gitignoreor.flashignore. - Verify decorator syntax is correct.
Deployment size limit exceeded
Base image packages are auto-excluded. If your deployment still exceeds 500MB, use--exclude to skip additional packages:
Authentication errors
Verify your API key is set correctly:.env file or export it:
Import errors in endpoint functions
Import packages inside the endpoint function, not at the top of the file:Next steps
- Learn about apps and environments for managing deployments.
- View the CLI reference for all available commands.
- Configure hardware resources for your endpoints.
- Monitor and troubleshoot your deployments.