Requirements:
- You’ve created a Runpod account.
- You’ve created a Runpod API key.
- You’ve installed Python 3.10-3.12 (3.13+ is not yet supported).
Step 1: Initialize a new project
Create a new directory and install Flash using uv:flash init command to generate a structured project template with a preconfigured application entry point:
.env file or exporting the RUNPOD_API_KEY environment variable:
YOUR_API_KEY with your actual Runpod API key.
Step 2: Explore the project template
This is the structure of the project template created byflash init:
flash_app
lb_worker.py
gpu_worker.py
cpu_worker.py
.env.example
.flashignore
.gitignore
pyproject.toml
requirements.txt
README.md
- Example worker files with
@Endpointdecorated functions for load-balanced and queue-based endpoints. - Templates for
requirements.txt,.env.example,.gitignore, etc. - Pre-configured endpoint configurations for GPU and CPU workers.
/gpu/hello and /cpu/hello, which call the endpoint functions described in their respective worker files.
Step 3: Install Python dependencies
Install required dependencies:Step 4: Configure your API key
Open the.env template file in a text editor and add your Runpod API key:
# symbol from the beginning of the RUNPOD_API_KEY line and replace your_api_key_here with your actual Runpod API key:
Step 5: Start the local API server
Useflash run to start the API server:
flash run, you’ll see the details of the job’s progress.
Faster testing with auto-provisioning
For development with multiple endpoints, use--auto-provision to deploy all resources before testing:
Step 6: Open the API explorer
Besides starting the API server,flash run also starts an interactive API explorer. Point your web browser at http://localhost:8888/docs to explore the API.
To run endpoint functions in the explorer:
- Expand one of the functions under GPU Workers or CPU Workers.
- Click Try it out and then Execute.
Step 7: Customize your endpoints
To customize your endpoints:- Edit the
@Endpointfunctions in your worker files (lb_worker.py,gpu_worker.py,cpu_worker.py). - Add new worker files for new endpoints.
- Test individual workers by running them as scripts (e.g.,
python gpu_worker.py). - Restart the development server to pick up changes.
Example: Adding a custom GPU endpoint
To add a new GPU endpoint for image generation, create a new worker file or modify an existing one. For deployed apps, each queue-based function needs its own unique endpoint configuration:/run or /runsync routes.
Step 8: Deploy to Runpod
When you’re ready to deploy your app to Runpod, useflash deploy:
- Builds your application into a deployment artifact.
- Uploads it to Runpod’s storage.
- Provisions independent Serverless endpoints for each endpoint configuration.
- Configures service discovery for inter-endpoint communication.
Next steps
- Deploy Flash applications for production use.
- Configure hardware resources for your endpoints.
- Monitor and troubleshoot your endpoints.