flash init, you have a working project template with example and . This guide shows you how to customize the template to build your application.
Endpoint types
Flash supports two endpoint types, each suited for different use cases:| Type | Best for | Functions per endpoint |
|---|---|---|
| Queue-based | Long-running GPU tasks | One |
| Load-balanced | Fast HTTP APIs | Multiple (via routes) |
- Queue-based
- Load-balanced
Each Call via
@Endpoint function creates a separate Serverless endpoint:/run or /runsync: https://api.runpod.ai/v2/{endpoint_id}/runsyncAdd load balancing routes
To add routes to an existing load balancing endpoint, use the route decorator pattern:lb_worker.py
lb_worker Serverless endpoint. Each route is accessible at its defined path.
Key points:
- Multiple routes can share one endpoint configuration
- Each route has its own HTTP method and path
- All routes on the same endpoint deploy to one Serverless endpoint
Add queue-based endpoints
To add a new queue-based endpoint, create a new endpoint with a unique name:gpu_worker.py
Modify endpoint configurations
Customize endpoint configurations for each worker function in your app. Each@Endpoint function can have its own GPU type, scaling parameters, and timeouts optimized for its specific workload.
- Configuration parameters for all available options.
- GPU types for selecting hardware.
- Best practices for optimization guidance.
Test your customizations
After customizing your app, test locally withflash run:
- Interactive API documentation at
/docs - Auto-reload on code changes
- Real remote execution on Runpod workers
- All HTTP routes work as expected
- Endpoint functions execute correctly
- Dependencies install properly
- Error handling works
Next steps
Test locally
Use
flash run for local development and testing.Deploy to Runpod
Deploy your application to production with
flash deploy.Configure hardware resources
Complete reference for configuration options.
Create endpoint functions
Learn more about writing and optimizing endpoint functions.