Scaling
When your workflows are in production, how do you want your GPUs to scale? This section will help you understand how the settings you choose effect your scaling. As well as the cost and performance trade-offs.
You can change these settings underneath “Auto Scaling” while editing your machine.
Settings
Max Parallel GPU
This determines how many GPUs can be spun up at the same time.
If you need more then 10 GPUs concurrently, contact us at founders@comfydeploy.com
.
Workflow timeout
The maximum amount of time you want a workflow to run for. If a workflow run exceeds this time, the run will be cancelled.
We can increase this to up to 24 hours, contact us at founders@comfydeploy.com
.
Warm time
After your workflow has finished running on a GPU, you have the option to keep it warm for a certain amount of time, to reduce cold starts for your next request.
Warm time is still charged, this is a trade-off between cost and performance.
Keep warm
For the highest performance workloads. You can keep your GPUs warm to reduce cold starts to zero.
Example situations
This are some examples to show what happens with different request patterns with the same settings.
In this example
max parallel gpu
set to 2warm time
set to 1 minutealways warm GPUs
set to 0
Example 1: Basic
We have only 1 request.
r1
comes in, a GPU spins up.r1
finishes.- The GPU is kept warm for 1 minute before spinning down.
Example 2: Taking advantage of warm GPUs
This time we have 2 requests, where the 2nd request uses a warm GPU.
r1
comes in, a GPU spins up.r1
finishes.r2
is beforer1_f + warm time
, so we reuse the same GPUr2
is faster thanr1
because the GPU was warm.- The GPU is kept warm for 1 minute before spinning down.
Example 3: Scaling up and hitting max GPUs
We have 2 requests, and we’ll spin up 2 GPUs.
r1
comes in, a GPU spins up.r2
comes in beforer1
finishes, a new GPU spins up.r1
finishes.r1
GPU spins down after staying warm for 1 min.r2
finishesr2
GPU spins down after staying warm for 1 min.
If we had a 3rd request r3
while our 2 requests were running (between r2
and r1_f
).
The third request would have to wait for one of the GPUs to finish before it can start as we’ve hit our max GPU limit.
r3 starts
.
Was this page helpful?