Promise.all Is Not a Queue

The mistake is subtle because the code looks like modern JavaScript. It is short, readable, and apparently efficient. Then one day the input is large enough, the dependency is slow enough, or the worker is busy enough, and your optimization becomes a denial-of-service attack against yourself.

There is a line of Node.js code that looks harmless until it is not:

await Promise.all(users.map(sendEmail));

At small scale, it feels perfect.

You have a list of users. You need to send an email to each one. Mapping the list into promises and waiting for all of them is idiomatic JavaScript. It is compact. It is easy to review. It is probably faster than a for...of loop that awaits each email one at a time.

So it ships.

Then the worker gets a real batch.

Not twenty users. Five thousand. Maybe eight thousand. At the same time, the upload pipeline is busy, the database pool is already warm, Redis is doing more than usual, and the email provider is having one of those afternoons where every request is technically working but suspiciously slow.

That is when the symptoms start looking unrelated:

CPU pinned at 100%
memory climbing without coming back down
ECONNRESET
rate-limit responses
random timeouts
Redis congestion
database pool exhaustion
workers restarting because the heap ran out of room

The bug is not that Promise.all is broken.

The bug is that you accidentally asked Node to start everything at once.

`Promise.all` waits. It does not pace.

This distinction matters.

Promise.all is a coordination primitive. It gives you one promise that resolves when all input promises resolve, or rejects when one of them rejects. That is useful.

But it is not a scheduler.

It does not know that your provider allows only 100 requests per second. It does not know that your Postgres pool has 20 connections. It does not know that every call to sendEmail also fetches a template, writes an audit row, increments a Redis counter, and waits on a third-party API.

By the time Promise.all sees the promises, the work has already been started.

This is the part that gets missed in code review:

users.map(sendEmail)

That line is the fan-out. If there are 5,000 users, it creates 5,000 in-flight operations immediately. Promise.all is just the place where you wait for the blast radius to finish.

Sometimes that is fine. If the work is CPU-light, local, small, and bounded, parallelism is exactly what you want.

But if the function touches a network, a database, a queue, Redis, the file system, or a third-party API, unbounded parallelism is not an optimization. It is a bet that every dependency has infinite capacity.

They do not.

The fix is boring: control concurrency

You usually do not need a full job system for this specific problem. You need a limit.

For example, with p-limit:

import pLimit from "p-limit";

const limit = pLimit(10);

await Promise.all(
  users.map((user) => limit(() => sendEmail(user))),
);

That still processes the full list. It still lets work happen concurrently. But only ten calls to sendEmail run at the same time.

The other 4,990 users are not forgotten. They are waiting behind the limiter instead of competing for sockets, memory, database connections, and provider quota all at once.

The number 10 is not magic. It is a capacity decision.

A good starting value depends on what the work does:

If it calls a slow external API, start low.
If it uses a database pool, stay comfortably below the pool size.
If it does mostly local CPU work, remember that Node is not going to make CPU-bound JavaScript magically parallel on one thread.
If the provider publishes rate limits, treat them as a budget, not a suggestion.

The point is not to find the perfect number on the first try. The point is to make the number exist.

Once concurrency is explicit, you can measure it, tune it, expose it as config, and change it during an incident without rewriting the shape of the worker.

Batches are not the same thing

Another common fix is chunking:

for (const batch of chunks(users, 10)) {
  await Promise.all(batch.map(sendEmail));
}

This is much better than launching everything at once. It caps each wave at ten.

But it has different behavior from a limiter.

With batches, the next group waits for the slowest item in the current group. If nine emails finish quickly and one hangs for ten seconds, the worker sits there with unused capacity until the slow one completes.

With a limiter, capacity is reused as soon as a slot frees up. If one email finishes, the next user starts. That usually gives you smoother throughput without allowing the system to explode.

Batches are fine for simple scripts. For long-running workers, I prefer an actual concurrency limiter because it expresses the real rule: no more than N operations in flight.

This is not a queue replacement

A concurrency limit protects the work that is already inside one process.

It does not give you durable retries. It does not persist jobs across restarts. It does not coordinate multiple workers by itself. It does not give you a dead-letter queue, visibility timeout, or backpressure across a fleet.

If you need those things, use the right primitive. That might be a real queue. It might be a Postgres-backed job table. It might be something else entirely.

But do not confuse that decision with the simpler one here.

Even if you have a queue, a worker can still hurt itself by pulling a batch and calling Promise.all on thousands of handlers at once. A queue controls when work is available. A limiter controls how much work this process is doing right now.

You often need both.

The review smell

The smell is any code that maps an unbounded list into async work:

await Promise.all(items.map(async (item) => {
  await doSomethingExpensive(item);
}));

When you see that, ask three questions:

How large can items get?
What does doSomethingExpensive touch?
What is the maximum number of those things we want in flight at once?

If nobody can answer the third question, the code is not optimized. It is uncontrolled.

That does not mean every Promise.all is bad. It is perfectly reasonable for a fixed set of independent operations:

const [profile, settings, permissions] = await Promise.all([
  getProfile(userId),
  getSettings(userId),
  getPermissions(userId),
]);

Three known requests are not the problem.

The problem is turning user input, database rows, files, or queued jobs into unbounded concurrent work because the code looked clean.

Make capacity visible

The safest version is usually boring and explicit:

const emailConcurrency = Number(process.env.EMAIL_CONCURRENCY ?? 10);
const limit = pLimit(emailConcurrency);

await Promise.all(
  users.map((user) =>
    limit(async () => {
      await sendEmail(user);
    }),
  ),
);

Now concurrency is a setting. You can start conservative, watch latency and error rates, and raise it intentionally.

That is the difference between engineering and hoping.

Promise.all is a good way to wait for work.

It is not a good way to decide how much work should exist at the same time.