Every time someone tells me they need a queue, I ask them what they're actually trying to do. About nine times out of ten, the answer is: run a job later, retry it if it fails, and make sure two workers don't run the same one at the same time. Postgres can do that.
The shape of the problem
You have tasks. You want to run them asynchronously, distribute them across N workers, retry them if they fail, and never execute the same task twice concurrently. That's it. Ninety percent of background jobs look exactly like this.
You do not need Kafka. You probably don't need Redis. You almost certainly don't need SQS.
The whole solution
create table job (
id bigserial primary key,
payload jsonb not null,
run_at timestamptz default now(),
started_at timestamptz,
attempts int default 0
);
Then, from each worker:
begin;
select id, payload from job
where started_at is null and run_at <= now()
order by run_at
for update skip locked
limit 1;
-- do the work
update job set started_at = now() where id = $1;
commit;
FOR UPDATE SKIP LOCKED is the whole trick: it hands each worker a different row without making them step on each other. No polling loop explodes under load, no two workers pick the same job, no external dependencies.
Retries and dead letters
Add a failed_at column and an attempts counter. If attempts > 5, flag it. Show the flagged rows on a page somewhere. You now have a DLQ.
You'll know you need a real queue when Postgres tells you — not when a blog post tells you.
What about exactly-once?
If your handler is idempotent — and it should be — then "exactly-once delivery" stops mattering. Make the unit of work safe to re-run. Drop the anxiety. Go home.