And, on the other hand, people shouldn't kid themselves about the ability of Pos...

philipbjorge · on April 11, 2023

We were comfortably supporting millions of jobs per day as a Postgres queue (using select for update skip locked semantics) at a previous role.

Scaled much, much further than I would’ve guessed at the time when I called it a short-term solution :) — now I have much more confidence in Postgres ;)

simplotek · on April 11, 2023

> We were comfortably supporting millions of jobs per day as a Postgres queue (using select for update skip locked semantics) at a previous role.

That's very refreshing to hear. In a previous role I was in a similar situation than yours, but I pushed for RabbitMQ instead of postgres due to scaling concerns, with hypothetical seilings smaller than the ones you faced. My team had to make a call without having hard numbers to support any decision and no time to put together a proof of concept. The design pressures were the simplicity of postgres vs paying for the assurance of getting a working message broker with complexity. In the end I pushed for the most conservative approach and we went with RabbitMQ, because I didn't wanted to be the one having to explain why we had problems getting a RDBMS to act as a message broker when we get a real message broker for free with a docker pull.

I was always left wondering if that was the right call, and apparently it wasn't, because RabbitMQ also put up a fight.

If there were articles out there showcasing case studies of real world applications of implementing message brokers over RDBMS then people like me would have an easier time pushing for saner choices.

caleb-allen · on April 12, 2023

> RabbitMQ also put up a fight.

I'm interested in hearing more about this (making a similar decision right now!). What pains did RabbitMQ give you?

marcosdumay · on April 11, 2023

> showcasing case studies of real world applications of implementing message brokers over RDBMS

You mean "industrial scale RDBMS" that you can license for thousands of dollars? No, you can't really implement message brokers on those.

You will never see those showcase articles. Nobody paying wants them.

jholman · on April 12, 2023

No, industrial scale RDBMSes like PostgreSQL, that you can license for free. Obviously?

marcosdumay · on April 12, 2023

Those don't have money to fund studies about industry best practices. So you don't get many.

Almost everything you see on how to use a DBMS is an amateur blog or one of those studies. One of those is usually dismissed on any organization with more than one layer of management.

simplotek · on April 12, 2023

> Those don't have money to fund studies about industry best practices. So you don't get many.

Your comment reads like a strawman. I didn't needed "studies". It was good enough if there was a guy with a blog saying "I used postgres as a message broker like this and I got these numbers", and they had a gitlab project page providing the public with the setup and benchmark code.

mhink · on April 11, 2023

Just out of curiosity (as someone who hasn't done a lot of this kind of operational stuff) how does this approach to queueing with Postgres degrade as scale increases? Is it just that your job throughput starts to hit a ceiling?

nightpool · on April 11, 2023

Throughput is less of an issue then queue size—Postgres can handle a truly incredible amount of throughput as long as the jobs table is small enough that it can safely remain in memory for every operation. We can handle 800k jobs/hr with postgres, but if you have more than 5k or 10k jobs in the table at any given time, you're in dangerous territory. It's a different way of thinking about queue design then some other systems, but it's definitely worth it if you're interested in the benefits Postgres can bring (atomicity, reliability, etc)

aetherson · on April 11, 2023

With Postgres, you also need to worry a lot about tombstoning and your ability to keep up with the vacuums necessary to deal with highly mutable data. This can depend a lot on what else is going on with the database and whether you have more than one index on the table.

Rapzid · on April 12, 2023

One strategy for mitigating vacuum costs would be to adopt an append-only strategy and partition the table. Then you can just drop partitions and avoid the vacuum costs.

Really depends on the needs but this can unlock some very impressive and sustainable throughputs.

fjni · on April 11, 2023

This! Most haven't tried. It goes incredibly far.

jbverschoor · on April 11, 2023

Because all popular articles are about multi million tps at bigtech scale, and everybody thinks they're big tech somehow.

int_19h · on April 11, 2023

That's the original problem, but then there are the secondary effects. Some of the people who made decision on that basis write blog posts about what they did, and then those blog posts end up on StackOverflow etc, and eventually it just becomes "this is how we do it by default" orthodoxy without much conscious reasoning involved - it's just a safe bet to do what works for everybody else even if it's not optimal.

dymk · on April 11, 2023

My hobby project does ~1.5M jobs per day enqueued into Postgres, no sweat. I use https://github.com/bensheldon/good_job which uses PG's LISTEN/NOTIFY to lower worker poll latency.

avinassh · on April 11, 2023

> which uses PG's LISTEN/NOTIFY to lower worker poll latency

Can you elaborate on how do you do this?

sideofbacon · on April 11, 2023

I'm the GoodJob author. Here's the class that is responsible for implementing Postgres's LISTEN/NOTIFY functionality in GoodJob:

https://github.com/bensheldon/good_job/blob/10e9d9b714a668dc...

That's heavily inspired by Rail's Action Cable (websockets) Adapter for Postgres, which is a bit simpler and easier to understand:

https://github.com/rails/rails/blob/be287ac0d5000e667510faba...

Briefly, it spins up a background thread with a dedicated database connection and makes a blocking Postgres LISTEN query until results are returned, and then it forwards the result to other subscribing objects.

comboy · on April 11, 2023

excited claps

Mavvie · on April 11, 2023

I can't speak for how they do it, but when your worker polls the table and finds no rows, you will sleep. While sleeping, you can also LISTEN on a channel (and if you get a message, you abort your sleep).

Then, whenever you write a new job to the queue, you also do a NOTIFY on the same channel.

This lets you keep latency low while still polling relatively infrequently.

NOTIFY is actually transactional which makes this approach even better (the LISTENer won't be notified until the NOTIFY transaction commits)

fweimer · on April 11, 2023

These are somewhat obscure PostgreSQL SQL commands:

https://www.postgresql.org/docs/15/sql-listen.html

https://www.postgresql.org/docs/15/sql-notify.html

I think they have been around for ages, but handling the LISTEN responses may need special client library support.

lelanthran · on April 11, 2023

I'm not kidding myself, postgresql easily handles 10s of thousands of queries per second.

No problem with millions of enqueue+dequeue per day.

A table for a queue is also going to be so tiny that postgresql might even outdo my own expectations.

shakow · on April 11, 2023

A few millions a days is a few dozens per second; we currently have a service running this order of magnitude of jobs with a SELECT/SKIP LOCKED pattern and no issue at all on a medium AWS box.

vidarh · on April 11, 2023

Have done millions on nearly a decade old hardware. A million is <12 a second, and that's trivial.

mjevans · on April 11, 2023

In other SQL databases an 'in memory' table could be a candidate. It looks like Postgres only has session specific temporary tables, but does have an UNLOGGED https://www.postgresql.org/docs/13/sql-createtable.html table type which has desirable properties for temporary data that must be shared.

cryptonector · on April 11, 2023

PG really needs GLOBAL TEMP tables...

mjevans · on April 11, 2023

The properties of the UNLOGGED table suggest it fills this niche already.

cryptonector · on April 12, 2023

It's not the same. You can emulate GLOBAL TEMP tables with UNLOGGED, but native GLOBAL TEMP would be better.

eshnil · on April 11, 2023

There's an extension for this: https://github.com/darold/pgtt

baq · on April 11, 2023

a well-tuned bare metal box in a master-slave config should easily handle (being conservative here) 10k/s... I assume a purpose-built box could handle 100k/s without breaking a sweat

pritambaral · on April 11, 2023

I've used Postgres to handle 60M jobs per second (using FOR UPDATE SKIP LOCKED) in production, for two years, on a single dual core 8GB GCE VM. Postgres goes far.

qskousen · on April 11, 2023

Either you meant "per day" or you've got a really well tuned database.

pritambaral · on April 12, 2023

Ah, my bad, I mistyped. It was 60k. I should have known when I typed out my original comment, since the design goal was only around 200M per day.