> it shouldn’t be the same as the production database Why is that?

gregors · 2026-01-14T16:03:38 1768406618

Here's an example from the circleci incident

https://status.circleci.com/incidents/hr0mm9xmm3x6

and a good analysis by a flicker engineer who ran into similar issues

https://blog.mihasya.com/2015/07/19/thoughts-evoked-by-circl...

davidw · 2026-01-14T16:58:40 1768409920

CircleCI and Flickr are both pretty big systems. There are tons of businesses that will never operate at that scale.

gregors · 2026-01-14T17:50:51 1768413051

I don't disagree with that call out. However, we've been through these discussions many times over the years. The solid queue of yesteryear was delayed_job which was originally created by Shopify's CEO.

https://github.com/tobi/delayed_job

Shopify however grew (as many others) and we saw a host of blog posts and talks about moving away from DB queues to Redis, RabbitMQ, Kafka etc. We saw posts about moving from Resque to SideKiq etc. All this to day storing a task queue in the db has always been the naive approach. Engineers absolutely shouldn't be shocked that approach isn't viable at higher workloads.

zarzavat · 2026-01-14T14:21:36 1768400496

If you need to restore the production database do you also want to restore the task database?

If your task is to send an email, do you want to send it again? Probably not.

stavros · 2026-01-14T14:30:18 1768401018

It's not like I'll get a choice between the task database going down and not going down. If my task database goes down, I'm either losing jobs or duplicating jobs, and I have to pick which one I want. Whether the downtime is at the same time as the production database or not is irrelevant.

In fact, I'd rather it did happen at the same time as production, so I don't have to reconcile a bunch of data on top of the tasks.

zarzavat · 2026-01-16T10:49:13 1768560553

Right, I was referring to logical databases rather than the database server itself.

stavros · 2026-01-16T11:05:47 1768561547

But even for the logical databases, if I want to revert to an earlier state of the database, why wouldn't I want the tasks as well? If I have a bunch of update tasks in flight at that point, wouldn't I want them to actually run? They are a part of the overall state of the system.