Ask HN: What's your go-to message queue in 2025?
The space is confusing to say the least.
Message queues are usually a core part of any distributed architecture, and the options are endless: Kafka, RabbitMQ, NATS, Redis Streams, SQS, ZeroMQ... and then there's the “just use Postgres” camp for simpler use cases.
I’m trying to make sense of the tradeoffs between:
- async fire-and-forget pub/sub vs. sync RPC-like point to point communication
- simple FIFO vs. priority queues and delay queues
- intelligent brokers (e.g. RabbitMQ, NATS with filters) vs. minimal brokers (e.g. Kafka’s client-driven model)
There's also a fair amount of ideology/emotional attachment - some folks root for underdogs written in their favorite programming language, others reflexively dismiss anything that's not "enterprise-grade". And of course, vendors are always in the mix trying to steer the conversation toward their own solution.
If you’ve built a production system in the last few years:
1. What queue did you choose?
2. What didn't work out?
3. Where did you regret adding complexity?
4. And if you stuck with a DB-based queue — did it scale?
I’d love to hear war stories, regrets, and opinions.
I played with most message queues and I go with RabbitMQ in production.
Mostly because it has been very reliable for years in production at a previous company, and doesn’t require babysitting. Its recent versions also has new features that make it is a descent alternative to Kafka if you don’t need to scale to the moon.
And the logo is a rabbit.
Datadog too. i often wonder how come more companies dont pick cute mascots. gives a logo, makes everyone have warm fuzzies immediately, creates pun opportunities.
inb4 "oh but you wont be taken seriously" well... datadog.
Kafka is fairly different from the rest of these — it’s persistent and designed for high read throughput to multiple simultaneous clients at the same time, as some other commenters have pointed out.
We wanted replayability and multiple clients on the same topic, so we evaluated Kafka, but we determined it was too operationally complex for our needs. Persistence was also unnecessary as the data stream already had a separate archiving system and existing clients only needed about 24hr max of context. AWS Kinesis ended up being simpler for our needs and I have nothing but good things to say about it for the most part. Streaming client support in Elixir was not as good as Kafka but writing our own adapter wasn’t too hard.
NATS.io because I'm using Go, and I can just embed it for one server [0], one binary to deploy with Systemd, but able to split it out when scaling the MVP.
[0] https://www.inkmi.com/blog/how-i-made-inkmi-selfhealing
Been on Kafka (MSK) for a couple of years. I find the programming model and getting everything perfectly set up to be sitting behind a steep learning curve, to my surprise. For example, at some point I had a timestamp header but only very much later realised that it all ends up as number[] on the consumer side. So I lost data. My fault, but still. I came to the realisation that the programming model especially in MSK is rather unintuitive.
I found it hard to shift mentally from MSK and its even triggers back to regular consumer spun up in containers etc. but that also it rather MSK than Kafka.
I am currently swapping out the whole pub/sub layer to MongoDB change streams, which I have found to be working really well. For queuing it attempts to lock on read so I can scale consumers with retry / backoff etc. Broadcast is simple and without locking, auto delete in Mongo.
I will have to see how it really scales and I'm sure I'm trading one problem for another but, it will definitely help to remove a moving part. Overall, app is rather low volume with the occasional spike. I would have stayed with Kafka were there be let's say >100rpm on the core functions.
SQS is great if you're already on AWS - it works and gets out of your way.
Kafka is a great tool with lots of very useful properties (not just queues, it can be your primary datastore), but it's not operationally simple. If you're going to use it you should fully commit to building your whole system on it and accept that you will need to invest in ops at least a little. It's not a good fit for a "side" feature on the edge of your system.
Postgres. Doing ~ 70k messages/second average. Nothing huge but don’t need anything dedicated yet.
Curious what kind of hardware you're using for that 70K/s?
It’s an r8g instance in aws. Can’t remember the size but I think it’s over provisioned because it’s at like 20% utilisation and only spikes to 50.
I'm curious on how people use Postgres as a message queue. Do you rely on libraries or do you run a custom implementation?
You can go an awfully long way with just SELECT … FOR UPDATE … SKIP LOCKED
We also use Postgres but we don't have many jobs. It's usually 10-20 squedule that creates hourly-monthly jobs and they are mostly independent. Currently a custom made solution but we are going to update it to use skip locked and use Notify/Listen + interval to handle jobs. There is a really good video about it on YouTube called: "Queues in PostgreSQL Citus Con."
pgmq https://github.com/pgmq/pgmq
Just select for update skipped locked. Table is partitioned to keep unprocessed small.
I got tired of the pricing and/or complexity of running message queues/event brokers, so decided to play around with implementing my own. It utilizes S3 as the source of truth, which makes it orders of magnitude easier to manage and cheaper to run. There's an ongoing blog series on the implementation: https://github.com/micvbang/simple-event-broker
I am using Beanstalkd, it is small and fast and you just apt-get it on Debian.
However, I have noticed that oftentimes devs are using queues where Workflow Engines would be a better fit.
If your message processing time is in tens of seconds – talk to your local Workflow Engine professional (:
A classic. Not something I personally use these days, but I think just as a piece of software it is an eternally good example of something simple, powerful, well-engineered, pleasant to use, and widely-compatible, all at the same time
In that case, any suggestions if the answer was looking for workflow engines? Ideally something that will work for no-person-in-the-middle workloads in the tens of seconds range as well as person-making-a-decision workflows that can live for anywhere between minutes and months?
I would highlight a distinction between Queues and Streams, as I think this is an important factor in making this choice.
In the case of a queue, you put an item in the queue, and then something removes it later. There is a single flow of items. They are put in. They are taken out.
In the case of a stream, you put an item in the queue, then it can be removed multiple times by any other process that cares to do so. This may be called 'fan out'.
This is an important distinction and really effects how one designs software that uses these systems. Queues work just fine for, say, background jobs. A user signs up, and you put a task in the 'send_registration_email' queue.[1]
However, what if some _other_ system then cares about user sign ups? Well, you have to add another queue, and the user sign-up code needs to be aware of it. For example, a 'add_user_to_crm' queue.
The result here is that choosing a queue early on leads to a tight-coupling of services down the road.
The alternative is to choose streams. In this case, instead of saying what _should_ happen, you say what _did_ happen (past tense). Here you replace 'send_registration_email' and 'add_user_to_crm' with a single stream called 'used_registered'. Each service that cares about this fact is then free to subscribe to that steam and get its own copy of the events (it does so via a 'consumer group', or something of a similar name).
This results in a more loosely coupled system, where you potentially also have access to an event history should you need it (if you configure your broker to keep the events around).
--
This is where Postgresql and SQS tend to fall down. I've yet to hear of an implementation of streams in Postgresql[2]. And SQS is inherently a queue.
I therefore normally reach for Redis Steams, but mostly because it is what I am familiar with.
Note: This line of thinking leads into Domain Driven Design, CQRS, and Event Sourcing. Each of which is interesting and certainly has useful things to offer, although I would advise against simply consuming any of them wholesale.
[1] Although this is my go-to example, I'm actually unconvinced that email sending should be done via a queue. Email is just a sequence of queues anyway.
[2] If you know of one please tell me!
What makes Postgres (or any decent relational DB) fall down in this case?
Sidekiq, Sidekiq, Sidekiq (or just Postgres if Im dealing with something trivial)
I have so far gotten by plenty well writing my own queue systems to fit the needs of the consuming application. Normally the only place where I need queue systems is in distributed systems with rapid fire transmissions to ensure messages hit the network in time sequence order. The additional benefit is that network traffic is saved in order when the current network socket fails so that nothing is lost but time.
The US Federal Reserve uses IBM MQ for the FedNow interbank settlement service that went live last year.
Architecture info: https://explore.fednow.org/resources/technical-overview-guid...
Likely implies z/OS is common on both sides. Given the stakes and availability needs not a bad choice.
For large applications in a service-oriented architecture, I leverage Kafka 100% of the time. With Confluent Cloud and Amazon MSK, infra is relatively trivial to maintain. There's really no reason to use anything else for this.
For smaller projects of "job queues," I tend to use Amazon SQS or RabbitMQ.
But just for clarity, Kafka is not really a message queue -- it's a persistent structured log that can be used as a message queue. More specifically, you can replay messages by resetting the offset. In a queue, the idea is once you pop an item off the queue, it's no longer in the queue and therefore is gone once it's consumed, but with Kafka, you're leaving the message where it is and moving an offset instead. This means, for example, that you can have many many clients read from the same topic without issue.
SQS and other MQs don't have that persistence -- once you consume the message and ack, the message disappears and you can't "replay it" via the queue system. You have to re-submit the message to process it. This means you can really only have one client per topic, because once the message is consumed, it's no longer available to anyone else.
There are pros and cons to either mechanism, and there's significant overlap in the usage of the two systems, but they are designed to serve different purposes.
The analogy I tend to use is that Kafka is like reading a book. You read a page, you turn the page. But if you get confused, you can flip back and reread a previous page. An MQ like RabbitMQ or Sidekiq is more like the line at the grocery store: once the customer pays, they walk out and they're gone. You can't go back and re-process their cart.
Again, pros and cons to both approaches.
"What didn't work out?" -- I've learned in my career that, in general, I really like replayability, so Kafka is typically my first choice, unless I know that re-creating the messages are trivial, in which case I am more inclined to lean toward RabbitMQ or SQS. I've been bitten several times by MQs where I can't easily recreate the queue, and I lose critical messages.
"Where did you regret adding complexity?" -- Again, smaller systems that are just "job queues" (versus service-to-service async communication) don't need a whole lot of complexity. So I've learned that if it's a small system, go with an MQ first (any of them are fine), and go with Kafka only if you start scaling beyond a single simple system.
"And if you stuck with a DB-based queue -- did it scale?" -- I've done this in the past. It scales until it doesn't. Given my experience with MQs and Kafka, I feel it's a trivial amount of work to set up an MQ/Kafka, and I don't get anything extra by using a DB-based queue. I personally would avoid these, unless you have a compelling reason to use it (eg, your DB isn't huge, and you can save money).
> This means you can really only have one client per topic, because once the message is consumed, it's no longer available to anyone else.
It depends on your use case (or maybe what you mean by "client"). If I just have a bunch of messages that need to be processed by "some" client, then having the message disappear once a client has processed it is exactly what you want.
We build applications very differently. SQS queues with 1000s of clients have been a go to for me for over a decade. And the opposite as well — 1000s of queues (one per client device, they’re free). Zero maintenance, zero cost when unused. Absurd scalability.
No one ever seems to use it, but for AMPQ I like Beanstalkd. It’s fast, stable and has not failed me under high RPS.
This is my go-to solution as well. It is great, but utilizes just one CPU core. But if this the problem, then your business is already booming.
Another option to consider: Cloudflare Workers. They have a simple queue but you'll need to patch it with a Worker anyway. This means you can programatically manage the queue through the worker and also it makes it easy to send/receive HTTP requests.
For my extremely specialized case, I use a SQLite database as a message queue. It absolutely wouldn't scale, but it doesn't need to. It works extremely well for what I need it to do.
Have you written up about it? I'd love to read it if so. Thought of using SQLite several times like this but never mustered the courage to try.
I use SQLite as an offline buffer for telemetry data, basically one thread does INSERT of the payloads and another thread does just SELECT and then DELETE when it has successfully transmitted the payload.
I prefer pulsar. Elegant modular design and fully open source ecosystem.
Performance is at least as good as Kafka.
For simpler workload, beanstalkd could be a good fit, either.
Kafka for communication between microservices, and MQTT (VerneMQ) for IOT devices
What are your thoughts on Apache Pulsar vs Kafka?
I'm hesitating with EMQx, have you tried it? why did you choose VerneMQ?
SQS. For Ruby apps I use Shoryuken with SQS.
UUCP
People will call me crazy but why not SMTP for message queueing?
Maybe start by explaining what you want to use it for?
using zeebe/Camunda at work. The system gives you a way of designing and partitioning message-based workflows. It has a very thorough design.
We had a lot of reliability isdues with zeebe/camunda (granted we started using it at version 0.10), and now they also rugulled the free version. So I would never go near that company again.
reliability is much better now, as far as i can tell.
Solid queue in rails
Surprised no body is mentioning ActiveMQ!
Does Google Cloud Tasks count?
SQS
What do people think of Supabase?
It's not a message queue?
I've used Qless for several years;
For those unfamiliar, it's a Lua library that gets executed in Redis using one of the various language bindings (which are essentially wrappers around calling the Lua methods).
With our multi-node redis setup it seems to be quite reliable.
Pulsar
NATS
I use NATS too! It has worked very well for me, using it to collect data from IoT devices. I don't really like all the other bits they tacked on like jetstream and object store, that seems beyond its scope. Subject authorization is also painful to implement. But runtime behaviour has been flawless for me.
Do you have any links explaining the subject authorization? I have recommended NATS for a project that got scrapped.
Docs: https://docs.nats.io/running-a-nats-service/configuration/se...
Example: https://natsbyexample.com/examples/auth/callout/java
[dead]
A cron job did thd work.