>I'm not even sure that it's possible to write a correct lock-free queue (of which this is a variant) in a non-garbage collected language.
I'm guessing because of the "magical free that doesn't lock" assumption in several papers?
There are some ideas in the Rust community (mostly centered around audio), most of them use garbage collection schemes that prevent either the producer or consumer thread from locking. It's tricky and easy to break.
Yeah, that's pretty much exactly it. Every "lock-free queue without garbage collection" I've seen is either broken, relying on hidden locks (most commonly in free), or implementing garbage collection while calling it something else.
I'm biased because I've worked on these things, but imo implementing garbage collection for a very limited set of data structures that are all deterministic or explicit in their deallocation is much easier to deal with than a general purpose GC for an entire program. It allows you to guarantee lock/wait free behavior on particular threads, which you don't get with a GC'd language as a whole (depending on the GC of course).
Well, then your mind should be blown as the lock free queue provided in this article works without either locks or garbage collection.
The trick here is that this queue has a fixed capacity so it can allocate everything upfront. You are right that infinite capacity queues can be a bit tricky to implement without violating your conditions. (In many cases you don't actually need an infinite capacity queue though ...)
No, the trick here is that they limit the queue to a single producer thread and single consumer thread. With only two threads operating on the queue, it's extremely unlikely that this is faster than a locked version of the same structure.
As long as it is using sequentially consistent writes is not going to be faster than a two lock queue, as, at least on x86, an uncontended sc write is exactly as expensive an uncontended spin lock acquire (and a spin lock release is very cheap).
On the other hand, a SPSC ring buffer can be implemented purely with acquire loads and release stores, and if one takes care of avoiding reading recently written cachelines when not necessary it can have higher throughput than a spin-locked queue.
I'm guessing because of the "magical free that doesn't lock" assumption in several papers?
There are some ideas in the Rust community (mostly centered around audio), most of them use garbage collection schemes that prevent either the producer or consumer thread from locking. It's tricky and easy to break.