For CPU intensive operations, the JVM is much better than Erlang or Python and w...

cmrdporcupine · on June 1, 2015

I fell into doing RTB (buyer side) with the JVM (Java/Scala) many moons ago, and this was a mistake. The GC isn't cooperative enough to meet the expected latencies in the 95th percentile of requests and above. There were erratic pauses, even after extensive GC tuning. I wouldn't do it again.

After coming to Google and working on the other side of the RTB process for a while.. well, I just wouldn't do it in anything other than something like C++ or Rust -- a systems level language where I can fine tune everything and not be interrupted by a GC. I just wouldn't mess around -- not only is it important to have low latencies, it's important to have _consistently_ low latencies.

kasey_junk · on June 1, 2015

I think that using a manual memory language is fine if that is the engineering decision you want to make.

I'd point out that people can and do use GC'd languages in environments that are dramatically more latency sensitive than RTB. For that matter, in latency sensitive applications allocation/deallocation after startup is a no-no, so GC tends to not be the major reason not to use the JVM (memory layout/unfettered access to system calls/etc tend to drive that decision).

cmrdporcupine · on June 1, 2015

From my experience, Java doesn't really offer a true realtime friendly non-blocking collector which can offer consistent latencies. There are a million flags you can do set the GC in Java, but nothing is going to stop GC pauses from happening entirely.

95% of the time the latency is acceptable. The problem is that under heavy throughput the latency spikes periodically. I wish I still had some of my old graphs to show.

I just wouldn't do it again, the time spent futzing trying to tune the JVM to avoid this would have been better spent on writing code in a systems level language where allocation is predictable.

I was a big booster of the JVM as a mature platform for a lot of things. But for this purpose... nope.

kasey_junk · on June 1, 2015

Understood.

My point is that most truly latency sensitive applications, in GC'd languages or not, don't do any allocation/deallocation on the hot path. Memory management is just too slow.

So GC twiddling is a red herring in those environments, though using manual memory techniques on the JVM may obviate the major reasons to use it in your particular cases. Lots of teams make the engineering decision the other way and use the JVM in usages that require more consistent latencies than RTB requires.

lostcolony · on June 1, 2015

Well, per the OP, Erlang is pretty nice there as you don't have the GC pauses that you see in most JVMs (Azul being the possible outlier). You get all the benefits of automatic GC, without most of the negatives; GC is generally quick, and it is stop the process (as your others are run on other cores), not stop the world.

cmrdporcupine · on June 1, 2015

Yes I'd definitely consider Erlang for this type of application. Though at the time that I was implementing an RTB bidder (2010/2011 timeframe) Erlang still did not offer actual multicore support, was still quite considered quite exotic (and probably still is) and therefore made pointy-hair types nervous, and my negative experiences with RabbitMQ at the time (very unstable, and not actually that performant) made me leery of using it in production code.

After working on the bidder side of RTB, I then had two jobs where I worked on the other side, _sending_ bid requests, one of which was here at Google. I learned a lot.

lostcolony · on June 1, 2015

Erlang supported multicore well before then. It was enabled by default in R12B and later, which was released in 2008. The other considerations are valid though (well, Rabbit maybe not so much, though as a popular app it's easy to judge an esoteric language by a single app when it's so rare).

saryant · on June 1, 2015

The biggest weakness of Akka (Erlang for the JVM) is the GC. If you aren't really careful about memory usage it's easy to trigger big GC pauses that render your node unreachable by the rest of the cluster.

CPU-intensive operations pose their own problems. EC2 doesn't have amazing context switching so sometimes Akka's internal dispatchers won't be able to get CPU time quickly enough to keep the cluster heartbeats going, also causing your node to be marked unreachable by the rest of the cluster. I imagine bare-metal would be far easier to work with in this regard.

There are certainly ways to deal with these problems by tuning the GC and Akka failure detectors, but it's a serious problem with Akka.