Maybe I'm not understanding what is going on here but it appears the author's client is communicating over a socket on the same machine as the server. The author is seeing insanely high numbers because she/he is bypassing the entire TCP/IP stack.
I believe this test was run locally for simplicity (and so people can reproduce it easily). As you've pointed out, that's a pretty artificial context and can obviously have a big absolute effect on the numbers.
I will say that in my limited experience these numbers are proportionally representative of what you can get in a real-world environment.
Even testing locally, getting concurrency numbers like this is tough or impossible with many (most?) servers.
So, yes, it's an artificial test - but I think the results are interesting when taken in the correct context.
While true, I'm suspect given this setup that some layer isn't just dropping to a unix socket when it notices localhost.
Most notably, 600k active real TCP connections in the kernel would use somewhere around 6GB of memory assuming an average of ~10kB of memory per socket for the R/W buffers and other data. EDIT: Thats just the server side, double to include the client sockets.
Most attempts I've see at this number of real TCP connections required a lot more tweaking of kernel TCP settings to achieve.
For large number (1M+) of connections (doing websockets or long polling) to replicate the functionality of a COMET server I've been playing with rolling my own TCP handling via libnetfilter_queue as I simple don't need the ~10kB of r/w buffers on each socket.
Linux (as far as I'm aware) doesn't allow tuning of r/w buffer sizes on a per interface basis otherwise I'd have one interface for the COMET server with drastically reduced r/w buffer sizes and the remaining interfaces with 'normal' TCP r/w buffer sizes to ensure the other things running on that host run without problem.
It bypasses checksumming, but no not the whole stack. Although not sure in this case it bypasses checksumming as the interfaces were created as aliases on a physical interface, which may mean it does do checksumming still.
Many OS's kernels use their local domain socket codepath/interface for 127.0.0.1 <-> 127.0.0.1 communications, completely bypassing TCP/IP... This is transparent to the application.