Inside CERN's multi-megawatt data center

walrus01 · on Feb 10, 2017

It is interesting to see that they apparently built this as a traditional raised floor environment without hot/cold aisle separation. Looks like something from 20 years ago. Building datacenters on the multi megawatt scale (example: 10-12kW thermal per 44U cabinet, equivalent to two 208V 30A circuits per cab), it's far more efficient to put everything straight on concrete slab, build hot and cold aisle separation and do fiber and power trays entirely overhead.

If I had to guess the PM on this was somebody who'd built traditional supercomputer datacenter environments 15-20 years ago for some entries in the top 50 of the top500 list, and scaled up what they knew how to build.

edit: Looking at the photos in more detail it looks like they're retrofitted hot/cold separation into a traditional 20-year-ago style raised floor environment. The fact that there's zero overhead ladder racks carrying AC power (or -48VDC cabling) and zero overhead fiber trays means that everything is under the floor... Very costly and labor intensive compared to modern methods of building a datacenter. Also means that it's not designed to be changed or modified very often, if ever, and it's a huge forklift upgrade if the power and fiber layer 1 topology ever needs to change.

https://home.cern/sites/home.web.cern.ch/files/image/about_s...

PaulHoule · on Feb 10, 2017

They built it in 2002, which was almost 15 years ago.

Angostura · on Feb 11, 2017

And probably designed it quite a few years before that.

arca_vorago · on Feb 11, 2017

Do you have any good links or reading material on datacenter design where you get your claims from?

Datacenter design is something that has fascinated me for a long time, ever since I pretty much solo-rescued a teir2 that had been in disrepair (had a cray in the corner...), but I have often though about trying to do one from the ground up with new techniques. I mostly just tried to get tours from l3, and other vendors I worked with to get an idea, and keeping up with googles published stuff, but I find a glaring lack of good citation for claims on isle design, temp/humidity control method standards, etc.

Stratoscope · on Feb 11, 2017

> hot/cold aisle separation

Since I didn't know what that meant and don't know much about datacenter physical design in general, this made for some interesting reading:

https://www.google.com/search?q=datacenter+hot+cold+aisle

The image search was also helpful:

https://www.google.com/search?q=datacenter+hot+cold+aisle&tb...

Create · on Feb 11, 2017

They have long hit the limits of that building, they rent space in a DC across the lake, but salaries being too expensive, they built one in Eastern Europe. They probably regret that by now, moreover the PM of the host country is openly hostile to international orgs. Btw cern is recruiting disposable temp technicians through AstroTurfing. Even looking for a full stack dev! See other post. Their pr and tv studio(!) is bigger than some experiments.

Now that a grain of salt has been handed over, go ahead and downvote as usual.

sctb · on Feb 11, 2017

Please don't throw around baseless accusations of astroturfing, and don't complain (preemptively or not) about downvotes. Neither of these things contribute towards the kind of discussion that we're trying to have here.

milesward · on Feb 10, 2017

"Some 6000 changes in the database are performed every second."

Things have changed a lot in 15 years!

https://cloud.google.com/bigtable/docs/performance 10k QPS per node...

https://cloud.google.com/bigtable/pdf/FISConsolidatedAuditTr... "2.7 Million FIX messages processed and inserted per second"

Disclosure: I work at Google Cloud.

6nf · on Feb 11, 2017

Yea they talk about using 30 petabytes of data per year. Backblaze can fit that in half a dozen racks these days. I can't even imagine how much Google can squeeze into a modern data center. Storage sure moves quickly!

VodkaHaze · on Feb 11, 2017

I wonder when the fixed costs of upgrading will be less than the operating costs of inefficient 15 year old hardware

Create · on Feb 11, 2017

Hw is regularly replaced, old kit being thrown at poor nations (obviously they’ll foot the energy bills). Its a regular intel shop, mostly with dell stickers. Truly run-of-the-mill current cots, what you would expect anywhere else.

josh2600 · on Feb 11, 2017

I think 6000 was a lot back in 2002... Am I wrong?

simonebrunozzi · on Feb 11, 2017

Hey Miles!!! Good to see you here on HN! :)

discodave · on Feb 11, 2017

> The Grid runs more than two million jobs per day. At peak rates, 10 gigabytes of data may be transferred from its servers every second.

So that would be 80 Gigabits per second, or the networking capacity of 4 hosts with 20Gbit connections (but let's say you had 3x4=12 hosts to be safe).

On AWS you can get m4.16xlarge, p2.16large, x1.32xlarge or r4.16xlarge hosts with 20Gbit networking.

Those hosts can be had for 10-50k per year. So that works out to be 120-600k dollars per year.. which doesn't seem like that much in the scheme of things. 120k is less than one developer in the bay area!

Of course there would be other costs like networking and storage, but overall it seems like they could save $$$ in the cloud.

brutos · on Feb 11, 2017

That is just the interconnect. Grid jobs are usually very compute heavy: "[...] includes the majority of the 100,000 processing cores in the CE." That is the expensive part. We bought p2.16xlarge equivalent nodes for just about 1/20 of the "3 Yr All Upfront Reserved" cost of EC2.

Depending on the utilization (i.e. somewhat low utilization) you could save some money in the cloud if you compare it to the usual 5 years depreciation cycle.

But for scientific computing you usually still use your computing nodes longer than the 5 years of depreciation/support contract (often until they are close to fail).

Another reason why cloud is not very attractive is with how budgeting rules work in those old institutions. You propose a cluster and get the money, all calculated from the start. On-demand cloud pricing will be very hard to get approval for.

> [...] other costs like networking and storage [...] I am currently using the equivalent 20k dollar in yearly EBS costs on our scientific computing cluster. You can buy a lot of storage for that. And the internal IT department that exists anyway will manage it.

Cloud can be a very hard sell to scientist.

Edit (a point I forgot): Cloud only makes sense if you can use spot market capacity extensively, then it can become very competitive with local installations (again depending on required utilization).

captainmuon · on Feb 11, 2017

I don't know the exact numbers, but it is regularly evaluated, and they found that it won't save money moving to the cloud (yet).

Apparently, grid jobs have different IO/CPU/Memory characteristics from typical cloud applications. My jobs tend to use a lot of CPU and bandwidth, but are mostly IO bound. A friend did a very, very CPU intensive analysis for his PhD, and they estimated that it would cost $30 million to run it on AWS. I'm not sure where they got that number from, so take it with a grain of salt, but even if they are an order of magnitude off, it is still prohibitive.

Another issue we are facing is RAM consumption. Many scientists are not trained programmers, so there are a lot of memory leaks. It didn't use to matter, an analysis program ran only a few hours and was single-threaded anyway. Now we are moving to using multi-threading, and we have been using multi-processing anyway... And as I understand it, in modern hardware the RAM/CPU ratio is getting lower and lower. If your job or thread needs 8 GB RAM, you can't run many of them on a 32 core CPU...

So yeah, I think the main issue is wierd resource usage patterns. Not that it would be impossible. Grid computing is basically just a weird parallelly evolved version of cloud computing, after all.

Coding_Cat · on Feb 11, 2017

For the Alice experiment most jobs are IO limited. Although throughput should be improved by run 3 via backend changes (I hope so at least, currently doing my MSc. thesis on exactly that...).

Simulations are compute heavy, but analysis tasks vary, with the speed of a modern processor future analysis is still likely to be IO/bandwidth limited. The ratio of RAM to core on the test server I have access to is 128/20 so 6.4 GiB per core.

But for (future) analysis the plan is to use, or at least try to use, shared memory and grouping users tasks together based on the required data (They speak of an analysis train with wagons). So a large part of the ram requirements, namely the backing data, should be shared among the cores. I think ideally there will be a top-level scheduler for each node which tries to minimize the required bandwidth (but that is outside the scope of my work so who knows how it's implemented). With a few "best practices" it should be possible for most analysis to consume a 'reasonable' amount of memory at any given time in that case.

yigitdemirag · on Feb 11, 2017

These pictures are taken on the same floor, but just one floor below in this building there is another computer grid, similar to this one but a bit smaller. At that grid, there are, if I remember correctly, more advanced (Intel's xeon, AVX512 etc.) CPU architectures are operating. In the very same floor there are also robotic arms that access offline/long stored hard-drives per user request, which always seems pretty amazing to me. (Disclaimer: I was intern at CERN.)

batbomb · on Feb 11, 2017

Those aren't hard drives, it's HPSS (more specifically, robotic tape library)

spaceboy · on Feb 11, 2017

What a curious TLD:

    .cern

I want one of those

Coding_Cat · on Feb 11, 2017

Go work for CERN, become a member of the personel and then meet these strict requirements: http://nic.cern/registration-policy/ , shouldn't be too hard ;)

Shame they don't let users have a homepage USERNAME.cern like how universities give you a homepage usually at uni.edu/USER/~/index.htlm (or something).

peter303 · on Feb 11, 2017

From the people who invented the World Wide Web given away free to the world.

bouvin · on Feb 11, 2017

Awesome. I was a summer student there in 1993, and had the opportunity to visit their data center at the time. What impressed me most was the huge tape robot in the basement.

yigitdemirag · on Feb 11, 2017

It was still there :) I was also summer student at 2014, how far summer studentship goes back I wonder..

peter303 · on Feb 11, 2017

Each collision event sends impinges on thousands of various sensors which record energy, charge, and geometry. A tiny fraction of collisions are deemed potentially interesting and recorded for future analysis. Later a physicist could propose a particle model and search the records for candidates.

rodionos · on Feb 10, 2017

Lots of Sun machines, including in the database racks. Oracle I assume.

pjmlp · on Feb 11, 2017

When I interned there my room had a pile of dead Sun workstations, maybe something like Sun-3, I don't remember the exact model.

quakeguy · on Feb 11, 2017

Stunning amounts i can't wrap my head around!