Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> all public github code

In RAM.



Some napkin math - as of late 2013, GitHub's main Elasticsearch cluster had ~128 shards of ~120 GB [1]. Obviously they have more data now, and it's not clear whether the number of shards includes replicas, but at current EC2 on-demand prices[2]:

   RAM: 63  r3.8xlarge @ $2.80/hr = $176.40/hr
   SSD:  5  i2.4xlarge @ $3.41/hr = $ 17.05/hr
   HDD:  1 hs1.8xlarge @ $4.60/hr = $  4.60/hr
[1] http://www.elasticsearch.org/case-study/github/

[2] http://aws.amazon.com/ec2/pricing/


We don't even need to keep ALL of it in memory, we would just need to keep everything for the current language we are developing in.

Maybe one or two others. Say I am working on a Angular/Pyramid application, all I would need is JavaScript and Python loaded.


A hybrid multi-layer cache system would likely be more cost effective and still perform nearly as well as dedicated RAM (depending on how the data is structured/sharded/queried).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: