I'm not counting Craigslist, though its infrastructure is documented and open source, because you can't just download the engine and run it yourself without a lot of glue. Likewise Pirate Bay. There are probably more marginal cases like that among the top.
And there are open data projects like Stack Exchange and Stack Overflow that essentially publish all their content on open source licenses and offer free download dumps. Both are top 100 sites.
Then there are the news sites available without paywall. Those are just basic blogs with photos, video, and text accessible to the public. They aren't open source; you have to pay to republish their content. But there is essentially no special software you can't match with any open source CMS at CNN or the Washington Post or HuffPo or many, many other top sites.
I'd say the solid majority of the top 100 sites don't have any special closed source software, except for the proprietary search engine projects: Google, Bing, Yahoo, their respective mail and document suite projects, and the like.
And there are open data projects like Stack Exchange and Stack Overflow that essentially publish all their content on open source licenses and offer free download dumps. Both are top 100 sites.
Then there are the news sites available without paywall. Those are just basic blogs with photos, video, and text accessible to the public. They aren't open source; you have to pay to republish their content. But there is essentially no special software you can't match with any open source CMS at CNN or the Washington Post or HuffPo or many, many other top sites.
I'd say the solid majority of the top 100 sites don't have any special closed source software, except for the proprietary search engine projects: Google, Bing, Yahoo, their respective mail and document suite projects, and the like.