> I think the current ML world will have an overfitting reckoning. It seems like...

> I think the current ML world will have an overfitting reckoning.

It seems like current ML (at least large-language models and transformers) will actually run out of publicly-accessible data to train on - the current models have scraped the majority of the useful text and image data on the public Internet, and it's not clear that we'll gain access to orders of magnitude more data, which according to the Chinchilla paper is the bottleneck on transformer performance: https://news.ycombinator.com/item?id=32321522