In finance we say "past performance does not guarantee future returns." Not because we don't believe that, statistically, returns will continue to grow at x rate, but because there is a chance that they won't. The reality bias is actually in favour of these getting better faster, but there is a chance they do not.
this is true because markets are generally efficient. It's very hard to find predictive signals. That is a completely different space than what we're talking about here. Performance is incredibly predictable through scaling laws that continue to hold even at the largest scales we've built
I agree this is a new space and prediction volatility is much higher. We have evidence going back to at least 2019 that improvements have been exponential (https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...). The benchmarks are all over the place because improvements don't happen in a straight line. Even composites aren't that useful because the last 10% improvement can require more effort than the first 90%.
To be frank, from what I can see, even if all progress stopped right now, it would take 1-2 decades to fully operationalise the existing potential of LLMs. There would be massive economic and social change. But progress is not stopping, and in some measurements, continues to improve exponentially. I really think this is incredibly transformative. Moreso than anything humanity has ever experienced. In the last year, OpenAI and potentially Claude have been working on recursive self-improvement. Meaning these models are designing better versions of themselves. This means we have effectively entered the singularity.
I agree with all of this -- the one nit I'll say is that scaling laws (e.g. Chinchilla -- classic paper on this that still holds) are based on next-token log loss on an evaluation set for pretraining, and follow (empirically) very consistent powerlaw relationships with compute / data (there is an ideal mixture of compute + data, and the thing you scale is the compute at this ideal mixture). So that's all I mean by performance -- we do also have as you observe benchmark performance trends (which are measured on the final model, after post-training, RL stages etc). These follow less predictable relationships, but it's the pretraining loss that dominates anyway.
There is an intractable problem is economics which is how to assign value to a thing. It's probably more of a philosophical debate. One arm of the debate is something called marginal utility. That is, a think is worth a different amount to different people. I value a particular guitar more than my neighbour does. I think it's worth $100. He thinks it's worth $50. Who is right? Authoritarian nations use top-down economic methods to dictate the value. In this case, they would make the guitar worth $1 (so "everyone can afford a guitar now"). Under capitalism, we allow individual to negotiate, and the highest bidder determines the price.
Apologies for the long-winded context, but I think it's important to address your point. You're alluding to some kind of intrinsic, self-evident value, and I would like to challenge you on that. Prove to me that this value exists. Prove to me it can be measured. To pre-empt your reply, I don't think you can. You might value a person in a certain way, but you can't ensure everyone else does. In fact you even accept this in your last sentence. A small group of people known to that individual would value them more than the other 8 billion people on the planet. Which is more or less what the person you replied to was explaining.
>Prove to me that this value exists. Prove to me it can be measured. To pre-empt your reply, I don't think you can.
Why does any of this matter? Do you require a person proves their utility to you before you hold the door open for them? When a child falls and scrapes their knee, do you ask about their grades in school, or parents net worth, before lending them a hand?
My point: human society is deeply interwoven with sentimental behaviors that make zero sense in economic theory. You can try to apply all the models you want to model human compassion and it will get you nil.
But that doesn't mean we should optimize that out of societies. I think it's the most wonderful part of our societies, and if we were to remove it, we'd stop being humans.
If you're going to make the claim that people hold intrinsic value, people are going to challenge you for proof. Holding a door open for someone and asking questions doesn't necessarily indicate value. It could indicate personal interest. Empathy. Projection. Self-interest. The concept of altruism doesn't necessitate the belief that other life holds value at all. Altruism by its definition is giving without the expectation of return.
I think you make a good point re culture and tradition. Humans like many "valueless" activities. Some of these are hardcoded into our psyche through evolution. Some are for sentimental reasons. Some are religious. Some are enforced. Some are situational. Etc. I am not suggesting we eliminate those. I am simply agreeing with the top comment which is that we cannot force people to place any value on them. Some people do not see value in those traditions (or in other people). There is no objective way to prove them wrong.
>If you're going to make the claim that people hold intrinsic value, people are going to challenge you for proof.
But this is assuming we share the same set of axioms?
It sounds like you don't accept humans having intrinsic value as a core axiom. However, I do, and it makes zero sense to me to try and "prove" such a notion.
> I can see this kind of survival-bias stories distorting the reality. To have millions of people asking for "specific tests" because AI told them seems problematic. One in a million will discover something, and that story will be enough to create the believe that is "worth doing the test that AI says" just in case. But...
This is a competition of public and private interests. A sick individual is going to lobby for tests until they discover the cause. From a public perspective, it might be cheaper to just let them die. AI is an advocate for the individual.
For the record, ChatGPT helped me diagnose a lifelong illness. I'm a new man now thanks to AI. Literally life changing. I had spent decades pleading for tests because no one could figure out the cause. I think a likely outcome here is not necessarily 10,000x more tests performed, but similar or even fewer tests, because the diagnosis success rate with AI is higher. It's not subject to bias. People tend to be more honest and reflective with their AI than they are with doctors. They get 5 minutes to give the entire case to the doctor. With an AI they can spend weeks debating and reflecting. This builds a case history far more detailed and accurate than anything we have in modern medicine today. Amplified by an order of magnitude because the AI can extract meaningful insights from the discussion.
In the very near future our AI will contact our GP for us. Soon after that, our GP will be our AI.
I’m not sure how you can come to the conclusion that AI is an advocate for the individual writ large. It seems that AI can just as easily be used to make algorithmic decisions on who receives care (based on symptoms etc). Whether or not that’s an equalizing influence or not depends on the algorithm, training data, etc.
The models could be designed that way, but we don't have evidence that they have been designed that way today. If that were to occur in future, I'm sure people would seek out impartial models.
> From a public perspective, it might be cheaper to just let them die.
You missed the point. More tests can be detrimental to the patient's health as increase the risk of unneeded medication or surgery. Also many test like x-rays have their own risks. To do them for the sake of it increases overall mortality.
So, to not over test is not just cheaper but better for people's health.
Yeah I see that there can be a false positive/negative issue too.
For instance, allergy tests have a false positive rate of ~10% and a false negative rate of ~48%. So you really need a MD (or AI) to help tease things out there.
But I'll push back here a bit. Taking random tests will of course put you at the mercy of statistics. I think this is where AI will actually really help. The tests it'll have you take are not random any more than a MD's tests are (okay maybe a tad more?). Instead the AI's testing strategy will be more broad than an MD's will. Combine the experience and physical presence of the MD and the deep 'knowledge' of the AI and I think that centaur is a lot more potent.
100% this. Human psychology is always overlooked in these discussions, and people focus on "perfect technical solution" without considering how humans will actually end up using them. Linux permissions schema are a classic example, with many guides advising users to keep everything as locked down as possible, and expanding permissions as and when required. After the 100th time of fucking around with chmod, users often give up and just make everything 777. If there were a user-friendly (but imperfect) method (like Windows' UAC), people would actually use it, and be far safer in the long run.
Your reply conflates “build more” with “build anything, anywhere, with no standards”, which is not what they wrote. China and Spain are not rebuttals to the basic supply point because both involved distorted credit, speculation, and overbuilding in the wrong places or segments, not healthy increases in broadly useful housing supply. The question is not whether supply is the only variable, but whether more homes, in places people actually want to live, puts downward pressure on prices and rents, and it does. That is just basic scarcity: when demand rises faster than housing stock, prices go up.
Keeping crime low matters too, because people pay a premium for safety, and high-crime areas often face weaker investment and worse long-term housing outcomes. And “cutting red tape” does not mean legalising asbestos or lead pipes, which is a straw man. It means reducing delays, exclusionary zoning, parking mandates, and other rules that limit safe housing production and raise costs for no good reason. Housing is absolutely a complex social problem, but complexity does not erase the role of supply. More safe housing plus safer neighbourhoods will not solve everything, but it is still one of the clearest ways to reduce pressure on rents and prices.
In the next few years it's going to be quicker to tell an AI to make something than it will take to hunt down software which fits all your uses perfectly. If you're honest, all software is imperfect for you. It's not customised exactly how you like it. Imagine if it could be exactly what you want with zero effort.
You need to think outside the box a little. They're not going to need to write a requirements doc from scratch. They'll tell it to copy a piece of software which is already established and make some customisations or improvements based on their needs. This is a few sentences.
I asked it to do some portfolio analysis for me and it created BEAUTIFUL, tabbed, interactive charts UNPROMPTED. This is kind of magical. The charts were not just beautiful, but actually super useful in understanding the data faster. I honestly could not have produced those in a week if you asked me to.
Likewise, it created a couple visualizations for me that weren't requested but were very useful. That's a nice surprise. My takeaway was essentially the same in that I'd take much longer to make something comparable. I'll take advantage of this quite a bit, I think. I spend a lot of time visualizing data
> but hasn't ChatGPT had capability to create graphs and interact with data for a while?
It's pretty bad (for me). I have to use extremely prescriptive language to tell ChatGPT what to create. Even down to the colours in the chart, because otherwise it puts black font on black background (for example). Then I have to specifically tell it to put it in a canvas, and make it interactive, and make it executable in the canvas. Then if I'm lucky I have to hit a "preview" button in the top right and hope it works (it doesn't). I could write several paragraphs telling it to do something like what Claude just demo'd and it wouldn't come close. I'm trying Claude now for financial insights and it's effortless with beautiful UX.
For posterity, Gemini is pretty good with these interactive canvases. Not nearly as good, but FAR better than ChatGPT.
reply