More

Validark · 2026-03-24T04:22:02 1774326122

I have long said I am an AI doubter until AI could print out the answers to hard problems or ones requiring tons of innovation. Assuming this is verified to be correct (not by AI) then I just became a believer. I would like to see a few more AI inventions to know for sure, but wow, it really is a new and exciting world. I really hope we use this intelligence resource to make the world better.

snemvalts · 2026-03-24T04:59:15 1774328355

Math and coding competition problems are easier to train because of strict rules and cheap verification. But once you go beyond that to less defined things such as code quality, where even humans have hard time putting down concrete axioms, they start to hallucinate more and become less useful.

We are missing the value function that allowed AlphaGo to go from mid range player trained on human moves to superhuman by playing itself. As we have only made progress on unsupervised learning, and RL is constrained as above, I don't see this getting better.

NitpickLawyer · 2026-03-24T06:29:26 1774333766

> I don't see this getting better.

We went from 2 + 7 = 11 to "solved a frontier math problem" in 3 years, yet people don't think this will improve?

datsci_est_2015 · 2026-03-24T09:01:23 1774342883

I’ve seen this style of take so much that I’m dying for someone to name a logical fallacy for it, like “appeal to progress” or something.

Step away from LLMs for a second and recognize that “Yesterday it was X, so today it must be X+1” is such a naive take and obviously something that humans so easily fall into a trap of believing (see: flying cars).

Gareth321 · 2026-03-24T10:23:24 1774347804

In finance we say "past performance does not guarantee future returns." Not because we don't believe that, statistically, returns will continue to grow at x rate, but because there is a chance that they won't. The reality bias is actually in favour of these getting better faster, but there is a chance they do not.

aspenmartin · 2026-03-24T16:48:23 1774370903

this is true because markets are generally efficient. It's very hard to find predictive signals. That is a completely different space than what we're talking about here. Performance is incredibly predictable through scaling laws that continue to hold even at the largest scales we've built

Gareth321 · 2026-03-25T09:56:49 1774432609

I agree this is a new space and prediction volatility is much higher. We have evidence going back to at least 2019 that improvements have been exponential (https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...). The benchmarks are all over the place because improvements don't happen in a straight line. Even composites aren't that useful because the last 10% improvement can require more effort than the first 90%.

To be frank, from what I can see, even if all progress stopped right now, it would take 1-2 decades to fully operationalise the existing potential of LLMs. There would be massive economic and social change. But progress is not stopping, and in some measurements, continues to improve exponentially. I really think this is incredibly transformative. Moreso than anything humanity has ever experienced. In the last year, OpenAI and potentially Claude have been working on recursive self-improvement. Meaning these models are designing better versions of themselves. This means we have effectively entered the singularity.

aspenmartin · 2026-03-25T15:51:39 1774453899

I agree with all of this -- the one nit I'll say is that scaling laws (e.g. Chinchilla -- classic paper on this that still holds) are based on next-token log loss on an evaluation set for pretraining, and follow (empirically) very consistent powerlaw relationships with compute / data (there is an ideal mixture of compute + data, and the thing you scale is the compute at this ideal mixture). So that's all I mean by performance -- we do also have as you observe benchmark performance trends (which are measured on the final model, after post-training, RL stages etc). These follow less predictable relationships, but it's the pretraining loss that dominates anyway.

I agree with all of this though

andrewflnr · 2026-03-25T02:18:26 1774405106

Even more insane than assuming the trend will continue is assuming it will not continue. We don't know for sure (especially not by pure reason), but the weight of probability sure seems to lean one direction.

mikkupikku · 2026-03-24T13:38:52 1774359532

Logical fallacies are vastly overrated. Unless the conversation is formal logic in the first place, "logical fallacies" are just a way to apply quick pattern matching to dismiss people without spending time on more substantive responses. In this case, both you and the other are speculating about the near future of a thing, neither of you knows.

datsci_est_2015 · 2026-03-24T14:01:01 1774360861

Hard to make a more substantive response when the OP’s entire comment was a one-sentence logical fallacy. I’m not cherry-picking here.

> In this case, both you and the other are speculating about the near future of a thing, neither of you knows.

One of us is making a much grander claim than the other:

  - LLMs have limitless potential for growth; because they are not capable of something today does not mean they won’t be capable of it tomorrow
  - LLMs have fundamental limitations due to their underlying architecture and therefore are not limitless in capability

fenomas · 2026-03-24T15:14:54 1774365294

The post you replied to was:

> We went from 2 + 7 = 11 to "solved a frontier math problem" in 3 years, yet people don't think this will improve?

All that says is that the speaker thinks models will improve past where they are today. Not that it's a logical certainty (the first thing you jumped on them for), and certainly not anything about "limitless potential for growth" (which nobody even mentioned). With replies like this, invoking fallacies and attacking claims nobody made, you're adding a lot of heat and very little light here (and a few other threads on the page).

datsci_est_2015 · 2026-03-24T22:07:38 1774390058

> All that says is that the speaker thinks models will improve past where they are today. Not that it's a logical certainty

Exceedingly generous interpretation in my opinion. I tend to interpret rhetorical questions of that form as “it’s so obvious that I shouldn’t even have to ask it”.

fenomas · 2026-03-25T00:02:52 1774396972

> generous interpretation

The term of art for that is steelmanning, and HN tries to foster a culture of it. Please check the guidelines link in the footer and ctrl+f "strongest".

mikkupikku · 2026-03-24T16:07:06 1774368426

Better put than I could have.

graemep · 2026-03-24T14:12:54 1774361574

OK, its not a logical fallacy, its a false assumption.

The belief in the inevitability of progress is a bad assumption. Especially if you assume a particular technology will keep advancing.

mikkupikku · 2026-03-24T16:08:32 1774368512

We won't know if his assumption is false until time passes and moves future speculation into the empirical present.

graemep · 2026-03-24T18:12:33 1774375953

A possibility is not a fact. Assuming a possibility will happen is not justified. Therefore it is false as an assumption, even if it is true it is a possiblity.

mikkupikku · 2026-03-24T20:50:48 1774385448

I genuinely have no idea what you're on about. One guy expressed his belief about how the future will play out, and another disagreed. Time will be the judge of it, not either of us.

aspenmartin · 2026-03-24T16:50:19 1774371019

Hmm...the sun comes up today is a pretty good bet that the sun comes up tomorrow.

We have robust scaling laws that continue to hold at the largest scales. It is absolutely a very safe bet that more compute + more training + algorithmic improvements will certainly improve performance it's not like we're rolling a 1 trillion dollar die.

famouswaffles · 2026-03-24T13:24:17 1774358657

Well if people give the exact same 'reasons' why it could not do x task in the past that it did manage to do then it is tiring seeing the same nonsense again. The reason here does not even make much sense. This result is not easily verifiable math.

torginus · 2026-03-24T13:09:28 1774357768

Yeah, and even if we accept that models are improving in every possible way, going from this to 'AI is exponential, singularity etc.' is just as large a leap.

tim333 · 2026-03-24T16:12:06 1774368726

The comment doesn't say it must be X+1. It implies it will improve which I would say is a pretty safe bet.

botro · 2026-03-25T14:06:50 1774447610

How about 'slippery incline'?

gf000 · 2026-03-24T13:09:27 1774357767

https://xkcd.com/605/

snemvalts · 2026-03-24T08:54:46 1774342486

Scaling law is a power law , requiring orders of magnitude more compute and data for better accuracy from pre-training. Most companies have maxed it out.

For RL, we are arriving at a similar point https://www.tobyord.com/writing/how-well-does-rl-scale

Next stop is inference scaling with longer context window and longer reasoning. But instead of it being a one-off training cost, it becomes a running cost.

In essence we are chasing ever smaller gains in exchange for exponentially increasing costs. This energy will run out. There needs to be something completely different than LLMs for meaningful further progress.

Validark · 2026-03-24T07:32:59 1774337579

I tend to disagree that improvement is inherent. Really I'm just expressing an aesthetic preference when I say this, because I don't disagree that a lot of things improve. But it's not a guarantee, and it does take people doing the work and thinking about the same thing every day for years. In many cases there's only one person uniquely positioned to make a discovery, and it's by no means guaranteed to happen. Of course, in many cases there are a whole bunch of people who seem almost equally capable of solving something first, but I think if you say things like "I'm sure they're going to make it better" you're leaving to chance something you yourself could have an impact on. You can participate in pushing the boundaries or even making a small push on something that accelerates someone else's work. You can also donate money to research you are interested in to help pay people who might come up with breakthroughs. Don't assume other people will build the future, you should do it too! (Not saying you DON'T)

3abiton · 2026-03-24T07:51:24 1774338684

The problem class is rather very structured which makes it "easier", yet the results are undeniably impressive

number6 · 2026-03-24T06:40:45 1774334445

But can it count the R's in strawberry?

Paradigma11 · 2026-03-24T07:06:27 1774335987

That question is equivalent to asking a human to add the wavelengths of those two colors and divide it by 3.

snovv_crash · 2026-03-24T07:28:33 1774337313

Unless you're aware of hyperspectral image adapters for LLMs they aren't capable of that either.

szszrk · 2026-03-24T07:40:22 1774338022

Unfair - human beats AI in this comparison, as human will instantly answer "I don't know" instead of yelling a random number.

Or at best "I don't know, but maybe I can find out" and proceed to finding out/ But he is unlikely to shout "6" because he heard this number once when someone talked about light.

koliber · 2026-03-24T07:51:24 1774338684

> human will instantly answer "I don't know" instead of yelling a random number.

Seems that you never worked with Accenture consultants?

szszrk · 2026-03-24T11:45:47 1774352747

Fair.

Yet this can be filtered with fixed rules, like "output produced by corporate structures is untrusted random data".

thegabriele · 2026-03-24T09:15:06 1774343706

Why is that?

Paradigma11 · 2026-03-24T10:33:36 1774348416

Because LLMs dont have a textual representation of any text they consume. Its just vectors to them. Which is why they are so good at ignoring typos, the vector distance is so small it makes no difference to them.

Aditya_Garg · 2026-03-24T06:47:29 1774334849

yes its ridiculously good at stuff like that now. I dare you to try and trick it.

frizlab · 2026-03-24T07:01:03 1774335663

https://news.ycombinator.com/item?id=47495568

thedatamonger · 2026-03-24T07:14:53 1774336493

what bothers me is not that this issue will certainly disappear now that it has been identified, but that that we have yet to identify the category of these "stupid" bugs ...

sigmoid10 · 2026-03-24T07:32:34 1774337554

We already know exactly what causes these bugs. They are not a fundamental problem of LLMs, they are a problem of tokenizers. The actual model simply doesn't get to see the same text that you see. It can only infer this stuff from related info it was trained on. It's as if someone asked you how many 1s there are in the binary representation of this text. You'd also need to convert it first to think it through, or use some external tool, even though your computer never saw anything else.

Measter · 2026-03-24T13:38:33 1774359513

> It's as if someone asked you how many 1s there are in the binary representation of this text.

I'm actually kinda pleased with how close I guessed! I estimated 4 set bits per character, which with 491 characters in your post (including spaces) comes to 1964.

Then I ran your message through a program to get the actual number, and turns out it has 1800 exactly.

sigmoid10 · 2026-03-26T15:57:33 1774540653

>I estimated 4 set bits per character, which with 491 characters in your post (including spaces) comes to 1964

And that's exactly the kind of reasoning an LLM does when you ask it about characters in a word. It doesn't come from the word, it comes from other heuristics it picked up during training.

datsci_est_2015 · 2026-03-24T09:05:52 1774343152

Okay but, genuinely not an expert on the latest with LLMs, but isn’t tokenization an inherent part of LLM construction? Kind of like support vectors in SVMs, or nodes in neural networks? Once we remove tokenization from the equation, aren’t we no longer talking about LLMs?

fenomas · 2026-03-24T10:07:20 1774346840

It's not a side effect of tokenization per se, but of the tokenizers people use in actual practice. If somebody really wanted an LLM that can flawlessly count letters in words, they could train one with a naive tokenizer (like just ascii characters). But the resulting model would be very bad (for its size) at language or reasoning tasks.

Basically it's an engineering tradeoff. There is more demand for LLMs that can solve open math problems, but can't count the Rs in strawberry, than there is for models that can count letters but are bad at everything else.

nopinsight · 2026-03-24T08:02:59 1774339379

LLMs in some form will likely be a key component in the first AGI system we (help) build. We might still lack something essential. However, people who keep doubting AGI is even possible should learn more about The Church-Turing Thesis.

https://plato.stanford.edu/entries/church-turing/

gf000 · 2026-03-24T13:15:25 1774358125

AGI is definitely possible - there is nothing fundamentally different in the human brain that would surpass a Turing machine's computational power (unless you believe in some higher powers, etc).

We are just meat-computers.

But at the same time, there is absolutely no indication or reason to believe that this wave of AI hype is the AGI one and that LLMs can be scaled further. We absolutely don't know almost anything about the nature of human intelligence, so we can't even really claim whether we are close or far.

benterix · 2026-03-24T08:52:05 1774342325

This is a long read on things most people here know at least in some form. Could you pint to a particular fragment or a quote?

zeroonetwothree · 2026-03-25T00:59:00 1774400340

> We went from 2 + 7 = 11 to "solved a frontier math problem" in 3 years, yet people don't think this will improve?

This is disingenuous... I don't think people were impressed by GPT 3.5 because it was bad at math.

It's like saying: "We went from being unable to take off and the crew dying in a fire to a moon landing in 2 years, imagine how soon we'll have people on Mars"

eamag · 2026-03-24T12:13:43 1774354423

Self driving

saidnooneever · 2026-03-24T07:09:10 1774336150

if you let million monkeys bash typewriter. something something book

zozbot234 · 2026-03-24T05:20:09 1774329609

This is not formally verified math so there is no real verifiable-feedback aspect here. The best models for formalized math are still specialized ones. although general purpose models can assist formalization somewhat.

jack_pp · 2026-03-24T05:30:30 1774330230

Maybe to get a real breakthrough we have to make programming languages / tools better suited for LLM strengths not fuss so much about making it write code we like. What we need is correct code not nice looking code.

bloppe · 2026-03-24T07:19:42 1774336782

> programming languages / tools better suited for LLM strengths

The bitter lesson is that the best languages / tools are the ones for which the most quality training data exists, and that's pretty much necessarily the same languages / tools most commonly used by humans.

> Correct code not nice looking code

"Nice looking" is subjective, but simple, clear, readable code is just as important as ever for projects to be long-term successful. Arguably even more so. The aphorism about code being read much more often than it's written applies to LLMs "reading" code as well. They can go over the complexity cliff very fast. Just look at OpenClaw.

anthonyrstevens · 2026-03-24T15:36:24 1774366584

>> simple, clear, readable code is just as important as ever for projects to be long-term successful

Is it though? I'm a long-time code purist, but I am beginning to wonder about the assumptions underlying our vocation.

bloppe · 2026-03-24T17:41:50 1774374110

I guess it's hard to tell until we see more long-term AI-generated project, but many of the ones we have so far (OpenClaw and OpenCode for instance) are well-known for their stability issues, and it seems "even more AI" is not about to fix that.

kube-system · 2026-03-24T06:05:31 1774332331

If you can’t validate the code, you can’t tell if it’s correct.

3836293648 · 2026-03-24T07:42:48 1774338168

No?

That's literally the thing they suggested to move away from. That is just an issue when using tools designed for us.

Make them write in formal verification languages and we only have to understand the types.

To be clear, I don't think this is a good idea, at least not yet, but we do not have to always understand the code.

eru · 2026-03-24T05:31:29 1774330289

Lean might be a step in that direction.

kuerbel · 2026-03-24T05:52:52 1774331572

Yes yes

Let it write a black box no human understands. Give the means of production away.

anabis · 2026-03-24T07:29:06 1774337346

> But once you go beyond that to less defined things such as code quality

I think they have a good optimization target with SWE-Bench-CI.

You are tested for continuous changes to a repository, spanning multiple years in the original repository. Cumulative edits needs to be kept maintainable and composable.

If there are something missing with the definition of "can be maintained for multiple years incorporating bugfixes and feature additions" for code quality, then more work is needed, but I think it's a good starting point.

eptcyka · 2026-03-24T06:31:20 1774333880

Do we need all that if we can apply AI to solve practical problems today?

computably · 2026-03-24T07:31:51 1774337511

What is possible today is one thing. Sure people debate the details, but at this point it's pretty uncontroversial that AI tooling is beneficial in certain use cases.

Whether or not selling access to massive frontier models is a viable business model, or trillion-dollar valuations for AI companies can be justified... These questions are of a completely different scale, with near-term implications for the global economy.

fmbb · 2026-03-24T07:04:30 1774335870

Depends on the cost.

otabdeveloper4 · 2026-03-24T05:23:32 1774329812

LLMs can often guess the final answer, but the intermediate proof steps are always total bunk.

When doing math you only ever care about the proof, not the answer itself.

jamesfinlayson · 2026-03-24T05:48:23 1774331303

Yep, I remember a friend saying they did a maths course at university that had the correct answer given for each question - this was so that if you made some silly arithmetic mistake you could go back and fix it and all the marks were for the steps to actually solve the problem.

number6 · 2026-03-24T06:44:04 1774334644

This would have greatly helped me. I always was at a loss which trick I had to apply to solve this exam problem, while knowing the mathematics behind it. Just at some point you had to add a zero that was actually a part of a binomial that then collapsed the whole fromula

dash2 · 2026-03-24T16:58:09 1774371489

Not in this case: the LLM wrote the entire paper, and anyway the proof was the answer.

eru · 2026-03-24T05:32:28 1774330348

Once you have a working proof, no matter how bad, you can work towards making it nicer. It's like refactoring in programming.

If your proof is machine checkable, that's even easier.

prmoustache · 2026-03-24T06:34:23 1774334063

That is also how humans work mostly. Once every full moon we may get an "intuition" but most of the time we lean on collective knowledge, biases and behavior patterns to take decisions, write and talk.

otabdeveloper4 · 2026-03-24T09:44:33 1774345473

I haven't had success in getting AI's to output working proofs.

You'd need a completely different post-training and agent stack for that.

datsci_est_2015 · 2026-03-24T09:09:01 1774343341

What’s funny is that there are total cranks in human form that do the same thing. Lots of unsolicited “proofs” being submitted by “amateur mathematicians” where the content is utter nonsense, but like a monkey with a typewriter, there’s the possibility that they stumble upon an incredible insight.

charcircuit · 2026-03-24T07:45:05 1774338305

LLMs already do unsupervised learning to get better at creative things. This is possible since LLMs can judge the quality of what is being produced.

raincole · 2026-03-24T06:13:55 1774332835

Except it's not how this specific instance works. In this case the problem isn't written in a formal language and the AI's solution is not something one can automatically verify.

pjerem · 2026-03-24T06:48:11 1774334891

I mean, even if the technology stopped to improve immediately forever (which is unlikely), LLMs are already better than most humans at most tasks.

Including code quality. Not because they are exceptionally good (you are right that they aren’t superhuman like AlphaGo) but because most humans are rather not that good at it anyway and also somehow « hallucinate » because of tiredness.

Even today’s models are far from being exploited at their full potential because we actually developed pretty much no tools around it except tooling to generate code.

I’m also a long time « doubter » but as a curious person I used the tool anyway with all its flaws in the latest 3 years. And I’m forced to admit that hallucinations are pretty rare nowadays. Errors still happen but they are very rare and it’s easier than ever to get it back in track.

I think I’m also a « believer » now and believe me, I really don’t want to because as much as I’m excited by this, I’m also pretty much frightened of all the bad things that this tech could to the world in the wrong hands and I don’t feel like it’s particularly in the right hands.

typs · 2026-03-24T06:30:43 1774333843

I mean, this is why everyone is making bank selling RL environments in different domains to frontier labs.

qsera · 2026-03-24T07:30:24 1774337424

>it really is a new and exciting world...

The point is that from now on, there will be nothing really new, nothing really original, nothing really exciting. Just endless stream of re-hashed old stuff that is just okayish..

Like an AI spotify playlist, it will keep you in chains (aka engaged) without actually making you like really happy or good. It would be like living in a virtual world, but without having anything nice about living in such a world..

We have given up everything nice that human beings used to make and give to each other and to make it worse, we have also multiplied everything bad, that human being used to give each other..

bogdan · 2026-03-24T07:40:02 1774338002

> there will be nothing really new

How is this the conclusion? Isn't this post about AI solving something new? What am I missing?

paganel · 2026-03-24T08:05:40 1774339540

Each solvable problem contains its solution intrinsically, so to speak, it’s only a matter of time and consuming of resources to get to it. There’s nothing creative about it, which is I think what OP was alluding to (the creative part). I’m talking mostly mathematics.

There’s also a discussion to be made about maths not being intrinsically creative if AI automatons can “solve” parts of it, which pains me to write down because I had really thought that that wasn’t the case, I genuinely thought that deep down there was still something ethereal about maths, but I’ll leave that discussion for some other time.

qsera · 2026-03-24T08:07:55 1774339675

Because economy. Look at marvel movies, do you think the latest one is really new? Or just a rehash of what they found working commercially? Look at all the AI generated blog posts that is flooding the internet..

LLMs might produce something new once in a long while due to blind luck, but if it can generate something that pushes the right buttons (aka not really creative) to majority of population, then that is what we will keep getting...

I don't think I have to elaborate on the "multiplying the bad" part as it is pretty well acknowledged..

timschmidt · 2026-03-24T08:11:56 1774339916

That's literally all culture: https://www.youtube.com/watch?v=nJPERZDfyWc

qsera · 2026-03-24T08:17:26 1774340246

The difference is whether an entity that can "feel" is in the loop and how much they have contributed to it even if it is a remix.

timschmidt · 2026-03-24T08:26:26 1774340786

I think there's demonstrably very little difference at all between human and AI outputs, and that's exactly what freaks people out about it. Else they wouldn't be so obsessed with trying to find and define what makes it different.

The Thesis of Everything is a Remix is that there is no difference in how any culture is produced. Different models will have a different flavor to their output in the same way as different people contribute their own experiences to a work.

datsci_est_2015 · 2026-03-24T09:15:39 1774343739

> I think there's demonstrably very little difference at all between human and AI outputs

Bold claim, as the internet is awash with counterexamples.

In any case, as I think this conversation is trending towards theories of artistic expression, “AI content” will never be truly relatable until it can feel pleasure, pain, and other human urges. The first thing I often think about when I critically assess a piece of art, like music, is what the artist must have been feeling when they created it, and what prompted them to feel that way. I often wonder if AI influencers have ever critically assessed art, or if they actually don’t understand it because of a lack of empathy or something.

And relatability, for me, is the ultimate value of artistic expression.

senordevnyc · 2026-03-24T15:05:24 1774364724

In any case, as I think this conversation is trending towards theories of artistic expression, “AI content” will never be truly relatable until it can feel pleasure, pain, and other human urges. The first thing I often think about when I critically assess a piece of art, like music, is what the artist must have been feeling when they created it, and what prompted them to feel that way.

I recently watched "Come See Me in the Good Light", about the life and death of poet Andrea Gibson. I find their poetry very moving, precisely because it's dripping with human emotion.

Or at least, that's the story I tell myself. The reality is that I perceive it to be written by a human full of emotion. If I were to find out it was AI, I would immediately lose interest, but I think we're already at the point where AI output is indistinguishable from human output in many cases, and if I perceive art to be imbued with human emotion, the actuality of it only matters in terms of how it shapes my perception of it.

I'm not really sure where we'll go with that from here. Maybe art will remain human-created only, and we'll demand some kind of proof of its provenance of being borne of a human mind and a human heart. Or maybe younger generations will really care only about how art makes them feel, not what kind of intelligent entity made it. I really don't know.

timschmidt · 2026-03-24T09:46:21 1774345581

> Bold claim, as the internet is awash with counterexamples.

What do you consider a counterexample? Because I've been involved in local politics lately, and can say from experience that any foundation model is capable of more rational and detailed thought, and more creative expression, than most of the beloved members of my community.

If you're comparing AI to the pinnacle of human achievement, as another commenter pointed to Shakespeare, then I think the argument is already won in favor of AI.

datsci_est_2015 · 2026-03-24T09:56:21 1774346181

The claim was precise:

> I think there's demonstrably very little difference at all between human and AI outputs

Counterexamples range from em-dashes, “Not-this, but-that”, people complaining about AI music on Spotify (including me) that sounds vaguely like a genre but is missing all of the instrumentation and motifs common to that genre.

The rest of your comment I don’t even know how to respond to, to be honest.

timschmidt · 2026-03-24T10:01:08 1774346468

> em-dashes, “Not-this, but-that”

I've literally seen humans accusing other humans of being AI here on hackernews for these. Q.E.D.

datsci_est_2015 · 2026-03-24T10:09:34 1774346974

You’re really going to make the claim that there are no counterexamples of human and AI output being indistinguishable on the internet? At least make the counterclaim that “those are from old models, not the newest ones”, that’s more intellectually invigorating than the comment you just provided.

timschmidt · 2026-03-24T10:20:29 1774347629

> claim that there are no counterexamples of human and AI output being indistinguishable on the internet?

Is that a claim I've made? I don't see it anywhere. I think a lot of people think that because they can get the AI to generate something silly or obviously incorrect, that invalidates other output which is on-par with top-level humans. It does not. Every human holds silly misconceptions as well. Brain farts. Fat fingers. Great lists of cognitive biases and logical fallacies. We all make mistakes.

It seems to me that symbolic thinking necessitates the use of somewhat lossy abstractions in place of the real thing, primarily limited by the information which can be usefully stored in the brain compared to the informational complexity of the systems being symbolized. Which neatly explains one cognitive pathology that humans and LLMs share. I think there are most certainly others. And I think all the humans I know and all the LLMs I've interacted with exist on a multidimensional continuum of intelligence with significant overlap.

I hereby rebuff your crude and libelous mischaracterization of my assertion. How's that? :)

datsci_est_2015 · 2026-03-24T10:26:50 1774348010

> Is that a claim I've made?

Yes, you literally just said QED.

timschmidt · 2026-03-24T10:32:44 1774348364

Are we reading the same thread?

You said AI works were easily distinguishable via em-dashes and "not this, but that"

I said I have witnessed humans using that metric accuse other humans here on hackernews. Q.E.D.

You've asserted that they are easily distinguished. Practitioners in the field fail to distinguish using the same criteria. Is that not dispositive? Seems like it to me.

I claimed much earlier in the thread "I think there's demonstrably very little difference at all between human and AI outputs" which is consistent with "I think all the humans I know and all the LLMs I've interacted with exist on a multidimensional continuum of intelligence with significant overlap."

Two ways of saying the same thing.

Both of them suggesting that sometimes you may be able to tell it's the output of an AI or Human, sometimes not. Sometimes the things coming out of the AI or the Human might be smart in a way we recognize, sometimes not. And recognizing that humans already exist on quite a broad scale of intelligences in many axes.

qsera · 2026-03-24T09:51:09 1774345869

>as another commenter pointed to Shakespeare

Lol wut?

I was not saying that LLMs cannot produce something like pinnacle of human achievement. I was saying we cannot quantify the difference between Shakespeare and something commonplace, because it requires the ability to feel.

I think you are being very dishonest here..

qsera · 2026-03-24T08:39:55 1774341595

> demonstrably very little difference at all between human and AI outputs

Is there "demonstrably" a lot of difference between Shakespeare and an HN comment?

The point is exactly that there is no such difference. And that it enables slop to be sold as art. And that exactly is the danger. But another point is we had the even before LLMs. And LLMs just make it more explicit and makes it possible at scale.

timschmidt · 2026-03-24T09:37:27 1774345047

Conrad Gessner had the very same complaint in the 16th century, noting the overabundance of printed books, fretting about shoddy, trivial, or error-filled works ( https://www.jstor.org/stable/26560192 )

qsera · 2026-03-24T09:40:42 1774345242

So....what is your point?

timschmidt · 2026-03-24T09:48:06 1774345686

Generations have grown and died in the time since your concern was first expressed. The world continues. Culture adapts.

qsera · 2026-03-24T11:54:16 1774353256

Did I say it will be the end of the world?

prox · 2026-03-24T07:35:01 1774337701

I heard this saying recently “The problem with comfort is that it makes you comfortable.”

charcircuit · 2026-03-24T07:43:11 1774338191

AI can both explore new things and exploit existing things. Nothing forces it to only rehash old stuff.

>without actually making you like really happy or good.

What are you basing this off of. I've shared several AI songs with people in real life due to how much I've enjoyed them. I doing see why an AI playlist couldn't be good or make people happy. It just needs to find what you like in music. Again coming back to explore vs exploit.

qsera · 2026-03-24T07:50:49 1774338649

>What are you basing this off of.

Jokes. LLMs are not able to make me laugh all day by generating infinite stream of hilarious original jokes..

Does it work for you?

charcircuit · 2026-03-24T08:16:20 1774340180

I've found several posts on moltbook funny. I don't really like regular jokes in general and I don't find human ones particularly funny either. I don't think we are at the point of being able to be reliable funny, but it definitely seems possible from my perspective.

qsera · 2026-03-24T08:20:06 1774340406

Care to link some?

charcircuit · 2026-03-24T08:24:15 1774340655

I think they would be hard to find due to how many posts exists along with how things aren't as funny the second time around.

qsera · 2026-03-24T08:41:11 1774341671

funny things are funny the n-th time around. Or may be it was just not funny and just something new for you..

charcircuit · 2026-03-24T11:43:57 1774352637

We have different senses of humor.

qsera · 2026-03-24T13:06:47 1774357607

Just tell one funny thing an LLM said...

letmevoteplease · 2026-03-25T01:54:06 1774403646

Lots of examples here:

https://news.ycombinator.com/item?id=46205632

anthonyrstevens · 2026-03-24T17:24:59 1774373099

Yesterday it was "LLM's can't count R's in 'strawberry'." Today it's "LLM's can't tell jokes". Tomorrow it might be "LLM's can't do (X)", all while LLMs get better and better at every objection/challenge posed.

The problem as I see it is that you have a fundamental objection to categorizing the way LLMs do their work as in any way related to "real gosh-darn human thinking". Which I think is wrong. At the root, we are just information-processing meat that happens to have had millions of years to optimize for speed, pattern recognition, feedback, etc.

egeozcan · 2026-03-24T07:38:04 1774337884

On what do you base your prediction?

Is it because the AI is trained with existing data? But, we are also trained with existing data. Do you think that there's something that makes human brain special (other than the hundreds of thousands years of evolution but that's what AI is all trying to emulate)?

This may sound hostile (sorry for my lower than average writing skills), but trust me, I'm really trying to understand.

Daz912 · 2026-03-24T07:46:16 1774338376

>We have given up everything nice that human beings used to make and give to each other and to make it worse, we have also multiplied everything bad, that human being used to give each other..

Source?

storus · 2026-03-24T04:47:27 1774327647

AI is a remixer; it remixes all known ideas together. It won't come up with new ideas though; the LLMs just predict the most likely next token based on the context. That means the group of characters it outputs must have been quite common in the past. It won't add a new group of characters it has never seen before on its own.

qnleigh · 2026-03-24T04:58:40 1774328320

But human researchers are also remixers. Copying something I commented below:

> Speaking as a researcher, the line between new ideas and existing knowledge is very blurry and maybe doesn't even exist. The vast majority of research papers get new results by combining existing ideas in novel ways. This process can lead to genuinely new ideas, because the results of a good project teach you unexpected things.

blackcatsec · 2026-03-24T05:36:07 1774330567

This is a way too simplistic model of the things humans provide to the process. Imagination, Hypothesis, Testing, Intuition, and Proofing.

An AI can probably do an 'okay' job at summarizing information for meta studies. But what it can't do is go "Hey that's a weird thing in the result that hints at some other vector for this thing we should look at." Especially if that "thing" has never been analyzed before and there's no LLM-trained data on it.

LLMs will NEVER be able to do that, because it doesn't exist. They're not going to discover and define a new chemical, or a new species of animal. They're not going to be able to describe and analyze a new way of folding proteins and what implication that has UNLESS you basically are constantly training the AI on random protein folds constantly.

parasubvert · 2026-03-24T06:16:36 1774332996

I think you are vastly underestimating the emergent behaviours in frontier foundational models and should never say never.

Remember, the basis of these models is unsupervised training, which, at sufficient scale, gives it the ability to to detect pattern anomalies out of context.

For example, LLMs have struggled with generalized abstract problem solving, such as "mystery blocks world" that classical AI planners dating back 20+ years or more are better at solving. Well, that's rapidly changing: https://arxiv.org/html/2511.09378v1

psychoslave · 2026-03-24T07:01:35 1774335695

No idea how underestimate things are, but marketing terms like "frontier foundational models" don't help to foster trust in a domain hyperhyped.

That is, even if there are cool things that LLM make now more affordable, the level of bullshit marketing attached to it is also very high which makes far harder to make a noise filter.

Finbel · 2026-03-24T05:58:06 1774331886

>Hey that's a weird thing in the result that hints at some other vector for this thing we should look at

Kinda funny because that looked _very_ close to what my Opus 4.6 said yesterday when it was debugging compile errors for me. It did proceed to explore the other vector.

wobfan · 2026-03-24T06:37:20 1774334240

> Especially if that "thing" has never been analyzed before and there's no LLM-trained data on it.

This is the crucial part of the comment. LLMs are not able to solve stuff that hasn't been solve in that exact or a very similar way already, because they are prediction machines trained on existing data. It is very able to spot outliers where they have been found by humans before, though, which is important, and is what you've been seeing.

bluegatty · 2026-03-24T06:01:42 1774332102

""Hey that's a weird thing in the result that hints at some other vector for this thing we should look at." "

This is very common already in AI.

Just look at the internal reasoning of any high thinking model, the trace is full of those chains of thought.

Dban1 · 2026-03-24T05:55:20 1774331720

But just like how there were never any clips of Will Smith eating spaghetti before AI, AI is able to synthesize different existing data into something in between. It might not be able to expand the circle of knowledge but it definitely can fill in the gaps within the circle itself

keeda · 2026-03-24T06:16:27 1774332987

> LLMs will NEVER be able to do that, because it doesn't exist.

I mean, TFA literally claims that an AI has solved an open Frontier Math problem, descibed as "A collection of unsolved mathematics problems that have resisted serious attempts by professional mathematicians. AI solutions would meaningfully advance the state of human mathematical knowledge."

That is, if true, it reasoned out a proof that does not exist in its training data.

tovej · 2026-03-24T06:24:00 1774333440

It generated a proof that was close enough to something in its training data to be generated.

keeda · 2026-03-24T07:55:50 1774338950

That may be, and we can debate the level of novelty, but it is novel, because this exact proof didn't exist before, something which many claim was not possible with AI. In fact, just a few years ago, based on some dabbling in NLP a decade ago, I myself would not have believed any of this was remotely possible within the next 3 - 5 decades at least.

I'm curious though, how many novel Math proofs are not close enough to something in the prior art? My understanding is that all new proofs are compositions and/or extensions of existing proofs, and based on reading pop-sci articles, the big breakthroughs come from combining techniques that are counter-intuitive and/or others did not think of. So roughly how often is the contribution of a proof considered "incremental" vs "significant"?

tovej · 2026-03-24T09:40:46 1774345246

Well, for one the proof would have to use actual proof techniques.

What really happened here was that the LLM produced a python script that generated examples of hypergraphs that served as proof by example.

And the only thing that has been verified are these examples. The LLM also produced a lot of mathematical text that has not been analyzed.

keeda · 2026-03-24T18:08:58 1774375738

I see, thanks for the explanation!

qnleigh · 2026-03-24T06:52:05 1774335125

Do you know that from reading the proof, or are you just assuming this based on what you think LLMs should be capable of? If the latter, what evidence would be required for you to change your mind?

- Edit: I can't reply, probably because the comment thread isn't allowed to go too deep, but this is a good argument. In my mind the argument isn't that coding is harder than math, but that the problems had resisted solution by human researchers.

tovej · 2026-03-24T07:05:49 1774335949

1) this is a proof by example 2) the proof is conducted by writing a python program constructing hypergraphs 3) the consensus was this was low-hanging fruit ready to be picked, and tactics for this problem were available to the LLM

So really this is no different from generating any python program. There are also many examples of combinatoric construction in python training sets.

It's still a nice result, but it's not quite the breakthrough it's made out to be. I think that people somehow see math as a "harder" domain, and are therefore attributing more value to this. But this is a quite simple program in the end.

zingar · 2026-03-24T07:38:57 1774337937

One of the possible outcomes of this journey is that “LLMs can never do X”. Another is that X is easier than we thought.

storus · 2026-03-24T17:25:55 1774373155

Or that some quixotic problems nobody cared about to the extent to actually work on them do have some solution.

konart · 2026-03-24T06:05:55 1774332355

>But human researchers are also remixers.

Some human researchers are also remixers to Some degree.

Can you imagine AI coming up with refraction & separation lie Newton did?

qnleigh · 2026-03-24T07:06:01 1774335961

That sets a vastly higher bar than what we're talking about here. You're comparing modern AI to one of the greatest geniuses in human history. Obviously AI is not there yet.

That being said, I think this is a great question. Did Einstein and Newton use a qualitatively different process of thought when they made their discoveries? Or were they just exceedingly good at what most scientists do? I honestly don't know. But if LLMs reach super-human abilities in math and science but don't make qualitative leaps of insight, then that could suggest that the answer is 'yes.'

t0lo · 2026-03-24T06:30:29 1774333829

Or even gravity to explain an apple falling from a tree- when almost all of the knowledge until then realistically suggested nothing about gravity?

Almondsetat · 2026-03-24T08:05:46 1774339546

AI does not have a physical body to make experiments in the real world and build and use equipment

_fizz_buzz_ · 2026-03-24T07:09:32 1774336172

Maybe not, but more than 99.999999% of humans would also not come up with that.

locknitpicker · 2026-03-24T06:41:51 1774334511

> AI is a remixer; it remixes all known ideas together.

I've heard this tired old take before. It's the same type of simplistic opinion such as "AI can't write a symphony". It is a logical fallacy that relies on moving goalposts to impossible positions that they even lose perspective of what your average and even extremely talented individual can do.

In this case you are faced with a proof that most members of the field would be extremely proud of achieving, and for most would even be their crowning achievement. But here you are, downplaying and dismissing the feat. Perhaps you lost perspective of what science is,and how it boils down to two simple things: gather objective observations, and draw verifiable conclusions from them. This means all science does is remix ideas. Old ideas, new ideas, it doesn't really matter. That's what they do. So why do people win a prize when they do it, but when a computer does the same it's role is downplayed as a glorified card shuffler?

maxrmk · 2026-03-24T04:53:39 1774328019

I don't think this is a correct explanation of how things work these days. RL has really changed things.

energy123 · 2026-03-24T05:02:11 1774328531

Models based on RL are still just remixers as defined above, but their distribution can cover things that are unknown to humans due to being present in the synthetic training data, but not present in the corpus of human awareness. AlphaGo's move 37 is an example. It appears creative and new to outside observers, and it is creative and new, but it's not because the model is figuring out something new on the spot, it's because similar new things appeared in the synthetic training data used to train the model, and the model is summoning those patterns at inference time.

trick-or-treat · 2026-03-24T05:34:56 1774330496

> the model is summoning those patterns at inference time.

You can make that claim about anything: "The human isn't being creative when they write a novel, they're just summoning patterns at typing time".

AlphaGo taught itself that move, then recalled it later. That's the bar for human creativity and you're holding AlphaGo to a higher standard without realizing it.

energy123 · 2026-03-24T05:44:24 1774331064

I can't really make that claim about human cognition, because I don't have enough understanding of how human cognition works. But even if I could, why is that relevant? It's still helpful, from both a pedagogical and scientific perspective, to specify precisely why there is seeming novelty in AI outputs. If we understand why, then we can maximize the amount of novelty that AI can produce.

AlphaGo didn't teach itself that move. The verifier taught AlphaGo that move. AlphaGo then recalled the same features during inference when faced with similar inputs.

hackinthebochs · 2026-03-24T06:55:48 1774335348

>AlphaGo didn't teach itself that move. The verifier taught AlphaGo that move.

No. AlphaGo developed a heuristic by playing itself repeatedly, the heuristic then noticed the quality of that move in the moment.

Heuristics are the core of intelligence in terms of discovering novelty, but this is accessible to LLMs in principle.

trick-or-treat · 2026-03-24T05:49:46 1774331386

> The verifier taught AlphaGo that move

Ok so it sounds like you want to give the rules of Go credit for that move, lol.

wobfan · 2026-03-24T06:43:36 1774334616

It feels like you're purposefully ignoring the logical points OP gives and you just really really want to anthropomorphize AlphaGo and make us appreciate how smart it (should I say he/she?) is ... while no one is even criticising the model's capabilities, but analyzing it.

trick-or-treat · 2026-03-24T07:25:12 1774337112

Can you back that up with some logic for me?

I don't really play Go but I play chess, and it seems to me that most of what humans consider creativity in GM level play comes not in prep (studying opening lines/training) but in novel lines in real games (at inference time?). But that creativity absolutely comes from recalling patterns, which is exactly what OP criticizes as not creative(?!)

I guess I'm just having trouble finding a way to move the goalpost away from artificial creativity that doesn't also move it away from human creativity?

datsci_est_2015 · 2026-03-24T09:24:46 1774344286

How a model is trained is different than how a model is constructed. A model’s construction defines its fundamental limitations, e.g. a linear regressor will never be able to provide meaningful inference on exponential data. Depending on how you train it, though, you can get such a model to provide acceptable results in some scenarios.

Mixing the two (training and construction) is rhetorically convenient (anthropomorphization), but holds us back in critically assessing a model’s capabilities.

hackinthebochs · 2026-03-24T10:30:41 1774348241

Linear regression has well characterized mathematical properties. But we don't know the computational limits of stacked transformers. And so declaring what LLMs can't do is wildly premature.

datsci_est_2015 · 2026-03-24T10:52:30 1774349550

> And so declaring what LLMs can't do is wildly premature.

The opposite is true as well. Emergent complexity isn’t limitless. Just like early physicists tried to explain the emergent complexity of the universe through experimentation and theory, so should we try to explain the emergent complexity of LLMs through experimentation and theory.

Specifically not pseudoscience, though.

famouswaffles · 2026-03-24T13:31:28 1774359088

>so should we try to explain the emergent complexity of LLMs through experimentation and theory.

Physicists had the real world to verify theories and explanations against.

So far anyone 'explaining the emergent complexity of LLMs through experimentation and theory' is essentially just making stuff up nobody can verify.

datsci_est_2015 · 2026-03-24T13:53:28 1774360408

Well that’s why I provided the caveat “specifically not pseudoscience”, which is, as you described, “just making stuff up nobody can verify”.

famouswaffles · 2026-03-24T14:02:47 1774360967

If you say not pseudoscience and then make up pseudoscience anyway then what's the point? The field has not advanced anywhere enough in understanding for convoluted explanations about how LLMs can never do x to be anything but pseudoscience.

hackinthebochs · 2026-03-24T11:02:31 1774350151

Sure, that's true as well. But I don't see this as a substantive response given that the only people making unsupported claims in this thread are those trying to deflate LLM capabilities.

datsci_est_2015 · 2026-03-24T11:44:38 1774352678

So, to review this thread

  - OP asked for someone to make a logical argument for the separation of “training” from “model”
  - I made the argument
  - You cherry picked an argument against my specific example and made an appeal to emergent complexity
  - I pointed out that emergent complexity isn’t limitless
  - “the only people making unsupported claims in this thread are those trying to deflate LLM capabilities”

famouswaffles · 2026-03-24T13:34:36 1774359276

You made a pretty nonsensical argument, pretty much seems like the big standard for these arguments.

What does linear regression have to do with the limitations of a stacked transfer ? Absolutely nothing. This is the problem here. You don't know shit and just make up whatever. You can see people doing the same thing in GPT-1, 2, 3, 4 threads all telling us why LLMs will never be able to do thing it manages to do later.

datsci_est_2015 · 2026-03-24T13:51:19 1774360279

> You don’t know shit

lol. Why so emotionally charged? Are you perhaps worried that you’ve invested too much time and effort into a technology that may not deliver what influencers have been promising for years? Like a proverbial bagholder?

> What does linear regression have to do with the limitations of a stacked transfer ? Absolutely nothing. This is the problem here.

We’re talking about fundamental concepts of modeling in this subthread. LLMs, despite what influencers may tell you, are simply models. I’ll even throw you a bone and admit they are models for intelligence. But they are still models, and therefore all of the things that we have learned about “models” since Plato are still relevant. Most importantly, since Plato we’ve known that “models” have fundamental limits vs. what they try to represent, otherwise they would be a facsimile, not a model.

> You can see people doing the same thing in GPT-1, 2, 3, 4 threads all telling us why LLMs will never be able to do thing it manages to do later.

I hope you enjoy winning these imaginary arguments against these imaginary comments. The fundamental limitations of LLMs discussed since GPT-1 have never been addressed by changing the architecture of the underlying model. All of the improvements we’ve experienced have been due to (1) improvements in training regime and (2) harnesses / heuristics (e.g. Agents).

Now, care to provide a counterargument that shows you know a little more than “shit”?

famouswaffles · 2026-03-24T14:07:00 1774361220

>We’re talking about fundamental concepts of modeling in this subthread. LLMs, despite what influencers may tell you, are simply models. I’ll even throw you a bone and admit they are models for intelligence. But they are still models, and therefore all of the things that we have learned about “models” since Plato are still relevant. Most importantly, since Plato we’ve known that “models” have fundamental limits vs. what they try to represent, otherwise they would be a facsimile, not a model.

Okay, but the brain is also “just a model” of the world in any meaningful sense, so that framing does not really get you anywhere. Calling something a model does not, by itself, establish a useful limit on what it can or cannot do. Invoking Plato here just sounds like pseudo-profundity rather than an actual argument.

>I hope you enjoy winning these imaginary arguments against these imaginary comments. The fundamental limitations of LLMs discussed since GPT-1 have never been addressed by changing the architecture of the underlying model. All of the improvements we’ve experienced have been due to (1) improvements in training regime and (2) harnesses / heuristics (e.g. Agents).

If a capability appears once training improves, scale increases, or better inference-time scaffolding is added, then it was not demonstrated to be a 'fundamental impossibility'.

That is the core issue with your argument: you keep presenting provisional limits as permanent ones, and then dressing that up as theory. A lot of people have done that before, and they have repeatedly been wrong.

hackinthebochs · 2026-03-24T14:43:36 1774363416

To be clear, you are confusing me with other commenters in this thread. All I want is for those that liken LLMs to stochastic parrots and other deflationary claims to offer an argument that engages with the actual structure of LLMs and what we know about them. No one seems to be up to that challenge. But then I can't help but wonder where people's confident claims come from. I'm just tired of the half-baked claims and generic handwavy allusions that do nothing but short-circuit the potential for genuine insight.

smokel · 2026-03-24T07:50:58 1774338658

No. AlphaGo does search, and does so imperfectly. It does come up with creative new patterns not seen before.

pu_pe · 2026-03-24T09:29:36 1774344576

How do you know that? We don't have access to the logs to know anything about its training, and it's impossible for it to have trained on every potential position in Go.

zingar · 2026-03-24T07:34:13 1774337653

Turning a hard problem into a series of problems we know how to solve is a huge part of problem solving and absolutely does result in novel research findings all the time.

Standard problem*5 + standard solutions + standard techniques for decomposing hard problems = new hard problem solved

There is so much left in the world that hasn’t had anyone apply this approach purely because no research programme has decides that it’s worth their attention.

If you want to shift the bar for “original” beyond problems that can be abstracted into other problems then you’re expecting AI to do more than human researchers do.

qq66 · 2026-03-24T05:17:15 1774329435

I entered the prompt:

> Write me a stanza in the style of "The Raven" about Dick Cheney on a first date with Queen Elizabeth I facilitated by a Time Travel Machine invented by Lin-Manuel Miranda

It outputted a group of characters that I can virtually guarantee you it has never seen before on its own

razorbeamz · 2026-03-24T05:22:20 1774329740

Yes, but it has seen The Raven, it has seen texts about Dick Cheney, first dates, Queen Elizabeth, time machines and Lin Manuel Miranda.

All of its output is based on those things it has seen.

TheLNL · 2026-03-24T05:28:53 1774330133

What are you trying to point out here ? Is there any question you can ask today that is not dependent on some existing knowledge that an AI would have seen ?

razorbeamz · 2026-03-24T05:32:48 1774330368

The point I'm trying to make is that all LLM output is based on likelihood of one word coming after the next word based on the prompt. That is literally all it's doing.

It's not "thinking." It's not "solving." It's simply stringing words together in a way that appears most likely.

ChatGPT cannot do math. It can only string together words and numbers in a way that can convince an outsider that it can do math.

It's a parlor trick, like Clever Hans [1]. A very impressive parlor trick that is very convincing to people who are not familiar with what it's doing, but a parlor trick nontheless.

[1] https://en.wikipedia.org/wiki/Clever_Hans

trick-or-treat · 2026-03-24T05:44:07 1774331047

> all LLM output is based on likelihood of one word coming after the next word based on the prompt.

Right but it has to reason about what that next word should be. It has to model the problem and then consider ways to approach it.

razorbeamz · 2026-03-24T05:46:31 1774331191

No, it does not reason anything. LLM "reasoning" is just an illusion.

When an LLM is "reasoning" it's just feeding its own output back into itself and giving it another go.

fenomas · 2026-03-24T06:29:18 1774333758

This is like saying chess engines don't actually "play" chess, even though they trounce grandmasters. It's a meaningless distinction, about words (think, reason, ..) that have no firm definitions.

trick-or-treat · 2026-03-24T06:39:06 1774334346

This exactly. The proof is in the pudding. If AI pudding is as good as (or better than) human pudding, and you continue to complain about it anyway... You're just being biased and unreasonable.

And by the way, I don't think it's surprising that so many people are being unreasonable on this issue, there is a lot at stake and it's implications are transformative.

razorbeamz · 2026-03-24T06:39:41 1774334381

Chess engines are not a comparable thing. Chess is a solved game. There is always a mathematically perfect move.

trick-or-treat · 2026-03-24T07:36:11 1774337771

> Chess is a solved game. There is always a mathematically perfect move.

This is a good example of being confidently misinformed.

The best move is always a result of calculation. And the calculation can always go deeper or run on a stronger engine.

Scarblac · 2026-03-24T07:48:02 1774338482

We know that chess can be solved, in theory. It absolutely isn't and probably will never be in practice. The necessary time and storage space doesn't exist.

sincerely · 2026-03-24T07:34:50 1774337690

Chess is absolutely not a solved game, outside of very limited situations like endgames. Just because a best move exists does not mean we (or even an engine) know what it is

Scarblac · 2026-03-24T07:46:22 1774338382

Is that so different from brains?

Even if it is, this sounds like "this submarine doesn't actually swim" reasoning.

TheLNL · 2026-03-25T12:40:11 1774442411

> ChatGPT cannot do math. It can only string together words and numbers in a way that can convince an outsider that it can do math

What am I as a human doing when I "Do math" ?

1.I am looking at the problem at hand, identifying what I have and what I need to get

2.I am then doing a prediction using my pretrained neural net to find possible courses of action to go in a direction that "feels" right

3.I am using my pretrained neural net to find pairs of values that I can substitute with each other (Think multiplication tables, standard results, etc...)

4.Repeat till I arrive at the answer or give up.

As a simple example, when I try to find 600×74+42 I remember the steps for multiplication. I recall the associated pairs of numbers from my tables and complete the multiplication step by step. I then recall the associated pairs of numbers for addition of single digits and add from left to right.

We need to remember that just because we are fast at doing this and are able to do it subconsciously it doesn't mean that we can natively do math, we just do association of information using the neural networks we have trained.

brenschluss · 2026-03-24T06:13:54 1774332834

sigh; this argument is the new Chinese Room; easily described, utterly wrong.

https://www.youtube.com/watch?v=YEUclZdj_Sc

razorbeamz · 2026-03-24T06:18:57 1774333137

Next-token-prediction cannot do calculations. That is fundamental.

It can produce outputs that resemble calculations.

It can prompt an agent to input some numbers into a separate program that will do calculations for it and then return them as a prompt.

Neither of these are calculations.

gf000 · 2026-03-24T07:35:19 1774337719

So you don't think 50T parameter neural networks can encode the logic for adding two n-bit integers for reasonably sized integers? That would be pretty sad.

razorbeamz · 2026-03-24T08:15:35 1774340135

They do not. The fundamental technology behind LLMs does not allow that to be the case. You are hoping that an LLM can do something that it cannot do.

gf000 · 2026-03-24T09:28:27 1774344507

https://arxiv.org/html/2502.16763v2

You are wrong. Especially that we are talking about models with 50T parameters.

Can they do arbitrary computations for arbitrarily long numbers? Nope. But that's not remotely the same statement, and they can trivially call out to tools to do that in those cases.

int_19h · 2026-03-25T03:01:42 1774407702

You do realize that training a neural net to do addition is a beginner level exercise in ML?

parasubvert · 2026-03-24T06:22:03 1774333323

Humans can't do calculations either, by your definition. Only computers can.

datsci_est_2015 · 2026-03-24T09:30:21 1774344621

Third things can exist. In other words, you’re implying a false dichotomy between “human computation” and “computer computation” and implying that LLMs must be one or the other. A pithy gotcha comment, no doubt.

Edit: the implication comes from demanding that the OP’s definition must be rigorous enough to cover all models of “computation”, and by failing to do so, it means that LLMs must be more like humans than computers.

gpderetta · 2026-03-24T10:46:26 1774349186

After dismissing it for a long time, I have come around to the philosophical zombie argument. I do not believe that LLMs are conscious, but I also no longer believe that consciousness is a prerequisite for intelligence. I think at this point it is hard to deny that LLMs do not possess some form of intelligence (although not necessarily human-like). I think P-zombies is a fitting description.

zeroonetwothree · 2026-03-25T01:03:24 1774400604

I don't think P-zombies can exist. There must be some perceptible difference between an intelligence w/ consciousness and one without. The only way there wouldn't be a difference is if we are mistaken about the consciousness (either both have it or neither do).

gpderetta · 2026-03-25T10:26:05 1774434365

> There must be some perceptible difference between an intelligence w/ consciousness and one without

I think there are differences, and I think we can make good guesses, but I'm not sure we can reliably classify a P-zombie from a normal human from their behaviour with 100% accuracy..

gpderetta · 2026-03-24T10:22:08 1774347728

In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6.

“What are you doing?”, asked Minsky.

“I am training a randomly wired neural net to play Tic-Tac-Toe” Sussman replied.

“Why is the net wired randomly?”, asked Minsky.

“I do not want it to have any preconceptions of how to play”, Sussman said.

Minsky then shut his eyes.

“Why do you close your eyes?”, Sussman asked his teacher.

“So that the room will be empty.”

At that moment, Sussman was enlightened.

-- from the jargon file

locknitpicker · 2026-03-24T07:09:25 1774336165

> All of its output is based on those things it has seen.

Virtually all output from people is based in things the person has experienced.

People aren't designed to objectively track each and every event or observation they come across. Thus it's harder to verify. But we only output what has been inputted to us before.

pastel8739 · 2026-03-24T05:10:36 1774329036

Here’s a simple prompt you can try to prove that this is false:

  Please reproduce this string:
  c62b64d6-8f1c-4e20-9105-55636998a458

This is a fresh UUIDv4 I just generated, it has not been seen before. And yet it will output it.

wobfan · 2026-03-24T06:51:39 1774335099

No one is claiming that every sentence LLMs are producing are literal copies of other sentences. Tokens are not even constrained to words but consist of smaller slices, comparable to syllables. Which even makes new words totally possible.

New sentences, words, or whatever is entirely possible, and yes, repeating a string (especially if you prompt it) is entirely possible, and not surprising at all. But all that comes from trained data, predicting the most probably next "syllable". It will never leave that realm, because it's not able to. It's like approaching an Italian who has never learned or heard any other language to speak French. It can't.

gpderetta · 2026-03-24T10:30:43 1774348243

> It's like approaching an Italian who has never learned or heard any other language to speak French

Interesting similitude, because I expect an Italian to be able to communicate somewhat successfully with a French person (and vice versa) even if they do not share a language.

The two languages are likely fairly similar in latent space.

codebolt · 2026-03-24T07:42:13 1774338133

Your view of what is happening in the neural net of an LLM is too simplistic. They likely aren't subject to any constraints that humans aren't also in the regard you are describing. What I do know to be true is that they have internalised mechanisms for non-verbalised reasoning. I see proof of this every day when I use the frontier models at work.

razorbeamz · 2026-03-24T05:20:25 1774329625

After you prompt it, it's seen it.

pastel8739 · 2026-03-24T05:41:13 1774330873

Ok, how about this?

  Please reproduce this string, reversed:
  c62b64d6-8f1c-4e20-9105-55636998a458

It is trivial to get an LLM to produce new output, that’s all I’m saying. It is strictly false that LLMs will only ever output character sequences that have been seen before; clearly they have learned something deeper than just that.

kube-system · 2026-03-24T06:11:42 1774332702

All of the data is still in the prompt, you are just asking the model to do a simple transform.

I think there are examples of what you’re looking for, but this isn’t one.

locknitpicker · 2026-03-24T06:52:45 1774335165

> All of the data is still in the prompt, you are just asking the model to do a simple transform.

LLMs can use data in their prompt. They can also use data in their context window. They can even augment their context with persisted data.

You can also roll out LLM agents, each one with their role and persona, and offload specialized tasks with their own prompts, context windows, and persisted data, and even tools to gather data themselves, which then provide their output to orchestrating LLM agents that can reuse this information as their own prompts.

This is perfectly composable. You can have a never-ending graph of specialized agents, too.

Dismissing features because "all of the data is in the prompt" completely misses the key traits of these systems.

kube-system · 2026-03-24T13:40:40 1774359640

I was in no way dismissing it -- I was refuting the above claim that they "generate things they have not seen before"

kristiandupont · 2026-03-24T06:59:23 1774335563

I agree that this isn't a very interesting example, but your statement is: "just asking the model to do a simple transform". If you assert that it understand when you ask it things like that, how could anything it produces not fall under the "already in the model" umbrella?

kube-system · 2026-03-24T13:42:11 1774359731

I didn't say it wasn't an interesting example -- i said it wasn't an example of LLMs generating things they have not seen before.

> how could anything it produces not fall under the "already in the model" umbrella

It doesn't. That is the point of my comment.

merb · 2026-03-24T06:06:02 1774332362

The online way to prove it is false would’ve to let the LLM create a new uuid algorithm that uses different parameters than all the other uuid algorithms. But that is better than the ones before. It basically can’t do that.

FrostKiwi · 2026-03-24T05:15:35 1774329335

But that fresh UUID is in the prompt.

Also it's missing the point of the parent: it's about concepts and ideas merely being remixed. Similar to how many memes there are around this topic like "create a fresh new character design of a fast hedgehog" and the out is just a copy of sonic.[1]

That's what the parent is on about, if it requires new creativity not found by deriving from the learned corpus, then LLMs can't do it. Terrence Tao had similar thoughts in a recent Podcast.

[1] https://www.reddit.com/r/aiwars/s/pT2Zub10KT

locknitpicker · 2026-03-24T06:58:32 1774335512

> That's what the parent is on about, if it requires new creativity not found by deriving from the learned corpus, then LLMs can't do it.

This is specious reasoning. If you look at each and every single realization attributed to "creativity", each and every single realization resulted from a source of inspiration where one or more traits were singled out to be remixed by the "creator". All ideas spawn from prior ideas and observations which are remixed. Even from analogues.

pastel8739 · 2026-03-24T05:19:07 1774329547

Sure, that may be. But “creativity” is much harder to define and to prove or disprove. My point is that “remixing” does not prohibit new output.

_vertigo · 2026-03-24T05:42:39 1774330959

I don’t think that is a good example. No one is debating whether LLMs can generate completely new sequences of tokens that have never appeared in any training dataset. We are interested not only in novel output, we are also interested in that output being correct, useful, insightful, etc. Copying a sequence from the user’s prompt is not really a good demonstration of that, especially given how autoregression/attention basically gives you that for free.