Hacker Newsnew | past | comments | ask | show | jobs | submit | ankit219's commentslogin

this is good.

problem is google's security concerns. when people connect gmail to openclaw, google flags the activity as weird and suspend the account because of unusual activity. Many whose accounts got locked because of this and they thought it was because they connected it to antigravity use against the policy (which happened in some cases). We will still see google account suspensions, and that would keep making news. and it wont be because of antigravity usage.


not much to do with self improvement as such. openai has increased its pace, others are pretty much consistent. Google last year had three versions of gemini-2.5-pro each within a month of each other. Anthropic released claude 3 in march 24, sonnet 3.5 in june 24, 3.5 new in oct 24, and then 3.7 in feb 25, where they went to 4 series in May 25. then followed by opus 4.1 in august, sonnet 4.5 in oct, opus 4.5 in nov, 4.6 in feb, sonnet 4.6 in feb itself. Yes, they released both within weeks of each other, but originally they only released it together. This staggered release is what creates the impression of fast releases. its as much a function of training as a function of available compute, and they have ramped up in that regard.


> Batching multiple users up thus increases overall throughput at the cost of making users wait for the batch to be full.

writer has not heard of continuous batching. this is no longer an issue. this is what makes claude code that affordable. https://huggingface.co/blog/continuous_batching


People are misunderstanding Anthropic's fast mode because they chose to name it that way. The hints all point to a specific thing they did. The setup is costlier, its also smarter and better on tougher problems which is unheard of in terms of speed. This paper[1] fits perfectly:

The setup is parallel distill and refine. You start with parallel trajectories instead of one, then distill from them, and refine that to get to an answer. Instead of taking all trajectories to completion, they distill it quickly and refine so it gives outputs fast and yet smarter.

- paper came out in nov 2025

- three months is a good research to production pipeline

- one of the authors is at anthropic

- this approach will definitely burn more tokens than a usual simple run.

- > Anthropic explicitly warns that time to first token might still be slow (or even slower)

To what people are saying, speculative decoding wont be smarter or make any difference. Batching could be faster, but then not as costly.

Gemini Deepthink and gpt-5.2-pro use the same underlying parallel test time compute but they take each trajectory to completion before distilling and refining for the user.

[1]: https://arxiv.org/abs/2510.01123


The official document from Anthropic:

> Fast mode is not a different model. It uses the same Opus 4.6 with a different API configuration that prioritizes speed over cost efficiency. You get identical quality and capabilities, just faster responses.


Agreed. Gemini 3 Pro for me has always felt like it has had a pretraining alpha if you will. And many data points continue to support that. Even as flash, which was post trained with different techniques than pro is good or equivalent at tasks which require post training, occasionally even beating pro. (eg: in apex bench from mercor, which is basically a tool calling test - simplifying - flash beats pro). The score on arc agi2 is another datapoint in the same direction. Deepthink is sort of parallel test time compute with some level of distilling and refinement from certain trajectories (guessing based on my usage and understanding) same as gpt-5.2-pro and can extract more because of pretraining datasets.

(i am sort of basing this on papers like limits of rlvr, and pass@k and pass@1 differences in rl posttraining of models, and this score just shows how "skilled" the base model was or how strong the priors were. i apologize if this is not super clear, happy to expand on what i am thinking)


(author here) great paper to cite.

What i think you are referring to is hidden state as in internal representations. I refer to hidden state in game theoretic terms like a private information only one party has. I think we both agree alphazero has hidden states in first sense.

Concepts like king safety are objectively useful for winning at chess so alphazero developed it too, no wonder about that. Great example of convergence. However, alphazero did not need to know what i am thinking or how i play to beat me. In poker, you must model a player's private cards and beliefs.


I see now, thanks. Yes, in poker you need more of a mental model of the other side.


(Author here)

I address that in part right there itself. Programming has parts like chess (ie bounded) which is what people assume to be actual work. Understanding future requiremnts / stakeholder incentives is part of the work which LLMs dont do well.

> many domains are chess-like in their technical core but become poker-like in their operational context.

This applies to programming too.


My bad, re-read that part and it's definitely clear. Probably was skimming by the time I got to the section and didn't parse it.


>Programming has parts like chess (ie bounded)

The number of legal possible boards in chess is somewhere around 10^44 based on current calculation. That's with 32 chess pieces and their rules.

The number of possible permutations in an application, especially anything allowing turing completeness is far larger than all possible entropy states in the visible universe.


Bounded domains require scaling reasoning/compute. Two separate scenarios - one where you have hidden information, other where you have high number of combinations. Reasoning works in second case because it narrows the search space. Eg: a doctor trying to diagnose a patient is just looking at number of possibilities. If not today, when we scale it up, a model will be able to arrive at the right answer. Same goes with Math, the variance or branching for any given problem is very high. But LLMs are good at it. and getting better. A negotiation is not a high variance thing, and low number of combinations, but llms would be repeated bad at it.


> And they would have won the AI race not by building the best model, but by being the only company that could ship an AI you’d actually trust with root access to your computer.

and the very next line (because i want to emphasize it

> That trust—built over decades—was their moat.

This just ignores the history of os development at apple. The entire trajectory is moving towards permissions and sandboxing even if it annoys users to no end. To give access to an llm (any llm, not just a trusted one acc to author) the root access when its susceptible to hallucinations, jailbreak etc. goes against everything Apple has worked for.

And even then the reasoning is circular. "So you build all your trust, now go ahead and destroy it on this thing which works, feels good to me, but could occasionally fuck up in a massive way".

Not defending Apple, but this article is so far detached from reality that its hard to overstate.


you are comparing post hoc narratives in the training data to real time learning from causal dynamics. The objectives are different. They may look the same in scenarios where its heavily and accurately documented, but most narratives suffer from survivorship bias and reasoning post facto, eulogising the given outcomes.


think this particular complaint is about claude ai - the website - and not claude code. I see your point though.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: