Hacker Newsnew | past | comments | ask | show | jobs | submit | woadwarrior01's commentslogin

Reminds me of Solar 10.7B, which was a very good model for its size ~2 year ago and the "Depth Up-Scaling" technique behind it. Although, that involved continued training after repeating the layers.

https://arxiv.org/abs/2312.15166


I think the question really is: What's better? One giant bureaucracy or 27 smaller (and competing) bureaucracies?

Well, we've proven the 27 smaller, competing bureaucracies are creating plenty of their own issues. It's not like the US where 50 states (and yet more territories) actually compete on simplifying corporate law and offering strategic advantages.

Corporate law is inherently somewhat bureaucratic; better simplify it and unify it if proven necessary.


Why aren’t European countries competing on better corporate law?

To some degree they are.

Before Brexit there was the 1£ Ltd. as a famous contender, which got quite attraction and lead to creation of German UG. Nowadays Estonia is advertising their quick digital registration process.

The problem is that you are still bound to the individual countries legal system and many things there aren't unified. Having to appear at an Estonian court as your books don't comply with Estonian regulation (while at the same time for tax purpose your bookkeeping have to comply to your local legislation) isn't fun.

Also people don't know those names. What is a GmbH, an A.S., an OÜ? Is that a serious business or some shady shell company far off?


In America everything is a shady shell company so it doesn’t matter ;P

Well, it's all Delaware corporation ...

Bureaucracies are not in competition though. They are intended monopolies. For a prospective company they are investment options, larger countries usually have larger bureaucracies but also larger labor markets.

The common subexpression elimination (CSE) pass in compilers takes care of that.

Compilers cannot do this optimization for floating point [1] unless you're compiling with -ffast-math. In general, don't rely on compilers to optimize floating point sub-expressions.

[1]: https://godbolt.org/z/8bEjE9Wxx


Right, I totally forgot about floating point non associativity.

GPUs are a near monopoly. There are at least handful of big players in the CPU space. Competition alone makes the latter space a lot cheaper.

Also, for inference (and not training) there are other ways to efficiently do matmuls besides the GPU. You might want to look up Apple's undocumented AMX CPU ISA, and also this thing that vendors call the "Neural Engine" in their marketing (capabilities and the term's specific meaning varies broadly from vendor to vendor).

For small 1-3B parameter transformers like TADA, both these options are much more energy efficient, compared to GPU inference.


> Apple M3 or later required. MetalRT uses Metal 3.1 GPU features available on M3, M3 Pro, M3 Max, M4, and later chips. M1/M2 support is coming soon. On M1/M2, RCLI automatically falls back to the open-source llama.cpp engine.

So, no support for M5 Neural Accelerators, eh? (Requires Metal 4) ¯\_(ツ)_/¯


Ha, not yet. Metal 4 is interesting and we're keeping an eye on it.

MetalRT currently targets Metal 3.1 GPU compute because that's where we get the most control over the decode pipeline. Neural Engine / ANE is powerful for fixed-shape inference (vision, classification) but autoregressive LLM decode, where you're generating one token at a time with dynamic KV cache, doesn't map as cleanly to ANE today.

That said, if Metal 4 opens up new capabilities that help with sequential token generation or gives better programmable access to the neural accelerator, we'll absolutely look at it. The M5 will be a fun chip to benchmark on.


> Neural Engine / ANE is powerful for fixed-shape inference (vision, classification) but autoregressive LLM decode, where you're generating one token at a time with dynamic KV cache, doesn't map as cleanly to ANE today.

What does the ANE have to with this?

Neural Engine (ANE) and the M5 Neural Accelerator (NAX) are not the same thing. NAX can accelerate LLM prefill quite dramatically, although autoregressive decoding remains memory bandwidth bound.

I suspect the biggest blocker for Metal 4 adoption is the macOS Tahoe 26 requirement.


Good correction, thanks. You're right that NAX and ANE are distinct, I shouldn't have conflated them. NAX's ability to accelerate LLM prefill is exactly the kind of capability that could complement MetalRT's decode-focused pipeline. Appreciate the clarification on the Metal 4 / Tahoe requirement too.

Also, flexbuffers.

I'm a (non-practicing) Dwaitin Hindu. AFAICT, there's no mainstream school of Hindu philosophy (there are three) espouses that view. Although, Advaitins come very close to it with their four mahavakyas.

IMO, Integrated Information theory of consciousness (IIT) is exactly that. Everything is conscious, the difference is only in the degree to which they are conscious.


Oh, thank you very much enlightening me! All the time I misunderstood! I guess then IIT it is for me :-)

> Cool to see Claude doing decently though!

The scales do seem to be tipped in its favor (cf: my other comment in this thread).


Interesting benchmark.

I can't help but notice that they're benchmarking Opus 4.6 (Anthropic's latest and greatest model) against GPT-5.2 (which is three generations behind OpenAI's latest coding models: GPT-5.2-Codex, GPT-5.3-Codex and the latest GPT-5.4).


As far as I know, OpenAI did not release 5.3 Codex in their API. You can only use it with Codex CLI or app.

It's there, you just need to use it with the responses API. Set model field to 'gpt-5.3-codex'

5.2 and 5.2 Codex is arguably the same gen.

Sure, but one is fine-tuned for what they are testing and one is not.

The first half of this is already happening to a certain extent. I first noticed this in a submission[1] on Dimitris Papailiopoulos' Adderboard[2], which is a code-golf competition for training the smallest transformer that can add two 10-digit numbers. Most submissions on it are fully AI generated.

The report in the linked repo is Claude Code generated.

[1]: https://github.com/rezabyt/digit-addition-491p

[2]: https://github.com/anadim/AdderBoard


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: