More

jasonjmcghee · 2026-03-21T16:31:18 1774110678

0 improvements is unfair

longislandguido · 2026-03-21T16:38:06 1774111086

Maybe we can go back to a world where we decouple new under-the-hood features from complete UI fuckery.

You shouldn't have to cope with one to get the other.

amluto · 2026-03-21T16:45:13 1774111513

I wonder if it’s time to try this again. Last time I tried it, it was intensely buggy, not to mention almost every feature I wanted.

The concept is great, and I would love to ditch OrbStack for it. (OrbStack is slick. But their everything-shares-one-kernel-and-they-don’t-give-privileged-access model falls apart as soon as you try to do anything that doesn’t fit in their not-amazing sandbox. Even user namespaces don’t appear to work.) But, other than the actual core mostly working, Apple Containers was a buggy mess, and it was the only thing that made me frequently reboot the whole machine.

jasonjmcghee · 2026-03-21T17:03:07 1774112587

I certainly haven’t done anything outside of general happy path but i swapped out docker for it and it “just worked” - and i got more battery life

jasonjmcghee · 2026-03-20T13:19:56 1774012796

Didn't know this was a thing- thanks for posting.

Is there something similar that supports shaders? Like metal / wgsl / glsl or something?

Sounds like a fun project...

jasonjmcghee · 2026-03-20T12:38:17 1774010297

This isn't quite as good unfortunately- you can't accept / deny permissions prompts.

Maybe there should be a Claude code that facilitates others that is connected. Like sub agents but can "choose what to do" on permissions check.

Or some other means to listen for permissions check

jasonjmcghee · 2026-03-20T12:26:04 1774009564

I don't see this said anywhere - maybe I missed it. Why is it only one conversation at a time?

Couldn't you have multiple sessions using different plugins or whatever?

tomasz-tomczyk · 2026-03-20T13:47:41 1774014461

Well, it's one conversation per bot. I set it up, connected the channel. DM'd it (only way to converse with it - wish I could have a Discord channel per project, different CWDs etc...) and asked what happens when I start 2 claude sessions connected to the channel and it said it'll just work with one.

Suppose you could have multiple bots, but it looks like it only supports one bot token anyway.

jon-wood · 2026-03-20T16:51:56 1774025516

This is an API designed to build connectors, there's absolutely nothing preventing you from building one that connects to Discord and listens to a different channel for each instance.

jasonjmcghee · 2026-03-20T20:09:18 1774037358

I think they gave two sample implementations to demonstrate it.

I'm guessing their expecting the community to run with it

jasonjmcghee · 2026-03-17T00:56:28 1773708988

For the longest time the answer to this was that, features would randomly not be supported for C#.

But it's gotten much better.

jasonjmcghee · 2026-03-17T00:28:52 1773707332

Curious if pass@2 was tested for haiku and sonnet?

jasonjmcghee · 2026-03-17T00:14:01 1773706441

Based on community definitions I've seen, this is considered "open weights". If you can't reproduce the model, it's not "open source"

xpe · 2026-03-17T14:26:09 1773757569

Yes “open weights” conveys the reality more clearly: merely having the parameters is very different than able to run a process that creates them. Without openness of the full process start to finish, much is hidden.*

Remember, language is what we make it. Dictionaries are useful catalogs of usage but we make the judgment calls.

* Even with the process, much is not well understood! / The ethics of releasing an open weights model at some capability level is a separate discussion.

jasonjmcghee · 2026-03-16T21:57:34 1773698254

Curious if anyone else had the same reaction as me

This model is specifically trained on this task and significantly[1] underperforms opus.

Opus costs about 6x more.

Which seems... totally worth it based on the task at hand.

[1]: based on the total spread of tested models

beernet · 2026-03-16T22:32:19 1773700339

Agreed. The idea is nice and honorable. At the same time, if AI has been proving one thing, it's that quality usually reigns over control and trust (except for some sensitive sectors and applications). Of course it's less capital-intense, so makes sense for a comparably little EU startup to focus on that niche. Likely won't spin the top line needle much, though, for the reasons stated.

isodev · 2026-03-17T07:03:00 1773730980

> quality usually reigns over control and trust

Most Copilot customers use Copilot because Microsoft has been able to pinky promise some level of control for their sensitive data. That's why many don't get to use Claude or Codex or Mistral directly at work and instead are forced through their lobotomised Copilot flavours.

Remember, as of yet, companies haven't been able to actually measure the value of LLMs ... so it's all in the hands of Legal to choose which models you can use based on marketing and big words.

Eridrus · 2026-03-18T03:53:37 1773806017

This too will be solved. You can get tye frontier models from AWS/Google/Azure without needing to send your data to anyone else already.

hermanzegerman · 2026-03-16T23:05:47 1773702347

EU could help them very much if they would start enforcing the Laws, so that no US Company can process European data, due to the Americans not willing to budge on Cloud Act.

That would also help to reduce our dependency on American Hyperscalers, which is much needed given how untrustworthy the US is right now. (And also hostile towards Europe as their new security strategy lays out)

bcye · 2026-03-17T00:04:00 1773705840

This would be unfortunately a rather nuclear option due to the continent’s insane reliance on technology that breaks its unenforced laws.

Aerroon · 2026-03-17T04:58:47 1773723527

How about not making these unenforced laws in the first place so that European companies could actually have a chance at competing? We're going to suffer the externalities of AI either way, but at least there would be a chance that a European company could be relevant.

The AI Act absolutely befuddled me. How could you release relatively strict regulation for a technology that isn't really being used yet and is in the early stages of development? How did they not foresee this kneecapping AI investment and development in Europe? If I were a tinfoil hat wearer I'd probably say that this was intentional sabotage, because this was such an obvious consequence.

Mistral is great, but they haven't kept up with Qwen (at least with Mistral Small 4). Leanstral seems interesting, so we'll have to see how it does.

disgruntledphd2 · 2026-03-17T12:13:36 1773749616

Because the AI act was mostly written to address issues with ML products and services. It was mostly done before ChatGPT happened, so all the foundation model stuff got shoehorned in.

Speaking as someone who's been doing stats and ML for a while now, the AI act is pretty good. The compliance burden falls mostly on the companies big enough to handle it.

The foundation model parts are stupid though.

Aerroon · 2026-03-17T15:44:04 1773762244

>Because the AI act was mostly written to address issues with ML products and services. It was mostly done before ChatGPT happened, so all the foundation model stuff got shoehorned in.

It's not an excuse. Anybody with half a working brain should've been able to tell that this was going to happen. You can't regulate a field in its infancy and expect it to ever function.

>The compliance burden falls mostly on the companies big enough to handle it.

You mean it falls on anyone that tries to compete with a model. There's a random 10^25 FLOPS compute rule in there. The B300 does 2500-3750 TFLOPS at fp16. 200 of these can hit that compute number in 6 months, which means that in a few years time pretty much every model is going to hit that.

And if somebody figures out fp8 training then it would only take 10 of these GPUs to hit it in 6 months.

The copyright rule and having to disclose what was trained on also means that it will be impossible to have enough training data for an EU model. And this even applies to people that make the model free and open weights.

I don't see how it is possible for any European AI model to compete. Even if these restrictions were lifted it would still push away investors because of the increased risk of stupid regulation.

disgruntledphd2 · 2026-03-19T09:57:41 1773914261

> It's not an excuse. Anybody with half a working brain should've been able to tell that this was going to happen. You can't regulate a field in its infancy and expect it to ever function.

As I said, the core of the AI act was written about supervised ML, not generative ML, as generative ML wasn't as big a deal pre Chat GPT.

> You mean it falls on anyone that tries to compete with a model. There's a random 10^25 FLOPS compute rule in there. The B300 does 2500-3750 TFLOPS at fp16. 200 of these can hit that compute number in 6 months, which means that in a few years time pretty much every model is going to hit that.

As I also said, the foundation model stuff (including this flops thing) is incredibly stupid. I agree with you on this, but my point is that the core of the AI act was supposed to cover the ML systems built since approx 2010.

> The copyright rule and having to disclose what was trained on also means that it will be impossible to have enough training data for an EU model. And this even applies to people that make the model free and open weights.

Again, you're talking about generative stuff (makes sense given the absurdly misleading name now) whereas I'm talking about the original AI act, which I read well before ChatGPT happened.

The training data thing is a tradeoff, like copyright is far too invasive (IMO) and it's good to be able to use this information for other purposes. However, I personally would be super worried about an ML team that couldn't tell me what data went into their model. Like, the data is core to all ML/AI approaches so that lack of understanding would make me very sceptical of any performance claims.

Lets be real, the AI companies don't want to say what's in their models because of the rampant copyright infringement, not because of any technical incapability.

segmondy · 2026-03-17T00:23:02 1773706982

Ha, keep putting your prompts and workflows into cloud models. They are not okay with being a platform, they intend to cannibalize all businesses. Quality doesn't always reign over control and trust. Your data and original ideas are your edge and moat.

KetoManx64 · 2026-03-18T17:30:09 1773855009

The same old speech that has been used throughout history. When cars were invented people complained to everyone that Ford intended to cannbolize all horse drawn carriages. When manufacturing was invented it cannibalized the work of all the sewing and knitting companies that had women making one item at a time. When Google was invented it cannabolized libraries, and encyclopedias, etc. etc. Yet nobody wants a horse drawn carriage, nor to knit their own sweaters, nor go to the library to look things up in a physical encyclopedia.

hrmtst93837 · 2026-03-17T11:03:46 1773745426

Treating "quality" as something you can reliably measure in AI proof tools sounds nice until you try auditing model drift after the 14th update and realize the "trust" angle stops being a niche preference and starts looking like the whole product. Brand is not a proof. Plenty of orgs will trade peak output for auditability, even if the market is bigger for YOLO feature churn.

miohtama · 2026-03-16T22:44:59 1773701099

Alignment tax directly eats to model quality, double digit percents.

DarkNova6 · 2026-03-16T22:10:41 1773699041

I'm never sure how much faith one can put into such benchmarks but in any case the optics seem to shift once you have pass@2 and pass@3.

Still, the more interesting comparison would be against something such as Codex.

speedgoose · 2026-03-17T06:52:06 1773730326

But you can run this model for free on a common battery powered laptop sitting on your laps without cooking your legs.

hobofan · 2026-03-17T07:08:27 1773731307

Sorry, but what are you talking about? This is a 120B-A6B model, which isn't runnable on any laptop except the most beefed up Macbooks, and then will certainly drain its battery and cook your legs.

naasking · 2026-03-17T14:19:19 1773757159

You can easily run a quant of this on a DGX Spark though. Seems like a small investment if it meaningful improves Lean productivity.

jasonjmcghee · 2026-03-17T14:57:50 1773759470

Is it though?

Most people I know that use agents for building software and tried to switch to local development, every single time they switch back to Claude/codex.

It's just not worth it. The models are that much better and continue to get released / improve.

And it's much cheaper unless you're doing like 24/7 stuff.

Even on the $200/m plan, that's cheaper than buying a $3k dgx or $5k m4 max with enough ram.

Not to mention you can no longer use your laptop as a laptop as the power draw drains it - you'd need to host separately and connect

naasking · 2026-03-17T15:25:50 1773761150

A single DGX Spark can service a whole department of mathematicians (or programmers), and you can cluster up to 4 of them them to fit very large models like GLM-5 and quants of Kimi K2.5. This is nearing frontier-level model size.

I understand the value proposition of the frontier cloud models, but we're not as far off from self-hosting as you think, and it's becoming more viable for domain-specific models.

jasonjmcghee · 2026-03-17T18:49:50 1773773390

That's great news- I wonder if that will help drive cloud costs down too

speedgoose · 2026-03-17T07:35:09 1773732909

Yeah my bad, it requires an expensive MacBook.

I think it would still be fine for the legs and on battery for relatively short loads: https://www.notebookcheck.net/Apple-MacBook-Pro-M5-2025-revi...

But 40 degrees and 30W of heat is a bit more than comfortable if you run the agent continuously.

nimchimpsky · 2026-03-17T00:23:50 1773707030

the model is open source, you can run it locally. You don't think thats significant ?

jasonjmcghee · 2026-03-16T15:23:20 1773674600

Seems strange to immediately assume karpathy was the offender here

jasonjmcghee · 2026-03-16T00:42:57 1773621777

What's the structurally simplest architecture that has worked to a reasonably competitive degree?

loveparade · 2026-03-16T00:59:50 1773622790

Competitiveness doesn't really come from architecture, but from scale, data, and fine-tuning data. There has been little innovation in architecture over the last few years, and most innovations are for the purpose of making it more efficient to run training or inference (fit in more data), not "fundamentally smarter"

bigyabai · 2026-03-16T00:47:55 1773622075

If your definition of "competitive" is loose enough, you can write your own Markov chain in an evening. Transformer models rely on a lot of prior art that has to be learned incrementally.

jasonjmcghee · 2026-03-16T01:02:12 1773622932

Not that loose lol.

I’m thinking it’s still llama / dense decoder only transformer.