More

woah · 2026-03-16T21:13:44 1773695624

The current fad for "agent swarms" or "model teams" seems misguided, although it definitely makes for great paper fodder (especially if you combine it with distributed systems!) and gets the VCs hot.

An LLM running one query at a time can already generate a huge amount of text in a few hours, and drain your bank account too.

A "different agent" is just different context supplied in the query to the LLM. There is nothing more than that. Maybe some of them use a different model, but again, this is just a setting in OpenRouter or whatever.

Agent parallelism just doesn't seem necessary and makes everything harder. Not an expert though, tell me where I'm wrong.

Leynos · 2026-03-17T04:59:06 1773723546

An agent is a way of performing an action that will generate context or a useful side effect without having to worry about the intermediate context.

People already do this serially by having a model write a plan, clearing the context, then having the same or a cheaper model action the plan. Doing so discards the intermediate context.

Sub-agents just let you do this in parallel. This works best when you have a task that needs to be done multiple times that cannot be done deterministically. For example, applying the same helper class usage in multiple places across a codebase, finding something out about multiple parts of the codebase, or testing a hypothesis in multiple places across a codebase.

jjmarr · 2026-03-16T23:26:50 1773703610

> Agent parallelism just doesn't seem necessary and makes everything harder. Not an expert though, tell me where I'm wrong.

I use parallel agents for speed or when my single agent process loses focus due to too much context. I determine context problems by looking at the traces for complaints like "this is too complicated so I'll just do the first part" or "there are too many problems, I'll display the top 5".

If you're trying a "model swarm" to improve reliability beyond 95% or so, you need to start hoisting logic into Python scripts.

bee_rider · 2026-03-16T23:51:37 1773705097

Would it be possible to give that single agent process twice as much compute, instead? Or do production systems not scale that way?

rcxdude · 2026-03-17T01:51:29 1773712289

It's hard to speed up running a single prompt through a model because it's a sequential memory-bandwidth limited process (you roughly need to cycle through all of the weights in the model to get the next token, before starting again, but a GPU can do many more operations than a single weight application in between memory fetches). So it's a lot more efficient with current hardware to run multiple prompts in parallel on the same weights.

Also, the limiting factor on a single instance of an agent is generally how much of its context window gets filled up, as opposed to how much time it has to 'think'. Generally the model performance decreases at the context grows (more or less getting dumber the more it has to think about), so agent frameworks try to mitigate this by summarizing the work of one instance and passing it into another instance with a fresh context. This means if you have five tasks that are all going to fill up a model's addressable context, there's no real benefit to running them sequentially unless they naturally feed into each other.

jjmarr · 2026-03-17T04:34:33 1773722073

You can increase LLM inference throughput by using smaller batch sizes but that scales non-linearly in practice. It probably isn't worth it unless your model provider makes it really easy.

rando1234 · 2026-03-16T23:28:35 1773703715

Where we've had some success is with heterogeneous agents with some cheap quantised/local models performing certain tasks extremely cheaply that are then overseen or managed by a more expensive model.

timcobb · 2026-03-17T05:22:23 1773724943

I've played with this type of thing and I couldn't justify it vs just using a premium model, which seems more direct and error proof. Cheap models in my experience could really consume tokens and generate cost

woah · 2026-03-16T21:40:54 1773697254

Steelmanning the other side of this question:

LLMs mostly do useful work by writing stories about AI assistants who issue various commands and reply to a user's prompts. These do work, but they are fundamentally like a screenplay that the LLM is continuing.

An "agent" is a great abstraction since the LLM is used to continuing stories about characters going through narrative arcs. The type of work that would be assigned to a particular agent can also keep its context clean and distraction-free.

So parallelism could be useful even if everything is completely sequential to study how these separate characters and narrative arcs intersect in ways that are similar to real characters acting independently and simultaneously, which is what LLMs are good at writing about.

Seems like the important thing would be to avoid getting caught up on actual "wall time" parallelism

goldretriever · 2026-03-16T23:56:32 1773705392

I also really appreciate the point about using LLM teams for fault tolerance protocols in the future (in addition to improving efficiency). Since agents tend to hallucinate and fail unpredictably, then coordinating multiple of them to verify and come to a consensus etc could reduce those errors

htrp · 2026-03-16T23:43:29 1773704609

you have to own the inference layer

scotty79 · 2026-03-17T01:35:46 1773711346

> A "different agent" is just different context supplied in the query to the LLM. There is nothing more than that.

Yup, but context includes prompt which can strongly control LLM behavior. Sometimes the harness restricts some operations to help LLM stay in its lane. And starting with fresh context and clear description of a thing it should work on is great.

People get angry when their 200k or million token context gets filled. I can't ever understand why. Keeping such amount of info in the operational memory just can't work well, for any mind. Divide and conquer, not pile up all the crap till it overfills.

nateroling · 2026-03-16T21:23:28 1773696208

I tend to agree. After seeing http://chatjimmy.ai, I think multi-agent systems are mostly just solving for LLMs being slow currently.

conception · 2026-03-16T21:47:26 1773697646

This is like saying “multi-core cpus are just solving cpus being slow”. Which yes, exactly.

woah · 2026-03-11T00:12:15 1773187935

Build features faster. Granted, this exposes the difference between people who like to finish projects and people who like to get paid a lot of money for typing on a keyboard.

krater23 · 2026-03-11T04:33:40 1773203620

Bullshit! You project isn't finished as long as there are obvious major bugs that you can't fix because you don't unterstand the code.

woah · 2026-03-12T03:29:23 1773286163

Why does understanding computer science principles and software architecture and instructing a person or an ai on how to fix them require typing every line yourself?

woah · 2026-03-08T20:54:45 1773003285

Reading through this i feel like i'm on substack

woah · 2026-03-06T03:59:14 1772769554

No idea what these guys do exactly but their tagline says "Feldera's award-winning incremental compute engine runs SQL pipelines of any complexity"

So it sounds like helping customers with databases full of red flags is their bread and butter

gz09 · 2026-03-06T04:44:21 1772772261

> it sounds like helping customers with databases full of red flags is their bread and butter

Yes that captures it well. Feldera is an incremental query engine. Loosely speaking: it computes answers to any of your SQL queries by doing work proportional to the incoming changes for your data (rather than the entire state of your database tables).

If you have queries that take hours to compute in a traditional database like Spark/PostgreSQL/Snowflake (because of their complexity, or data size) and you want to always have the most up-to-date answer for your queries, feldera will give you that answer 'instantly' whenever your data changes (after you've back-filled your existing dataset into it).

There is some more information about how it works under the hood here: https://docs.feldera.com/literature/papers

woah · 2026-02-28T20:48:36 1772311716

Seems pretty unimportant and inconsequential though because LLMs don't work anyway because they aren't logic-based symbolic AI, right?

mentalgear · 2026-02-28T21:51:30 1772315490

I know you trying to mock Marcus, but the reality is that all the big LLM providers have been shifting to integrating symbolic reasoning into their models for over a year now since they noticed that scale-alone is a dead-end. Also DeepMind's AlphaFold, which won the nobile price, is neuro-symbolic AI - so I think both of those points very much justify Marcus's long criticism of pure subsymbolic LLM "AI" as a path to real causal reasoning.

woah · 2026-02-27T19:11:08 1772219468

And you can't escape. Facebook is less of a concern because you can just not go to the website and you're good. The US Postal Service is the basis of an entire huge industry devoted to finding you at your physical location to try to scam you.

woah · 2026-02-26T19:46:44 1772135204

> You searched for people who do what you need to have done, found me, looked at what I've worked on and determined I'd be a good fit and you reached out? That's the number one way to get me to want to work for you.

No, their email templating tool finds an old throwaway repo you did 6 years ago, templates its name into a form email, and invites you to join a cattle call to be whiteboarded along with the rest of the shmucks

woah · 2026-02-26T19:43:33 1772135013

I just got an incredible idea about how foundation model providers can reach profitability

rishabhaiover · 2026-02-26T20:41:10 1772138470

I'm already seeing a degradation in experience in Gemini's response since they've started stuffing YouTube recommendations at the end of the response. Anthropic is right in not adding these subtle(or not) monetization incentives.

Gigachad · 2026-02-27T00:36:20 1772152580

I mean, that’s almost just fair. They ripped the answer from a YouTube video, but at least link you back to the source now.

rishabhaiover · 2026-02-26T19:46:34 1772135194

is it anything like the OpenAI ad model but for tool choice haha

glimshe · 2026-02-26T20:20:55 1772137255

Claude Free suggests Visual Studio.

Claude Plus suggests VSCode.

Claude Pro suggests emacs.

c0balt · 2026-02-26T21:11:26 1772140286

> ~~Claude Pro suggests emacs.~~

Claude Pro asks you about your preferences and needs instead of pushing an opinionated solution?

wafflemaker · 2026-02-26T20:41:31 1772138491

I'm not quite sure if you're making fun of emacs or actually praising it.

esafak · 2026-02-26T21:39:00 1772141940

Stallman paying for advertising, now that is good one :)

selridge · 2026-02-26T21:49:27 1772142567

Copilot suggests leftpad

Leynos · 2026-02-26T20:46:37 1772138797

I'd thought about model providers taking payment to include a language or toolkit in the training set.

KaoruAoiShiho · 2026-02-27T01:12:00 1772154720

Buy more GPUs.

ting0 · 2026-02-26T20:24:21 1772137461

Hence the claw partnership.

woah · 2026-02-25T19:16:16 1772046976

This very forum was founded by a VC who had great success recruiting 22 year olds with fancy diplomas to automate away the job of the guy who copied the numbers from the TPS report pdf attachment into excel.

I didn't see people on here ranting and taking up the flag of revolution for the TPS report excel paster guy's job that they were automating away with their web2 SaaS startup.

But wait- that guy himself was automating away the job of the lady who used to physically Xerox the TPS report and put it in the filing cabinet down the hall, but that lady was automating the job of the secretary who used to re-type all those TPS reports.

It's automatic filing cabinets all the way down, and ranting because your little slice of the filing cabinet automation machine has been made redundant is a bit silly.

woah · 2026-02-23T18:55:32 1771872932

This looks like those rough cardboard inserts. Is it actually any better? Especially since they can use the lowest grade of recycled cardboard.