And even then - I still read the code it generates, and if I see a better way of doing something I just step in, write a partial solution, and then sketch out how the complete solution should work.
i could be wrong, but i'm pretty sure that end-users get upset when a change takes a long time or it ends up breaking something for them.
just because people are finding that agents or whatever are speeding changes up now doesn't necessarily mean they won't encounter a slow-down later when the codebase becomes an un-maintainable mess. technical debt is always a thing, even with machines doing the work (the agent/machine still has to parse a codebase to make changes).
What makes you think that AI couldn’t make the same changes without breaking it whether you modify the code or not? And you do have automated unit tests don’t you?
Right now I have a 5000 line monolithic vibe coded internal website that is at most going to be used by 3 people. It mixes Python, inline CSS and Javasript with the API. I haven’t looked at a line of code. My IAM permissions for the Lambda runtime has limited permissions (meaning the code can’t do anything that the permissions won’t allow it to). I used AWS Cognito for authorization and validated the security of the endpoints and I validated the permissions of the database user.
Neither Claude nor Codex have any issues adding pages, features and API endpoints without breaking changes.
By definition, coding agents are the worse they will be right now.
i have a rule of thumb based on past experience. circa 10k per developer involved, reducing as the codebase size increases.
> 5000 line
so that's currently half a developer according to my rule of thumb.
what happens when that gets to 20,000 lines...? that's over the line in my experience for a human who was the person who wrote it. it takes longer to make changes. change that are made increasingly go out in a more and more broken state. etc. etc. more and more tests have to be written for each change to try and stop it going out in a broken state. more work needs to be done for a feature with equal complexity compared to when we started, because now the rest of the codebase is what adds complexity to us making changes. etc. etc. and that gets worse the more we add.
these agent things have a tendency and propensity to add more code, rather than adding the most maintainable code. it's why people have to review and edit the majority of generated code features beyond CRUD webapp functionality (or similar boilerplate). so, given time and more features, 5k --> 10k --> 20k --> ... too much for a single human being if the agent tools are no longer available.
so let's take it to a bit of a hyperbolic conclusion ... what about agents and a 5,000,000 line codebase...? do you think these agents will take the same amount of time to make a change in a codebase of that size versus 5,000 lines? how much more expensive do you think it could get to run the agents at that size? how about increases in error rate when making changes? how many extra tests need to be added for each feature to ensure zero breakage?
do you see my point?
(fyi: the 5 million LoC is a thought experiment to get you to critically think about the problem technical debt related to agents as codebase size increases, i'm not saying your website's code will get that big)
(also, sorry i basically wrote most of this over the 20 minutes or so since i first posted... my adhd is killing me today)
20K lines of code is well within the context window of any modern LLM. But just like no person tries to understand everything and keep the entire context in their brain, neither do modern LLMs.
Also documentation in the form of MD files becomes important to explain the why and the methodology.
Generally speaking, I try to ensure that the LLM is using core abstractions throughout the codebase in a consistent manner. This makes it easier for me to review any changes it makes.
Sort of a devils advocate question. If you write and review your tests and the functional and non functional requirements
and the human tests for usability pass, why does the code matter?
Non functional requirements: performance, security, reliability, logging etc?
This line of thought is honestly a bit silly - uv is just a package manager that actually does its job for resolving dependencies. You’re talking about a completely orthogonal problem.
> uv is just a package manager that actually does its job for resolving dependencies.
Pip resolves dependencies just fine. It just also lets you try to build the environment incrementally (which is actually useful, especially for people who aren't "developers" on a "project"), and is slow (for a lot of reasons).
7 thousand confirmed death, 9 thousand unconfirmed death. Among that 1200 confirmed death from the regime forces, and 400 to be confirmed bystanders. The nurse burned to death by protesters is among those 400.
I see the value of the students, it just seems like an odd thing for a government to subsidize via NIH/NSF funding. We don’t really have anything analogous to that in Canada and it just seems awfully weird that it exists in the US without the “it’s older than the country” excuse that Oxford/Cambridge have.
Yes current EVs are heavy. It's not at all clear that this will prevail as solid state batteries evolve to become standard. It is highly possible that EVs will soon be lighter than comparable ICE vehicles [1]
No no no. Sure, there might be a future where solid state batteries become the standard for electric vehicles, but you cannot link to Donut Lab's announcement from this month. There is no credible evidence they've achieved the holy grail of batteries so far until they actually deliver these motorcycles in hand and people independently verify them.
Time will tell on their battery, especially if the bike they're putting it on delivers. I think the overall point could be that there's active R&D in trying to find geopolitically sustainable materials, and lowering the weight of materials used.
If you want proper answers, yes. If you want to rely on whatever reddit or tiktok says about the book, then I guess at that point you're fine with hallucinations and others doing the thinking for you anyway. Hence the issues brought up in the article.
I wouldn't trust an LLM for anything more than the most basic questions of it didn't actually have text to cite.
You don’t need any rights to execute the feature. The user owns the book. The app lets the user feed the book into an LLM, as is absolutely their right, and asks questions.
1. The user doesn't own the book, the user has a revocable license to the book. Amazon has no qualms about taking away books that people have bought
2. I doubt the Kindle version of the LLM will run locally. Is Amazon repurposing the author-provided files, or will the users' device upload the text of the book?
You agree that we should own our digital content but it sounds like you don’t want this particular capability because… fuck Amazon.
I can totally understand that sentiment but I don’t think giving up end user capabilities to spite Amazon is logically aligned with wanting ownership of digital media.
> All these weird mental gymnastics to argue that users should have less rights
We probably agree more than not. But users getting more rights isn’t universally good. To finish an argument, one must consider the externalities involved.
I work on a much easier problem (physics-based character animation) after spending a few years in motion planning, and I haven’t really seen anything to suggest that the problem is going to be solved any time soon by collecting more data.
"We present Dreamer 4, a scalable agent that learns to solve control tasks by imagination training inside of a fast and accurate world model. ... By training inside of its world model, Dreamer 4 is the first agent to obtain diamonds in Minecraft purely from offline data, aligning it with applications such as robotics where online interaction is often impractical."
In other words, it learns by watching, e.g. by having more data of a certain type.
I am pushing the optimism a bit of course, but currently we can see many demos of robots doing basic tasks, and it seems like it is quite easy nowadays to do this with the data driven approach.
The problem becomes complicated once the large discrete objects are not actuated. Even worse if the large discrete objects are not consistently observable because of occlusions or other sensor limitations. And almost impossible if the large discrete objects are actuated by other agents with potentially adversarial goals.
Self driving cars, an application in which physics is simple and arguably two dimensional, have taken more than a decade to get to a deployable solution.
Next to zero cognition was involved in the process. There's some kind of hierarchy of thought in the way my mind/brain/body processed the task. I did cognitively decide to get the beer, but I was focused on something at work and continued to think about that in great detail as the rest of me did all of the motion planning and articulation required to get up, walk through two doorways, open the door on the fridge, grab a beer, close the door, walk back and crack the beer as I was sitting down.
Basically zero thought in that entire sequence.
I think what's happening today with all of this stuff is ultimately like me trying to play Fur Elise on piano. I don't have a piano. I don't know how to play one. I'm going to be all brain in that entire process and it's going to be awful.
We need to learn how to use the data we have to train these layers of abstraction that allow us to effectively compress tons of sophistication into 'get a beer'.
I think this is an interesting direction, but I think that step 2 of this would be to formulate some conjectures about the geometry of other LLMs, or testable hypotheses about how information flows wrt character counting. Even checking some intermediate training weights of Haiku would be interesting, so they’d still be working off of the same architecture.
The biology metaphor they make is interesting, because I think a biologist would be the first to tell you that you need more than one datapoint.
reply