Hacker Newsnew | past | comments | ask | show | jobs | submit | catgary's commentslogin

And even then - I still read the code it generates, and if I see a better way of doing something I just step in, write a partial solution, and then sketch out how the complete solution should work.

Unless the solution is going to be more secure, faster, more stable etc, why does it matter?

Will the end user care? “Does it make the beer taste better”?


in a word, maintainability

> maintainability is inversely proportional to the amount of time it takes a developer to make a change and the risk that change will break something

https://softwareengineering.stackexchange.com/a/134863

i could be wrong, but i'm pretty sure that end-users get upset when a change takes a long time or it ends up breaking something for them.

just because people are finding that agents or whatever are speeding changes up now doesn't necessarily mean they won't encounter a slow-down later when the codebase becomes an un-maintainable mess. technical debt is always a thing, even with machines doing the work (the agent/machine still has to parse a codebase to make changes).


What makes you think that AI couldn’t make the same changes without breaking it whether you modify the code or not? And you do have automated unit tests don’t you?

Right now I have a 5000 line monolithic vibe coded internal website that is at most going to be used by 3 people. It mixes Python, inline CSS and Javasript with the API. I haven’t looked at a line of code. My IAM permissions for the Lambda runtime has limited permissions (meaning the code can’t do anything that the permissions won’t allow it to). I used AWS Cognito for authorization and validated the security of the endpoints and I validated the permissions of the database user.

Neither Claude nor Codex have any issues adding pages, features and API endpoints without breaking changes.

By definition, coding agents are the worse they will be right now.


i have a rule of thumb based on past experience. circa 10k per developer involved, reducing as the codebase size increases.

> 5000 line

so that's currently half a developer according to my rule of thumb.

what happens when that gets to 20,000 lines...? that's over the line in my experience for a human who was the person who wrote it. it takes longer to make changes. change that are made increasingly go out in a more and more broken state. etc. etc. more and more tests have to be written for each change to try and stop it going out in a broken state. more work needs to be done for a feature with equal complexity compared to when we started, because now the rest of the codebase is what adds complexity to us making changes. etc. etc. and that gets worse the more we add.

these agent things have a tendency and propensity to add more code, rather than adding the most maintainable code. it's why people have to review and edit the majority of generated code features beyond CRUD webapp functionality (or similar boilerplate). so, given time and more features, 5k --> 10k --> 20k --> ... too much for a single human being if the agent tools are no longer available.

so let's take it to a bit of a hyperbolic conclusion ... what about agents and a 5,000,000 line codebase...? do you think these agents will take the same amount of time to make a change in a codebase of that size versus 5,000 lines? how much more expensive do you think it could get to run the agents at that size? how about increases in error rate when making changes? how many extra tests need to be added for each feature to ensure zero breakage?

do you see my point?

(fyi: the 5 million LoC is a thought experiment to get you to critically think about the problem technical debt related to agents as codebase size increases, i'm not saying your website's code will get that big)

(also, sorry i basically wrote most of this over the 20 minutes or so since i first posted... my adhd is killing me today)


20K lines of code is well within the context window of any modern LLM. But just like no person tries to understand everything and keep the entire context in their brain, neither do modern LLMs.

Also documentation in the form of MD files becomes important to explain the why and the methodology.


Generally speaking, I try to ensure that the LLM is using core abstractions throughout the codebase in a consistent manner. This makes it easier for me to review any changes it makes.

Sort of a devils advocate question. If you write and review your tests and the functional and non functional requirements and the human tests for usability pass, why does the code matter?

Non functional requirements: performance, security, reliability, logging etc?


Because the code is the actual thing, tests can only show that the code fails in certain cases, they don’t actually prove the code is correct.

If you are writing the correct tests that mirror the requirements, why wouldn’t passing tests mean the code is correct?

This line of thought is honestly a bit silly - uv is just a package manager that actually does its job for resolving dependencies. You’re talking about a completely orthogonal problem.

> uv is just a package manager that actually does its job for resolving dependencies.

Pip resolves dependencies just fine. It just also lets you try to build the environment incrementally (which is actually useful, especially for people who aren't "developers" on a "project"), and is slow (for a lot of reasons).


uv is really only something you need if you already aren't managing dependencies responsibly, imo.

I think there are 5-7 thousand confirmed deaths by the UN, and medical reports in Iran estimated there could be 20,000+ casualties.

7 thousand confirmed death, 9 thousand unconfirmed death. Among that 1200 confirmed death from the regime forces, and 400 to be confirmed bystanders. The nurse burned to death by protesters is among those 400.

I don't know enough to dispute, but could you link such a report

I see the value of the students, it just seems like an odd thing for a government to subsidize via NIH/NSF funding. We don’t really have anything analogous to that in Canada and it just seems awfully weird that it exists in the US without the “it’s older than the country” excuse that Oxford/Cambridge have.


How is any of this subsidized by NIH/NSF funding? Those grants are only spent on the cost of research, either direct or indirect.

Also, a number of the schools we're discussing are older than the US itself; Harvard predates it by almost 150 years.


EVs should do much better on brake dust thanks to regenerative braking, no?


But heavier so worse on the tires.

It isn’t intuitive that they’d be better off, and they might be worse on this particular dimension.


Yes current EVs are heavy. It's not at all clear that this will prevail as solid state batteries evolve to become standard. It is highly possible that EVs will soon be lighter than comparable ICE vehicles [1]

[1] https://news.ycombinator.com/item?id=46505975


No no no. Sure, there might be a future where solid state batteries become the standard for electric vehicles, but you cannot link to Donut Lab's announcement from this month. There is no credible evidence they've achieved the holy grail of batteries so far until they actually deliver these motorcycles in hand and people independently verify them.


Time will tell on their battery, especially if the bike they're putting it on delivers. I think the overall point could be that there's active R&D in trying to find geopolitically sustainable materials, and lowering the weight of materials used.


Because text analysis is substantially easier than video analysis?


Amazon has the Fallout scripts, subtitles, internal show bibles, etc. all available to them.


Are you implying that an LLM needs to be trained on a specific piece of text to answer questions about it?


If you want proper answers, yes. If you want to rely on whatever reddit or tiktok says about the book, then I guess at that point you're fine with hallucinations and others doing the thinking for you anyway. Hence the issues brought up in the article.

I wouldn't trust an LLM for anything more than the most basic questions of it didn't actually have text to cite.


Luckily, the LLM has the text to cite, it can be passed in at inference time, which is legally distinct from training on the data.


Having access to the text and being trained on the text are two different things.


You don’t need any rights to execute the feature. The user owns the book. The app lets the user feed the book into an LLM, as is absolutely their right, and asks questions.


1. The user doesn't own the book, the user has a revocable license to the book. Amazon has no qualms about taking away books that people have bought

2. I doubt the Kindle version of the LLM will run locally. Is Amazon repurposing the author-provided files, or will the users' device upload the text of the book?


I am so confused by some of the comments in this thread. All these weird mental gymnastics to argue that users should have less rights.

“Oh, you think you should be able to use an LLM with a book you paid for? Well you don’t own and book.”

Ok, and you like that? You want even less ownership? Less control?


I don't agree with the way you're interpreting the comment. If anything I think it's BAD that you don't really "own" digital content.

I guess my argument is that Amazon shouldn't be able to have their cake and eat it too


You agree that we should own our digital content but it sounds like you don’t want this particular capability because… fuck Amazon.

I can totally understand that sentiment but I don’t think giving up end user capabilities to spite Amazon is logically aligned with wanting ownership of digital media.


> All these weird mental gymnastics to argue that users should have less rights

We probably agree more than not. But users getting more rights isn’t universally good. To finish an argument, one must consider the externalities involved.


>The app lets the user feed the book into an LLM, as is absolutely their right,

I don't think that's cut and clear yet. Throwing media onto someone else's server may count as distribution.


How likely do you think it is that Amazon doesn’t have a pre-existing contract with these publishers to host these books on Amazon servers?


Sure, in the sense that any belief about the law isn’t cut and dried until a judge has explicitly dismissed it in the court of law.


I work on a much easier problem (physics-based character animation) after spending a few years in motion planning, and I haven’t really seen anything to suggest that the problem is going to be solved any time soon by collecting more data.


https://danijar.com/project/dreamer4/

"We present Dreamer 4, a scalable agent that learns to solve control tasks by imagination training inside of a fast and accurate world model. ... By training inside of its world model, Dreamer 4 is the first agent to obtain diamonds in Minecraft purely from offline data, aligning it with applications such as robotics where online interaction is often impractical."

In other words, it learns by watching, e.g. by having more data of a certain type.


Is Physics-based character animation an easier problem?

Almost any problem can be really hard depending on the amount of 9s.

Maybe there's more room for error in a lot of robotics applications than for your physics-based character animation?


I am pushing the optimism a bit of course, but currently we can see many demos of robots doing basic tasks, and it seems like it is quite easy nowadays to do this with the data driven approach.


Why? Physics of large discrete objects (such as a robot) isn't very complicated.

I thought it's fast accurate OCR that's holding everything back.


The problem becomes complicated once the large discrete objects are not actuated. Even worse if the large discrete objects are not consistently observable because of occlusions or other sensor limitations. And almost impossible if the large discrete objects are actuated by other agents with potentially adversarial goals.

Self driving cars, an application in which physics is simple and arguably two dimensional, have taken more than a decade to get to a deployable solution.


I just grabbed a beer about ten minutes ago.

Next to zero cognition was involved in the process. There's some kind of hierarchy of thought in the way my mind/brain/body processed the task. I did cognitively decide to get the beer, but I was focused on something at work and continued to think about that in great detail as the rest of me did all of the motion planning and articulation required to get up, walk through two doorways, open the door on the fridge, grab a beer, close the door, walk back and crack the beer as I was sitting down.

Basically zero thought in that entire sequence.

I think what's happening today with all of this stuff is ultimately like me trying to play Fur Elise on piano. I don't have a piano. I don't know how to play one. I'm going to be all brain in that entire process and it's going to be awful.

We need to learn how to use the data we have to train these layers of abstraction that allow us to effectively compress tons of sophistication into 'get a beer'.


I think this is an interesting direction, but I think that step 2 of this would be to formulate some conjectures about the geometry of other LLMs, or testable hypotheses about how information flows wrt character counting. Even checking some intermediate training weights of Haiku would be interesting, so they’d still be working off of the same architecture.

The biology metaphor they make is interesting, because I think a biologist would be the first to tell you that you need more than one datapoint.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: