Author here: it's not even clear that agents can reliably permute their training data (I'm not saying that it's impossible or never happens but that it's not something we can take for granted as a reliable feature of agentic coding).
As I mentioned in one of the footnotes in the post:
> People often tell me "you would get better results if you generated code in a more mainstream language rather than Haskell" to which I reply: if the agent has difficulty generating Haskell code then that suggests agents aren't capable of reliably generalizing beyond their training data.
If an agent can't consistently apply concepts learned in one language to generate code in another language, then that calls into question how good they are at reliably permuting the training dataset in the way you just suggested.
Your argument is far too dependent on observations made about the model's ability with Haskell, which is irrelevant. The concepts in Haskell are totally different to almost any other language - you can't easily "generalize" from an imperative strict language like basically everything people really use to a lazy pure FP language that uses monads for IO like Haskell. The underlying concepts themselves are different and Haskell has never been mainstream enough for models to get good at it.
Pick a good model, let it choose its own tools and then re-evaluate.
Why is Haskell irrelevant to the argument that LLMs can't reliably permute programming knowledge from one language to another? In fact, the purity of the language and dearth of training data seems like the perfect test case to see whether concepts found in more mainstream languages are actually understood.
Because human programmers routinely fail to do that too. Haskell is an obscure language that came out of academic research. Several of the core semantics (like lazyness by default) didn't get adopted anywhere else and are only found in Haskell.
> if the agent has difficulty generating Haskell code then that suggests agents aren't capable of reliably generalizing beyond their training data.
doesn't that apply to flesh-and-bone developers? ask someone who's only working in python to implement their current project in haskell and I'm not so sure you'll get very satisfying results.
> doesn't that apply to flesh-and-bone developers?
No, it does not. If you have a developer that knows C++, Java, Haskell, etc. and you ask that developer to re-implement something from one language to another the result will be good. That is because a developer knows how to generalize from one language (e.g. C++) and then write something concrete in the other (e.g. Haskell).
One language in the same category to another in the same category, yes. "Category" here being something roughly like "scripting, compiled imperative, functional". However my experience is that if you want to translate to another category and the target developer has no experience in it, you can expect very bad results. C++ to Haskell is among the most pessimal such translations. You end up with the writing X in Y problem.
Your argument fails where it equates someone who only codes in one language to an LLM who is usually trained in many languages.
In my experience, a software engineer knows how to program and has experience in multiple languages. Someone with that level of experience tends to pick up new languages very quickly because they can apply the same abstract concepts and algorithms.
If an LLM that has a similar (or broader) data set of languages cannot generalise to an unknown language, then it stands to reason that it is indeed only capable of reproducing what’s already in its training data.
The hard bit of programming has never been knowing the symbols to tell the computer what to do. It is more difficult to use a completely unknown language, sure, but the paradigms and problem solving approaches are identical and thats the actual work, not writing the correct words.
Saying that the paradigms of Python and Haskell are the same makes it sound like you don’t know either or both of those languages. They are not just syntactically different. The paradigms literally are different. Python is a high level duck typed oo scripting language and Haskell is a non-oo strongly typed functional programming language. They’re extremely far apart.
They are different, but on some fundamental level when you're writing code you're expressing an idea and it is still the same. The same way photograph and drawing of a cat are obviously different and they're made in vastly different ways, but still they're representations of a cat. It's all lambda calculus, turing machines, ram machines, combinator logic, posts' dominoes etc., etc. in the end.
They’re both turing complete, which make them equivalent.
Code is a description of a solution, which can be executed by a computer. You have the inputs and the outputs (we usually split the former into arguments amd environment, and we split the latter into side effects and return values). Python and Haskell are just different towers of abstraction built on the same computation land. The code may not be the same, but the relation between inputs and outputs does not change if they solve the same problem.
Other people have replied but to clarify my point, while two languages may operate with a focus on two seperate paradigms, the actual paradigms do not vary. OOP is still OOP whatever language you use, same for functional et al. Sure some languages are geared towards being used in certain ways, some very much so, but if you know the actual ways the language is largely irrelevant.
> The paradigms literally are different. […] They’re extremely far apart.
And yet, you can write pure-functional thunked streams in Python (and have the static type-checker enforce strong type checking), and high-level duck-typed OO with runtime polymorphism in Haskell.
The hardest part is getting a proper sum type going in Python, but ducktyping comes to the rescue. You can write `MyType = ConstructA | ConstructB | ConstructC` where each ConstructX type has a field like `discriminant: Literal[MyTypeDiscrim.A]`, but that's messy. (Technically, you can use the type itself as a discriminant, but that means you have to worry about subclasses; you can fix that by introducing an invariance constraint, or by forbidding subclasses….) It shouldn't be too hard to write a library to deal with this nicely, but I haven't found one. (https://pypi.org/project/pydantic-discriminated/ isn't quite the same thing, and its comparison table claims that everything else is worse.)
I am very sceptical mainstream languages will be better. I have seen plenty of bad Python from LLMs. Even with simple CRUD apps and when provided with detailed instructions.
But what's the point of re-building "standard software" if it is so standard that it already exists 100 times in the training data with slight variations?
I read this attitude very often on HN. "If someone else has already built it before, your effort is a waste of time." To me, it has this "Someone else already makes money from it, go somewhere else where you dont have competition." Well, I get the drift... But... Not everyone is into getting rich. You know, some of us just have fun building things and learning while doing so. It really doesn't matter if the path has been walked before. Not everything has to be plain novelty to count.
If you dont know the answer to the question you just raised, having a discussion is pretty much useless, because it is so obvious. Just because I have fun building thinks doesn't mean I don't care about the tools to do so?
I'd say that's pretty much the definition of standard, yeah. And it's why you can't make a profit selling a simple ToDo app. If you expect people to pay for what you build, you have to build something that doesn't have a thousand free clones on the app store.
That isn’t saying much. Every software is a permutation of zeros and ones. The novelty or ingenuity, or just quality and fitness for purpose, can lie in the permutation you come up with. And an LLM is limited by its training in the permutations it is likely to come up with, unless you give it heaps of specific guidance on what to do.
An email client is highly nontrivial, due to the complexities of the underlying standards, and how the real implementations you have to be compatible with don’t strictly follow them. Making an email client that doesn’t suck and is fully interoperable is quite an ambitious endeavor.
The point was to answer the question: "Can every piece of software be viewed as a permutation of software that has already been developed?"
In my opinion, an email client is a more favorable example than a 3D engine. In fields where it is necessary to differentiate, improve, or innovate at the algorithmic level, where research and development play a fundamental role, it is not simply a matter of permuting software or leveraging existing software components by simply assembling them more effectively.
Actually, in the specific case of a 3D program it's the current generation of LLM's complete lack of ability in spatial reasoning that prevents them from "understanding" what you want when you ask it to e.g. "make a camera that flies in the direction you are looking at".
It necessarily has to derive it from examples of cameras that fly forward that it knows about, without understanding the exact mathematical underpinnings that allow you to rotate a 3D perspective camera and move along its local coordinate system, let alone knowing how to verify whether its implementation functions as desired, often resulting in dysfunctional garbage. Even with a human in the loop that provides it with feedback and grounds it (I tried), it can't figure this out, and that's just a tiny example.
Math is precise, and an LLM's fuzzy approach is therefore a bad fit for it. It will need an obscene amount of examples to reliably "parrot" mathematical constructs.
> "make a camera that flies in the direction you are looking at"
That's not the task of a renderer though, but its client, so you're talking past your parent comment. And given that I've seen peers one-shot tiny Unity prototypes with agents, I don't really believe they're that bad at taking an educated guess at such a simple prompt, as much as I wish it were true.
You're right. My point was more that LLMs are bad at (3D) math and spatial reasoning, which applies to renderers. Since Unity neatly abstracts the complexity away of this through an API that corresponds well to spoken language, and is quite popular, that same example and similar prototypes should have a higher success rate.
I guess the less detailed a spec has to be thanks to the tooling, the more likely it is that the LLM will come up with something usable. But it's unclear to me whether that is because of more examples existing due to higher user adoption, or because of fewer decisions/predictions having to be made by the LLM. Maybe it is a bit of both.
What complexities specifically? Implementing SMTP (from the client’s perspective) that other SMTP servers can understand is not very hard. I have done it. Does it follow every nuance of the standard? I don’t know, but it works for me. I haven’t implemented IMAP but I don’t see why it should be much harder. Is there a particular example you have in mind?
I would be surprised if there are more working email clients out there than working 3D engines. The gaming market is huge, most people do not pay to use email, hobbyists love creating game engines.
Idk, a working basic email client is just not that hard to write though. SMTP and IMAP are simple protocols and the required graphical interface is a very straightforward combination of standard widgets.
I don't mean to be contrarian, but this is completely false.
IMAP _seems_ to be a straightforward (but nasty and stateful) protocol, until you find out that every major provider ignores RFCs and does things slightly differently.
There's this weird schizophrenia around this in the software world.
In the past I have had people here suggest they're just writing boilerplate CRUD software, and I've suggested that means they could just use low code tools instead. They then suggest it's too complex for that to work.
I think we tend to view ourselves as just hooking together basic operations, which might be technically true, but that becomes complex very quickly. A product can be built off of straight forward REST and database operations, but take you months of learning to get up to speed on.
Most software written today (or 10 years ago, or 50 years ago) is not particularly unique. And even in that software that is unusual you usually find a lot of run-of-the-mill code for the more mundane aspects
I don’t think this is true. I’ve been doing this since the 1980s and while you might think code is fairly generic, most people aren’t shipping apps they’re working on quiet little departmental systems, or trying to patch ancient banking systems and getting a greenfield gig is pretty rare in my experience.
So for me the code is mundane but it’s always unique and rarely do you come across the same problems at different organisations.
If you ever got a spec good enough to be the code, I’m sure Claude or whatever could absolutely ace it, but the spec is never good enough. You never get the context of where your code will run, who will deploy it or what the rollback plan is if it fails.
The code isn’t the problem and never was. The problem is the environment where your code is going.
The proof is bit rot. Your code might have been right 5 years ago but isn’t any more because the world shifted around it.
I am using Claude pretty heavily but there are some problems it is awful at, e.g I had a crusty old classic ASP website to resuscitate this week and it would not start. Claude suggested all the things I half remembered from back in the day but the real reason was Microsoft disabled vbscript in windows 11 24H2 but that wasn’t even on its radar.
I have to remind myself that it’s a fancy xerox machine because it does a damn good job of pretending otherwise.
Most of the economically valuable software written is pretty unique, or at least is one of few competitors in a new and growing niche. This is because software that is not particularly unique is by definition a commodity, with few differentiators. Commodity software gets its margins competed away, because if you try to price high, everybody just uses a competitor.
So goes the AI paradox: it's really effective at writing lots and lots of software that is low value and probably never needed to get written anyway. But at least right now (this is changing rapidly), executives are very willing to hire lots of coders to write software that is low value and probably doesn't need to be written, and VCs are willing to fund lots of startups to automate the writing of lots of software that is low value and probably doesn't need to be written.
Could you give some examples? I can only imagine completely proprietary technology like trading or developing medicine. I have worked in software for many years and was always paid well for it. None of it was particularly unique in any way. Some of it better than others, but if you could show that there exists software people pay well for that AI cannot make I would be really impressed. With my limited view as software engineer it seems to me that the data in the product / its users is what makes it valuable. For example Google Maps, Twitter, AirBnB or HN.
Were you around when any of Google Maps, Twitter, AirBnB, or HN were first released? Aside from AirBnB (whose primary innovation was the business model, and hitting the market right during the global financial crisis when lots of families needed extra cash), they were each architecturally quite different from software that had come before.
Before Google Maps nobody had ever pushed a pure-Javascript AJAX app quite so far; it came out just as AJAX was coined, when user expectations were that any major update to the page required a full page refresh. Indeed, that's exactly what competitor MapQuest did: you had to click the buttons on the compass rose to move the map, it moved one step at a time, and it fully reloaded the page with each move. Google Maps's approach, where you could just drag the map and it loaded the new tiles in the background offscreen, then positioned and cropped everything with Javascript, was revolutionary. Then add that it gained full satellite imagery soon after launch, which people didn't know existed in a consumer app.
Twitter's big innovation was the integration of SMS and a webapp. It was the first microblog, where the idea was that you could post to your publicly-available timeline just by sending an SMS message. This was in the days before Twilio, where there was no easy API for sending these, you had to interface with each carrier directly. It also faced a lot of challenges around the massive fan-out of messages; indeed, the joke was that Twitter was down more than it was up because they were always hitting scaling limits.
HN has (had?) an idiosyncratic architecture where it stores everything in RAM and then checkpoints it out to disk for persistence. No database, no distribution, everything was in one process. It was also written in a custom dialect of Lisp (Arc) that was very macro-heavy. The advantage of this was that it could easily crank out and experiment with new features and new views on the data. The other interesting thing about it was its application of ML to content moderation, and particularly its willingness to kill threads and shadowban users based on purely algorithmic processes.
All it takes is a sufficiently big pile of custom features interacting. I work on a legal tech product that automates documents. Coincidentally, I'm just wrapping up a rewrite of the "engine" that evaluates how the documents will come out. The rewrite took many months, the code uses graph algorithms and contains a huge amount of both domain knowledge and specific product knowledge.
Claude Code is having the hardest time making sense of it and not breaking everything every step of the way. It always wants to simplify, handwave, "if we just" and "let's just skip if null", it has zero respect for the amount of knowledge and nuance in the product. (Yes, I do have extensive documentation and my prompts are detailed and rarely shorter than 3 paragraphs.)
Everything I’ve ever worked on has been entirely greenfield in domains that had very limited prior development. Industrial applications of software (and data science).
Google Maps, Twitter, and AirBnb occupy a tiny fragment of the possible domain applications of software.
You know how whenever you shuffle a deck of cards you almost certainly create an order that has never existed before in the universe?
Most software does something similar. Individual components are pretty simple and well understood, but as you scale your product beyond the simple use cases ("TODO apps"), the interactions between these components create novel challenges. This applies to both functional and non-functional aspects.
So if "cannot make with AI" means "the algorithms involved are so novel that AI literally couldn't write one line of them", then no - there isn't a lot of commercial software like that. But that doesn't mean most software systems aren't novel.
What if someone doesn't declare that it has been reimplemented using an LLM? Isn't it enough to simply declare that you have reimplemented the software without using an LLM? Good luck proving that in court...
One thing is certain, however: copyleft licenses will disappear: If I can't control the redistribution of my code (through a GPL or similar license), I choose to develop it in closed source.
OpenVG is just an API by Kronos group, that was never implemented by hardware vendors on desktop graphic cards (it was specifically created for mobiles, as OpenGL|ES).
Btw, there exists several implementations, with pure CPU rendering (like AmanithVG SRE) and others with GPU backends.
This has the potential to kill open source, or at least the most restrictive licenses (GPL, AGPL, ...): if a license no longer protects software from unwanted use, the only possible strategy is to make the development closed source.
Yes, this is the reason I've completely stopped releasing any open-source projects. I'm discovering that newer models are somewhat capable of reverse-engineering even compiled WebAssembly, etc. too, so I can feel a sort of "dark forest theory" taking hold. Why publish anything - open or closed - to be ripped off at negligible marginal cost?
People are just not realizing this now because it's mostly hobby projects and companies doing it in private, but eventually everyone will realize that LLMs allow almost any software to be reverse engineered for cheap.
See e.g. https://banteg.xyz/posts/crimsonland/ , a single human with the help of LLMs reverse engineered a non-trivial game and rewrote it in another language + graphics lib in 2 weeks.
It’s a real problem. I threw it at an old MUD game just to see how hard it is [0] then used differential testing and LLMs to rewrite it [1]. Just seems to be time and money.
Wow, as a former MajorMUD addict (~30 years ago) that's extremely interesting to see. Especially since MajorMUD is rarely discussed on HN, even in MUD or BBS-related threads.
Did you find it worked reasonably well on any portion of the codebase you could throw at it? For example, if I recall correctly, all of MajorMUD's data file interactions used the embedded Btrieve library which was popular at the time. For that type of specialized low-level library, I'm curious how much effort it would take to get readable code.
I am getting closer and closer to a full verified rewrite in Rust. I have also moved to a much easier sqlite relational structure for the backend.
I actually sidestepped the annoying btrieve problem by exporting the data using a
go binary [0] and I write it to a sqlite instance with raw byte arrays (blobs). btreive is weird because it has a dll but also a a service to interact with the files.
P.s. I have spent a lot of hours on this mainly to learn actual LLM capabilities that have improved a huge amount in the last year.
Why does it matter if it is 'ripped off' if you released it as open source anyway? I get that you might want to impose a particular licence, but is that the only reason?
Even the most permissive open source licenses such as MIT require attribution. Releasing as open source would therefore benefit the author through publicity. Bein able to say that you're the author of library X, used by megacorp Y with great success, is a good selling point in a job interview.
This is pretty much exactly why copyright laws came about in the first place. Why bother creating a book, painting, or other work of art if anyone can trivially copy it and sell it without handing you a dime?
I think refusing to publish open source code right now is the safe bet. I know I won't be publishing anything new until this gets definitively resolved, and will only limit myself to contributing to a handful of existing open source projects.
Even writing clones of products without seems pretty doable without any major effort (a friend e.g. let an ai code a Duolingo clone for his daughter). AI itself can clone itself based on output in a similar way it clones its input (I had to chuckle about the term 'destillation attacks' used by anthropic).
In a way we physical things were before. Are software patents the solution? Patents are one of the reasons why open hardware is not a bigger thing. I feel that this part of AI will move thing backwards.
Another alternative would turning around the evidence requirement and ask everyone for full provenance on inputs: no clue how this could ever work. This one only became evident because the 'author' voluntarily provided the evidence.
I find the wording "protect from unwanted use" interesting.
It is my understanding that what a GPL license requires is releasing the source code of modifications.
So if we assume that a rewrite using AI retains the GPL license, it only means the rewrite needs to be open source under the GPL too.
It doesn't prevent any unwanted use, or at least that is my understanding. I guess unwanted use in this case could mean not releasing the modifications.
If the AI product is recognised as "derivative work" of a GPL-compliant project, then it must itself be licensed under the GPL. Otherwise, it can be licensed under any other license (including closed source/proprietary binary licenses). This last option is what threatens to kill open source: an author no longer has control over their project. This might work for permissive licenses, but for GPL/AGPL and similar licenses, it's precisely the main reason they exist: to prevent the code from being taken, modified, and treated as closed source (including possible use as part of commercial products or Sass).
If you'd be willing to close source your "libre" open source project because somebody might do something you don't like with it, you never wanted a "libre" project.
This issue can be resolved on the European side by effectively making the transfer of EU->US data illegal and, if detected, nationalizing the entire EU subsidiary of the US company. Would this trigger a US-EU war? Certainly, but only the blind cannot see that relations are no longer those between two allies.
Not a lawyer but from what I understand the EU law makers are acting in response to US behavior. The US has laws intended to protect US citizens that do not apply to foreigners, a system where money buys access to anything and a lust for hoarding data. Meanwhile in the EU people use US tech for everything, probably for various not very good reasons. It's kinda sad really, it should have just been properly organized. US Tech companies should really have the customers and the EU the services.
I don't know how common it is in fonts, but for generic 2D vector graphics, problems arise from the management of self-intersections, i.e., the pixels where they fall. With an SDF rasterizer, how do you handle the pixel where two Bezier curves intersect in a fish-shaped path?
For this reason, more conventional rasterizers with multisampling are often used, or rasterizers that calculate pixel coverage analytically, also finding intersections (sweepline, Bentley-Ottmann).
Hmm I'm not sure I quite understand your question. The renderer uses predefined textures of signed distance fields. Essentially you're building a gradient out from your source vector. You can tweak how that texture is built and it doesn't need to be strictly the true distance.
In theory you could handle it however you want. You could make an SDF texture of a figure eight where the inside of the eight is purely zero and only the outside has a gradient. If you built a texture like this then your stroke with would only ever grow out of the shape.
This is an example of how SDF is very powerful.
If instead you're asking about the precision with regards to the infinitely small point of an intersection of two vectors it's actually not an issue at all. The technique uses the usual texture sampling pipeline such that the render will be as sharp as your SDF texture and as alias free as your mitmaping allows.
I'm always interested in new 2D vector rendering algorithms, so if you make a blog post explaining your approach, with enough detail, I'd be happy to read it!
At what order of magnitude in the number of elements to be sorted (I'm thinking to the overhead of the GPU setup cost) is the break-even point reached, compared to a pure CPU sort?
No idea unfortunately. For me it's mandatory to sort on the GPU because the data already resides on GPU, and copying it to CPU (and the results back to GPU) would be too costly.
Thanks. This librarys are so full of features that they literally burry the real drawing part. I had hoped for something more basic. No anti aliasing etc.
You have to look for "stroking": there are several ways to do it, in CPU you usually first perform a piecewise linear approximation of the curve, then offset each segment along the normals, add the caps, get a polygon and draw it.