Hacker Newsnew | past | comments | ask | show | jobs | submit | iepathos's commentslogin

This is essentially 'License Laundering as a Service.' The 'Firewall' they describe is an illusion because the contamination happens at the training phase, not the inference phase. You can't claim independent creation when your 'independent developer' (the commercial LLM) already has the original implementation's patterns and edge cases baked into its weights.

In order to really do this, they would need to train LLMs from scratch that had no exposure whatsoever to open source code which they may be asked to reproduce. Those models in turn would be terrible at coding given how much of the training corpus is open source code.


>The 'Firewall' they describe is an illusion because [...]

it is an illusion because this is a satire site.


This service is provided "as is" without warranty. MalusCorp is not responsible for any legal consequences, moral implications, or late-night guilt spirals resulting from use of our services.

:)


"Our lawyers estimated $4M in compliance costs. MalusCorp's Total Liberation package was $50K. The board was thrilled. The open source maintainers were not, but who cares?"

The solution here seems to be to impose some constraint or requirement which means that literal copying is impossible (remember, copyright governs copies, it doesn't govern ideas or algorithms - that would be 'patents', which essentially no open source software has) or where any 'copying' from vaguely remembered pretraining code is on such an abstract indirect level that it is 'transformative' and thus safe.

For example, the Anthropic Rust C compiler could hardly have copied GCC or any of the many C compilers it surely trained on, because then it wouldn't have spat out reasonably idiomatic and natural looking Rust in a differently organized codebase.

Good news for Rust and Lean, I guess, as it seems like everyone these days is looking for an excuse to rewrite everything into those for either speed or safety or both.


> copyright governs copies, it doesn't govern ideas or algorithms

The second part is true. The first is a little trickier. The copyright applies to some fixed media (text in this case) rather than the idea expressed, but the protections extend well beyond copies. For example, in fiction, the narrative arc and "arrangement" is also protected, as are adaptations and translations.

If you were to try and write The Catcher in the Rye in Italian completely from memory (however well you remember it) I believe that would be protected by copyright even if not a single sentence were copied verbatim.


Obviously satire, but it will clearly be what happens in the future (predicting here, I'm not endorsing this practice). We can scratch train a new LLM on code generated from "contaminated" LLMs. We can then audit all the training data used and demonstrate that the original source wasn't in the training data. Therefore the cleanroom implementation holds. Current LLM training is relying less and less on human generated code. Just look at the open source models from China. They rely heavily on distilling from other models. One additional point. Exposure to the original source isn't enough to show infringement. Linus looked at UNIX source before writing linux.

I think this site is either satire, or serious but with a certain kind of humor in which both they and the reader know they're lying (but it's in everyone's interest to play along).

They do say this:

> Is this legal? / our clean room process is based on well-established legal precedent. The robots performing reconstruction have provably never accessed the original source code. We maintain detailed audit logs that definitely exist and are available upon request to courts in select jurisdictions.

Unless they're rejecting almost all of open source packages submitted by the customer, due to those packages being in the training set of the foundation model that they use, this is really the opposite of cleanroom.


This is definitely a parody though, not a real service.

This site is an obvious parody, but like most comedy these days it betrays the severity of the issues happening today.

[flagged]


Ah, it really wouldn't be HN without baselessly accusing other posters you disagree with.

i mean... the site is very clearly satire and the comment is clearly responding as if it is a real service.

i do not necessarily agree with the phrasing of ActivePatterns comment, but i also raised an eyebrow at iepathos' comment.


The pathos paradox: the more times a person introduces the word pathos in casual conversation the less likely they are to recognize humor/satire.

If the Ars Technica editorial process requires assuming reporters don't fabricate quotes, then their process is inadequate. That's like a software company letting junior engineers release directly to production with just a spellcheck and no real process to catch errors. Major publications like The New Yorker, The Atlantic, etc. have a dedicated fact-checking department that is part of the process and needs to give the ok before any article is published. Why is their process so deficient by comparison? Why wasn't there any fact checking?


> That's like a software company letting junior engineers release directly to production

This person wasn’t a junior.

Editorial processes don’t actually check every single line of everything that is written. Journalists are trusted to report accurately. This person demonstrated they could not be trusted.

> Why wasn't there any fact checking?

Why do programmers ever let any bugs get to production if they have code review? Journalistic outlets do not fact check literally every line that is ever written before it goes to publication.


I agree completely, the people who are acting like it's Ars' responsibility to assume every sentence from their journalists are lies just aren't being realistic.

And even if Ars editors had caught the fabricated quote, what then? Obviously he should still be fired. Ars could probably benifit from better editors but even so this doesn't absolve the journalist of any of his own blame, for being the one responsible for introducing these fabrications in the first place.


But they generally (or at least they did when I was in the biz) fact check quotes. It only takes a few minutes to fire off an email.


The idea that China hasn't 'attacked anyone' in 40 years is factually incorrect. In 1988, they engaged in a deadly naval skirmish with Vietnam over the Johnson South Reef. More recently, the PLA engaged in fatal border clashes with India in the Galwan Valley (2020). On top of direct skirmishes, they have engaged in constant gray-zone aggression: violently ramming Philippine and Vietnamese vessels in the South China Sea, firing water cannons at supply ships, and surrounding Taiwan with live-fire military blockades. That doesn't even touch on the internal human rights abuses against the Uyghurs in Xinjiang. Multiple international bodies and governments have recognized what they are doing to Uyghurs since 2014 as genocide. Finally, it's hard to ignore their devastating handling of COVID-19. The active suppression of information, punishment of early whistleblowers, and refusal to cooperate with international investigations resulted in unprecedented worldwide damage, amounting to an act of gross global endangerment.


Refreshing response from Google especially given the incompetence with which Anthropic has handled bans.


The old path of 'military invents it, civilians eventually get it' (like the Space Race or early ARPANET) hasn't been true for decades. Today, almost all major technological leaps like the modern internet, search engines, smartphones, commercial drones, etc. start in the commercial consumer sector first. The global consumer market dwarfs the defense market, which means the private sector has vastly more capital for R&D. Government payscale caps out ~$190k-$200k/year for specialized roles without some congressional workaround. The top AI researchers at OpenAI, Anthropic, Google etc. make ~$1m-$5m+/year for total compensation. The government couldn't afford to hire the right talent and the right talent likely would refuse based on moral, ethical, and rational principles with the current government.


"1000 PRs/week" with no breakdown of complexity or value is a vanity metric. If these are mostly migrations, boilerplate, and bug fixes on previous Minion PRs that were bug ridden, then you've just created 1000 code reviews/week to waste human time rubber-stamping. That's not productivity, that's busywork with extra steps.

It's like measuring productivity by how many people you pull into meetings each week. The CIA's Simple Sabotage Field Manual literally recommends holding as many meetings as possible with as many people as possible. The CIA should add "open as many PRs with AI as possible" to their list. Bonus sabotage points if the PRs are made from ambiguous "one-shot" attempts described in Slack with no follow up clarification.


I'm sorry, but the "it's not X it's Y" is making me flinch.

But otherwise I agree - its a denial of service toolkit, hard to imagine a more effective way to sabotage an engineering effort even if you tried.

One can even put a tinfoil hat and argue that this is actually done deliberately. My bet is on human stupidity, tho.


The hole is closed with per-site pseudonyms. Your wallet generates a unique cryptographic key pair for each site so same person + same site = same pseudonym, same person + different sites = different, unlinkable pseudonyms.

"The actual correct way" is an overstatement that misses jfaganel99's point. There are always tradeoffs. EUDI is no exception. It sacrifices full anonymity to prevent credential sharing so the site can't learn your identity, but it can recognize you across visits and build a behavioral profile under your pseudonym.


Ok but we were talking about users on discord who have to verify their age. I was under the impression that

> it can recognize you across visits and build a behavioral profile under your pseudonym

is the default Discord experience for users with an account, long before age verification entered the chat.


If AI is good enough that juniors wielding it outproduce seniors, then the juniors are just... overhead. The company would cut them out and let AI report to a handful of senior architects who actually understand what's being built. You don't pay humans to be a slow proxy for a better tool.

If the tools get good enough to not need senior oversight, they're good enough to not need junior intermediaries either. The "juniors with jetpacks outpacing seniors" future is unrealistic and unstable—it either collapses into "AI + a few senior architects" or "AI isn't actually that reliable yet."


Or it collapses when the seniors have to retire anyway. Who instructs the LLM when there’s nobody who understands the business?

I’m sure the plan is to create a paperclip maximizing company which is fully AI. And the sea turned salty because nobody remembered how to turn it off.


Apparent hypocrisy and injustice in government policy is an ugly thing in the world that should be pointed out and eliminated through public awareness and scrutiny.


Facebook are also under investigation, it just hasn't concluded yet. https://news.ycombinator.com/item?id=46912263


Get a life that's more interesting than dish washing 4-8 hours a day.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: