this post was submitted on 12 Apr 2026
574 points (94.8% liked)
Technology
83753 readers
2862 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Seems like a reasonable approach. Make people be accountable for the code they submit, no matter the tools used.
If the accountability cannot be practically fulfilled, the reasonable policy becomes a ban.
What good is it to say "oh yeah you can submit LLM code, if you agree to be sued for it later instead of us"? I'm not a lawyer and this isn't legal advice, but sometimes I feel like that's what the Linux Foundation policy says.
But this was already the case. When someone submitted code to Linux they always had to assume responsibility for the legality of the submitted code, that's one of the points of mandatory Signed-off-by.
But now, even the person submitting the license-breaching content may be unaware that they are doing that, so the problem is surely worse now that contributors can easily unwittingly be on the wrong side of the law.
That's their problem. If they are using an LLM and cannot verify the output they shouldn't be using an LLM
Nobody can verify that the output of an LLM isn’t from its training data except those with access to its training data.
Problem is that broadly most GenAI users don't take that risk seriously. So far no one can point to a court case where a rights holder successfully sued someone over LLM infringement.
The biggest chance is getty and their case, with very blatantly obvious infringement. They lost in the UK, so that's not a good sign.
Most GenAI users do not submit code to the Linux kernel project.
So why invite them to?
Nobody is inviting them
It is their problem until the second they submit it, then it is the project's problem. You can lay the blame for the bad actions wherever you want, but the reality is that the work of verifying the legality and validity of these submissions if being abdicated, crippling projects under increased workloads going through ever more submissions that amount to junk.
What is the solution for that? The fact that is the fault of the lazy submitter doesn't clean up the mess they left.
Frankly I expect the kernel dudes to be pretty good about this, their style guides alone are quite strick and any funny business in a PR that isn't marked correctly is I think likely a ban from making PRs at all. How it worked beforehand, as already stated by others is the author says "I promise this follows the rules" and that's basically the end of it. Giving an official avenue for generated code is a great way to reduce the negatives of it that'll happen anyway. We know this from decades of real life experience trying to ban things like alcohol or drugs, time after time providing a legal avenue with some rules makes things safer. Why wouldn't we see a similar effect here?
I do think that some projects will fare better than others, particularly ones like you mentioned, where the team is robust and capable of handling the filtering of increased submissions from these new sources.
I believe we are going to end up having to see some new mechanism for project submissions to deal with the growing imbalance between submission volume and work hours available for review, as became necessary when viruses, malware, and spam first came into being. It has quickly become incredibly easy for anyone to make a PR, but not at all easier to review them, so something is going to have to give in the FOSS world.
No, it’s not a reasonable approach. Make people be the authors of the code they submit is reasonable, because then it can be released under the GPL. AI generated code is public domain.
I suppose there should be no code generators, assemblers, compilers, linkers, or lsp’s then either? Just etching 1’s and 0’s?
The copyright office has made it explicitly clear that those tools do not interfere with the traditional elements of authorship, and that the use of LLMs does. So, if you don’t want to take my word for it, take the US Copyright Office’s word for it.
What this makes clear is that it certainly isn’t black or white as you say. Nevertheless, automation converting an input to an output, simply cannot be the only mechanism used in determining authorship.
And that wouldn’t change my statement anyway, but rather supports it. The person submitting a patch must be accountable for its contents.
An outright ban would need to carefully define how an input gets converted to an output, and that may not be so clear. To be effectively clear, one would have to potentially end the use of many tools that have been used for many years in the kernel, including snippet generation, spelling and grammar correction, IDE autocompleting. So such a reductive view simply will not suffice.
Additionally, copywritability and licenseability are wholly different questions. And it does not violate GPL to include public domain content, since the license applies to the aggregate work.
That seems very clear to me. Generative AI output is not human authored, and therefore not copyrighted.
The policy I use also makes very clear the definition of AI generated material:
https://sciactive.com/human-contribution-policy/#Definitions
I’m not exactly sure how you can possibly think there is an equivalence between a tool like a spelling and grammar checker and a generative AI, but there’s a reason the copyright office will register works that have been authored using spelling and grammar checkers, but not works that have been authored using LLMs.
Just read the next two paragraphs. Don’t just stop because you got to something that you like. The equivalence I draw is clear. You don’t like it, and that’s okay. But one would have to clarify exactly what the ban entails, and that wouldn’t be as clear as you might think. LLM’s only, transformers specifically, what about graph generation, other ML models? Is it just ML? If so, is that because a matrix lattice was used to get from input to output? Could other deterministic math functions trigger the same ban? What is a spell checker used RNG to select best replacement from a list of correct options? What if a compiler introduces an assembled output with an optimization not of the authors writing?
Do you see why they say “The answer will depend on the circumstances, particularly how the AI tool operates and how it was used to create the final work. This is necessarily a case-by-case inquiry”?
And that still affects copywriteability, not license compliance.
Do you want to explain to me what, in those two paragraphs, means that the use of spell checkers and LLMs is equivalent with regard to copyrightability? It seems like those paragraphs make it clear that the use of spell checkers is not the same as LLMs.
The policy I use bans “generative AI model” output. Generative AI is a pretty well defined term:
https://en.wikipedia.org/wiki/Generative_AI
https://www.merriam-webster.com/dictionary/generative%20AI
If you have trouble determining whether something is a generative AI model, you can usually just look up how it is described in the promotional materials or on Wikipedia.
- https://en.wikipedia.org/wiki/Claude_(language_model)
I never said it violates GPL to include public domain code. I’m not sure where you got that from. What I said is that public domain code can’t really be released under the GPL. You can try, but it’s not enforceable. As in, you can release it under that license, but I can still do whatever I want with it, license be damned, because it’s public domain.
I did that with this vibe coded project:
https://github.com/hperrin/gnata
I just took it and rereleased it as pubic domain, because that’s what it is anyway.
Disney created films based on old fairy tales. Disney has a copyright on those films even though they include elements from the public domain because the films also include the artists' original expression. The linux kernel (probably) contains public domain AI-generated code alongside original work from its many contributors. If you wanted to get the entire project into the public domain, you'd have to get permission from nearly all its contributors or wait for their copyright term to expire. The small snippets of code which were AI-generated are public domain. The bulk of the project isn't, and the project as a whole isn't.
As much as I dislike AI, I can't say I understand forbidding AI-generated contributions on the grounds that the submitted code is public domain. I suppose somebody can come along and "steal" the public domain snippets, but I suspect it's difficult to definitively tell apart the human-written code from AI-generated and strip out the human-written bits. If they do, what's the issue? It wasn't yours to begin with and you can still keep it in your project. Moreover, now that the magical plagiarism machines exist, who's going to be lifting code in this way, anyway?
I mean, yeah, you can make the argument that owning the copyrights to all of the code in your project isn’t important. I don’t agree, but that’s certainly a valid stance. Apparently the Linux maintainers are on your side. That makes me sad. Copyright ownership of the things I produce is very important to me.
Isn’t that the rule? The author has to be a human?
If the author is an LLM, then the author is not a human.