this post was submitted on 12 Apr 2026

685 points (94.8% liked)

Technology

84940 readers

3619 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 3 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

685

Linux lays down the law on AI-generated code, says yes to Copilot, no to AI slop, and humans take the fall for mistakes — after months of fierce debate, Torvalds and maintainers come to an agreement (www.tomshardware.com)

submitted 1 month ago by throws_lemy@lemmy.nz to c/technology@lemmy.world

292 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] Blue_Morpho@lemmy.world 293 points 1 month ago (6 children)

The title of the article is extraordinary wrong that makes it click bait.

There is no "yes to copilot"

It is only a formalization of what Linux said before: All AI is fine but a human is ultimately responsible.

" AI agents cannot use the legally binding "Signed-off-by" tag, requiring instead a new "Assisted-by" tag for transparency"

The only mention of copilot was this:

"developers using Copilot or ChatGPT can't genuinely guarantee the provenance of what they are submitting"

This remains a problem that the new guidelines don't resolve. Because even using AI as a tool and having a human review it still means the code the LLM output could have come from non GPL sources.

[–] marlowe221@lemmy.world 78 points 1 month ago* (last edited 1 month ago) (2 children)

Yeah, that’s also my question. Partially because I am a former-lawyer-turned-software-developer… but, yeah. How are the kernel maintainers supposed to evaluate whether a particular PR contains non-GPL code?

Granted, this was potentially an issue before LLMs too, but nowhere near the scale it will be now.

(In the interests of full disclosure, my legal career had nothing to do with IP law or software licensing - I did public interest law).

[–] stsquad@lemmy.ml 45 points 1 month ago (1 children)

They don't, just like they don't with human submitted stuff. The point of the Signed-off-by is the author attests they have the rights to submit the code.

[–] ell1e@leminal.space 2 points 1 month ago* (last edited 1 month ago) (1 children)

Which I'm guessing they cannot attest, if LLMs truly have the 2-10% plagiarism rate that multiple studies seem to claim. It's an absurd rule, if you ask me. (Not that I would know, I'm not a lawyer.)

[–] stsquad@lemmy.ml 3 points 1 month ago (2 children)

Where are you seeing the 2-10% figure?

In my experience code generation is most affected by the local context (i.e. the codebase you are working on). On top of that a lot of code is purely mechanical - code generally has to have a degree of novelty to be protected by copyright.

[–] ell1e@leminal.space 1 points 1 month ago (1 children)

If you had a contributor that plagiarized at a 2-10%, would you really go "eh it has to have a degree of novelty to be a problem" rather than just ban them? The different standards baffle me sometimes.

You can find various rates mentioned here: https://dl.acm.org/doi/10.1145/3543507.3583199 and here: https://www.theatlantic.com/technology/2026/01/ai-memorization-research/685552/

[–] stsquad@lemmy.ml 1 points 1 month ago (1 children)

If the 2-10% is just boilerplate syscall number defines or trivial MIN/MAX macros then it's just the common way to do things.

[–] ell1e@leminal.space 1 points 1 month ago* (last edited 1 month ago) (1 children)

So do you want to legally review every line by an LLM to see if it meets the fair use criterion, since you have to assume it was probably stolen? And would you do this for a known plagiarizing human contributor too...?

[–] stsquad@lemmy.ml 1 points 1 month ago (1 children)

No, that's why the author asserts that with their signed-of-by. It's what I do if I use any LLM content as the basis of my patches.

[–] ell1e@leminal.space 1 points 1 month ago* (last edited 1 month ago) (1 children)

So what does the signed-off-by magically solve here, that doesn't require either you or the contributor to legally review every line by an LLM? If you're not a lawyer, is your contributor going to be one?

[–] stsquad@lemmy.ml 1 points 1 month ago (1 children)

They don't have to be. They know what they asked the LLM to do. They know how much they adapted the output. You usually have to work to get the models to spit out significant chunks of memorised text.

[–] ell1e@leminal.space 1 points 1 month ago

I don't have much more to say other than I doubt the data backs up what you're saying at all.

[–] Danquebec@sh.itjust.works 0 points 1 month ago (1 children)

Imagine how broken it would be otherwise. The first person to write a while loop in any given language would be the owner of it. Anyone else using the same concept would have to write an increasingly convoluted while loop with extra steps.

[–] sloppy_diffuser@sh.itjust.works 1 points 1 month ago

Anyone else using the same concept would have to write an increasingly convoluted while loop with extra steps.

Sounds like an origin story for recursion.

[–] wonderingwanderer@sopuli.xyz 13 points 1 month ago (1 children)

If it's flagged as "assisted by " then it's easy to identify where that code came from. If a commercial LLM is trained on proprietary code, that's on the AI company, not on the developer who used the LLM to write code. Unless they can somehow prove that the developer had access to said proprietary code and was able to personally exploit it.

If AI companies are claiming "fair use," and it holds up in court, then there's no way in hell open-source developers should be held accountable when closed-source snippets magically appear in AI-assisted code.

Granted, I am not a lawyer, and this is not legal advice. I think it's better to avoid using AI-written code in general. At most use it to generate boilerplate, and maybe add a layer to security audits (not as a replacement for what's already being done).

But if an LLM regurgitates closed-source code from its training data, I just can't see any way how that would be the developer's fault...

[–] sem@piefed.blahaj.zone 6 points 1 month ago (1 children)

Pretty convenient.

This is how copyleft code gets laundered into closed source programs.

All part of the plan.

[–] wonderingwanderer@sopuli.xyz 1 points 1 month ago (3 children)

How would they launder it? Just declare it their own property because a few lines of code look similar? When there's no established connection between the developers and anyone who has access to the closed-source code?

That makes no sense. Please tell me that wouldn't hold up in court.

[–] lagoon8622@sh.itjust.works 3 points 1 month ago (1 children)

Please tell me that wouldn't hold up in court.

First tell us how much money you have. Then we'll be able to predict whether the courts will find in your favor or not

[–] wonderingwanderer@sopuli.xyz 3 points 1 month ago

Sad but true...

[–] sem@piefed.blahaj.zone 2 points 1 month ago (1 children)

First of all, who is going to discover the closed source use of gpl code and create a lawsuit anyway?

Second, the llm ingests the code, and then spits it back out, with maybe a few changes. That is how it benefits from copyleft code while stripping the license.

Maybe a human could do the same thing, but it would take much longer.

[–] wonderingwanderer@sopuli.xyz 1 points 1 month ago (1 children)

Wait, did you just move the goalposts? I thought the issue we were talking about was open-source developers who use LLM-generated code and unwittingly commit changes that contain allegedly closed-source snippets from the LLM's training data.

Now you want to talk about LLM training data that uses open-source code, and then closed-source developers commit changes that contain snippets of GPL code? That's fine. It's a change of topic, but we can talk about that too.

Just don't expect what I said before about the previous topic of discussion to apply to the new topic. If we're talking about something different now, I get to say different things. That's how it works.

[–] sem@piefed.blahaj.zone 1 points 1 month ago (1 children)

I was responding specifically to this part

But if an LLM regurgitates closed-source code from its training data, I just can't see any way how that would be the developer's fault...

showing what would happen when the llm regurgitates open source code into close source projects.

Sorry if you didn't like that.

[–] wonderingwanderer@sopuli.xyz 1 points 1 month ago

But you flipped the situation, making it an entirely different discussion, and then you went on as if you thought my previous point was still supposed to apply to the new topic that you introduced.

It's not that I don't like it; we can talk about the issues with training commercial LLMs on GPL code. It was just an unannounced change of topic. Like you were trying to score points, so you brought up something irrelevant to pretend I'm arguing against, which I wasn't.

Corporations have been able to steal open-source code without the help of AI, and the same issues arise due to lack of transparency. It's a problem, sure, but it wasn't the problem we were discussing. And you acting like I'm somehow arguing against it being a problem is a strawman, because it's not what the thing I said was in reference to.

[–] ricecake@sh.itjust.works 1 points 1 month ago (1 children)

I believe what they're referring to is the training of models on open source code, which is then used to generate closed source code.
The break in connection you mention makes it not legally infringement, but now code derived from open source is closed source.

Because of the untested nature of the situation, it's unclear how it would unfold, likely hinging on how the request was formed.

We have similar precedent with reverse engineering, but the non sentient tool doing it makes it complicated.

[–] wonderingwanderer@sopuli.xyz 1 points 1 month ago (1 children)

That makes sense. I see the problem with that, and I don't have a good solution for it. It is a divergence of topic though, as we were discussing open-source programmers using LLMs which are potentially trained on closed-source code.

LLMs trained on open-source code is worth its own discussion, but I don't see how it fits in this thread. The post isn't about closed-source programmers using LLMs.

Besides, closed-source code developers could've been stealing open-source code all along. They don't really need AI to do that.

Still, training LLMs on open-source code is a questionable practice for that reason, particularly when it comes to training commercial models on GPL code. But it's probably hard to prove what code was used in their datasets, since it's closed-source.

[–] ricecake@sh.itjust.works 1 points 1 month ago (1 children)

I don't really see it as a divergence from the topic, since it's the other side of a developer not being responsible for the code the LLM produces, like you were saying.
In any case, it's not like conversations can't drift to adjacent topics.

Besides, closed-source code developers could've been stealing open-source code all along. They don't really need AI to do that.

Yes, but that's the point of laundering something. Before if you put foss code in your commercial product a human could be deposed in the lawsuit and make it public and then there's consequences. Now you can openly do so and point at the LLM.

People don't launder money so they can spend it, they launder money so they can spend it openly.

Regardless, it wasn't even my comment, I just understood what they were saying and I've already replied way out of proportion to how invested I am in the topic.

[–] wonderingwanderer@sopuli.xyz 1 points 1 month ago (1 children)

Conversations can drift to adjacent topics, yeah, but it's not a "gotcha" when someone suddenly changes the topic to the inverse of what was being said, and then acts like they're arguing against you because the thing that you said about the original topic doesn't add up with the new topic.

If you change the topic, you need to at least give the other person an opportunity to respond to your new topic, not just assume that their same argument applies.

[–] ricecake@sh.itjust.works 1 points 1 month ago (1 children)

Alright. I didn't see any gotchas or argument, and didn't make the comment.

That being said, reading the context I assume you're referring to, it hardly reads like anything more than talking about the implication of the idea you shared.
Disagreeing because applying the argument consistently results in an undesirable outcome isn't objectionable.

[–] wonderingwanderer@sopuli.xyz 1 points 1 month ago (1 children)

Disagreeing because applying the argument consistently results in an undesirable outcome isn't objectionable.

I'm not objecting to disagreement, I'm objecting to the attempt to apply my argument to a different situation that it wasn't meant for, and then going on as if that's even remotely what I was saying.

That's not "applying the argument consistently", it's removing context, overgeneralizing the argument, and applying a strawman based on a twisted version of it.

Open-source developers using AI trained on closed-source code and closed-source developers using AI trained on open-source code are two different issues. My point was only intended to apply to the former, because that's what we were talking about. Trying to apply what I said to the former is a distortion of my argument, and not the argument I was making.

And to try to conflate the two is to be allergic to nuance, which is honestly just typical and unsurprising, but if that's the case then I'm done wasting my time on this conversation.

[–] ricecake@sh.itjust.works 0 points 1 month ago (1 children)

I'm really not interested in the topic. I'm talking because I explained what someone else meant and you started responding as though that was an opinion or argument I was making.

That's not "applying the argument consistently", it's removing context, overgeneralizing the argument, and applying a strawman based on a twisted version of it.

It's really not.
It's not unreasonable for someone to think "developers who use copy written code from AI aren't liable for infringement" applies to closed source devs as well as open, and to disagree because they don't like one of those.
It's perfectly valid for you to also disagree and say the statement shouldn't apply both ways, but that doesn't make the other statement somehow a non-sequitor.

[–] wonderingwanderer@sopuli.xyz 0 points 1 month ago (1 children)

If you're not interested, then why are you still here saying the same thing over and over again?

It's perfectly fine if someone wants to make a claim that "we should apply the same argument across both situations," and then I would give my reasoning as to why different arguments apply. But that's not what happened.

What happened was, I gave an argument applied to the situation being discussed. Someone else tried to apply my argument to a different situation, in order to argue against a point that I didn't make. And ever since that point, this whole conversation has been going in circles in which you and that other commenter keep arguing as if I'm saying something that I never said, and I keep stating repeatedly that it's not what I said.

And if you read back through this chain, I never said it. I even said I can understand the other point of view, and would probably even agree with it, if that's the conversation we were having, and I said we could even have that conversation, but that the sudden change of topic as an attempt to "score points" against me is not a good faith argumentation style.

Is it a problem if commercial LLMs are trained on GPL code, and then used by closed-source developers to generate proprietary code which potentially contains open-source snippets? Yes, I've never denied that. But that's not what this conversation has been about.

From the start, it's been about open-source developers using LLMs to write open-source code, when those LLMs are potentially trained on closed-source code and may generate snippets closely resembling closed-source code.

Those are fundamentally different situations, and if you can't see that then I can break it down for you in minute detail. But the point I made about the one thing was never meant to apply to the other; and arguing against the point I made as if it was meant to apply to a different situation is a bad faith argument.

[–] ricecake@sh.itjust.works 1 points 1 month ago

Whoah, I never said I wasn't interested in the exchange, only that I wasn't interested in the topic.
As someone who's extremely insistent that it's grossly improper to make any form of inferences beyond what is literally stated, I'm shocked you would make such a leap!

I think you're persistently confusing me with someone else. I perfectly understand your point, and have never had any doubt about what you intended to say. I never even disagreed with you on the topic.
I clarified someone else's point to you, and you started explaining to me how they made unreasonable assumptions, which is what I disappeared with.

Intellectual property laws apply to open and closed source software and developers equally. When you make a statement about legal culpability for an action by one group, it makes sense to assume that statement applies to the other because in the eyes of the law and most people people in context there's no distinction between them.

No one is unclear that you were only referring to one group anymore. That's abundantly clear.

My point is that you're being overly defensive about someone else making a normal assumption about the logic behind your argument. And you're directing that defensiveness at someone who never even made that assumption.

[–] anarchiddy@lemmy.dbzer0.com 11 points 1 month ago

Yup.

I would also just point out that this doesnt change the legal exposure to the Linux kernel to infringing submissions from before the advent of LLMs.

[–] lechekaflan@lemmy.world 9 points 1 month ago

The title of the article is extraordinary wrong that makes it click bait.

It's the pain in the ass with some of those fucking tech/video/showbiz news outlets and then rules in some fora where you cannot make "editorialized" post titles, even though it's so tempting to correct the awful titling.

[–] Fmstrat@lemmy.world 3 points 1 month ago

Because even using AI as a tool and having a human review it still means the code the LLM output could have come from non GPL sources.

I get why they are passing this by though, since you don't know the provenance of that Stack Overflow snippet, either.

[–] scarabic@lemmy.world 1 points 1 month ago

That’s probably why they say “a human is responsible” not “a human must validate it.” I certainly agree that validation is not always possible. And this problem will get worse in time.

[–] TheOctonaut@piefed.zip -2 points 1 month ago (1 children)

the LLM output could have come from non-GPL sources

Fundamentally not how LLMs work, it's not a database of code snippets.

[–] BradleyUffner@lemmy.world 7 points 1 month ago

"Derivative works"