this post was submitted on 30 Jan 2025
90 points (94.1% liked)

Technology

61227 readers
4355 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
 

So taking data without permission is bad, now?

I'm not here to say whether the R1 model is the product of distillation. What I can say is that it's a little rich for OpenAI to suddenly be so very publicly concerned about the sanctity of proprietary data.

The company is currently involved in several high-profile copyright infringement lawsuits, including one filed by The New York Times alleging that OpenAI and its partner Microsoft infringed its copyrights and that the companies provide the Times' content to ChatGPT users "without The Times’s permission or authorization." Other authors and artists have suits working their way through the legal system as well.

Collectively, the contributions from copyrighted sources are significant enough that OpenAI has said it would be "impossible" to build its large-language models without them. The implication being that copyrighted material had already been used to build these models long before these publisher deals were ever struck.

The filing argues, among other things, that AI model training isn't copyright infringement because it "is in service of a non-exploitive purpose: to extract information from the works and put that information to use, thereby 'expand[ing] [the works’] utility.'"

This kind of hypocrisy makes it difficult for me to muster much sympathy for an AI industry that has treated the swiping of other humans' work as a completely legal and necessary sacrifice, a victimless crime that provides benefits that are so significant and self-evident that it's wasn't even worth having a conversation about it beforehand.

A last bit of irony in the Andreessen Horowitz comment: There's some handwringing about the impact of a copyright infringement ruling on competition. Having to license copyrighted works at scale "would inure to the benefit of the largest tech companies—those with the deepest pockets and the greatest incentive to keep AI models closed off to competition."

"A multi-billion-dollar company might be able to afford to license copyrighted training data, but smaller, more agile startups will be shut out of the development race entirely," the comment continues. "The result will be far less competition, far less innovation, and very likely the loss of the United States’ position as the leader in global AI development."

Some of the industry's agita about DeepSeek is probably wrapped up in the last bit of that statement—that a Chinese company has apparently beaten an American company to the punch on something. Andreessen himself referred to DeepSeek's model as a "Sputnik moment" for the AI business, implying that US companies need to catch up or risk being left behind. But regardless of geography, it feels an awful lot like OpenAI wants to benefit from unlimited access to others' work while also restricting similar access to its own work.

you are viewing a single comment's thread
view the rest of the comments
[–] pebbles@sh.itjust.works 33 points 7 hours ago (2 children)

We aren't there, but imagine a world where ideas aren't owned. Where you aren't hurt by someone else using your work. Where we all benefit from innovation and reuse.

[–] themurphy@lemmy.ml 22 points 7 hours ago* (last edited 6 hours ago)

I remember that there was a science group that each year got millions in funding, unconditionally. Except everything you discovered would be open for anyone to use.

Because it was unconditional, they could research ANYTHING. And it was very successful, because they could invent things without being controlled by profits or share holders.

It basically worked well.

EDIT: Found some of them. Look up The Invisible College or The Institute of Advanced Study. Also found 4 similar groups in Denmark being funded by private firms (like Carlsberg, the beer maker), where they can study anything and make it public.

[–] maplebar@lemmy.world -4 points 6 hours ago (1 children)

We aren't talking about "ideas" being stolen here, we're talking about work being stolen and exploited for corporate profit.

Personally I don't think it's crazy to suggest that the person who writes a book should own it, the people who compose a song should own it, the artists who paints a painting should own it, etc.

As much as techbros love to pretend that AI is ushering us into a post-capitalist, post-copyright Star Trek future, it is actually in fact doing the exact opposite--it's empowering the biggest and richest tech companies to exploit human creativity in the largest industrial plagiarism scheme in history, all so some bullshit VC investors can gain their way up the pyramid scheme known as the stock market.

[–] proceduralnightshade@lemmy.ml 1 points 1 hour ago

The problem with copyright/data ownership is that it's useless if you're unable to enforce it. Data is replicable, doesn't matter if you call it "work" or "ideas". Do you think you own the text you just wrote? Let me show you something.

SpoilerWe aren't talking about "ideas" being stolen here, we're talking about work being stolen and exploited for corporate profit.

Personally I don't think it's crazy to suggest that the person who writes a book should own it, the people who compose a song should own it, the artists who paints a painting should own it, etc.

As much as techbros love to pretend that AI is ushering us into a post-capitalist, post-copyright Star Trek future, it is actually in fact doing the exact opposite--it's empowering the biggest and richest tech companies to exploit human creativity in the largest industrial plagiarism scheme in history, all so some bullshit VC investors can gain their way up the pyramid scheme known as the stock market.

There. I just stole your text. I stole it. I own it now. It's mine now. What are you gonna do about it?

Instead there is no stealing when it comes to information, there's only replication, there's only copying.

I agree with you, corpos shouldn't have this amount of power. But you won't get there by trying to protect the work of artists writers etc with the exact same scheme corpos pulled to protect their power and interests. Like, it didn't work, did it? No copyright for me, thanks