this post was submitted on 14 Jan 2024
837 points (99.2% liked)

Technology

59589 readers
3024 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] drmoose@lemmy.world 10 points 10 months ago* (last edited 10 months ago) (20 children)

They're the same issue tho. Piracy and using books for corporate AI training both should be fine. The same people going after data freedom are pushing this AI drama too. There's too much money in copyright holding and it's not being held by your favorite deviantart artists.

[–] SnotFlickerman@lemmy.blahaj.zone 14 points 10 months ago* (last edited 10 months ago) (11 children)

So why are Meta, and say, Sci-Hub are treated so differently? I don't necessarily disagree, but it's interesting that we legally attack people who are sharing data altruistically (Sci-Hub gives research away for free so more research can be done, scientific research should be free to the world, because it benefits all of mankind), but when it comes to companies who break the same laws to just make more money, that's fine somehow.

It's like trying to improve the world is punished, and being a selfish greedy fucking pig is celebrated and rewarded.

Sci-Hub is so villified, it can be blocked at an ISP level (depending on where you live) and politicians are pushing for DNS-level blocking. Similar can be said for Libgen or Annas-Archive. Is anything like that happening to Meta? No? Huh, interesting. I wonder why Meta gets different treatment for similar behavior.

I am willing to defend Meta's use of this kind of data after the world has changed how they treat entities like Sci-Hub. Until that changes, all you are advocating for is for corporations to be able to break the law and for altruistic people to be punished. I agree they're the same, but until the law treats them the same, you're just giving freebies to giant corporations while fucking yourself in the ass.

[–] General_Effort@lemmy.world 2 points 10 months ago (8 children)

So why are Meta, and say, Sci-Hub treated so differently?

They are not. Meta is being sued, just like Sci-Hub was sued. So, one difference is that the suit involving Meta is still ongoing.

In any case, Meta did not create the dataset. IDK if they even shared it. The researcher who did is also being sued. The dataset has been taken down in response to a copyright complaint. IDK if it is available anywhere anymore. So the dataset was treated just like Sci-Hub. The sharing of the copyrighted material was stopped.

Meta downloading these books for AI training seems fairly straight-forward fair use to me. I don't see how what Meta did is anything like what Sci-Hub did.

[–] antonim@lemmy.dbzer0.com 3 points 10 months ago (1 children)

Meta downloading these books for AI training seems fairly straight-forward fair use to me.

They pirated the books. Is that not legally relevant?

[–] General_Effort@lemmy.world 2 points 10 months ago

"Straight-forward" may be too strong regarding these books. If they inadvertently picked up unauthorized copies while scraping the web, that would definitely not be a problem. That's what search engines do.

The question is if it is a problem that the researchers knowingly downloaded these copyrighted texts. Owners don't seem to go after downloaders. IDK if there is case law establishing that the mere act of downloading copyrighted material is infringement. I don't think there's anything to suggest that knowing about the copyright status should make a difference in civil law.

In any case, researchers must be able to share copyrighted material, not just for AI training but also any other purpose that needs it. If this is not fair use, then common crawl may not be fair use either. IDK if there is case law regarding the sharing of copyrighted materials as research material, rather than for their content. But I find it hard to see how it could not be fair use, as the alternative would be extremely destructive. So even if the download would normally be infringement, I doubt that it is in this case.

Eventually, we are only talking about a single copy of each book. So, even if researchers were forced to purchase these books, all of AI training would yield only a few extra sales for each title. The benefit to the owners would be very small in relation to the damage to the public.

load more comments (6 replies)
load more comments (8 replies)
load more comments (16 replies)