Technology

59605 readers

3397 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS

837

Meta admits using pirated books to train AI, but won't pay for it (www.techspot.com)

submitted 10 months ago by throws_lemy@lemmy.nz to c/technology@lemmy.world

164 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] TWeaK@lemm.ee 43 points 10 months ago (31 children)

Fair use covers research, but creating a training database for your commercial product is distinctly different from research. They're not publishing scientific papers, along with their data, which others can verify; they are developing a commercial product for profit. Even compared to traditional R&D this is markedly different, as they aren't building a prototype - the test version will eventually become the finished product.

The way fair use works is that a judge first decides whether it fits into one of the categories - news, education, research, criticism, or comment. This does not really fit into the category of "research", because it isn't research, it's the final product in an interim stage. However, even if it were considered research, the next step in fair use is the nature, in particular whether it is commercial. AI is highly commercial.

AI should not even be classified in a fair use category, but even if it were, it should not be granted any exemption because of how commercial it is.

They use other peoples' work to profit. They should pay for it.

Facebook steals the data of individuals. They should pay for that, too. We don't exchange our data for access to their website (or for access to some 3rd party Facebook pays to put a pixel on), the website is provided free of charge, and they try and shoehorn another transaction into the fine print of the terms and conditions where the user gives up their data free of charge. It is not proportionate, and the user's data is taken without proper consideration (ie payment, in terms of the core principles of contract law).

Frankly, it is unsurprising that an entity like Facebook, which so egregiously breaks the law and abuses the rights of every human being who uses the interent, would try to abuse content creators in such a fashion. Their abuse needs to be stopped, in all forms, and they should be made to pay for all of it.

[–] Syntha@sh.itjust.works 5 points 10 months ago (3 children)

They're not publishing scientific papers, along with their data, which others can verify;

Not that I think this is really relevant here but I'm pretty sure Meta has published scientific papers on Llama and the Llama 1 & 2 models are open and accessible to anyone.

[–] TWeaK@lemm.ee 1 points 10 months ago (2 children)

No that is relevant, however I would still argue that a paper without enough data to replicate their work (ie releasing the code of their LLM) isn't really anything that should qualify as research. The whole point of academia is that someone else verifies your work - or rather, they try to prove you wrong.

[–] tinwhiskers@lemmy.world 2 points 10 months ago* (last edited 10 months ago) (1 children)

They have released it on github. The code is only about 500 lines. But releasing the model is arguably more important because that sort of compute is not affordable to any mortals.

[–] TWeaK@lemm.ee 1 points 10 months ago

Yeah I mean what they've released is essentially the design of the battery and starter system, without the design of the actual motor. You can't replicate their product and prove their work with what they've published.

load more comments (27 replies)