Technology

84965 readers

3937 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 3 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

952

The Internet Archive just lost its appeal over ebook lending (www.theverge.com)

submitted 2 years ago* (last edited 2 years ago) by fossilesque@mander.xyz to c/technology@lemmy.world

142 comments fedilink hide all child comments

https://github.com/ArchiveTeam/warrior-dockerfile

you are viewing a single comment's thread
view the rest of the comments

[–] MigratingtoLemmy@lemmy.world 178 points 2 years ago (5 children)

If OpenAI can get away with going through copy-righted material, then the answer to piracy is simple: round up a bunch of talented Devs from the internet who are writing and training AI models, and let's make a fantastic model trained on what the internet archive has. Tell you what, let Mistral's engineers lead that charge, and put an AGPL license on the project so that companies can't fuck us over.

I refuse to believe that nobody has thought of this yet

[–] bandwidthcrisis@lemmy.world 33 points 2 years ago (1 children)

An AI trained on old Internet material would be like a synthetic Grandpa Simpson:

"In my day we said 'all your base' and laughed all day long, because it took all day to download the video."

[–] Ragnarok314159@sopuli.xyz 19 points 2 years ago

This stupid thing just keeps saying “I can Haz Cheeseburger”. What the hell does that even mean?

[–] General_Effort@lemmy.world 11 points 2 years ago

What do you think Mistral trains its models on? Public domain stuff?

[–] werefreeatlast@lemmy.world 5 points 2 years ago

Better yet! Train an AI to re-write the books into brand new books and let us read, review the content, add notes etc so that the AI can refresh the books if we find errors.

Kick the private collections to the curb! Teeth in like in American History X.

[–] Dkarma@lemmy.world 3 points 2 years ago

"AI write Hamlet" AI writes Idiocracy.

[–] capital@lemmy.world 2 points 2 years ago (2 children)

We get it, y’all hate LLMs and the companies who make them.

This comparison is disingenuous and I have to think you’re smart enough to know that, making this disinformation.

If/when an LLM like ChatGPT spits out a full copy of training text, that’s considered a bug and is remediated fairly quickly. It’s not a feature.

What IA was doing was sharing the full text as a feature.

As far as I know, there are some court cases pending regarding determining if companies like Open AI are guilty of copyright infringement but I haven’t seen any convictions yet (happy to be corrected here).

All that said, I love IA and have a Warrior container scheduled to run nightly to help contribute.

[–] MigratingtoLemmy@lemmy.world 4 points 2 years ago (1 children)

Hmm, true. IA wouldn't be as supported if we couldn't get the full text of the source.

Can you tell me more about the "warrior container"?

[–] capital@lemmy.world 4 points 2 years ago

It’s mentioned in the OP but it’s this:

https://wiki.archiveteam.org/index.php/ArchiveTeam_Warrior

Basically, distributed collection.

[–] dan@upvote.au 1 points 2 years ago* (last edited 2 years ago) (1 children)

have a Warrior container

This is an ArchiveTeam project, which is a totally separate effort to the Internet Archive. As far as I know, they're not related other than the fact that ArchiveTeam use The Internet Archive for storage.

[–] capital@lemmy.world 1 points 2 years ago

Ahh my mistake.

Might be time to financially contribute to IA.