Anna's Archive is looking for volunteers to run mirrors : piracy


Ko-fi	Liberapay

[–] maxprime@lemmy.ml 65 points 1 year ago (2 children)

For anyone wanting to contribute but on a smaller and more feasible scale, you can help distribute their database using torrents.

https://annas-archive.org/torrents

[–] empireOfLove2@lemmy.dbzer0.com 47 points 1 year ago* (last edited 1 year ago) (1 children)

I know the last time this came up there was a lot of user resistance to the torrent scheme. I'd be willing to seed 200-500gb but having minimum torrent archive sizes of like 1.5TB and larger really limits the number of people willing to give up that storage, as well as defeats a lot of the resiliency of torrents with how bloody long it takes to get a complete copy. I know that 1.5TB takes a massive chunk out of my already pretty full NAS, and I passed on seeding the first time for that reason.

It feels like they didn't really subdivide the database as much as they should have...

[–] maxprime@lemmy.ml 27 points 1 year ago (1 children)

There are plenty of small torrents. Use the torrent generator and tell the script how much space you have and it will give you the “best” (least seeded) torrents whose sum is the size you give it. It doesn’t have to be big, even a few GB is suitable for some smaller torrents.

[–] empireOfLove2@lemmy.dbzer0.com 22 points 1 year ago* (last edited 1 year ago)

Almost all the small torrents that I see pop up are already seeded relatively good (~10 seeders) though, which reinforces the fact that A. the torrents most desperately needing seeders are the older, largest ones and B. large torrents don't attract seeders because of unreasonable space requirements.

Admittedly, newer torrents seem to be split into 300gb or less pieces, which is good, but there's still a lot of monster torrents in that list.

[–] GravitySpoiled@lemmy.ml 7 points 1 year ago (1 children)

Thx.

Do you know how useful it is to host such a torrent? Who is accessing the content via that torrent?

[–] maxprime@lemmy.ml 7 points 1 year ago (1 children)

Anyone who wants to. I think a lot of LLM trainers access them.

[–] GravitySpoiled@lemmy.ml 1 points 1 year ago

Doesn't sound like I should host some of it. I'd be more down to host it for endusers

[+] umbrella@lemmy.ml 29 points 1 year ago* (last edited 3 months ago) (2 children)

[deleted]

[–] xrtxn@lemmy.sdf.org 50 points 1 year ago (2 children)

The selection is literally all books that can be found on the internet.

[–] tsonfeir@lemm.ee 13 points 1 year ago (3 children)

So how big is that?

[–] Index@feddit.nl 33 points 1 year ago (3 children)

According to their total dataset size excluding duplicates, over 900 TB

[–] rufus@discuss.tchncs.de 17 points 1 year ago

Sure, that's a bit more than $65.000 per year with Backblaze.

[–] tsonfeir@lemm.ee 12 points 1 year ago (5 children)

Shit, my synology has more than that… alas, it is full of movie “archives”

[–] state_electrician@discuss.tchncs.de 18 points 1 year ago (1 children)

You run a petabyte Synology at home?

[–] tsonfeir@lemm.ee 7 points 1 year ago

Well, it’s not just a single synology, it’s got a bunch of expansion units, and there are multiple host machines.

[–] dutchkimble@lemy.lol 6 points 1 year ago (2 children)

I'm guessing you're talking GBs?

[–] tsonfeir@lemm.ee 8 points 1 year ago (1 children)

Nope.

[–] dutchkimble@lemy.lol 2 points 1 year ago (1 children)

That's awesome - how many drives and of what sizes do you have? Also why synology instead of higher enterprise grade solution at this point?

[–] tsonfeir@lemm.ee 4 points 1 year ago

Right now most of them are 20T each. I started smaller at first, but they’ve dropped so much in price. I usually wait until a sale and grab a bunch. There are… math… 62 drives?

When I first started, I only had the 6 bay… I chose synology because I wanted something that was managed for me. I don’t want to have to focus on setting things up and possibly doing things wrong. It comes with amazing tools. Also, the server buy-in was a lot less than the other “professional” rack mounted solutions.

I had such a great experience that I just kept with them. It is a pretty expensive hobby though, but so is buying physical movies. And, some things never get a physical release, so having it digitally protects me from when Netflix, or whomever, decides to drop something.

[–] FigMcLargeHuge@sh.itjust.works 3 points 1 year ago* (last edited 1 year ago)

They put a link in with the total...

Total Excluding duplicates 133,708,037 files 913.1 TB

load more comments (3 replies)

[–] AmbiguousProps@lemmy.today 5 points 1 year ago

Correct me if I'm wrong, but they only index shadow libraries and do not host any files themselves (unless you count the torrents). So, you don't need 900+ TB of storage to create a mirror.

[–] Cuntessera@sh.itjust.works 4 points 1 year ago (1 children)

I imagine a couple of terabytes at the very least, though, I could be underestimating how many books have got deDRMed so far.

[–] tsonfeir@lemm.ee 4 points 1 year ago (1 children)

Apparently it’s 900TB

[–] Cuntessera@sh.itjust.works 9 points 1 year ago (2 children)

Girl, what? No wonder they’re having trouble hosting their archive. Does Anna’s Archive host copyrighted content as well or is all that copyleft?

[–] redcalcium@lemmy.institute 13 points 1 year ago (2 children)

They host academic papers and books, most of them are copyrighted contents. They recently got in trouble for scraping a book metadata service to generate a list of books that hasn't been archived yet: https://torrentfreak.com/lawsuit-accuses-annas-archive-of-hacking-worldcat-stealing-2-2-tb-data-240207/

[–] Cuntessera@sh.itjust.works 2 points 1 year ago (2 children)

Is hosting all that stuff even legal? I mean, they’re not making any money off of it, but they’re still a “piracy” hub. How have they survived this long?

[–] redcalcium@lemmy.institute 4 points 1 year ago

It's very illegal. iirc it was created by a group called "Pirate Library Mirror" after the guy that runs z-library got arrested, so I assume they're taking anonymity seriously to avoid arrest.

[–] catloaf@lemm.ee 3 points 1 year ago

No, it's not.

They've survived by making themselves hard to identify and shut down. And as we can see here, by creating redundancies.

[–] AmbiguousProps@lemmy.today 2 points 1 year ago* (last edited 1 year ago)

They index, not host, no? (Unless you count the torrents, which are distributed)

[–] smnwcj@fedia.io 4 points 1 year ago

The archive includes copyrighted works. Often multiple copies of each work, across different formats.

[–] FreudianCafe@lemmy.ml 4 points 1 year ago

I guess more than 5?

[–] spiderman@ani.social 2 points 1 year ago* (last edited 1 year ago)

bigger than zlib or project Gutenberg?

[–] redcalcium@lemmy.institute 15 points 1 year ago (1 children)

It is huge! They claimed to have preserved about 5% of the world’s books.

[–] HeartyOfGlass@lemm.ee 28 points 1 year ago (2 children)

Could anyone broad-stroke the security requirements for something like this? Looks like they'll pay for hosting up to a certain amount, and between that and a pipeline to keep the mirror updated I'd think it wouldn't be tough to get one up and running.

Just looking for theory - what are the logistics behind keeping a mirror like this secure?

[–] thanksforallthefish@literature.cafe 23 points 1 year ago* (last edited 1 year ago) (3 children)

Could be worth asking on selfhosted (how do I link a sub on lemmy ?) They probably have more relevant experience at this sort of thing.

Edit

Does this work ?

https://lemmy.world/c/selfhosted

[–] can@sh.itjust.works 20 points 1 year ago

!selfhosted@lemmy.world might work for more people.

[–] rufus@discuss.tchncs.de 12 points 1 year ago* (last edited 1 year ago) (1 children)

!datahoarder@lemmy.ml

Is probably more suitable. I'd be interested in the total size, though.

[–] catloaf@lemm.ee 3 points 1 year ago (2 children)

900 TB, according to other comments here.

load more comments (2 replies)

[–] pfaca@lemm.ee 5 points 1 year ago

It does. 😉

[–] obviouspornalt@lemmynsfw.com 15 points 1 year ago (1 children)

They outline it pretty well here:

https://annas-blog.org/how-to-run-a-shadow-library.html

[–] tsonfeir@lemm.ee 4 points 1 year ago

This is a fascinating read

[–] Vigilante@lemmy.today 16 points 1 year ago (1 children)

Also link any ways to donate if they're accepting that.

[–] Andromxda@lemmy.dbzer0.com 16 points 1 year ago

https://annas-archive.org/donate

[–] matcha_addict@lemy.lol 10 points 1 year ago (2 children)

I had no idea about this project. Is it like a better search engine for libgen etc?

[–] weirdo_from_space@sh.itjust.works 20 points 1 year ago

It searches through libgens, z-library and has it's own mirrors of the files they serve on top of that. I think it was created as a response to Z-Library's domain getting seized but I could be wrong.

[–] Andromxda@lemmy.dbzer0.com 14 points 1 year ago

It has way more content than Libgen

Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ

⚓ Dedicated to the discussion of digital piracy, including ethical problems and legal advancements.

Rules • Full Version

Loot, Pillage, & Plunder

🏴‍☠️ Other communities