Reddit is actually extremely good for AI. It's a vast trove of examples of people talking to each other.
When it comes to factual data then there are better sources, sure, but factual data has never been the key deficiency of AI. We've long had search engines for that kind of thing. What AIs had trouble with was human interaction, which is what Reddit and Facebook are all about. These datasets train the AI to be able to communicate.
If the Fediverse was larger we'd be a significant source of AI training material too. Would be surprised if it's not being collected already.
The Internet Archive was distributing unlimited copies of ebooks whose rights were held by major publishers.
The major publishers sued them for distributing copies of ebooks whose rights were held by them.
Yeah, totally unrelated.