this post was submitted on 11 Jan 2026
280 points (98.3% liked)
Technology
79476 readers
4494 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I had a short tootstorm about this, because oh my god, this is some terribly ineffective, useless piece of nothing.
For one, Poison Fountain tells us to join the war effort and cache responses. Okay...
Yeaah... how am I supposed to cache this? Do I cache one response and then continue serving that for the 50+ million crawlers that visit my sites every day? And you think a single, repetitive thing will poison anything at all? Really?
Then, the Poison Fountain explanation goes on to explain that serving garbage to the crawlers will end up in the training data. I'm fairly sure the person who set this up never worked with model training, because this is not what happens. Not even the AI companies are that clueless, they do not train on anything and everything, they do filter it down.
And what this fountain provides, is trivial to filter.
It's also mighty hard to set up! It's not just a
reverse_proxy https://rnsaffn.com/posion2, because then you leak all the headers you got. No, you have to make a sanitized request that doesn't leak data. Good luck!Meanwhile, there are a gazillion of self-hostable garbage generators and tarpits that you can literally shove in a docker container and reverse proxy tarpit URLs to them, safely, locally. Much more efficient, far more effective. And, seeing as this is practically uncacheable, if I were to use it, I'd have to send all the shit that hits my servers, their way. As far as I can tell, this is a single Linode server. It probably wouldn't crumble under my 50 million requests / day, but if ten more people would join the "war effort" without caching, my well educated guess is that it would fall over and die.
Besides, we have no idea whether poisoning works. We can't measure that. What we can measure, is the load on our servers, and this helps fuck all in that regard. The bots will still come, they'll still hit everything, and I'd have additional load due to the network traffic between my server and theirs (remember: the returned response provides no sane indicators that'd allow caching while keeping the responses useful for poisoning purposes).
Not only is this ineffective in poisoning, it's not usable at all in its current state. And they call for joining the war effort. C'mon.
Your comment is more technical than I can properly follow, but this all reminds me of the tarpits I read about a year ago. https://arstechnica.com/tech-policy/2025/01/ai-haters-build-tarpits-to-trap-and-trick-ai-scrapers-that-ignore-robots-txt/
Is that more usable? (Genuine curiosity from someone who would shed no tears if the plagiarism machines experienced resistance)
Yup. All of the things listed there are far better than this.
(I'm also in that article, look for "iocaine", although it evolved into something a whole lot more powerful, and a lot easier to deploy since the article was written).
Love the princess bride reference. Thank you for acting on behalf of those of us with less technical skills.