this post was submitted on 08 Sep 2025
119 points (96.1% liked)
Selfhosted
59973 readers
402 users here now
A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.
Rules:
-
Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
-
No spam.
-
Posts here are to be centered around self-hosting. Please ensure it is clear in your post how it relates to self-hosting.
-
Don't duplicate the full text of your blog or git here. Just post the link for folks to click.
-
Submission headline should match the article title.
-
No trolling.
Resources:
- selfh.st Newsletter and index of selfhosted software and apps
- awesome-selfhosted software
- awesome-sysadmin resources
- Self-Hosted Podcast from Jupiter Broadcasting
Any issues on the community? Report it using the report flag.
Questions? DM the mods!
founded 3 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
In what way? Anything on the public internet is likely being used for AI training. I guess by using free GitHub you can't object to training.
Then again anywhere you host you sort of run into the same problem. You can use robots.txt, but things don't have to listen to it.
Self-hosting there are some ways to fight back, or depending on your opinions on Cloudflare it seems they’re fairly effective at blocking the AI crawlers.
Yep, on top of simply blocking, if you're self hosting or using cloudflare, you can enable AI tarpits.
How do I do this? I don't mind (and may prefer) to host not at home. My main concern with GH is that you become an AI snack whether you like it or not.
Which part? If you're wanting to use cloudflare pages, it's relatively straightforward. You can follow this and get up & running pretty quickly: https://www.hongkiat.com/blog/host-static-website-cloudflare-pages/
If you're asking about the tarpits, there's two ways (generally) to accomplish that. Even if you don't use cloudflare pages to host your site directly (if you use nginx on your server, for example), you can still enable AI tarpits for your entire domain, so long as you use cloudflare for your DNS provider. If you use pages, the setup is mostly the same: https://blog.cloudflare.com/ai-labyrinth/#how-to-use-ai-labyrinth-to-stop-ai-crawlers
If you want to do it all locally, you could instead setup iocaine or nepenthes which are both self hosted and can integrate with various webserver software. Obviously, cloudflare's tarpits are stupid simple to setup compared to these, but these give you greater control of exactly how you're poisoning the well and trapping crawlers.
Github, acquired by Microsoft, is now forcing AI on its user base.
That's one of my main drivers to stay away from GH