At this point we need to treat AI web scrapers as DDoS attacks and prosecute the companies and people involved the same way we would those
Technology
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
I've been in a similar situation, and I'm also blocking large ranges of IP addresses in addition to running Anubis in front of my most scraped services (Git/forgejo and Lemmy)
I came up with a hacky python script that watches my fail2ban logs, counts bans for IP ranges going from /28 to /8, applies some heuristics (based on range size n and how offending IPs are split between the 2 /(n+1) subranges) I came up with to detect ranges that should be blocked, the issues a log line that is picked up by fail2ban to manage bans of increasing length on récidive.
It's quite contrived and I often fear it will be too agressive and block something I rely on, but it has been working really wellin my experience.
It will initially block a lot of small ranges, but over time the ranges will grow larger. Smaller ranges having a lower threshold helps it block only the narrowest ranges needed, which gives some time for larger ranges that contain them to drop out of fail2ban's watchlist.
I should clean up this mess and make it a git repo, maybe even try to have it merged in fail2ban
So we're at the point where A I. Is not only stealing intellectual property, but also driving up costs for people while doing it.