Technology

84277 readers

3993 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

The Day I Logged 1 In Every 2000 Public IPv4: Visualizing The AI Scraper DDoS - VulpineCitrus (vulpinecitrus.info)

submitted 12 hours ago by lemmydividebyzero@reddthat.com to c/technology@lemmy.world

3 comments fedilink hide all child comments

top 3 comments

sorted by: hot top controversial new old

[–] MysticKetchup@lemmy.world 6 points 3 hours ago

At this point we need to treat AI web scrapers as DDoS attacks and prosecute the companies and people involved the same way we would those

[–] pcouy@lemmy.pierre-couy.fr 13 points 11 hours ago

I've been in a similar situation, and I'm also blocking large ranges of IP addresses in addition to running Anubis in front of my most scraped services (Git/forgejo and Lemmy)

I came up with a hacky python script that watches my fail2ban logs, counts bans for IP ranges going from /28 to /8, applies some heuristics (based on range size n and how offending IPs are split between the 2 /(n+1) subranges) I came up with to detect ranges that should be blocked, the issues a log line that is picked up by fail2ban to manage bans of increasing length on récidive.

It's quite contrived and I often fear it will be too agressive and block something I rely on, but it has been working really wellin my experience.

It will initially block a lot of small ranges, but over time the ranges will grow larger. Smaller ranges having a lower threshold helps it block only the narrowest ranges needed, which gives some time for larger ranges that contain them to drop out of fail2ban's watchlist.

I should clean up this mess and make it a git repo, maybe even try to have it merged in fail2ban

[–] thericofactor@sh.itjust.works 14 points 11 hours ago

So we're at the point where A I. Is not only stealing intellectual property, but also driving up costs for people while doing it.