r00ty

joined 2 years ago
[–] r00ty@kbin.life 8 points 9 months ago (10 children)

No, I think it's just me on my instance (that probably has the capacity for 1000+ active users) and the steady influx of suspicious accounts that pass the email verification and captcha and then either post nothing, or post adverts get banned/deleted and it goes on.

Mind you I don't really advertise the instance either. So that's likely why.

I suspect people coming from reddit don't understand the fediverse (I know I didn't when I first got here). So they go to the hosting instance and join there, not really understanding they can join any instance and then join the community (if not already on the instance).

[–] r00ty@kbin.life 4 points 9 months ago

I think the top one might be the culprit. But it might be the guy's account was hacked?

On his repo he has a fork of WSL and the repo is called "free-palestine", he tried to merge the branch "freedom". So that PR seems likely to be linked to this. Other than this, activity seems normal for a terminal githubber with 444 repos...

[–] r00ty@kbin.life 26 points 9 months ago (4 children)

I feel like the only even remotely acceptable way to do this is to show the ad, prompt for the answer for 10 seconds. They can log the right/wrong answer or if the time expires the lack of one and must move on.

I can imagine metrics knowing if your advertising is actually reaching people is valid. But to make people answer and especially make them watch more if they answer wrong is about as dystopian as it gets.

If (and I say if, I really don't want to believe it is) that is the case, the only correct response is to uninstall Hulu immediately and put on your pirate hat.

[–] r00ty@kbin.life 19 points 9 months ago

Why? Because you can. But in terms of useful reasons?

Cellphones, Internet they need infrastructure to work, and that can be disabled either during a natural disaster or war situation. Even by your own government in some cases.

But if I want to communicate, I just need a piece of wire, somewhere to hang it, and a 12v battery and I can communicate for thousands of miles.

Personally I just think that's cool.

[–] r00ty@kbin.life 6 points 9 months ago (1 children)

The "Interesting" is very Muskesque. I also think if it was DMs to someone else, even in the USA that's got to be some level of a legal privacy issue.

[–] r00ty@kbin.life 3 points 9 months ago (1 children)

Didn't have the link to hand. But a search turned this one up: https://reggiodigital.com/blog/nginx-rule-blocking-bad-bots/ it looks to be the same list, and you can see the ones I've added to the end of that list.

[–] r00ty@kbin.life 2 points 9 months ago (3 children)

Hmm, I took an original list and added to it. You got a website I can check? If so I'll happily remove. I don't mind slow web crawlers at all.

[–] r00ty@kbin.life 4 points 9 months ago

So on my mbin instance, it's on cloudflare. So I filter the AS numbers there. Don't even reach my server.

On the sites that aren't behind cloudflare. Yep it's on the nginx level. I did consider firewall level. Maybe just make a specific chain for it. But since I was blocking at the nginx level I just did it there for now. I mean it keeps them off the content, but yes it does tell them there's a website there to leech if they change their tactics for example.

You need to block the whole ASN too. Those that are using chrome/firefox UAs change IP every 5 minutes from a random other one in their huuuuuge pools.

[–] r00ty@kbin.life 5 points 9 months ago (2 children)

Yeah, I probably should look to see if there's any good plugins that do this on some community submission basis. Because yes, it's a pain to keep up with whatever trick they're doing next.

And unlike web crawlers that generally check a url here and there, AI bots absolutely rip through your sites like something rabid.

[–] r00ty@kbin.life 18 points 9 months ago (12 children)

If you're running nginx I am using the following:

if ($http_user_agent ~* "SemrushBot|Semrush|AhrefsBot|MJ12bot|YandexBot|YandexImages|MegaIndex.ru|BLEXbot|BLEXBot|ZoominfoBot|YaK|VelenPublicWebCrawler|SentiBot|Vagabondo|SEOkicks|SEOkicks-Robot|mtbot/1.1.0i|SeznamBot|DotBot|Cliqzbot|coccocbot|python|Scrap|SiteCheck-sitecrawl|MauiBot|Java|GumGum|Clickagy|AspiegelBot|Yandex|TkBot|CCBot|Qwantify|MBCrawler|serpstatbot|AwarioSmartBot|Semantici|ScholarBot|proximic|GrapeshotCrawler|IAScrawler|linkdexbot|contxbot|PlurkBot|PaperLiBot|BomboraBot|Leikibot|weborama-fetcher|NTENTbot|Screaming Frog SEO Spider|admantx-usaspb|Eyeotabot|VoluumDSP-content-bot|SirdataBot|adbeat_bot|TTD-Content|admantx|Nimbostratus-Bot|Mail.RU_Bot|Quantcastboti|Onespot-ScraperBot|Taboolabot|Baidu|Jobboerse|VoilaBot|Sogou|Jyxobot|Exabot|ZGrab|Proximi|Sosospider|Accoona|aiHitBot|Genieo|BecomeBot|ConveraCrawler|NerdyBot|OutclicksBot|findlinks|JikeSpider|Gigabot|CatchBot|Huaweisymantecspider|Offline Explorer|SiteSnagger|TeleportPro|WebCopier|WebReaper|WebStripper|WebZIP|Xaldon_WebSpider|BackDoorBot|AITCSRoboti|Arachnophilia|BackRub|BlowFishi|perl|CherryPicker|CyberSpyder|EmailCollector|Foobot|GetURL|httplib|HTTrack|LinkScan|Openbot|Snooper|SuperBot|URLSpiderPro|MAZBot|EchoboxBot|SerendeputyBot|LivelapBot|linkfluence.com|TweetmemeBot|LinkisBot|CrowdTanglebot|ClaudeBot|Bytespider|ImagesiftBot|Barkrowler|DataForSeoBo|Amazonbot|facebookexternalhit|meta-externalagent|FriendlyCrawler|GoogleOther|PetalBot|Applebot") { return 403; }

That will block those that actually use recognisable user agents. I add any I find as I go on. It will catch a lot!

I also have a huuuuuge IP based block list (generated by adding all ranges returned from looking up the following AS numbers):

AS45102 (Alibaba cloud) AS136907 (Huawei SG) AS132203 (Tencent) AS32934 (Facebook)

Since these guys run or have run bots that impersonate real browser agents.

There are various tools online to return prefix/ip lists for an autonomous system number.

I put both into a single file and include it into my web site config files.

EDIT: Just to add, keeping on top of this is a full time job! EDIT 2: Removed Mojeek bot as it seems to be a normal web crawler.

[–] r00ty@kbin.life 8 points 10 months ago (1 children)

What next? A toaster with butter spreader built-in?

Yes, but the it burns the logo of the highest bidder each month onto your toast.

[–] r00ty@kbin.life 13 points 10 months ago

Yeah, it's not outside the realm of possibilities. But by far, they're more likely to be updates for the smart features.

view more: ‹ prev next ›