ctag

joined 1 year ago
[–] ctag@lemmy.sdf.org 1 points 2 weeks ago

Will check this out. Thanks!

[–] ctag@lemmy.sdf.org 7 points 2 weeks ago

That's pretty neat. Thanks!

[–] ctag@lemmy.sdf.org 1 points 2 weeks ago

Will check this out. Thanks!

[–] ctag@lemmy.sdf.org 5 points 2 weeks ago (3 children)

Thank you for the detailed reply.

keeping on top of this is a full time job!

I guess that's why I'm interested in a tooling based solution. My selfhosting is small-fry junk, but a lot of others like me are hosting entire fedi communities or larger websites.

[–] ctag@lemmy.sdf.org 2 points 2 weeks ago

In that case I'm interested in tools to automate doing that.

[–] ctag@lemmy.sdf.org 4 points 2 weeks ago* (last edited 2 weeks ago) (1 children)

I hadn't heard of that before, thanks for the link.

I haven't read through the docs yet... But PoW makes me wonder what the work is and if it's cryptocurrency related.

Edit: Found it: https://altcha.org/docs/proof-of-work/

[–] ctag@lemmy.sdf.org 7 points 2 weeks ago

In the hackernews comments for that geraspora link people discussed websites shutting down due to hosting costs, which may be attributed in part to the overly aggressive crawling. So maybe it's just a different form of DDOS than we're used to.

[–] ctag@lemmy.sdf.org 7 points 2 weeks ago

A commenter in the hackernews post has created this: https://marcusb.org/hacks/quixotic.html

I'm interested, but it seems like an easy way for bots to exhaust your own server resources before they give up crawling.

[–] ctag@lemmy.sdf.org 1 points 2 weeks ago

Thank you for the detailed response. It's disheartening to consider the traffic is coming from 'real' browsers/IPs, but that actually makes a lot of sense.

I'm coming at this from the angle of AI bots ingesting a website over and over to obsessively look for new content.

My understanding is there are two reasons to try blocking this: to protect bandwidth from aggressive crawling, or to protect the page contents from AI ingestion. I think the former is doable, and the latter is an unwinnable task. My personal reason is because I'm an AI curmudgeon, I'd rather spend CPU resources blocking bots than serving any content to them.

[–] ctag@lemmy.sdf.org 2 points 2 weeks ago (3 children)

Thank you for the reply, but at least one commenter claims they'll impersonate Chrome UAs.

 

Now that we know AI bots will ignore robots.txt and churn residential IP addresses to scrape websites, does anyone know of a method to block them that doesn't entail handing over your website to Cloudflare?

 

Hi,

I'm interested in setting up a small static-site-generator site. Looked at 11ty recently and feel pretty uncomfortable with the amount of javascript and "funny language" churn just to make some html happen.

Do you know of any alternative that's simpler / easier / less complicated dependencies? Or do you have an approach to 11ty that you think I should try?

Thanks in advance for any input, it's appreciated!

view more: next ›