this post was submitted on 02 Feb 2024
675 points (99.1% liked)
Technology
59589 readers
3825 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Without getting into too much detail, a cached site saved my ass in a court case. Fuck you Google.
It sucks because it's sometimes (but not very often) useful but it's not like they are under any obligation to support it or are getting any money from doing it.
Isn't caching how anti-paywall sites like 12ft.io work?
At least some of these tools change their "user agent" to be whatever google's crawler is.
When you browse in, say, Firefox, one of the headers that firefox sends to the website is "I am using Firefox" which might affect how the website should display to you or let the admin knkw they need firefox compatibility (or be used to fingerprint you...).
You can just lie on that, though. Some privacy tools will change it to Chrome, since that's the most common.
Or, you say "i am the google web crawler", which they let past the paywall so it can be added to google.
If I'm not wrong, Google has a set range of IP addresses for their crawlers, so not all sites will let you through just because your UA claims to be Googlebot