this post was submitted on 17 Jan 2024
955 points (98.6% liked)
Technology
59674 readers
2917 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
At some point in the last two years I completely stopped using Google search in browser and just use Google maps to find businesses or ddg for searches. Actual Google search just has too many sponsored or promotional links
Pretty soon the internet will be almost completely ruined. Within a few years. AI bots will have spammed everything. Searches and web pages will be entirely faked bs. Reddit and Lemmy will have enough ai Bots commenting and pushing agendas/products that no one will have a clue who's a real person. Information that's true will be almost impossible to verify online.
In short, if you think the web has gotten bad now, you ain't seen nothin yet.
I agree with the sentiment, but lack of AI has not stopped SEO hacking in the past. Sure it will help them go farther, but there are already tons of garbage websites hacking the top 1-5 results of any search
The top results pages, sure. I belive it's going to take over the top 500. Along with flooding places like lemmy and reddit.
In the past I remember it made using search engines less rewarding than using web directories, web rings, asking people on forums etc. That was slower, but gave you results (and acquaintances). While using search meant looking through dozens of pages of search results, mainly SEO.
I am more optimistic on that one. AI provides a pretty clear way out of this, since it allows you to automatically detect the bullshit. Meaning either the bullshit has to raise so much in quality that it is indistinguishable from good content, in which case it would not be bullshit anymore, or it will get filtered. AI can also transform bad websites into good ones, like a super-powered ReaderMode, AdBlock and more all rolled into one, so a lot of the "lets plaster everything with ads" will lose effectiveness.
The problem over the last decade was that Google completely lost interest in being a search engine, they are just an ad company and as long as search leads you to more ads, they are quite happy. So the user experience went down the toilet.
The real problem with AI is that it will remove the incentive for the authors. Content producers want to get paid, with AI you can just extract the information from an article without ever viewing the article or the ads around it.
I think it's just a new world for spam.
At some point, probably soon, AI content will generate so much data it becomes untenable to store all the scraped data.
We'll also reach a point where it becomes much more costly to parse the data for AI spam+trustworthiness+topics. If you need LLMs just to filter spam, that is a large step up in costs and infrastructure vs current methods.
When that happens what happens to search? The quality will have to degrade or the margins will drop off sharply.
They have already been trying to use ai to combat and identify ai in college and highschool papers. So far it's been severely ineffective. AI has gotten pretty good at writing out a sentence or two that looks like it's real. If ai improves enough I doubt they'll be much of a way to identify it all.
It's not about identifying AI or even spam, but about extracting useful information. Are the claims made in a source backed by other sources? Do they violate information from trusted sources? That's all stuff that an AI can reason about and then discard the source as junk or condense it down to the useful information in it.
Basically you completely skip browsing the Web yourself and just use the AI to find you what you want. Think of it like some IMDB or Wikipedia, but covering everything and written and curated by AI. When the AI doesn't already know some fact, it goes crawling the Web and finding it out for you, expanding its knowledge base in the process.
Or see the ship computer from StarTrek, you don't see the people there browsing the Web, you see them getting data in exactly the format they need and they can reformat and filter it as needed.
At the moment there are still some technical hurdles, the AI systems we have are all still a little to stupid for this. But that seems to be the direction we are heading, things like summarizer bots already do a pretty good job and ChatGPT is reasonably good at answering basic questions and reformatting it the way you need it. Only a matter of time until it gets good enough that you couldn't do a better job yourself.
You're looking at it in a flawed manner. AI has already been making up sources and names to state things as facts. If there's a hundred websites for claiming the earth is flat and you ask an ai if the earth is flat, it may tell you it is flat and source those websites. It's already been happening. Then imagine more opinionated things than hard observable scientific facts. Imagine a government using AI to shape opinion and claim there was no form of insurrection on Jan 6th. Thousands of websites and comments could quickly be fabricated to confirm that it was all made up. Burying the truth into obscurity.
You have plenty of literature that can act as ground truth. This is not a terribly hard problem to solve, it just requires actually focusing on it. Which so far simply hasn't been done. ChatGPT is just the first "look, this can generate text". It was never meant to do anything useful by itself or stick to the truth. That all still has to be developed. ChatGPT simply demonstrates that LLM can process natural language really well. It's the first step in this, not the last.
Sounds like you're arguing against yourself, now.