this post was submitted on 23 Jan 2024

138 points (93.7% liked)

Technology

76339 readers

4155 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

138

Google search might be getting worse - and AI threatens to ruin it entirely (www.techradar.com)

submitted 2 years ago by throws_lemy@lemmy.nz to c/technology@lemmy.world

41 comments fedilink hide all child comments

top 41 comments

sorted by: hot top controversial new old

[–] Gradually_Adjusting@lemmy.world 14 points 2 years ago (6 children)

I'm no expert, but dreaming here - is there a FOSS search engine that can be run in a distributed way by a community? I would happily switch to that if there was.

[–] tonyn@lemmy.ml 7 points 2 years ago (1 children)

Do you mean like Yacy?

[–] Gradually_Adjusting@lemmy.world 1 points 2 years ago

YES

[–] Plopp@lemmy.world 7 points 2 years ago (1 children)

That'd be awesome. I'm just curious how you'd go about constructing such a thing that would be resilient against millions and potentially billions of dollars invested in trying to break it and making it serve results it otherwise wouldn't. Because those investments will happen if such search engine would gain traction. I really like the idea though.

[–] Gradually_Adjusting@lemmy.world 2 points 2 years ago

I have heard of searches that are non commercial, so it can't be impossible. I read this recently: https://lemmy.world/post/10979517

[–] elrik@lemmy.world 6 points 2 years ago* (last edited 2 years ago) (1 children)

It's a really interesting question and I imagine scaling a distributed solution like that with commodity hardware and relatively high latency network connections would be problematic in several ways.

There are several orders of magnitude between the population of people who would participate in providing the service and those who would consume the service.

Those populations aren't local to each other. In other words, your search is likely global across such a network, especially given the size of the indexed data.

To put some rough numbers together for perspective, for search nearing Google's scale:

A single copy of a 100PB index would require 10,000 network participants each contributing 10TB of reliable and fast storage.
100K searches / sec if evenly distributed and resolvable by a single node would be at least 10 req/sec/node. Realistically it's much higher than that, depending on how many copies of the index, how requests are routed, and how many nodes participate in a single query (probably on the order of hundreds). Of that 10TB of storage per node, substantial amounts of it would need to be kept in memory to sustain the likely hundreds of req/sec a node might see on average.
The index needs to be updated. Let's suppose the index is 1/10th the size of the crawled data and the oldest data is 30 days (which is pretty stale for popular sites). That's at least 33PB of data to crawl per day or roughly 3,000Gbps minimum sustained data ingestion. For those 10,000 nodes they would need 1Gbps of bandwidth to index fresh data.

These are all rough numbers but this is not something the vast majority of people would have the hardware and connection to support.

You'd also need many copies of this setup around the world for redundancy and lower latency. You'd also want to protect the network against DDoS, abuse and malicious network participants. You'll need some form of organizational oversight to support removal of certain data.

Probably the best way to support such a distributed system in an open manner would be to have universities and other public organizations run the hardware and support the network (at a non-trivial expense).

[–] UNWILLING_PARTICIPANT@sh.itjust.works 4 points 2 years ago (2 children)

So this is starting to sound more like something that needs to explicitly be paid for in some way (as opposed to just crowd sourcing personal hardware), at least if we want to maintain the same level of service.

[–] Gradually_Adjusting@lemmy.world 2 points 2 years ago

It seems like there are others in the thread with good options

[–] elrik@lemmy.world 1 points 2 years ago

Yes, at least currently. There may be better options as multi-gigabit internet access becomes more common place and commodity hardware gets faster.

The other options mentioned in this thread are basically toys in comparison (either obtaining results from existing search engines or operating at a scale less than a few terabytes).

[–] ioslife@lemmy.ml 4 points 2 years ago (1 children)

SearXNG?

[–] Gradually_Adjusting@lemmy.world 1 points 2 years ago

Good one!

[–] yuki2501@lemmy.world 4 points 2 years ago (2 children)

No, but we can build it. It's called a Directory. This is how Yahoo! worked before it got enshittified and eventually replaced by Google search.

https://en.wikipedia.org/wiki/Yahoo!_Directory

[–] wikibot@lemmy.world 4 points 2 years ago (2 children)

Here's the summary for the wikipedia article you mentioned in your comment:

The Yahoo! Directory was a web directory which at one time rivaled DMOZ in size. The directory was Yahoo! 's first offering and started in 1994 under the name Jerry and David's Guide to the World Wide Web. When Yahoo!

^to^ ^opt^ ^out^^,^ ^pm^ ^me^ ^'optout'.^ ^article^ ^|^ ^about^

[–] Stovetop@lemmy.world 6 points 2 years ago

Bot thinks the exclamation point at the end of Yahoo! is the end of the sentence. Cute bot.

[–] jbloggs777@discuss.tchncs.de 1 points 2 years ago

Dmoz was great.

[–] ares35@kbin.social 1 points 2 years ago* (last edited 2 years ago)

even netscape/mozilla was in on this game, with the open directory project (dmoz). aol eventually shut it down, but it apparently lives on independently as curlie.org - but i have no idea how current it is or anything.

[–] magic_lobster_party@kbin.social 4 points 2 years ago (2 children)

FOSS won’t change this matter unless it somehow can filter out all the low quality AI-generated articles better than Google’s filters.

[–] ParetoOptimalDev@lemmy.today 1 points 2 years ago (1 children)

Don't allow sites in the index by default, use an allowlist.

[–] magic_lobster_party@kbin.social 3 points 2 years ago (1 children)

Is this allowlist supposed to work on article by article basis? Because that’s what has to be done for publishing platforms like medium.

[–] ParetoOptimalDev@lemmy.today 1 points 2 years ago

By default I was thinking site. But for sites with huge variances in page quality you'd need it by page/article as you say.

[–] Gradually_Adjusting@lemmy.world 1 points 2 years ago

Comes down to editorial quality maybe; what sites do you trust?

Jimmy Wales has a social media project with "whom to trust" built into the algorithm. I'm not sure if it is an idea with legs, but I like where his head is at

[–] Halvdan@sopuli.xyz 12 points 2 years ago (1 children)

"might"...? I thought that was pretty much a proven fact. Not that any of the other are any better, with the possible exception of Kagi, but fuck using that anymore. I'm pretty disappointed by that since I finally had found a search that was pretty good and I have no problem paying for it, but I simply won't give that asshat any money. Why aren't there any nice capitalists? 🤔

[–] TheEntity@kbin.social 8 points 2 years ago (1 children)

Wait, what happened to Kagi?

[–] ioslife@lemmy.ml 12 points 2 years ago (2 children)

Kagi now has the ability to include Brave search results and people are upset about that because someone brought it up as a “partnership” with Brave, but they’re just using the search API

[–] TheEntity@kbin.social 7 points 2 years ago (2 children)

Sounds... at worst just as bad as using Microsoft's or Google's search results. I get that the Brave CEO is a POS but Google and Microsoft are Google and Microsoft.

So basically a drama over nothing?

[–] ioslife@lemmy.ml 3 points 2 years ago

Yes

[–] demonsword@lemmy.world 1 points 2 years ago* (last edited 2 years ago) (1 children)

“Should we not be buying VW, BMW, Siemens and Bayer technology and products today because they participated in holocaust and directly collaborated with Hitler?” – CEO of Kagi when given feedback re: Brave partnership

doesn't sound like it's "drama over nothing" to me

[–] phar@lemmy.world 7 points 2 years ago

lol "Does Hitler still work there?"

[–] UNWILLING_PARTICIPANT@sh.itjust.works 2 points 2 years ago (1 children)

So it's not actually funding anything reprehensible?

I ask because I'm really liking it...

[–] TheEntity@kbin.social 5 points 2 years ago (1 children)

This response from the Kagi founder seems to be quite solid. https://kagifeedback.org/d/2808-reconsider-your-partnership-with-brave

[–] UNWILLING_PARTICIPANT@sh.itjust.works 1 points 2 years ago

Good convo there, thanks

[–] antimidas@sopuli.xyz 10 points 2 years ago* (last edited 2 years ago) (1 children)

SEO is of course a problem, but it's been a problem for a long time, and there are ways around it for those who know how to seek information. Proper use of keywords, blacklisting sites with known spam information, searching specific sites, mandating specific words and phrases to be contained in the search etc. It's true, however, that information has become less discoverable during the latest decade – at least reliable information has.

While AI-written spam articles and such have been a pain sometimes, gatekeeping content is in my opinion as big of a threat to the proper use of search engines for finding information. As more and more sites require you to log in to view the discussion (social media is the worst offender here) much of the search results is unusable. Nowadays the results lead to a paywall or a login wall almost more often than to a proper result, and that makes them almost completely useless. I understand this kind of thing for platforms which pay for creating the content, e.g. news sites, but user-generated content shouldn't be locked behind a login requirement.

I fear the day StackOverflow and Reddit decide the users' discussions should be visible for only logged-in users. Reddit has already taken the first steps with limiting "NSFW" content to logged-in users only (on new reddit). Medium articles going behind paywall also caused some headaches a while back.

[–] kernelle@0d.gs 5 points 2 years ago* (last edited 2 years ago)

SEO is of course a problem, but it's been a problem for a long time

This has been denied for the longest time by search engines, them claiming search is always getting better, which is why we're seeing a lot of articles pointing at the problem.

I think the prime of search engines as we know them has long passed, right before SEO became mainstream. Which is why we need new tools to search the web, which will be powered by AI.

[–] alienanimals@lemmy.world 6 points 2 years ago

Weak reporting and a clickbait title. Wow, Christian Guyton must have his masters in Journalism!

[–] drmoose@lemmy.world 6 points 2 years ago

it's kinda ruined already and Google's incompetence/greed didn't need any AI assistance.

[–] wolfeh@lemmy.world 5 points 2 years ago

Might be?

[–] OutrageousUmpire@lemmy.world 1 points 2 years ago

Google has sucked for a while. I see little difference between it, Bing, and Brave. In fact, I often get better results with Bing and Brave.

[–] GilgameshCatBeard@lemmy.ca 0 points 2 years ago (1 children)

This would be the best thing AI could do for humanity.

[–] magic_lobster_party@kbin.social 4 points 2 years ago (1 children)

The study says it’s affecting all other search engines as well

[–] GilgameshCatBeard@lemmy.ca -2 points 2 years ago* (last edited 2 years ago)

Sometimes, casualties are an unfortunate result of doing what’s necessary. Think of it like antibiotics. Sure, it kills bad bacteria that causes infection. But it also kills good bacteria.

Scorched earth is not always bad for the whole of a thing.