this post was submitted on 09 Apr 2024
151 points (94.2% liked)

Technology

59605 readers
4202 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

A prototype is available, though it's Chrome-only and English-only at the moment. How this'll work is you select some text and then click on the extension, which will try to "return the relevant quote and inference for the user, along with links to article and quality signals".

How this works is it uses ChatGPT to generate a search query, utilizes WP's search API to search for relevant article text, and then uses ChatGPT to extract the relevant part.

you are viewing a single comment's thread
view the rest of the comments
[–] swordsmanluke@programming.dev 1 points 7 months ago (2 children)

a quick web search uses much less power/resources compared to AI inference

Do you have a source for that? Not that I'm doubting you, just curious. I read once that the internet infrastructure required to support a cellphone uses about the same amount of electricity as an average US home.

Thinking about it, I know that LeGoog has yuge data centers to support its search engine. A simple web search is going to hit their massive distributed DB to return answers in subsecond time. Whereas running an LLM (NOT training one, which is admittedly cuckoo bananas energy intensive) would be executed on a single GPU, albeit a hefty one.

So on one hand you'll have a query hitting multiple (comparatively) lightweight machines to lookup results - and all the networking gear between. One the other, a beefy single-GPU machine.

(All of this is from the perspective of handling a single request, of course. I'm not suggesting that Wikipedia would run this service on only one machine.)

[–] sheogorath@lemmy.world 8 points 7 months ago (1 children)

Based on this article, it seems that on average an LLM query costs about 10x when compared to a search engine query.

[–] swordsmanluke@programming.dev 1 points 7 months ago

Man - that's wild. Thank you for coming though with a citation - I appreciate it!

[–] barsoap@lemm.ee 3 points 7 months ago

A simple web search is going to hit their massive distributed DB to return answers in subsecond time.

It's going to hit an index, not the actual data, it's going to return approximate and not accurate results. Tons of engineering been done around basic search precisely to get more data locality.

Read a blog post at some time (please don't ask me where) talking about Bing vs. Google when Bing started to use ChatGPT and it basically boiled down to "Google has the tech to do it, they don't roll it out because they don't want to eat the electricity bill this is MS spending money to get market share". The cost difference in providing search vs. having ChatGPT answer a question was something like 10x. It might not be that way forever what with beating models down to work in trinary and stuff, though (that's not just massive quantisation but also much easier maths, convolutions don't need much maths when all you deal with is -1, 0, 1 IIRC you can throw out the multiplication unit and work with nothing but shifts and adds)