a quick web search uses much less power/resources compared to AI inference
Do you have a source for that? Not that I'm doubting you, just curious. I read once that the internet infrastructure required to support a cellphone uses about the same amount of electricity as an average US home.
Thinking about it, I know that LeGoog has yuge data centers to support its search engine. A simple web search is going to hit their massive distributed DB to return answers in subsecond time. Whereas running an LLM (NOT training one, which is admittedly cuckoo bananas energy intensive) would be executed on a single GPU, albeit a hefty one.
So on one hand you'll have a query hitting multiple (comparatively) lightweight machines to lookup results - and all the networking gear between. One the other, a beefy single-GPU machine.
(All of this is from the perspective of handling a single request, of course. I'm not suggesting that Wikipedia would run this service on only one machine.)
Man - that's wild. Thank you for coming though with a citation - I appreciate it!