More GPUs =/= better AI.
More data =/= better AI.
More tech bro “superstars” =/= better AI
This is what people like Musk and Zuckerberg don’t seem to understand.
Training scales very poorly past a certain cluster size, especially if you go for new architectures to actually pursue improvements, hence reports of GPUs being tasked with busywork just to meet utilization quotes. Increasing data size and training scale hits diminishing returns, quick, or even regresses models because the bulk data is shit and the model is too inefficient. A prime example: Llama 4. “Superstar” AI engineers are better and Tweeting and sycophantic gaslighting than coding something interesting.
In other words, I’d argue there’s a much smaller “sweet spot” for pure LLMs that these billionaires are way, way past. And no one is telling them no because they’re too rich to hear it. It’s all going to collapse on itself because scaling like that just does not work.