Supermodel

“Supermodel” – a cartoon that illustrates how DeepSeek’s announcement of more efficient AI model creation is good news for consumers.

In August 2024, Elon Musk announced he had secured $640 M to build the Colossus cluster, which xAI uses to train its AI models.

This cluster initially planned to house 100,000 Nvidia GPUs, with plans announced by October to double that amount.

By December, Elon, referencing Dr Evil from the Austin Powers films, said: “Nope, at least 1 biiillioon GPUs!”

Despite a cost of around $25,000 per GPU, Elon and many other companies in the US and Europe had little concern to raise the capital, for example Sam Altman was pursuing investors, including the U.A.E., for a project possibly requiring up to $7 TN.

Meanwhile, on another planet there existed equally ambitious entrepreneurs who had to innovate in efficiency to get ahead because of U.S. restrictions on GPU hardware availability.

Necessity required a focus on the reduction of communication overhead, both between GPU nodes and within nodes, resulting in a ‘training’ model cost of under $6M.

DeepSeek’s hardware optimizations should mean all the capital raised by these AI model companies can focus on increasing the pace of applications to benefit consumers, rather than pouring money into GPU hardware and energy projects to manage sprawling data centers.

“In adversity lies innovation…”

Wil Koenig

Organizational Design, Business Agility and Digital Enterprise Strategist | Agile and Open Innovation Program Leader | Enterprise Architect | Technologist | Advisor
SILAC Insurance Company

“I feel pretty good about Nvidia and I get the sense whatever comes next will need as much compute as the world can handle and so we still need to work in energy efficiency.
I suspect in the next decade robotics will be the big item and Nvidia is at the center. Then perhaps Quantum Computing. Let’s see how well DeepSeek is addopted outside of China, particularly in a few years.”

John Knight

DevOps, Platform and SRE Leader | Author | Human Computer Interaction (AI, Games, Digital Learning)

“This isn’t a story about necessity but misinformation. From https://semianalysis.com/2025/01/31/deepseek-debates/
Our analysis shows that the total server CapEx for DeepSeek is ~$1.6B, with a considerable cost of $944M associated with operating such clusters.
DeepSeek’s price and efficiencies caused the frenzy this week, with the main headline being the “$6M” dollar figure training cost of DeepSeek V3. This is wrong. This akin to pointing to a specific part of a bill of materials for a product and attributing it as the entire cost. The pre-training cost is a very narrow portion of the total cost.
We believe the pre-training number is nowhere the actual amount spent on the model. We are confident their hardware spend is well higher than $500M over the company history.”