‘A virtual DPU within a GPU’: Could clever hardware hack be behind DeepSeek’s groundbreaking AI efficiency?
A brand new method known as DualPipe appears to be the important thing to DeekSeek’s success
One professional describes it as an on-GPU digital DPU that maximizes bandwidth effectivity
Whereas DeepSeek has used Nvidia GPUs solely, one wonders how AMD’s Intuition would fare
China’s DeepSeek AI chatbot has shocked the tech trade, representing a reputable various to OpenAI’s ChatGPT at a fraction of the price.
A recent paper revealed DeepSeek V3 was educated on a cluster of two,048 Nvidia H800 GPUs – crippled variations of the H100 (we will solely think about how rather more highly effective it could be operating on AMD Intuition accelerators!). It reportedly required 2.79 million GPU-hours for pretraining, fine-tuning on 14.8 trillion tokens, and value – in accordance with calculations made by The Next Platform – a mere $5.58 million.