Post

Conversation

🇨🇳 DeepSeek delayed the release of its new model after failing to train it using 🇨🇳 Huawei’s chips, highlighting the limits of Beijing’s push to replace US technology. DeepSeek was encouraged by authorities to adopt Huawei’s Ascend processor rather than use Nvidia’s systems after releasing its R1 model in January. But the Chinese start-up encountered persistent technical issues during its R2 training process using Ascend chips, prompting it to use Nvidia chips for training and Huawei’s for inference. The issues were the main reason the model’s launch was delayed from May, causing it to lose ground to rivals. Training involves the model learning from a large dataset, while inference refers to the step of using a trained model to make predictions or generate a response, such as a chatbot query. DeepSeek’s difficulties show how Chinese chips still lag behind their US rivals for critical tasks, highlighting the challenges facing China’s drive to be technologically self-sufficient. Beijing has demanded that Chinese tech companies justify their orders of Nvidia’s H20, in a move to encourage them to promote alternatives made by Huawei and 🇨🇳 Cambricon. Chinese chips suffer from stability issues, slower inter-chip connectivity and inferior software compared with Nvidia’s products. Huawei sent a team of engineers to DeepSeek’s office to help the company use its AI chip to develop the R2 model. Yet despite having the team on site, DeepSeek could not conduct a successful training run on the Ascend chip. DeepSeek is still working with Huawei to make the model compatible with Ascend for inference. Founder Liang Wenfeng has said internally he is dissatisfied with R2’s progress and has been pushing to spend more time to build an advanced model that can sustain the company’s lead in the AI field. The R2 launch was also delayed because of longer-than-expected data labelling for its updated model. Chinese media reports have suggested that the model may be released as soon as in the coming weeks. “Models are commodities that can be easily swapped out. A lot of developers are using Alibaba’s Qwen3, which is powerful and flexible.” Qwen3 adopted DeepSeek’s core concepts, such as its training algorithm that makes the model capable of reasoning, but made them more efficient to use. ft.com/content/eb9846
Image