We use cookies to tailor your experience and present relevant ads. By clicking “Accept”, you agree that cookies can be placed per our Privacy Policy
ACCEPT
Advertisement

Alibaba Cloud team wins top AI award for breakthrough in model efficiency

The research could help reduce training and inference costs for Alibaba’s next generation Qwen models without sacrificing accuracy

Reading Time:3 minutes
Why you can trust SCMP
0
Listen
Advertisement
discover more stories from
Advertisement
Browse other locations
China
Asia
North America
Middle East
Europe
Russia and Central Asia
Oceania
Africa
Americas and Caribbean
NEWSLETTERSaturday
China Future Tech
Explore in-depth coverage of EVs, AI, semiconductors, aerospace, robotics and biomedicine sectors. Follow us to stay ahead in China's evolving competitive landscape.
By submitting, you consent to receiving marketing emails from SCMP. If you don't want these, tick here
By registering, you agree to ourT&CandPrivacy Policy
Before you go
scmp poll
Advertisement
In this photo illustration, the Qwen logo is displayed on a smartphone with the Alibaba logo in the background. Photo: Shutterstock Images

A research team led by Alibaba Cloud was the only group from China to receive a top award at this year’s NeurIPS, or the Conference on Neural Information Processing Systems, billed as the artificial intelligence industry’s most prestigious annual event.

The research could lead to drastic improvements in the efficiency of large language models (LLMs), significantly reducing both training and inference costs for Alibaba Group Holding’s next generation of Qwen models without sacrificing accuracy.

The technique was rigorously validated through more than 30 experiments across models of varying sizes and architectures, demonstrating its robustness and generalisability.

The Alibaba Cloud team was one of four to be awarded best paper on Thursday – selected from 21,575 submissions – ahead of the conference opening on Sunday, with judges praising the tech giant for publishing its findings at a time when leading US players were increasingly keeping their AI research behind closed doors.

Alibaba Cloud is the AI and cloud computing unit of Alibaba, owner of the South China Morning Post.

The Qwen logo is displayed on a phone screen behind an AI image with integrated circuits. Photo: Shutterstock Images
The Qwen logo is displayed on a phone screen behind an AI image with integrated circuits. Photo: Shutterstock Images

Alibaba’s paper – which was co-authored by researchers from the University of Edinburgh, Stanford University, the Massachusetts Institute of Technology and Tsinghua University – introduced a new technique for improving AI foundational models’ “attention” mechanism, according to co-author and Alibaba Cloud chief technology officer Zhou Jingren.

“Large language models use an internal attention mechanism to focus on the most relevant words in a sentence, helping them understand and generate better responses,” Zhou told the Post on Friday.

Existing attention mechanisms experience diminishing returns on efficiency due to the way they determine what information is most relevant, which necessitates trade-offs between accuracy and computational costs.

These costs compound exponentially as inputs become longer, making it expensive to train sophisticated AI agents, whose practical usefulness lies in their ability to autonomously execute tasks over extended periods.

Instead of having the models compare each new piece of data with all previous ones, the paper proposed using a “gate” to help the models decide what information to discard, thereby improving their training stability and ability to handle long inputs, according to Zhou.

Crucially, the paper validated the innovation with “extensive evidence” through more than 30 experiments across models of different sizes and architectures, meaning that the technique would likely be widely adopted, according to the 14 leading AI experts that made up the conference award committee.

“This paper represents a substantial amount of work that is possible only with access to industrial scale computing resources,” the judges wrote.

“The authors’ sharing of the results of their work … is highly commendable, especially in an environment where there has been a move away from open sharing of scientific results around LLMs.”

According to Zhou, this innovation along with other new techniques such as hybrid attention, ultra-sparse Mixture of Experts, and multi-token prediction would “significantly lower” both training and inference costs of Alibaba Cloud’s next generation of models.

“We will continue open-sourcing Qwen models to foster research, drive industry adoption, and make cutting-edge AI accessible to all,” he said.

In July, DeepSeek researchers secured the best paper award at the Association for Computational Linguistics conference in Vienna. Photo: DigiTimes
In July, DeepSeek researchers secured the best paper award at the Association for Computational Linguistics conference in Vienna. Photo: DigiTimes
Other Chinese AI players, such as DeepSeek and Moonshot AI, have also targeted the attention mechanism for improvement in a bid to nullify the impact of restrictions on access to computing resources.
In July, DeepSeek researchers secured the best paper award at the Association for Computational Linguistics conference in Vienna, Austria, another premier global AI conference, for their research into a related technique called “native sparse attention”.
This was followed by a groundbreaking peer-reviewed article published in the leading journal Nature that outlined the training process for its flagship R1 model.

Alibaba Cloud’s achievement marked the second year in a row that a Chinese team had secured NeurIPS’s top award, following last year’s win by researchers from TikTok-owner ByteDance and Peking University.

Three of the four best papers at this year’s NeurIPS had researchers from China listed as their lead authors, demonstrating the country’s leading role in the global AI research landscape.

“The most impressive thing about Chinese AI researchers is their engineering ability, which is really solid and amazing” said Yao Shunyu, a senior staff research scientist at Google DeepMind. “The industry there is definitely going upwards.”

This year’s best papers will be presented during the main conference this coming week, which will be held in two locations for the first time – San Diego and Mexico City – due to difficulties in securing US visas for international researchers.

Advertisement
Vincent Chow
Vincent Chow is a technology reporter covering AI, with a focus on how society navigates the emergence of increasingly powerful AI systems. He previously covered Chinese society and was awarded a SOPA award for his culture reporting. He is currently supported by the Tarbell Center for AI Journalism.
Advertisement
Advertisement

Alibaba’s Qwen chatbot exceeds 10 million downloads, faster than ChatGPT and DeepSeek

Powered by Alibaba Cloud’s open-source model family of the same name, Qwen is the firm’s most significant foray into the consumer AI market

Reading Time:2 minutes
Why you can trust SCMP
3
Listen
The Qwen chatbot was developed by Alibaba Cloud, the AI and cloud services unit of Alibaba Group Holding. Photo: Shutterstock
Wency Chenin Shanghai
Alibaba Group Holding’s new multipurpose artificial intelligence app, Qwen, recorded more than 10 million downloads in the first week of its public beta launch, faster than OpenAI’s ChatGPT or DeepSeek, the company said on Monday.
The strong debut of Qwen, launched last week, augured well for the Chinese tech conglomerate’s efforts to establish a major AI assistant that would rival the performance of Google’s Gemini chatbot and ChatGPT, both of which are not available in mainland China.
The Hong Kong-listed shares of Alibaba, owner of the South China Morning Post, gained 4.67 per cent to close at HK$154.50 on Monday. Alibaba’s latest quarterly financial results are set to be released on Tuesday.

Qwen marked “Alibaba’s most significant step yet into the consumer AI market, aiming to transform its cutting-edge foundational AI model capabilities into real-life applications and tools”, the company said in a statement on Monday.

Powered by Alibaba Cloud’s open-source AI model family of the same name, Qwen was designed to meet both the professional and personal needs of users, with capabilities that include deep research, image generation and slide generation.
Alibaba Cloud is the AI and cloud services unit of Hangzhou-based Alibaba.
Advertisement
Select Voice
Choose your listening speed
Get through articles 2x faster
1.25x
250 WPM
Slow
Average
Fast
1.25x