Alibaba Cloud team wins top AI award for breakthrough in model efficiency
The research could help reduce training and inference costs for Alibaba’s next generation Qwen models without sacrificing accuracy
A research team led by Alibaba Cloud was the only group from China to receive a top award at this year’s NeurIPS, or the Conference on Neural Information Processing Systems, billed as the artificial intelligence industry’s most prestigious annual event.
The research could lead to drastic improvements in the efficiency of large language models (LLMs), significantly reducing both training and inference costs for Alibaba Group Holding’s next generation of Qwen models without sacrificing accuracy.
The technique was rigorously validated through more than 30 experiments across models of varying sizes and architectures, demonstrating its robustness and generalisability.
The Alibaba Cloud team was one of four to be awarded best paper on Thursday – selected from 21,575 submissions – ahead of the conference opening on Sunday, with judges praising the tech giant for publishing its findings at a time when leading US players were increasingly keeping their AI research behind closed doors.
Alibaba Cloud is the AI and cloud computing unit of Alibaba, owner of the South China Morning Post.
Alibaba’s paper – which was co-authored by researchers from the University of Edinburgh, Stanford University, the Massachusetts Institute of Technology and Tsinghua University – introduced a new technique for improving AI foundational models’ “attention” mechanism, according to co-author and Alibaba Cloud chief technology officer Zhou Jingren.
“Large language models use an internal attention mechanism to focus on the most relevant words in a sentence, helping them understand and generate better responses,” Zhou told the Post on Friday.
Existing attention mechanisms experience diminishing returns on efficiency due to the way they determine what information is most relevant, which necessitates trade-offs between accuracy and computational costs.
These costs compound exponentially as inputs become longer, making it expensive to train sophisticated AI agents, whose practical usefulness lies in their ability to autonomously execute tasks over extended periods.
Instead of having the models compare each new piece of data with all previous ones, the paper proposed using a “gate” to help the models decide what information to discard, thereby improving their training stability and ability to handle long inputs, according to Zhou.
Crucially, the paper validated the innovation with “extensive evidence” through more than 30 experiments across models of different sizes and architectures, meaning that the technique would likely be widely adopted, according to the 14 leading AI experts that made up the conference award committee.
“This paper represents a substantial amount of work that is possible only with access to industrial scale computing resources,” the judges wrote.
“The authors’ sharing of the results of their work … is highly commendable, especially in an environment where there has been a move away from open sharing of scientific results around LLMs.”
“We will continue open-sourcing Qwen models to foster research, drive industry adoption, and make cutting-edge AI accessible to all,” he said.
Alibaba Cloud’s achievement marked the second year in a row that a Chinese team had secured NeurIPS’s top award, following last year’s win by researchers from TikTok-owner ByteDance and Peking University.
Three of the four best papers at this year’s NeurIPS had researchers from China listed as their lead authors, demonstrating the country’s leading role in the global AI research landscape.
“The most impressive thing about Chinese AI researchers is their engineering ability, which is really solid and amazing” said Yao Shunyu, a senior staff research scientist at Google DeepMind. “The industry there is definitely going upwards.”
This year’s best papers will be presented during the main conference this coming week, which will be held in two locations for the first time – San Diego and Mexico City – due to difficulties in securing US visas for international researchers.
Alibaba’s Qwen chatbot exceeds 10 million downloads, faster than ChatGPT and DeepSeek
Powered by Alibaba Cloud’s open-source model family of the same name, Qwen is the firm’s most significant foray into the consumer AI market
Qwen marked “Alibaba’s most significant step yet into the consumer AI market, aiming to transform its cutting-edge foundational AI model capabilities into real-life applications and tools”, the company said in a statement on Monday.