skip to main content
10.1145/3691620.3695260acmconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
research-article
Open access

Enabling Cost-Effective UI Automation Testing with Retrieval-Based LLMs: A Case Study in WeChat

Authors:
Sidong Feng
Monash University, Melbourne, Australia
,
Haochuan Lu
,
Jianqin Jiang
,
Ting Xiong
,
Likun Huang
,
Yinglin Liang
,
Xiaoqin Li
,
Yuetang Deng
,
Aldeida Aleti
Monash University, Melbourne, Australia
Authors Info & Claims
Published: 27 October 2024 Publication History

Abstract

UI automation tests play a crucial role in ensuring the quality of mobile applications. Despite the growing popularity of machine learning techniques to generate these tests, they still face several challenges, such as the mismatch of UI elements. The recent advances in Large Language Models (LLMs) have addressed these issues by leveraging their semantic understanding capabilities. However, a significant gap remains in applying these models to industrial-level app testing, particularly in terms of cost optimization and knowledge limitation. To address this, we introduce CAT to create cost-effective UI automation tests for industry apps by combining machine learning and LLMs with best practices. Given the task description, CAT employs Retrieval Augmented Generation (RAG) to source examples of industrial app usage as the few-shot learning context, assisting LLMs in generating the specific sequence of actions. CAT then employs machine learning techniques, with LLMs serving as a complementary optimizer, to map the target element on the UI screen. Our evaluations on the WeChat testing dataset demonstrate the CAT's performance and cost-effectiveness, achieving 90% UI automation with $0.34 cost, outperforming the state-of-the-art. We have also integrated our approach into the real-world WeChat testing platform, demonstrating its usefulness in detecting 141 bugs and enhancing the developers' testing process.

Formats available

You can view the full content in the following formats:

References

[1]
2024. Android Debug Bridge (adb) - Android Developers. https://developer.android.com/studio/command-line/adb.
[2]
2024. Android Uiautomator2 Python Wrapper. https://github.com/openatx/uiautomator2.
[3]
2024. Developers warned: GitHub Copilot code may be licensed. https://www.techtarget.com/searchsoftwarequality/news/252526359/Developers-warned-GitHub-Copilot-code-may-be-licensed.
[4]
2024. Genymotion - Android Emulator for app testing. https://www.genymotion.com/.

Cited By

View all
  • (2025)Agent for User: Testing Multi - User Interactive Features in TikTok2025 IEEE/ACM 47th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)10.1109/ICSE-SEIP66354.2025.00011(57-68)Online publication date: 27-Apr-2025

Index Terms

  1. Enabling Cost-Effective UI Automation Testing with Retrieval-Based LLMs: A Case Study in WeChat

    Recommendations

    Comments