Christianos, F;
Papoudakis, G;
Coste, T;
Hao, J;
Wang, J;
Shao, K;
(2025)
Lightweight Neural App Control.
In:
13th International Conference on Learning Representations ICLR 2025.
ICLR: Singapore.
Preview |
PDF
9741_Lightweight_Neural_App_Co.pdf - Accepted Version Download (2MB) | Preview |
Abstract
This paper introduces a novel mobile phone control architecture, Lightweight Multi-modal App Control (LiMAC), for efficient interactions and control across various Android apps. LiMAC takes as input a textual goal and a sequence of past mobile observations, such as screenshots and corresponding UI trees, to generate precise actions. To address the computational constraints inherent to smartphones, we introduce a small Action Transformer (AcT) integrated with a fine-tuned vision-language model (VLM) for real-time decision-making and task execution. We evaluate LiMAC on two open-source mobile control datasets, demonstrating the superior performance of our small-form-factor approach against fine-tuned versions of open-source VLMs, such as Florence2 and Qwen2-VL. It also significantly outperforms prompt engineering baselines utilising closed-source foundation models like GPT-4o. More specifically, LiMAC increases the overall action accuracy by up to 19% compared to fine-tuned VLMs, and up to 42% compared to prompt-engineering baselines.
| Type: | Proceedings paper |
|---|---|
| Title: | Lightweight Neural App Control |
| Event: | ICLR 2025 |
| Open access status: | An open access version is available from UCL Discovery |
| Publisher version: | https://openreview.net/forum?id=BL4WBIfyrz |
| Language: | English |
| Additional information: | This version is the version of record. For information on re-use, please refer to the publisher’s terms and conditions. |
| Keywords: | vision-language model, multi-modal, android control, app agent |
| UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science |
| URI: | https://discovery.ucl.ac.uk/id/eprint/10212511 |
Archive Staff Only
![]() |
View Item |

