Zhi, Zhuo;
Feng, Chen;
Daneshmend, Adam;
Orlu, Mine;
Demosthenous, Andreas;
Lu, Yin;
Da, Li;
... Rodrigues, Miguel; + view all
(2025)
TFAR: a training-free framework for autonomous reliable reasoning in visual question answering.
Transactions on Machine Learning Research
Preview |
PDF
5064_TFAR_A_Training_Free_Fram (2).pdf - Published Version Download (2MB) | Preview |
Abstract
Recent approaches introduce chain-of-thought (CoT) reasoning to mitigate the challenges, such as hallucination and reasoning deficit in multimodal large language models (MLLMs) and enhance performance. However, existing CoT-based methods often rely on extensive data annotation and training. To overcome these limitations, we propose a training-free framework for autonomous and reliable reasoning (TFAR), which only uses common lightweight vision tools to improve the reasoning ability of MLLMs. TFAR enables an MLLM to autonomously and accurately identify relevant regions of interest (RoIs) and support CoT reasoning, without requiring additional training or annotations, and with low computational overhead during inference. However, the use of external tools will introduce noise and uncertainty. To mitigate the uncertainty introduced by external tools and select the optimal pathway, we propose a conformal prediction-based uncertainty quantification method that calibrates the outputs from external tools and dynamically selects the most appropriate tool based on the MLLM’s output uncertainty. Experiments across five datasets demonstrate that TFAR improves performance over the base MLLM by an average of 4.6 , in some cases even outperforming fine-tuned baselines, while maintaining low inference cost. These results offer new insights into training-free CoT guidance for MLLMs and underscore the value of reliable visual tools.
Type: | Article |
---|---|
Title: | TFAR: a training-free framework for autonomous reliable reasoning in visual question answering |
Open access status: | An open access version is available from UCL Discovery |
Publisher version: | https://openreview.net/forum?id=cBAKeZN3jy |
Language: | English |
Additional information: | This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Electronic and Electrical Eng |
URI: | https://discovery.ucl.ac.uk/id/eprint/10213298 |
Archive Staff Only
![]() |
View Item |