Lam, Percy;
Chen, Weiwei;
de Silva, Lavindra;
Brilakis, Ioannis;
(2026)
Toppled Realities: Challenges in Generation and Validation of Synthetic Data.
In: Tonkin, EL and Tourte, GJL and Yordanova, K, (eds.)
Annotation of Real-World Data for Artificial Intelligence Systems (ARDUOUS 2025).
(pp. pp. 36-50).
Springer Nature: Cham, Switzerland.
(In press).
|
Text
ECAI_Arduous rev2.pdf - Accepted Version Access restricted to UCL open access staff until 25 October 2026. Download (19MB) |
Abstract
In advancing automation in infrastructure maintenance, collecting comprehensive datasets is arduous. While synthetic data provides a promising avenue to address real data shortages, problems remain in creating and validating the generations. This position paper aims to push the boundary in generating synthetic data without prior training samples and validating the synthetic generation by vision language models (VLM), as learned from our exploratory trials. Our exploratory trials attempted to generate new toppled road lights in road scene images with several inpainting and image editing tools, and ultimately resorted to a more deterministic approach of "create, prepare, stylise and inpaint". When validating the synthetic toppled road lights, we explored the possibility of automating prompt engineering and made four main observations. Whilst exploration and exploitation can be seen, responses were sensitive to the text prompts. The model struggled with the dilemma of adhering to the instruction without good results and self-hallucinating for good results by goal misspecification. From the exploratory trials, we posit that finding the right starting point is important for generating synthetic data that appears real. VLMs can be more widely adopted for detection and validation with more meticulous auto-prompt engineering.
| Type: | Proceedings paper |
|---|---|
| Title: | Toppled Realities: Challenges in Generation and Validation of Synthetic Data |
| Event: | Annotation of Real-World Data for Artificial Intelligence Systems (ARDUOUS 2025) |
| ISBN-13: | 9783032091161 |
| DOI: | 10.1007/978-3-032-09117-8_3 |
| Publisher version: | https://doi.org/10.1007/978-3-032-09117-8_3 |
| Language: | English |
| Additional information: | This version is the author-accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions. |
| Keywords: | Synthetic data, Defect detection, Vision Language Models, Prompt engineering |
| UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS |
| URI: | https://discovery.ucl.ac.uk/id/eprint/10218782 |
Archive Staff Only
![]() |
View Item |

