eprintid: 10194620 rev_number: 6 eprint_status: archive userid: 699 dir: disk0/10/19/46/20 datestamp: 2024-07-16 14:58:54 lastmod: 2024-07-16 14:58:54 status_changed: 2024-07-16 14:58:54 type: article metadata_visibility: show sword_depositor: 699 creators_name: Feng, Chen creators_name: Tzimiropoulos, Georgios creators_name: Patras, Ioannis title: NoiseBox: Towards More Efficient and Effective Learning with Noisy Labels ispublished: inpress divisions: UCL divisions: B04 divisions: F46 keywords: Noise, Noise measurement, Training, Computational modeling, Predictive models, Supervised learning, Entropy note: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions. abstract: Despite the large progress in supervised learning with neural networks, there are significant challenges in obtaining high-quality, large-scale and accurately labelled datasets. In such contexts, how to learn in the presence of noisy labels has received more and more attention. Addressing this relatively intricate problem to attain competitive results predominantly involves designing mechanisms that select samples that are expected to have reliable annotations. However, these methods typically involve multiple off-the-shelf techniques, resulting in intricate structures. Furthermore, they frequently make implicit or explicit assumptions about the noise modes/ratios within the dataset. Such assumptions can compromise model robustness and limit its performance under varying noise conditions. Unlike these methods, in this work, we propose an efficient and effective framework with minimal hyperparameters that achieves SOTA results in various benchmarks. Specifically, we design an efficient and concise training framework consisting of a subset expansion module responsible for exploring non-selected samples and a model training module to further reduce the impact of noise, called NoiseBox . Moreover, diverging from common sample selection methods based on the "small loss" mechanism, we introduce a novel sample selection method based on the neighbouring relationships and label consistency in the feature space. Without bells and whistles, such as model co-training, self-supervised pre-training and semi-supervised learning, and with robustness concerning the settings of its few hyper-parameters, our method significantly surpasses previous methods on both CIFAR10/CIFAR100 with synthetic noise and real-world noisy datasets such as Red Mini-ImageNet, WebVision, Clothing1M and ANIMAL-10N. date: 2024-07-11 date_type: published publisher: Institute of Electrical and Electronics Engineers (IEEE) official_url: http://dx.doi.org/10.1109/tcsvt.2024.3426994 oa_status: green full_text_type: other language: eng primo: open primo_central: open_green verified: verified_manual elements_id: 2296874 doi: 10.1109/tcsvt.2024.3426994 lyricists_name: Feng, Chen lyricists_id: CFENA90 actors_name: Bracey, Alan actors_id: ABBRA90 actors_role: owner full_text_status: public publication: IEEE Transactions on Circuits and Systems for Video Technology issn: 1051-8215 citation: Feng, Chen; Tzimiropoulos, Georgios; Patras, Ioannis; (2024) NoiseBox: Towards More Efficient and Effective Learning with Noisy Labels. IEEE Transactions on Circuits and Systems for Video Technology 10.1109/tcsvt.2024.3426994 <https://doi.org/10.1109/tcsvt.2024.3426994>. (In press). Green open access document_url: https://discovery.ucl.ac.uk/id/eprint/10194620/1/NoiseBox_Towards_More_Efficient_and_Effective_Learning_with_Noisy_Labels.pdf