eprintid: 10074121
rev_number: 18
eprint_status: archive
userid: 608
dir: disk0/10/07/41/21
datestamp: 2019-05-16 16:26:52
lastmod: 2021-11-15 01:55:19
status_changed: 2019-05-16 16:26:52
type: proceedings_section
metadata_visibility: show
creators_name: Firman, M
creators_name: Campbell, NDF
creators_name: Agapito, L
creators_name: Brostow, GJ
title: DiverseNet: When One Right Answer is not Enough
ispublished: pub
divisions: UCL
divisions: B04
divisions: C05
divisions: F48
keywords: Training, Task analysis, Aerospace electronics, Three-dimensional displays, Supervised learning, Two dimensional displays, Training data
note: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
abstract: Many structured prediction tasks in machine vision have a collection of acceptable answers, instead of one definitive ground truth answer. Segmentation of images, for example, is subject to human labeling bias. Similarly, there are multiple possible pixel values that could plausibly complete occluded image regions. State-of-the art supervised learning methods are typically optimized to make a single test-time prediction for each query, failing to find other modes in the output space. Existing methods that allow for sampling often sacrifice speed or accuracy. We introduce a simple method for training a neural network, which enables diverse structured predictions to be made for each test-time query. For a single input, we learn to predict a range of possible answers. We compare favorably to methods that seek diversity through an ensemble of networks. Such stochastic multiple choice learning faces mode collapse, where one or more ensemble members fail to receive any training signal. Our best performing solution can be deployed for various tasks, and just involves small modifications to the existing single-mode architecture, loss function, and training regime. We demonstrate that our method results in quantitative improvements across three challenging tasks: 2D image completion, 3D volume estimation, and flow prediction.
date: 2018-12-17
date_type: published
publisher: IEEE
official_url: https://doi.org/10.1109/CVPR.2018.00587
oa_status: green
full_text_type: other
language: eng
primo: open
primo_central: open_green
verified: verified_manual
elements_id: 1633661
doi: 10.1109/CVPR.2018.00587
isbn_13: 978-1-5386-6420-9
lyricists_name: Brostow, Gabriel
lyricists_name: De Agapito Vicente, Lourdes
lyricists_id: GBROS38
lyricists_id: LDEAG40
actors_name: Brostow, Gabriel
actors_id: GBROS38
actors_role: owner
full_text_status: public
series: IEEE/CVF Conference on Computer Vision and Pattern Recognition
publication: 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)
volume: 2018
place_of_pub: Salt Late City, UT, USA
pagerange: 5598-5607
pages: 10
event_title: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
event_location: Salt Lake City, UT
event_dates: 18 June 2018 - 23 June 2018
institution: 31st IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
issn: 1063-6919
book_title: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
citation: Firman, M; Campbell, NDF; Agapito, L; Brostow, GJ; (2018) DiverseNet: When One Right Answer is not Enough. In: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. (pp. pp. 5598-5607). IEEE: Salt Late City, UT, USA. Green open access

document_url: https://discovery.ucl.ac.uk/id/eprint/10074121/1/cvpr18_diversenet.pdf