eprintid: 10200131
rev_number: 6
eprint_status: archive
userid: 699
dir: disk0/10/20/01/31
datestamp: 2024-11-15 11:36:53
lastmod: 2024-11-15 11:36:53
status_changed: 2024-11-15 11:36:53
type: article
metadata_visibility: show
sword_depositor: 699
creators_name: Lyu, Zhaoyan
creators_name: Miguel R. D., Rodrigues
title: Exploring the Impact of Additive Shortcuts in Neural Networks via Information Bottleneck-like Dynamics: From ResNet to Transformer
ispublished: pub
divisions: UCL
divisions: B04
divisions: F46
keywords: deep learning, neural networks, transformer, shortcut connections, inforamtion bottleneck theory
abstract: Deep learning has made significant strides, driving advances in areas like computer vision, natural language processing, and autonomous systems. In this paper, we further investigate the implications of the role of additive shortcut connections, focusing on models such as ResNet, Vision Transformers (ViTs), and MLP-Mixers, given that they are essential in enabling efficient information flow and mitigating optimization challenges such as vanishing gradients. In particular, capitalizing on our recent information bottleneck approach, we analyze how additive shortcuts influence the fitting and compression phases of training, crucial for generalization. We leverage Z-X and Z-Y measures as practical alternatives to mutual information for observing these dynamics in high-dimensional spaces. Our empirical results demonstrate that models with identity shortcuts (ISs) often skip the initial fitting phase and move directly into the compression phase, while non-identity shortcut (NIS) models follow the conventional two-phase process. Furthermore, we explore how IS models are still able to compress effectively, maintaining their generalization capacity despite bypassing the early fitting stages. These findings offer new insights into the dynamics of shortcut connections in neural networks, contributing to the optimization of modern deep learning architectures.
date: 2024-11-14
date_type: published
publisher: MDPI AG
official_url: https://doi.org/10.3390/e26110974
oa_status: green
full_text_type: pub
language: eng
primo: open
primo_central: open_green
verified: verified_manual
elements_id: 2335427
doi: 10.3390/e26110974
lyricists_name: Lyu, Zhaoyan
lyricists_id: ZLYUX31
actors_name: Lyu, Zhaoyan
actors_id: ZLYUX31
actors_role: owner
full_text_status: public
publication: Entropy
volume: 26
number: 11
article_number: 974
issn: 1099-4300
citation:        Lyu, Zhaoyan;    Miguel R. D., Rodrigues;      (2024)    Exploring the Impact of Additive Shortcuts in Neural Networks via Information Bottleneck-like Dynamics: From ResNet to Transformer.                   Entropy , 26  (11)    , Article 974.  10.3390/e26110974 <https://doi.org/10.3390/e26110974>.       Green open access   
 
document_url: https://discovery.ucl.ac.uk/id/eprint/10200131/1/entropy-26-00974.pdf