eprintid: 10205427
rev_number: 8
eprint_status: archive
userid: 699
dir: disk0/10/20/54/27
datestamp: 2025-02-28 08:49:05
lastmod: 2025-02-28 08:52:28
status_changed: 2025-02-28 08:49:05
type: article
metadata_visibility: show
sword_depositor: 699
creators_name: Nagano, Yuta
creators_name: Pyo, Andrew GT
creators_name: Milighetti, Martina
creators_name: Henderson, James
creators_name: Shawe-Taylor, John
creators_name: Chain, Benny
creators_name: Tiffeau-Mayer, Andreas
title: Contrastive learning of T cell receptor representations
ispublished: pub
divisions: UCL
divisions: B02
divisions: B04
divisions: C10
divisions: D15
divisions: F48
keywords: Protein language models; contrastive learning; TCR repertoire; T cell specificity; TCR; T cell receptor; representation learning
note: Copyright ©  2024 The Author(s). Published by Elsevier Inc. 
This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
abstract: Computational prediction of the interaction of T cell receptors (TCRs) and their ligands is a grand challenge in immunology. Despite advances in high-throughput assays, specificity-labeled TCR data remain sparse. In other domains, the pre-training of language models on unlabeled data has been successfully used to address data bottlenecks. However, it is unclear how to best pre-train protein language models for TCR specificity prediction. Here, we introduce a TCR language model called SCEPTR (simple contrastive embedding of the primary sequence of T cell receptors), which is capable of data-efficient transfer learning. Through our model, we introduce a pre-training strategy combining autocontrastive learning and masked-language modeling, which enables SCEPTR to achieve its state-of-the-art performance. In contrast, existing protein language models and a variant of SCEPTR pre-trained without autocontrastive learning are outperformed by sequence alignment-based methods. We anticipate that contrastive learning will be a useful paradigm to decode the rules of TCR specificity. A record of this paper’s transparent peer review process is included in the supplemental information.
date: 2025-01-15
date_type: published
publisher: Elsevier BV
official_url: https://doi.org/10.1016/j.cels.2024.12.006
oa_status: green
full_text_type: pub
language: eng
primo: open
primo_central: open_green
verified: verified_manual
elements_id: 2352604
doi: 10.1016/j.cels.2024.12.006
medium: Print-Electronic
pii: S2405-4712(24)00369-7
lyricists_name: Mayer, Andreas
lyricists_name: Chain, Benjamin
lyricists_name: Shawe-Taylor, John
lyricists_name: Henderson, James
lyricists_id: AMAYE10
lyricists_id: BMCHA43
lyricists_id: JSHAW87
lyricists_id: JFDHE23
actors_name: Mayer, Andreas
actors_id: AMAYE10
actors_role: owner
full_text_status: public
publication: Cell Systems
volume: 16
number: 1
article_number: 101165
event_location: United States
issn: 2405-4712
citation:        Nagano, Yuta;    Pyo, Andrew GT;    Milighetti, Martina;    Henderson, James;    Shawe-Taylor, John;    Chain, Benny;    Tiffeau-Mayer, Andreas;      (2025)    Contrastive learning of T cell receptor representations.                   Cell Systems , 16  (1)    , Article 101165.  10.1016/j.cels.2024.12.006 <https://doi.org/10.1016/j.cels.2024.12.006>.       Green open access   
 
document_url: https://discovery.ucl.ac.uk/id/eprint/10205427/1/Nagano2025.pdf