TY  - JOUR
TI  - Contrastive learning of T cell receptor representations
AV  - public
Y1  - 2025/01/15/
VL  - 16
N1  - Copyright ©  2024 The Author(s). Published by Elsevier Inc. 
This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
IS  - 1
ID  - discovery10205427
N2  - Computational prediction of the interaction of T cell receptors (TCRs) and their ligands is a grand challenge in immunology. Despite advances in high-throughput assays, specificity-labeled TCR data remain sparse. In other domains, the pre-training of language models on unlabeled data has been successfully used to address data bottlenecks. However, it is unclear how to best pre-train protein language models for TCR specificity prediction. Here, we introduce a TCR language model called SCEPTR (simple contrastive embedding of the primary sequence of T cell receptors), which is capable of data-efficient transfer learning. Through our model, we introduce a pre-training strategy combining autocontrastive learning and masked-language modeling, which enables SCEPTR to achieve its state-of-the-art performance. In contrast, existing protein language models and a variant of SCEPTR pre-trained without autocontrastive learning are outperformed by sequence alignment-based methods. We anticipate that contrastive learning will be a useful paradigm to decode the rules of TCR specificity. A record of this paper?s transparent peer review process is included in the supplemental information.
SN  - 2405-4712
UR  - https://doi.org/10.1016/j.cels.2024.12.006
PB  - Elsevier BV
JF  - Cell Systems
KW  - Protein language models; contrastive learning; TCR repertoire; T cell specificity; TCR; T cell receptor; representation learning
A1  - Nagano, Yuta
A1  - Pyo, Andrew GT
A1  - Milighetti, Martina
A1  - Henderson, James
A1  - Shawe-Taylor, John
A1  - Chain, Benny
A1  - Tiffeau-Mayer, Andreas
ER  -