eprintid: 10205427 rev_number: 8 eprint_status: archive userid: 699 dir: disk0/10/20/54/27 datestamp: 2025-02-28 08:49:05 lastmod: 2025-02-28 08:52:28 status_changed: 2025-02-28 08:49:05 type: article metadata_visibility: show sword_depositor: 699 creators_name: Nagano, Yuta creators_name: Pyo, Andrew GT creators_name: Milighetti, Martina creators_name: Henderson, James creators_name: Shawe-Taylor, John creators_name: Chain, Benny creators_name: Tiffeau-Mayer, Andreas title: Contrastive learning of T cell receptor representations ispublished: pub divisions: UCL divisions: B02 divisions: B04 divisions: C10 divisions: D15 divisions: F48 keywords: Protein language models; contrastive learning; TCR repertoire; T cell specificity; TCR; T cell receptor; representation learning note: Copyright © 2024 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). abstract: Computational prediction of the interaction of T cell receptors (TCRs) and their ligands is a grand challenge in immunology. Despite advances in high-throughput assays, specificity-labeled TCR data remain sparse. In other domains, the pre-training of language models on unlabeled data has been successfully used to address data bottlenecks. However, it is unclear how to best pre-train protein language models for TCR specificity prediction. Here, we introduce a TCR language model called SCEPTR (simple contrastive embedding of the primary sequence of T cell receptors), which is capable of data-efficient transfer learning. Through our model, we introduce a pre-training strategy combining autocontrastive learning and masked-language modeling, which enables SCEPTR to achieve its state-of-the-art performance. In contrast, existing protein language models and a variant of SCEPTR pre-trained without autocontrastive learning are outperformed by sequence alignment-based methods. We anticipate that contrastive learning will be a useful paradigm to decode the rules of TCR specificity. A record of this paper’s transparent peer review process is included in the supplemental information. date: 2025-01-15 date_type: published publisher: Elsevier BV official_url: https://doi.org/10.1016/j.cels.2024.12.006 oa_status: green full_text_type: pub language: eng primo: open primo_central: open_green verified: verified_manual elements_id: 2352604 doi: 10.1016/j.cels.2024.12.006 medium: Print-Electronic pii: S2405-4712(24)00369-7 lyricists_name: Mayer, Andreas lyricists_name: Chain, Benjamin lyricists_name: Shawe-Taylor, John lyricists_name: Henderson, James lyricists_id: AMAYE10 lyricists_id: BMCHA43 lyricists_id: JSHAW87 lyricists_id: JFDHE23 actors_name: Mayer, Andreas actors_id: AMAYE10 actors_role: owner full_text_status: public publication: Cell Systems volume: 16 number: 1 article_number: 101165 event_location: United States issn: 2405-4712 citation: Nagano, Yuta; Pyo, Andrew GT; Milighetti, Martina; Henderson, James; Shawe-Taylor, John; Chain, Benny; Tiffeau-Mayer, Andreas; (2025) Contrastive learning of T cell receptor representations. Cell Systems , 16 (1) , Article 101165. 10.1016/j.cels.2024.12.006 <https://doi.org/10.1016/j.cels.2024.12.006>. Green open access document_url: https://discovery.ucl.ac.uk/id/eprint/10205427/1/Nagano2025.pdf