UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Harnessing Protein Language Models and Machine Learning for Variant Interpretation, Enzyme Optimisation and Functional Annotation

Lin, Weining; (2025) Harnessing Protein Language Models and Machine Learning for Variant Interpretation, Enzyme Optimisation and Functional Annotation. Doctoral thesis (Ph.D), UCL (University College London).

[thumbnail of Lin_10214054_Thesis.pdf] Text
Lin_10214054_Thesis.pdf
Access restricted to UCL open access staff until 1 October 2026.

Download (23MB)

Abstract

Proteins are fundamental to biological processes and hold significant importance in healthcare. This thesis introduces innovative computational methods, specifically protein language models (pLMs), to predict protein fitness and function, and optimise protein’s stability and functionality, thereby accelerating discovery and innovation in protein science. This thesis demonstrates the effectiveness of pLMs when combined with advanced machine learning techniques, leading to significant progress in protein manipulation. Findings underscore the transformative potential of computational methods in addressing complex biological challenges, contributing to the development of powerful tools for protein annotation and engineering. We first developed VariPred, a model that leverages sequence-derived embeddings from pLMs to identify pathogenic mutations in proteins. This model outperforms traditional approaches in accuracy. Building on this, we created PETGood, a model designed to predict the stability of plastic-degradable enzyme (PETase). By extracting the embeddings from the pLMs, PETGood acts as an accurate and powerful filter to select enhanced designs, which significantly reduces the experimental workload in enzyme engineering. Furthermore, our research applied deep learning techniques, including diffusion models and graph neural networks, to re-design PETase which optimised its functionalities and structures. We evaluated the designed PETases with PETGood and conducted structural analysis on focusing on key aspects such as binding pockets to ensure the stability and function.

Type: Thesis (Doctoral)
Qualification: Ph.D
Title: Harnessing Protein Language Models and Machine Learning for Variant Interpretation, Enzyme Optimisation and Functional Annotation
Language: English
Additional information: Copyright © The Author 2025. Original content in this thesis is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Licence (https://creativecommons.org/licenses/by-nc/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request.
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Div of Biosciences
URI: https://discovery.ucl.ac.uk/id/eprint/10214054
Downloads since deposit
3Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item