UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

De Novo Protein Design using Generative Machine Learning

Moffat, Lewis Iain; (2024) De Novo Protein Design using Generative Machine Learning. Doctoral thesis (Ph.D), UCL (University College London). Green open access

[thumbnail of Moffat, Lewis_LMoffat_Thesis_Corrected_Final.pdf]
Preview
Text
Moffat, Lewis_LMoffat_Thesis_Corrected_Final.pdf - Other

Download (25MB) | Preview

Abstract

In this thesis, methods are developed for computationally designing novel protein sequences and structures using deep generative machine learning algorithms. It is divided into six chapters. Chapter 1 provides an introduction to contemporary protein design, generative machine learning, and the nascent field emerging from the intersection of the two. This particularly focuses on the challenge of de novo protein design and the demonstrated uses of generative deep learning methods. Chapter 2 describes the first study performed for this thesis. It explores the use of variational autoencoders for protein sequence design with two separate in silico design tasks, one functional and one structural, respectively. Being one of the earliest works on this topic, it provided evidence that generative approaches had merit as a focus for further design investigations. Chapter 3 encompasses an analysis of the performance of state-of-the-art deep protein structure prediction algorithms, primarily the AlphaFold method, on previously de novo designed protein sequences. As expected, it finds AlphaFold is able to accurately and confidently predict the structures of these proteins, supporting its use as a tool in the development of future design methods. Chapter 4 explores the development of protein sequence language models trained on synthetic sequences to avoid the detriments of training with natural sequences. It also describes the evaluation of generated sequences with state-of-the-art structure predictor AlphaFold. Chapter 5 describes a technique for fixed-backbone protein design using greedy sequence optimization of AlphaFold structure predictions that leverages the models developed in the previous chapter. Initial in vitro validation of a small number of designed sequences provides optimistic signs of success. The final chapter highlights the key contributions of this thesis to the field of computational protein design and concludes with implications for future design method development.

Type: Thesis (Doctoral)
Qualification: Ph.D
Title: De Novo Protein Design using Generative Machine Learning
Open access status: An open access version is available from UCL Discovery
Language: English
Additional information: Copyright © The Author 2024. Original content in this thesis is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Licence (https://creativecommons.org/licenses/by-nc/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request.
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: https://discovery.ucl.ac.uk/id/eprint/10185608
Downloads since deposit
Loading...
312Downloads
Download activity - last month
Loading...
Download activity - last 12 months
Loading...
Downloads by country - last 12 months
Loading...

Archive Staff Only

View Item View Item