Venanzi, Niccolò Alberto Elia;
(2024)
Machine Learning for Protein Engineering Using Molecular Dynamics Simulation Data.
Doctoral thesis (Ph.D), UCL (University College London).
Preview |
Text
NAEV_Machine_Learning_for_Protein_Engineering_Using_Molecular_Dynamics_Simulation_Data.pdf - Accepted Version Download (29MB) | Preview |
Abstract
Protein therapies and enzymes have revolutionised the pharmaceutical and biotechnology industries. However, the scalability and labour intensity of evolutionary methods for protein engineering impede progress and still present ongoing challenges. Molecular Dynamics (MD) simulations are invaluable in researching protein properties; however, MD data require careful and subjective interpretation. Concurrently, Machine Learning (ML) algorithms have successfully elucidated cause-and-effect relationships in data, but their performances are bound by data quality and volume. This thesis delves into developing a pipeline that leverages the synergistic potential of MD simulations as a data source for ML algorithms. For this purpose, variant structures were generated, validated, and used to produce more than 1600 trajectories of 312 enterokinase variants to serve as data for ML algorithms. MD simulations were shown to be sensitive to mutations and provided comparable information across diverse simulation lengths. After selecting and validating optimal simulation parameters, datasets were constructed using MD simulation-derived data, sequence information, and structural features. These datasets were then used to test and refine over 40 supervised ML algorithms. An iterative process revealed that incorporating MD simulations enhanced the predictive capabilities of supervised ML. Interpretability techniques allowed for the identification of important features, paving the way for more targeted experimental rounds in protein engineering and setting a new standard in protein development research. As the final step, the MD data were used to build graph neural networks, and the performances of these deep learning algorithms were compared with the previously constructed ML models. Overall, the pipeline presented here, constructed by combining the strengths of MD simulations and ML techniques, served to predict protein functions. These findings present valuable insights with the potential to reduce costs and time in protein engineering campaigns, thereby exemplifying the immense potential of ML leveraged with information-rich data such as those derived from MD simulations.
Type: | Thesis (Doctoral) |
---|---|
Qualification: | Ph.D |
Title: | Machine Learning for Protein Engineering Using Molecular Dynamics Simulation Data |
Open access status: | An open access version is available from UCL Discovery |
Language: | English |
Additional information: | Copyright © The Author 2024. Original content in this thesis is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Licence (https://creativecommons.org/licenses/by-nc/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request. |
Keywords: | Machine Learning, Protein Engineering, Molecular Dynamics |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Biochemical Engineering |
URI: | https://discovery.ucl.ac.uk/id/eprint/10187764 |
Archive Staff Only
![]() |
View Item |