UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Elucidating protein aggregation and key protein-protein interaction sites with molecular dynamics simulations and predictive model building

Wang, Yuhan; (2025) Elucidating protein aggregation and key protein-protein interaction sites with molecular dynamics simulations and predictive model building. Doctoral thesis (Eng.D), UCL (University College London).

[thumbnail of Wang_10217518_thesis.pdf] Text
Wang_10217518_thesis.pdf
Access restricted to UCL open access staff until 1 December 2026.

Download (73MB)

Abstract

Computational methods including machine learning and molecular dynamics simulations have strong potential to characterise, understand and ultimately predict the properties of proteins relevant to their stability and function as therapeutics. Such methods would streamline the development pathway by minimising the current experimental testing required for many protein variants and formulations. The molecular understanding of thermostability and aggregation propensity has advanced significantly along with predictive algorithms based on the sequence-level or structural-level information of a protein. However, these approaches focus largely on a comparison of protein sequence variations, to correlate the properties of proteins to their stability, solubility and aggregation propensity. For therapeutic protein development it is of equal importance to take into account the impact of the formulation conditions, to elucidate and predict the stability of the antibody drugs. At the macroscopic level, changing temperature, pH, ionic strength, and the addition of excipients, can significantly alter the kinetics of protein aggregation. The mechanisms controlling aggregation kinetics have been traced back to a combination of molecular features, including conformational stability, partial unfolding to aggregation-prone states, and the colloidal stability governed by surface charges and hydrophobicity. However, very little has been done to evaluate these features in the context of protein dynamics in different formulations. In this work, I have combined a range of molecular features calculated from the sequence and molecular dynamics simulations of a range of proteins, from an antibody fragment Fab A33 to two full-sized antibodies, IgG4A33 and CSL IgG4. Using the power of advanced statistical tools, it has been possible to uncover greater insights into the mechanisms behind protein stability, validating previous findings, and to also develop models that can predict the aggregation kinetics within a range of different solution conditions with the presence of excipients. The first project, the Fab A33 project, established a workflow integrating MD simulations with machine learning regression models to predict aggregation kinetics, which held the promise as a highly impactful route for industry to early-identify formulation conditions with high developmentality from a pool of candidates and accelerate the formulation of biopharmaceuticals such as novel antibodies. The best-performing model for aggregation kinetics prediction made use of the PLS regression model with an $R^2$ of 0.84. The two IgG4 projects successfully simulated full-sized IgG4s with glycans in different formulation conditions, and offered molecular-level explanation of the function of excipients to stabilise the protein. The formulations were the most stable at high pH (6.5), followed by pH 5.2 and the worst stability observed at pH 4.5 at 40 °C. Proline is more effective in reducing aggregation formation compared to other two excipients. The final project, which investigated Fab-Fab interaction, provided a molecular-level understanding of Fab-Fab interactions, emphasising the role of specific hotspots in driving stable contacts and influencing aggregation. The insights gained from these projects can guide protein engineering strategies aimed at modulating specific sites to enhance therapeutic efficacy and formulation stability. Future studies could explore different protein systems based on the established workflows here to fine tune the procedure and customised to different systems due to the variation between the protein nature.

Type: Thesis (Doctoral)
Qualification: Eng.D
Title: Elucidating protein aggregation and key protein-protein interaction sites with molecular dynamics simulations and predictive model building
Language: English
Additional information: Copyright © The Author 2025. Original content in this thesis is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Licence (https://creativecommons.org/licenses/by-nc/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request.
Keywords: antibody aggregation, protein stability, monoclonal antibody, formulation, feature engineering, regression analysis, molecular dynamics
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Biochemical Engineering
URI: https://discovery.ucl.ac.uk/id/eprint/10217518
Downloads since deposit
5Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item