Development and validation of a risk score (Delay-7) to predict the occurrence of a treatment delay following cycle 1 chemotherapy

Background The risk of toxicity-related dose delays, with cancer treatment, should be included as part of pretreatment education and be considered by clinicians upon prescribing chemotherapy. An objective measure of individual risk could influence clinical decisions, such as escalation of standard supportive care and stratification of some patients, to receive proactive toxicity monitoring. Patients and methods We developed a logistic regression prediction model (Delay-7) to assess the overall risk of a chemotherapy dose delay of 7 days for patients receiving first-line treatments for breast, colorectal and diffuse large B-cell lymphoma. Delay-7 included hospital treated, age at the start of chemotherapy, gender, ethnicity, body mass index, cancer diagnosis, chemotherapy regimen, colony stimulating factor use, first cycle dose modifications and baseline blood values. Baseline blood values included neutrophils, platelets, haemoglobin, creatinine and bilirubin. Shrinkage was used to adjust for overoptimism of predictor effects. For internal validation (of the full models in the development data) we computed the ability of the models to discriminate between those with and without poor outcomes (c-statistic), and the agreement between predicted and observed risk (calibration slope). Net benefit was used to understand the risk thresholds where the model would perform better than the ‘treat all’ or ‘treat none’ strategies. Results A total of 4604 patients were included in our study of whom 628 (13.6%) incurred a 7-day delay to the second cycle of chemotherapy. Delay-7 showed good discrimination and calibration, with c-statistic of 0.68 (95% confidence interval 0.66-0.7), following internal validation and calibration-in-the-large of −0.006. Conclusions Delay-7 predicts a patient’s individualised risk of a treatment-related delay at cycle two of treatment. The score can be used to stratify interventions to reduce the occurrence of treatment-related toxicity.


INTRODUCTION
Systemic anticancer treatments (SACT) can cause haematological and non-haematological toxicity; where the latter can occur in 70% of patients treated. 1 Severe toxicity is undesirable as it will result in delays to subsequent treatments, thereby reducing patient experience and increasing health care costs. 2 The occurrence of these delays, in the curative setting, may also result in suboptimal therapy, with emerging evidencedfor some cancersdof the importance of accurate treatment timing. 3,4 The risk of dose delays with cancer treatment should be included as part of pretreatment education and be considered by clinicians upon prescribing, as this information can influence decisions such as escalation of standard supportive care or improved adherence by patients. Personalised approaches to toxicity management would therefore be supported.
Early detection and management of toxicities is a strategy that has been researched and demonstrated success at reducing the incidence of dose delays. 2,3,5 Various methods exist including nurse-led monitoring or utilisation of electronic patient-reported outcome measures (PROMS). These strategies involve resource in implementation and may not be of benefit to all patients, but if patients were accurately identified, these interventions could reduce acute care use, morbidity and costs.
Although several risk scores are available to predict specific toxic effects such as febrile neutropenia, none have, to date, investigated dose delays as a whole. [6][7][8][9][10] Additionally, most studies have been conducted only at single institutions 6,9 and one population study was found on acute admissions in the palliative setting; but to date there is no score available for patients receiving treatment of potentially curable cancers where an early intervention could influence their overall time on treatment. 8 In this study, we developed and validated the prediction of chemotherapy dose delays (Delay-7) to estimate the probability of delay occurrence.

Creation of cohort
Our data was derived from the electronic prescribing (EP) systems from four academic hospitals in England. In total, these hospitals treat around 18 000 cancer chemotherapy patients per year. Data were extracted for outcome measures and candidate predictors, identified from a systematic review and through consultation with expert clinicians and patients.

Eligibility criteria
Data were included for patients aged !18, identified through the chemotherapy EP system at each hospital for the period of 1 January 2013 to 1 January 2018. The first chemotherapy treatment date from the EP system was used as the index date for entry to the cohort during the study period. The study data were restricted to the following three tumour groups: breast, colorectal and diffuse large Bcell lymphoma, identified using the International Classification of Diseases 10th Revision (ICD-10) coding of C50, C83, C19, C19, C20 and C21. Justification for this was that this would be a large population receiving relatively standard treatments, enabling us to develop a risk model. In the case of breast cancer, we only included those with early breast cancer (stages 1-3), and in the case of colorectal cancers, we included all patients receiving their first treatments for any stage disease. Although the colorectal population included some metastatic patients, disease control and response rate are believed to be optimised in this group through achievement of a dose intensity of >80%. 11 For all tumour groups, only patients receiving first-line treatments were included. We restricted our inclusion to the following treatments: epirubicin and cyclophosphamide (EC) plus or minus fluorouracil (FEC); docetaxel plus or minus cyclophosphamide (TC, T-FEC); irinotecan modified de Gramont (IrMdG); oxaliplatin modified de Gramont (FOL-FOX) and combinations including irinotecan; oxaliplatin and capecitabine (OXCAP); rituximab, cyclophosphamide, vincristine and prednisolone (R-CHOP). Data were excluded for patients where only one treatment cycle was administered. Additionally, we excluded patients where the second cycle had been administered over 60 days from the index date as this type of delay was outside the scope of this research.

Ethics and data use
Heath Research Authority (HRA) approvals were required and granted on 24 November 2017 (IRAS 226078). The study was registered with European Network of Centres for Pharmacoepidemiology and Pharmacovigilance (ENCePP) study number EUPAS35413.

Outcome
The study outcome of interest was an administration delay for 7 days for treatment cycle 2. A delay of 7 days was considered as a suitable period that was used by clinicians for toxicity-related delays. 12 Expert clinicians identified that delays under 7 days could be an effect of poor scheduling. The outcome was generated through comparing the number of days between the first and second cycle with the intended cycle length of the treatment prescribed.

Predictive variables
The predictors to toxicity-related delays 13 were hospital treated, age at the start of chemotherapy, gender, ethnicity, body mass index (BMI), presence of cardiovascular comorbidity or diabetes, performance status, cancer group, chemotherapy regimen and associated cycle length, colony stimulating factor (CSF) use, first cycle dose modifications and baseline blood values. Baseline blood values were for neutrophils, platelets, haemoglobin, creatinine and bilirubin.
All laboratory values, BMI and age were in a continuous format. Where possible, continuous predictors were not categorised. Where this was the case linearity was tested to meet the assumptions of logistic regression and plots enabled the identification of outliers. Assessments of plausibility were made where outliers were present; an outlier was defined as any value that was 1.5 times more than the third quartile or lower than the first quartile. Any erroneous outliers were considered as missing.
To balance the statistical and clinical robustness of the model we decided to categorise continuous variables for laboratory values. Categorisation is generally not recommended 14 as it results in loss of information on predictor effects particularly when two categories are used (dichotomisation). We justified categorisation firstly, however, as the categories used are well recognised and firmly established in clinical practice. 15 Secondly, in routine practice there are more than two grading categories used, meaning less loss of information in contrast to two categories. Lastly, strong evidence exists that low neutrophils and haemoglobin, or high bilirubin or creatinine are associated with dose delays, meaning truncation of outliers was inappropriate.

Sample size
Sample size of a prognostic model development study is informed by three factors: anticipated prevalence of the outcome (treatment delays), desired sensitivity of the model to the outcome and the precision of the 95% confidence interval around the sensitivity of the model. 16 To maximise statistical power, we used all patient data from the four hospitals that met our inclusion criteria in model development. To reduce overfitting of the model, the number of variables for inclusion in model development were restricted to 10 events per variable. Dose delays in these tumour groups included were understood to occur in 10%-15% of the cancer population. In total, 28 predictors were included in our model and therefore a dataset including 280 events was required equating to a minimum sample size of 2800 patients (assuming a 10% rate). We included a total of 4604 patients in the analysis.

Missing data
Our cohort had missing information on vascular comorbidity, BMI, neutrophils, platelets, bilirubin and creatinine. Some 50% of the data for vascular comorbidity were missing and on assessment it was found that these originated from one hospital, meaning it was inappropriate to impute this data. The hospital was a large academic centre treating only patients with cancer, and missingness was associated with data imputation policies, where the recording of comorbidities was not mandated. Missing data for other variables were believed to be missing at random and equated to <10%.We used multiple imputation to replace missing values by using a chained equation approach based on all candidate predictors excluding vascular comorbidity. We created 10 imputed datasets for missing variables that were then combined across all datasets by using Rubin's rule to obtain final model estimates. 17

Statistical analysis for model development and validation
We treated occurrence of a dose delay as a binary outcome measure. For each of the candidate predictors, we used a univariable logistic regression model to calculate the unadjusted odds ratio. To derive our risk prediction model, we included all candidate predictors in a multivariable logistic regression model.
We assessed the performance of the model in terms of the c-statistic and calibration slope (where 1.00 is ideal). The c-statistic represents the probability that for any randomly selected pair of patients with and without a dose delay, the patient who had a dose delay had a higher predicted risk. A value of 0.50 represents no discrimination and 1.00 represents perfect discrimination. This process was repeated in our imputed and complete case dataset and a sensitivity analysis was undertaken comparing the coefficients obtained.
To validate our developed model and correct measures of predictive performance for optimism (overfitting) we used bootstrapping, using 200 samples of the derivation data. We then repeated the model development process in each bootstrap sample. To account for overfitting during the development process, we multiplied the original b coefficients by the uniform shrinkage factor in the final model. At this point we re-estimated the intercept based on the shrunken b coefficients to ensure that overall calibration was maintained, producing a final model.
Lastly, we carried out a net benefit analysis (which was not prespecified), to evaluate the potential clinical value of using Delay-7 to inform decision making. This analysis assumes that the threshold probability of the occurrence of dose delay at which a patient or clinician would opt for intervention is informative in terms of false positives and false negatives. This is then used to calculate the net benefit of the model across a wide range of threshold probabilities. The most basic interpretation of a decision curve is that the model with the highest net benefit at a particular threshold has the highest clinical value. In our study, the decision curve analysis assessed the potential clinical benefits of using the Delay-7 model to select patients for alternative models of care. In this analysis three scenarios are compared: selecting all patients for the intervention (treat all), selecting no patients (treat none) and selecting patients using the predictive model. The x-axis depicts the threshold probability, which is chosen by the decision maker. The yaxis depicts the net benefit of each strategy, which is expressed in terms of the value of true positives. 18 We used Stata version 15 (Stata Statistical Software: Release 15. College Station, TX: StataCorp LLC) for all statistical analyses. This study was conducted and reported in line with the Transparent Reporting of a multivariate prediction model for Individual Prediction or Diagnosis (TRIPOD) guidelines. 14

Study population
In our cohort from hospitals located in England, we analysed information on 4604 patients after excluding 447 patients where a second cycle of treatment was not recorded to be administered (see Supplementary Table S1, available at https://doi.org/10.1016/j.esmoop.2022.100743). Of the 4604 patients, there were 628 (13.6%) occurrences of 7-day delays. Table 1 summarises the characteristics of the study population. Women represented 69% of the cohort, due to the inclusion of breast cancer.
Univariable associations between delays to treatment and potential predictors are also displayed in Table 1. Of the 44 candidate predictors (from 16 risk factors), 16 were statistically significantly associated with delays. FOLFOX and IrMdG, used widely in colorectal cancer (including advanced disease), showed significant associations with delays but with wide confidence intervals that could be attributed to the mixed population in this disease group. Table 2 shows the apparent and internal validation performance statistics of our risk prediction model developed using multivariable methods. After adjustment for optimism, our final risk prediction model was able to discriminate patients who were likely to encounter a delay with a c-statistic of 0.68 (95% confidence interval 0.66-0.70). The agreement between the observed and predicted proportion of events showed excellent apparent calibration following bootstrapping (Figure 1). Table 3 presents the final model coefficients and Table 2 shows coefficients for each variable included in the final model for both the complete case and imputed datasets. The coefficients indicate the weighting that each factor has on the outcome. Notably, use of CSF showed little additive effect in this model although demonstrating a significant univariable association. Decision curve analysis for the Delay-7 score in the cohort is displayed in Figure 2. This decision curve demonstrates that selecting patients for an intervention using the Delay-7 had an appreciable net benefit compared with the treat all and treat none strategies for threshold probabilities that are <25%.

DISCUSSION
In this study, we developed and internally validated a score (Delay-7) to predict 7-day dose delays for patients receiving cancer chemotherapy. This is the first risk score that has been developed in the non-palliative setting and could support patients to receive timely treatments in future. The score was developed using a national representative dataset containing 4604 patients that was collected for this purpose and 13.6% of them experienced an event of dose delay. The included predictors in the score would be readily available to clinicians within any EP system and therefore could be calculated upon initiating treatment. We believe that Delay-7 can be used to improve both the informed consent process and help stratify patients to interventions that have demonstrated success at reducing the rates and severity of adverse events such as proactive monitoring and early interventions. Proactive monitoring, either through nursing support or electronically, has demonstrated success in a number of studies, but these interventions require resource. 5 Through using a stratification approach to identify patients, these interventions become more feasible. We believe that Delay-7 once validated could support the implementation of evidence-based interventions to improve the safety of patients receiving treatments.
Our work is important to future policy as the numbers of cancer patients increase year upon year. 19 Using a model to direct interventions to those likely to have the occurrence of a delay due to toxicity would be both resourceful and improve safety and patient experience. A reduction in the occurrence of treatment delays for some patients may also improve their response to treatment, 4 negating the need for future treatments. The decision to develop a generic model was through the understanding of the processes in the UK where toxicity advice and monitoring is led by nursing and pharmacy teams during treatment. 20 Currently, the model is only applicable to the treatments researched in specific cancers and research around validation and implementation is planned. To be implemented in clinical practice, we acknowledge that there may be clinical hesitancy without the inclusion of tumour-specific attributes. Furthermore, the approach of developing a more cancer specific model will make validation studies more achievable through utilisation of national datasets. 8 Our model requires a validation cohort to test that our score continues to perform in heterogenous populations. Future work will therefore explore international collaborations as other researchers have achieved 21 to validate our work.
The discrimination of our score is similar to those published to predict febrile neutropenia 10 and hospitalisations within 30 days of palliative treatment. 9 Discriminatory ability is improved through the inclusion of predictive variables, and as the decision to delay treatment because of toxicity can be subjective, the inclusion of behavioural elements may have improved discriminatory ability.
The strengths of Delay-7 compared were size and methodological rigour, adhering to prespecified published protocols and reporting guidelines. We used age as a continuous variable whereas other models 10,22 in other settings have dichotomised age (>65 years), limiting their transferability. We had a large number of events (n ¼ 628) and this is reflected in the narrow confidence intervals retrieved from our performance statistics. Our study has  Calibration in the large 0 d some limitations. As we extracted data from individual hospitals, we could not ascertain any reasons for discontinuation and therefore could not include this as an endpoint in addition to delayed dose. Reasons that patients might not receive more than one cycle of treatment could be movement to a different hospital or choice to cease treatment rather than discontinuation due to toxicity. The high volume of missing data for comorbidity meant we not include this in our final score. Aforementioned, this could be addressed in a validation cohort through working with more comprehensive national datasets, examining the performance of a model that included and excluded comorbidity, utilising more alternative measures of comorbidity, based on availability. 23 Most patients in our cohort were of a white ethnicity, meaning that confidence intervals around risk estimates for women of other individual ethnic groups were large. We therefore collapsed ethnicity groupings into white and not white in our model. In future work, we would like to further refine risk estimates for women of different ethnicities. Finally, through our investigations of variables we uncovered differences in hospitals and their rates of delays, meaning that simple alignment to both operational and clinical procedures such as appropriate threshold setting for haematological toxicity could result in reduced delays. 24 We believe that through a combination of alignment of policies and tailoring toxicity support through use of our model, we will improve the relative dose intensity of treatments received.

Conclusions
Delay-7 predicts the risk of delays of 7 days that can be used as a proxy measure for toxicity-related delays after the initiation of systemic therapy for cancer. The score quantifies an important risk of systemic therapy, which can improve the informed consent process. Validation of Delay-7 is required; however, once achieved the score could support accurate stratification of patients to preventative interventions. the collection, analysis and interpretation of the data; the writing of the manuscript; and the decision to submit the manuscript for publication.

DISCLOSURE
MDF reports grants and honoraria from AstraZeneca, Bristol Myers Squibb, Celgene, Eli-Lilly, Merck, Merck Sharp & Dohme, Nanobiotix, Novartis, Pfizer, Roche and Takeda; outside the submitted work; PC reports research grants from Janssen, Pfizer, Tessaro and Bristol Myers Squibb; outside the submitted work. ICKW reports research grants from Janssen and Bristol Myers Squibb; outside the submitted work. All other authors have declared no conflicts of interest.

ETHICS AND DATA USE
The data study was based on retrospective datasets; Heath Research Authority (HRA) approvals were required and granted on 24 November 2017 (IRAS 226078). Information governance approvals were granted at each recruited site in accordance with hospital policies.

DATA SHARING
The datasets generated during and analysed during the current study are available from the corresponding author on reasonable request.