Utilisation of primary care electronic patient records for identification and targeted invitation of individuals to a lung cancer screening programme

Highlights • Primary care records can identify individuals to invite for lung cancer screening risk assessment.• Invitation eligibility should use current and historic smoking status recording.• Direct eligibility assessment from primary care data is impracticable with some demographic bias.• LCS programmes require provision for individual-level eligibility assessment.• Work is needed to identify those without smoking status in primary care records.


Introduction
Lung Cancer Screening (LCS) using Low-Dose Computed Tomography (LDCT) reduces lung cancer-specific mortality in high-risk individuals [1,2].Unlike other cancer screening programmes for which eligibility is largely based on age and sex (e.g., Breast and Cervical screening), eligibility for LCS is based on the presence of lung cancer risk factors, the two main ones being increasing age and history of tobacco smoking.In the US, eligibility for LCS is therefore based on age and smoking history alone.However, analysis of data from the National Lung Screening Trial (NLST) has demonstrated that LCS is more efficient and cost-effective when using multi-factor individual lung cancer risk calculations which include smoking history [3].In the UK, no comprehensive system currently exists for assessing smoking history to guide LCS invitation at a population level.However, primary care electronic patient records provide a potential data source for this.While several UK studies have utilised primary care records to target LCS invitation [4][5][6][7], none have reported the accuracy of data used.Previous reports found smoking status recording in primary care to be incomplete and subject to inaccuracies [8], with limited improvements despite incentivisation [9].A recent evaluation using routinely collected primary care registry data to calculate validated lung cancer risk scores demonstrated a negative impact on model accuracy, with limitations in quality and completeness of data cited as potential contributary factors [10].
This manuscript assesses the completeness and validity of tobacco smoking exposure data extracted from primary care records, to examine whether this could be recommended as a comprehensive method for identifying individuals to invite for LCS.

Methods
The SUMMIT Study is a prospective observational cohort study aiming to assess the implementation of LDCT for LCS in a high-risk population and to validate a multi-cancer early detection blood test.Between March 2019 and December 2019, standardised electronic database searches at participating primary care practices across north central and east London identified individuals for invitation.Criteria for invitation included being aged 55-77 years with a documented status of "current smoker" within the prior 20 years.Individuals on a dementia or palliative care register, that had metastatic cancer, were housebound or had documented refusal to participate in research were excluded (Fig. 1).
Individuals identified as potentially eligible were invited by letter, where if interested they were advised to contact the team via telephone to arrange a Lung Health Check (LHC) appointment.During this telephone call their lung cancer risk was estimated to determine their eligibility for a LHC appointment [11].At the in person LHC appointment individuals meeting either one of United States Preventive Services Task Force (USPSTF) 2014 criteria [12] or Prostate, Lung, Colorectal, Ovarian (PLCO) m2012 6-year lung cancer risk [13] ≥ 1.3 % were offered LCS. Selection criteria were chosen to closely align with the USPSTF guidelines at the time of study set up.Some criteria were broadened to maximise the inclusion of those potentially eligible.
We analysed the quality of primary care smoking history data, including the proportion of records with missing or inconsistent data, the time since last updated and, for individuals who completed a LCS eligibility assessment, rates of concordance against self-reported data.Associations with sociodemographic factors were examined using logistic regression.

Completeness and recency of smoking status and tobacco consumption records
Between 20th March 2019 and 12th December 2019, 95,297 individuals from 251 practices were identified as potentially eligible and sent invitation letters (Fig. 1).
Of those invited, 83.8 % (n = 79,826) had their smoking status recorded within the past three years, but a small minority (0.2 %, n = 153) last had this updated > 15 years prior.Amongst current smokers (n = 48,518), tobacco consumption units (i.e., if an individual smoked prerolled cigarettes "cigarettes per day" or hand rolled tobacco "grams of tobacco per week") and quantified measures of consumption (i.e., the average number of cigarettes smoked per day) were recorded in their most recent smoking record in 59.7 % (n = 28,942) and 60.1 % (n = 29,143) respectively.Odds of missing data were highest amongst individuals from less deprived Index of Multiple Deprivation (IMD) quintiles (vs the most deprived quintile) and lower amongst those aged > 70 vs < 55 years (aOR:0.89;95 % CI:0.81-0.99).The absolute proportion with missing data varied by ethnic group (range: 18.8-47.4%), with a statistically significant lower likelihood of missing consumption data among individuals of Bangladeshi ethnicity (aOR:0.34;030-0.38)and higher likelihood among those of mixed white and black Caribbean ethnicity (aOR:1.30;1.06-1.59),when compared with those of a white British ethnicity (Table 1).

Consistency of 'never smoking' status records
10.3 % (n = 9,826) of those invited had inconsistent smoking status data (both a most recent status of "never smoker" and a previous status of current or former smoker) in their primary care record.The proportion of records with smoking status inconsistencies varied widely between individual practices (range: 0.7% -50.0%).The frequency of inconsistent data was lower among males than females (aOR:0.45;0.43-0.47),higher among individuals from less deprived IMD quintiles (e.g., least vs most deprived quintile: aOR:1.53;1.36-1.72),and higher across nearly all the ethnicity groups, especially those of Bangladeshi  ethnicity (aOR:9.79;9.10-10.57),when compared to white British groups (Table 1).

Concordance of primary care and self-reported data
For individuals who completed a telephone-based eligibility questionnaire (n = 29,698), self-reported smoking status (current, former or never) was concordant with individuals' most recent primary care record in 75.3% of cases (Table 2).Higher odds of non-concordance were seen in those from the two least deprived IMD quintiles (vs most deprived), and lower odds among those last recorded as former smokers (aOR:0.80;0.75-0.86)compared with current smokers.Increased time since smoking status was last updated was also associated with higher odds of non-concordance (vs those with last documented smoking status < 12 months previously) as was black Caribbean and "other" white ethnicity (relative to white British ethnicity).
Reported daily tobacco consumption varied significantly between primary care records and self-reported data, with a mean reported difference of 6.8 (95% CI: 6.18-7.18)fewer cigarettes per day reported in primary care records compared to self-reported telephone responses.Of those with both previous documentation of smoking and a most recent status of "never smoker", 50.9% (n = 1,861) reported having smoked 100 cigarettes or more in their lifetime, and 11.8% (n = 433) were ultimately deemed eligible for LCS.

Discussion
We examined the completeness and validity of smoking history data from 251 primary care practices to identify individuals to invite for LCS eligibility assessment.Use of smoking status in addition to age reduced the number of individuals invited by over 70%, when compared to inviting by age criteria alone.The smoking status last recorded by primary care showed good concordance with self-reported telephone responses when this record was either current or former smoker, and in most cases had been updated within the past three years.However, half of those last recorded by primary care as "never smoker" but with previous documentation of smoking, self-reported a history of smoking during telephone risk-based eligibility assessment and a significant minority proved eligible for LCS.Across all measures of data quality, disparities by sociodemographic factors were identified, most notably ethnicity and deprivation.

Conclusions
Our findings suggest sufficient accuracy to support the use of "ever smoker" status in primary care records as a means of identifying individuals to invite for further lung cancer risk assessment and potential LDCT LCS.However, we would caution against relying solely on the most recently recorded instance of smoking status, particularly if this record is "never smoker", as our findings demonstrate inconsistencies within the data which could wrongly preclude individuals from invitation.Our findings also suggest that primary care risk stratification for LCS beyond age and smoking status would be limited by data completeness and recency for more detailed parameters of smoking history, necessitating provision within LCS programmes for detailed eligibility assessment at an individual level.Further work is needed to identify those with no smoking data in primary care records and to understand factors influencing the described disparities in data accuracy across sociodemographic groups, to ensure equity in LCS invitation.

Contributions
The described protocol utilising primary care records to target LHC invitations was developed by the study management team for the SUMMIT Study, led by SMJ.JLD and HH prepared the manuscript for review and completed the data analysis.All authors contributed to the development of the manuscript and approved the final version.includes all staff at the participating academic, primary care and secondary care sites.Specifically, we thank the primary care practices and the SUMMIT study data management team (Sofia Nnorom, Hina Pervez, Moksud Miah) who oversaw the screening and data extraction for invited individuals.We are also hugely grateful to the NOCLOR Research Group (Andrew Perugia, Dr James Rusius, Dr Claire Chalmers-Watson, Lee Berney, Dr Jyotsna Hira, David Cole and David Jones) for their support and advice in designing and implementing primary care record searches.We would also like to thank all those at GRAIL Inc who have supported the SUMMIT Study, and particularly those who worked on programming the primary care data search and extraction, the Lung Health Check invitation mailings and the telephone screening questions (Thomas Rooney, Henry Armburg-Jennings, Eduardo Sosa, Jack Galilee, Marcus Foster).

Fig. 1 .
Fig. 1.Identification of individuals to invite for a LHC as part of the SUMMIT Study.

Table 1
Frequency and independent predictors of missing b or inconsistent smoking data in primary care records c Missing tobacco consumption units b in primary care data (all invited current smokers, n=48,518)Inconsistent smoking status values c in primary care data (all invited, n=95,297) (continued on next page)

Table 1
(continued )Missing tobacco consumption data defined as absence of recorded tobacco consumption units (e.g.cigarettes per day, grams of tobacco per week) in those with most recent smoking status recorded as "current smoker".cInconsistent smoking status data defined as most recent smoking status recorded as "never smoker" plus previous documentation as either current or former smoker in primary care record.
b J.L.Dickson et al.

Table 2
Frequency and independent predictors of discrepant smoking status responses between primary care and self-reported responses (all LHC invitation responders, n=29,698)