Computer-aided interpretation of chest radiography to detect TB in rural South Africa

Background Computer-aided digital chest radiograph (CXR) interpretation can facilitate high-throughput screening for tuberculosis (TB), but its use in population-based screening has been limited. We applied an automated image interpretation algorithm, CAD4TBv5, prospectively in an HIV-endemic area. Methods Participants underwent CXR and, for those with symptoms or lung field abnormality, microbiological assessment of sputum collected at a mobile camp in rural South Africa. CAD4TBv5 scored each CXR on a 0-100 scale in the field. An expert radiologist, blinded to the CAD4TBv5 score and other data, assessed CXRs for 1) lung field abnormality and 2) findings diagnostic of active TB (R+). We estimated the performance of CAD4TBv5 for triaging (identifying lung field abnormality as a criteria for sputum examination) and diagnosis (detection of active TB as defined by microbiologic (M+) or radiologic (R+) gold standards). Findings For triaging, a CAD4TBv5 threshold of 25 identified abnormal lung fields with a sensitivity of 90.3% and specificity of 48.2%. For diagnosis, CAD4TBv5 had less agreement with the microbiological reference standard (M+) used to define definite TB (AUC 0.78) than with the radiological reference standard (R+) used to define probable TB (AUC 0.96). HIV-serostatus did not impact CAD4TB's performance. Interpretation A low CAD4TBv5 threshold was required to achieve acceptable triaging sensitivity. Low specificity at this threshold led to high rates of sputum collection despite normal lung fields. CAD4TBv5 had difficulty identifying microbiologically-confirmed TB cases with subtle radiological features but had excellent agreement with the radiologist in identifying radiologically-defined TB cases. We conclude that computer-aided CXR interpretation can be useful in population-based screening in HIV-endemic settings, but threshold selection should be guided by setting-specific piloting and priorities. CXR interpretation algorithms require refinement for the identification of radiologically-subtle early TB.

Background Computer-aided digital chest radiograph (CXR) interpretation can facilitate high-throughput 16 screening for tuberculosis (TB), but its use in population-based screening has been limited. We applied an 17 automated image interpretation algorithm, CAD4TBv5, prospectively in an HIV-endemic area. 18 Methods Participants underwent CXR and, for those with symptoms or lung field abnormality, 19 microbiological assessment of sputum collected at a mobile camp in rural South Africa. CAD4TBv5 20 scored each CXR on a 0-100 scale in the field. An expert radiologist, blinded to the CAD4TBv5 score and 21 other data, assessed CXRs for 1) lung field abnormality and 2) findings diagnostic of active TB (R+). We 22 estimated the performance of CAD4TBv5 for triaging (identifying lung field abnormality as a criteria for 23 sputum examination) and diagnosis (detection of active TB as defined by microbiologic (M+) or 24 radiologic (R+) gold standards). 25 Findings For triaging, a CAD4TBv5 threshold of 25 identified abnormal lung fields with a sensitivity of 26 90·3% and specificity of 48·2%. For diagnosis, CAD4TBv5 had less agreement with the microbiological 27 reference standard (M+) used to define definite TB (AUC 0·78) than with the radiological reference 28 standard (R+) used to define probable TB (AUC 0·96). HIV-serostatus did not impact CAD4TB's 29 performance. 30 Interpretation A low CAD4TBv5 threshold was required to achieve acceptable triaging sensitivity. Low 31 specificity at this threshold led to high rates of sputum collection despite normal lung fields. CAD4TBv5 32 had difficulty identifying microbiologically-confirmed TB cases with subtle radiological features but had 33 excellent agreement with the radiologist in identifying radiologically-defined TB cases. We conclude that 34 computer-aided CXR interpretation can be useful in population-based screening in HIV-endemic settings, 35 but threshold selection should be guided by setting-specific piloting and priorities. CXR interpretation 36 algorithms require refinement for the identification of radiologically-subtle early TB. 37 Funding Funded by the Africa Health Research Institute, Wellcome Trust, Bill and Melinda Gates 38 Foundation and NIAID/NIH. 39 Tuberculosis (TB) continues to cause over 1 million deaths annually, challenging the WHO strategy to 42 eliminate the disease by 2030. 1 In resource-limited settings with high TB burden, community-based 43 screening programmes have been established to increase case-finding. 2-4 However, these programmes are 44 challenged by high costs for medical staff and diagnostics including molecular tests such as Xpert 45 MTB/RIF® and microbiological culture. 5,6,7 To reduce the cost and staffing requirements associated with 46 diagnostic testing, digital chest radiography (CXR) has become an important tool to identify individuals 47 with lung field abnormalities who require sputum testing according to WHO-guidelines for TB 48 screening. 7,8 However, this approach requires a workforce of experienced clinicians or radiologists which 49 keeps the cost of outreach programmes high. Another challenge is that CXRs of HIV-positive TB patients 50 may show atypical radiological signs. 9,10 51 Computer-aided detection (CAD) systems to support clinicians in detecting TB-related abnormalities in 52 digital CXRs have the potential to make health screening programmes more efficient. 11  performance in population-based screening is limited. Another study applied CAD4TB retrospectively in 65 a non-clinical population-based setting and suggested that a lower CAD4TB score threshold would be 66 required for triaging. 21 An independent, real-world prospective analysis of CAD4TB scores for triaging 67 study participants is important to establish how computer-automated chest radiography performs in 68 population-based screenings. 69 Here we report the prospective application of computer-aided CXR reading with CAD4TB version 5 70 (CAD4TBv5) during the first year of a community-based TB screening programme in rural South Africa. 71 We also report the performance of CAD4TB version 6 (v6), which was performed retrospectively. We 72 evaluated CAD4TB as a tool to 1) triage participants for sputum testing based on lung abnormalities, and 73 2) diagnose active TB based on microbiological and radiological gold standards. 74

75
Study design 76 The community-screening programme 'Vukuzazi' used mobile vans to provide free health assessments in 77 the rural uMkhanyakude district of KwaZulu-Natal in South Africa. Ab, Bio-Rad, Marnes-la-Coquette, France) on venous blood. 84 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted September 7, 2020. . https://doi.org/10.1101/2020.09.04.20188045 doi: medRxiv preprint Posterior-anterior digital CXRs were obtained using a mobile unit (Canon CXDI-NE) and saved in 85 DICOM-format in cloud storage. A score indicating lung abnormalities and likelihood of active 86 pulmonary tuberculosis 23 was calculated on-site using CAD4TBv5. CAD4TB's methodology is based on 87 initial lung field segmentation and subsequent analysis of the lung shape, symmetry, and costophrenic 88 angles, resulting in an abnormality score between 0 and 100 (increasing with abnormality). 24 Within 89 seven days of enrolment, an expert radiologist with more than 35 years of local experience reviewed all 90 CXRs in a central setting blinded to the CAD4TBv5 score and any other patient information. The 91 radiologist categorized each CXR as 1) having either normal or abnormal lung fields and 2) as having 92 radiological signs typical of active TB (R+) or not (R-). CAD4TBv6 an updated version of the image 93 interpretation software that uses deep neural networks, 25 became available after data collection. 94 CAD4TBv6 scores were calculated and analysed retrospectively. 95 The aim of our study was to maximize TB case-finding and to capture the full spectrum of active TB. 96 Following WHO guidelines for TB-prevalence surveys 8 , participants were referred for sputum 97 examination if they endorsed any cardinal TB-symptom (fever, weight loss, cough, or night sweats) or if 98 they had an abnormal CXR (indicated by a CAD4TBv5 score above the triaging threshold in the camp). If 99 the expert radiologist indicated abnormal lung fields despite a CAD4TBv5 score below the threshold, a 100 follow-up team contacted participants for sputum collection at home. Sputum specimens were analysed 101 for Mycobacterium tuberculosis using Xpert Ultra MTB/RIF® (XpertUltra) (Cepheid, Sunnyvale, CA, 102 USA) and liquid MGIT culture (MGIT) (Becton Dickinson, UK), held for 42 days). 103 During a pre-designated pilot phase, based on literature review 20 and consultation with experts in the 104 field, a CAD4TBv5 threshold of 60 was selected to triage participants for sputum examination. For the 105 main phase of the study, the CAD4TBv5 triaging threshold was adjusted based on analysis of the pilot 106 data to obtain 90% sensitivity for detection of lung field abnormalities. 107

Definitions of TB 108
Definite TB was defined in participant's whose sputum was microbiologically positive regardless of the 109 radiological status. Sputum was defined as microbiologically positive (M+) if M. tuberculosis was 110 detected by either XpertUltra or MGIT. Because of emerging questions about the significance of 111 XpertUltra "trace" positive 26 , we performed a sensitivity analysis that excluded participants whose M+ 112 evidence was solely due to a "trace" result. Probable TB was defined if TB was diagnosed radiologically 113 (R+) but sputum was microbiologically negative (M-) or not obtained (M0). 114

Data analysis 115
To assess performance of CAD4TBv5 to triage participants for sputum collection, sensitivity, specificity, 116 negative predictive value (NPV), positive predictive values (PPV) were calculated compared a gold 117 standard based on the radiologist's assessment of 'abnormal' or 'normal lung fields'. Additionally we 118 calculated the number of definite TB cases that would have been detected or missed at selected 119 thresholds. 120 To assess CAD4TBv5's performance for diagnosing definite or probable TB, the area under the receiver 121 operating curve (AUC) with estimations of 95% confidence intervals (CI) was calculated and AUCs were 122 compared using the DeLong method. CAD4TB scores between groups of participants with specified TB 123 and HIV statuses were performed using Mann-Whitney-Wilcoxon tests. 124 As a secondary analysis, for both the triaging and diagnostic analyses we compared CAD4TBv5 scores to 125 the retrospectively calculated CAD4TBv6 scores. 126 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted September 7, 2020.

139
Demographics and TB categorization 140 We report results from Vukuzazi's first year, during which 10,320 participants were enrolled (Table 1). 141 The first 1,132 participants were enrolled in a pilot phase. Among all participants, 406 were excluded 142 from chest radiography due to pregnancy or physical inability to climb into the mobile CXR van.

TB assessment
Cough currently 696 (6·7) . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted September 7, 2020. 154 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted September 7, 2020.  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted September 7, 2020. . https://doi.org/10.1101/2020.09.04.20188045 doi: medRxiv preprint visit (144, of which 109 were successfully collected), a significant impediment to study operations. Table  172 2 compares the radiologist's assessment of lung field abnormality with CAD4TBv5 at a threshold of 60. 173 At this threshold, CAD4TBv5 had a sensitivity=27·3% (CI: 21·2-34·0) and a specificity=99·1% (CI: 98·2-174 99·6). To achieve the targeted triage sensitivity of 90%, the threshold of 25 was selected for the main 175 phase. 176  . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

2) CAD4TB as a tool to identify active TB in CXRs 216
Based on microbiological and radiological findings, participants were defined as having definite TB 217 (microbiologically-proven, regardless of radiological diagnosis (M+ R-/+), probable TB (M-/0 R+) or no 218 evidence of TB (M-/0 R-). The distribution of CAD4TBv5 scores for each group is depicted in figure 3. 219 Scores of the probable TB group were significantly higher than those for the definite TB group (p-220 value<0·001). The sensitivity analysis in which XpertUltra "trace"-only cases were excluded from the 221 definite TB group did not meaningfully shift the distribution of CAD4TBv5 scores. Scores were lowest in 222 the group with no evidence of TB. We compared CAD4TBv5 score distributions of each group between 223 9 ng he nts CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Figure 3: CAD4TB v5 scores between diagnostic groups of microbiological and/or radiological evidence for active TB. Participants were grouped by into: Definite TB with microbiological evidence (M+), definite TB excluding samples that only had a XpertUltra trace result, probable TB with no microbiological evidence but radiological signs of TB (M-/0 R+) and no evidence of TB (M-/0 R-). Horizontal lines mark the median and the 25-75% quartile. A) shows all participants, B) shows all participants stratified by HIV status.
HIV-and HIV+ individuals and found no significant difference ( Figure 3B). CAD4TBv6 scores showed 224 similar trends to CAD4TBv5 ( Figure S2). 225 226 We assessed the diagnostic performance of CAD4TBv5 to identify definite and probable TB (Figure 4, 227 Table S2). The AUC for definite TB of 0·78 (0·73-0·83) was significantly lower than that of probable TB 228 (0·95-0·98), p-value<0·001). The AUC for definite TB that excluded XpertUltra trace only results did not 229 significantly differ from that of definite TB (0·82 (0·77-0·87), p-value=0·28). The diagnostic performance 230 of CAD4TBv6 was similar ( Figure S3, CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted September 7, 2020.

244
In a setting of community-based TB screening where HIV prevalence was high, we prospectively applied 245 CAD4TBv5 and found that using previously recommended CAD4TB thresholds of 60-85 would have 246 missed a high proportion of microbiologically-confirmed TB cases. To achieve 90% sensitivity compared 247 to an expert radiologist's assessment of lung field abnormality, we needed to apply a CAD threshold of 248 25. CAD4TBv5 had similar performance to an expert radiologist in identifying radiologically-typical TB 249 (AUC 0·96 (CI: 0·95-0·98)) but performed much less well compared to a microbiological gold standard 250 (AUC 0·78 (CI: 0·73-0·83)). 251 Applying CAD4TB to triage participants for sputum examination requires defining a triage threshold. To 252 date, CAD4TB has mostly been utilized in healthcare centres to triage symptomatic 253 patients. 13,14,16,18,19,21,25 . Previous studies have suggested thresholds between 60 and 85. 13,20,21 Several 254 reports suggested that CAD4TB's threshold needs to be adjusted to study aims, the number of available 255 microbiological tests, and underlying TB prevalence. 16,18,19,21,22 There has been no recommendation of a 256 triage threshold for community-based screening programmes. The goal of our study was to maximize TB 257 case-finding and to capture the full spectrum of active TB. Therefore, we conducted a pilot phase to 258 identify a threshold identifying abnormal lung fields at 90% sensitivity and found that only a low 259 threshold of 25 was able to achieve this aim. The AUC of CAD4TBv5 to identify abnormal lung fields 260 was 0·84 (CI: 0·83-0·85) and only slightly improved with CAD4TBv6 (0·88 (CI: 0·87-0·89)). These 261 results are similar to one reported study from a TB-prevalence survey in Zambia with CAD4TBv5 which 262 found an AUC=0·87. 22 The high HIVprevalence in our study population (30·1%) did not seem to affect 263 the triaging performance of CAD4TB. 264 Using CAD4TBv5 at a triage threshold of 25 identified a 1·0% prevalence of previously undiagnosed 265 active TB. Because the specificity for abnormality was low (48·2% (CI: 47·1-49·3)) at this threshold, 266 3,904 participants were triaged for sputum collection despite CXRs that the radiologist assessed as 267 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted September 7, 2020. We assessed the ability of CAD4TB to identify definite and probable TB. Like others, we found that 282 CAD4TB performed very well in identifying probable TB based on radiologic diagnosis of TB (AUC v5 283 0·96 (CI: 0·95-0·98) and v6 0·96 (CI: 0·95-0·98). This is comparable to AUCs reported from a 284 retrospective clinic-based cohort in Pakistan 25 using CAD4TBv5 and v6 compared to radiologic diagnosis 285 (v5: 0·95 (0·93-0·96), v6: 0·99 (CI: 0·98-0·99)). We found that CAD4TB had significantly lower 286 performance for identifying definite microbiologically-confirmed TB, which was similar to the study 287 from Pakistan (CAD4TBv5 0·87 (CI: 0·85-0·88), CAD4TB v6 0·89 (CI: 0·87-0·89)), although the AUCs 288 from our study were generally even lower (CAD4TBv5 0·78 (CI: 0·73-0·83), CAD4TBv6 0·79 (0·73-289 0·84)). We found no significant effect of HIV-serostatus on CAD4TB's performance. 290 It is known that identifying active TB from CXRs is a non-trivial task as the characteristics overlap with 291 prior TB and other pathologies. 27 Radiologists are trained in clinical settings where patients present with 292 symptoms and generally advanced disease. In our community-based study, only 20·2% of definite cases 293 endorsed symptoms and only 30·3% of these had CXR features that the radiologist interpreted as 294 significant of active TB. A recent review emphasized that the whole clinical spectrum of TB disease has 295 not been fully characterized and that symptoms may appear intermittently over the course of TB but are 296 consistently present only during the latest stage of disease. 28 We hypothesize that population-based 297 screening identifies cases of subclinical TB which can be asymptomatic and radiologically-subtle and that 298 this may explain the differences between our study and those previously reported. 299 Limitations of our study are that only one radiologist performed independent CXR reading, that a single 300 spot sputum was the basis of microbiological data, and that we may have misclassified definite TB cases 301 based on a false-positive XpertUltra trace result. XpertUltra is reported to perform at a higher sensitivity 302 than the predecessor Xpert MTB/RIF 29 , even among HIV co-infected participants. 26 We attempted to 303 address the uncertainty about trace results in a sensitivity-analysis that excluded these cases and did not 304 find significantly different results. More research is necessary to investigate if XpertUltra trace captures 305 subclinical TB. Another limitation is that microbiological testing was not performed for asymptomatic 306 participants whose CAD4TBv5 score was below 25. 307

308
In a prospective population-based study we assessed the performance of computer-aided CXR 309 interpretation with CAD4TBv5 and found that to triage abnormal chest x-rays for sputum collection with 310 90% sensitivity required using a threshold of 25. Using previously recommended CAD4TB thresholds of 311 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted September 7, 2020. . https://doi.org/10.1101/2020.09.04.20188045 doi: medRxiv preprint 60 and higher would have missed a high proportion of microbiologically-confirmed TB cases. 312 Diagnostically, CAD4TBv5 had similar performance to an expert radiologist in identifying radiologically-313 typical TB but performed much less well compared to a microbiological gold standard. Piloting was 314 necessary to obtain a CAD4TB triaging threshold identifying abnormal lung fields at 90% sensitivity. 315 This threshold was much lower than expected and prompted a large number of sputum examinations for 316 individuals with normal lung fields but also identified additional asymptomatic and radiologically-subtle 317 cases of TB. CAD4TB's performance to identify active TB in CXRs had high agreement with 318 radiologically-diagnosed probable TB, but poor recognition of microbiologically-positive definite TB. We 319 hypothesize that computer-aided radiography for TB which has been developed for clinical settings is 320 currently not trained to detect subclinical stages that may comprise the majority of case-findings during 321 population-based screening programmes. To fulfil its full potential, CXR-interpretation software requires 322 fine-tuning to detect subclinical TB. Additional research is needed to clarify radiological and 323 microbiological manifestations of subclinical TB to combat transmission and morbidity and to achieve the 324 WHO goal of eliminating TB by 2030. 325

326
JF performed data analysis with figures and tables and writing of the report. SO and KB contributed to 327 data analysis. DG contributed to data management. RG, AS, DP, and MJS contributed to study design. TS 328 and SM contributed to the laboratory study setup and provision of test results. ADG contributed to study 329 design and analysis conceptualization. SK and CL contributed to data analysis and supervision. EW, 330 study design, analysis conceptualization, revision and supervision. All authors contributed to the revision 331 of the manuscript. 332 333

334
We declare no competing interests. CAD4TBv5 scores were purchased from Delft but CAD4TBv6 scores 335 were provided free of charge. Delft did not contribute any funding and was not involved in the analysis 336 and writing of the report. 337 338

342
'Vukuzazi' was the collective effort of a large team (see Supplemental Appendix). We thank the 343 community of the uMkhanyakude district in KwaZulu-Natal for participating in this study. We thank 344 Delft for providing CAD4TBv6 scores free of charge. We dedicate this paper to the memory of Anand . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted September 7, 2020. . https://doi.org/10.1101/2020.09.04.20188045 doi: medRxiv preprint