Multi‐echo quantitative susceptibility mapping: how to combine echoes for accuracy and precision at 3 Tesla

Purpose To compare different multi‐echo combination methods for MRI QSM. Given the current lack of consensus, we aimed to elucidate how to optimally combine multi‐echo gradient‐recalled echo signal phase information, either before or after applying Laplacian‐base methods (LBMs) for phase unwrapping or background field removal. Methods Multi‐echo gradient‐recalled echo data were simulated in a numerical head phantom, and multi‐echo gradient‐recalled echo images were acquired at 3 Tesla in 10 healthy volunteers. To enable image‐based estimation of gradient‐recalled echo signal noise, 5 volunteers were scanned twice in the same session without repositioning. Five QSM processing pipelines were designed: 1 applied nonlinear phase fitting over TEs before LBMs; 2 applied LBMs to the TE‐dependent phase and then combined multiple TEs via either TE‐weighted or SNR‐weighted averaging; and 2 calculated TE‐dependent susceptibility maps via either multi‐step or single‐step QSM and then combined multiple TEs via magnitude‐weighted averaging. Results from different pipelines were compared using visual inspection; summary statistics of susceptibility in deep gray matter, white matter, and venous regions; phase noise maps (error propagation theory); and, in the healthy volunteers, regional fixed bias analysis (Bland–Altman) and regional differences between the means (nonparametric tests). Results Nonlinearly fitting the multi‐echo phase over TEs before applying LBMs provided the highest regional accuracy of χ and the lowest phase noise propagation compared to averaging the LBM‐processed TE‐dependent phase. This result was especially pertinent in high‐susceptibility venous regions. Conclusion For multi‐echo QSM, we recommend combining the signal phase by nonlinear fitting before applying LBMs.


INTRODUCTION
MRI QSM aims to determine the underlying spatial distribution of tissue magnetic susceptibility ( ) from gradient-recalled echo (GRE) phase data ( ): (r, TE) = ΔB Tot (r, TE)TE + 0 (r), where r is a vector of image space coordinates, the proton gyromagnetic ratio, TE the echo time, ΔB Tot the -induced total field perturbation along the scanner's z axis, and 0 the TE-independent phase offset at a nominal TE = 0 ms.
(2) ΔB Bg are induced by the global geometry, air-tissue interfaces, and any field inhomogeneities. ΔB Loc reflect the tissue inside the region of interest (ROI), for example, the brain. For QSM, ΔB Bg must be removed from ΔB Tot . The resulting ΔB Loc map is in the following relationship with : where d is the magnetic dipole and ⋆ denotes a spatially dependent convolution. Based on Equation 3, the local distribution of tissue , that is, the QSM map, is calculated by solving an ill-posed ΔB Loc -to-problem. Recently, the QSM community critically reviewed how to best perform phase unwrapping 1 and ΔB Bg removal. 2 Moreover, it promoted 2 challenges to compare algorithms for ΔB Loc -to-inversion, but a consensus has yet to be reached. 3,4 A further open question toward QSM standardization is how and at which stage of the processing pipeline multi-echo data from different echoes should be combined. This question is relevant because phase unwrapping, ΔB Bg removal, or both, are often performed using Laplacian-based methods (LBMs). 1,2 Laplacian phase unwrapping aims to calculate the 2 -aliasing-free phase as 5 : with ∇ 2 and ∇ −2 the forward and inverse Laplace operators, and w the aliased phase. Laplacian ΔB Bg removal relies on the harmonicity of ΔB Bg inside the ROI (i.e., ∇ 2 ΔB Bg (r) = 0, r ∈ ROI) and aims to solve this Equation 2 : The inverse discrete Laplace operator is not well defined and requires regularization, which is equivalent to spatially high-pass filtering the phase or local field. 2 However, the known varying frequency content at different TEs 6,7 could lead to different degrees of LBM-induced high-pass filtering at different echoes, alter the linearity of Equation 1, and thus introduce inaccuracies in the estimated ΔB Loc and maps. To investigate the issue, this study aimed to compare existing strategies for combining the multi-echo signal phase (see the Theory section for further details) when using LBMs for phase unwrapping or ΔB Bg removal.
Five processing pipelines for QSM were designed incorporating LBMs for both phase unwrapping and ΔB Bg removal and combining the signal from different TEs by fitting or averaging before or after applying LBMs. These pipelines were applied to both numerically simulated data and images acquired in vivo. Results from each pipeline were compared qualitatively by visual inspection and quantitatively via analysis of the regional bias and precision, as well as noise propagation.

Fitting
Multi-echo combination via nonlinear fitting (NLFit) has formulated the temporal evolution of the complex signal as a nonlinear least squares problem 12 : where S denotes the acquired complex signal, M the signal magnitude, the signal phase (Equation 1), and TE i the i-th echo time. This approach aims to mitigate noise in ΔB Tot by correctly modeling as normally distributed the noise in the real and imaginary parts of the complex signal. Unlike weighted-averaging-based approaches, NLFit enables estimating 0 . Notably, the input phase to nonlinear fitting is minimally processed because it only requires to be temporally unwrapped, thus avoiding the application of LBMs before multi-echo combination.

Weighted averaging
Multi-echo combination (with n echoes) via weighted averaging has been performed using either TE-based weighting factors 8,9 or phase SNR-based weighting factors. 10 TE-based weighted averaging (TE-wAvg) 8,9 accounts for the phase at shorter TEs being affected by larger noise levels than at longer TEs. TE-wAvg calculates a combined ΔB Tot as with weights equal to: TE-wAvg requires temporal unwrapping of the input multi-echo phase, resulting in a combined ΔB Tot , which still contains ΔB Bg contributions.
SNR-based weighted averaging (SNR-wAvg) 10 accounts for different tissue types reaching optimal SNR at different TEs and calculates a combined local field map (ΔB Loc ) as: with weights equal to: ) .
In Equation 10, R * 2 denotes the map of voxel-wise transverse relaxation rates, which is related to the signal magnitude M by: with M 0 the initial transverse magnetization. SNR-wAvg requires performing temporal and spatial unwrapping as well as background field removal on the input multi-echo phase, resulting in a background field-free field map. Alternatively, a distinct map has been calculated at each TE, and multi-echo combination of over time has been performed via weighted averaging using magnitude-based weighting factors (Susc-wAvg). 11 Based on the inverse proportionality of the phase noise and magnitude SNR, 17 this method aims to improve the SNR of the combined map as: with weights equal to:

Noise propagation
Previous studies using fitting 13 or SNR-wAvg 10 have calculated expressions for the noise in the total field map ( (ΔB Tot )). For TE-wAvg or SNR-wAvg, expressions for (ΔB Tot ) have not been calculated and were therefore derived here. For each multi-echo combination method, expressions for noise propagation from ΔB Tot to the corresponding ΔB Loc and images were also derived here.
Based on error propagation, the noise in ΔB Tot calculated using a linear least squares fitting approach 13 is (see Supporting Information, Section 1): Equation 14 also corresponds to the a priori noise estimate found in nonlinear least squares fitting, 12 thus . Based on Equations 7-10, ΔB Tot calculated using TE-wAvg and ΔB Loc calculated using SNR-wAvg, respectively, have variances equal to: . (16) Assuming that noise in the single-echo phase is temporally uncorrelated and based on error propagation, the noise in ΔB Tot / ΔB Loc calculated using TE-wAvg/SNR-wAvg is, respectively, equal to: Equations 17 and 18 omit the r dependency because they combine the multi-echo phase voxel by voxel.

Noise in the local field map
Based on error propagation and the orthogonality 18 of ΔB Loc and the ΔB Bg in the ROI, for example, the brain: where ⇔ denotes an "if and only if" relationship, and (ΔB Loc ) = (ΔB Tot ) is the worst-case scenario.
Where FT and FT −1 , respectively, denote the direct and inverse Fourier transforms, and k denotes k-space coordinates. Deriving an analytical expression forD −1 is possible, for example, when considering thresholded k-space division or the Tikhonov-regularized minimal norm solution. 13,19,20 For weighted averaging of TE-dependent (Equations 12 and 13), based on phase error propagation over time and Equation 20, Based on Equations 12, 13, 20, and 21 (see Supporting Information, Section 3), is the square root of:

In vivo data acquisition
Multi-echo 3D GRE imaging of 10 healthy volunteers (average age/age range: 26/22-30 years, 5 females) was performed in 2 centers (University College London Hospital and Queen Square Multiple Sclerosis Centre, University College London) equipped with the same 3 Tesla MRI system (Philips Achieva, Philips Healthcare, Best, NL; 32-channel head coil). Five subjects were acquired in each center. All volunteers provided written informed consent, and the local research ethics committees approved the experimental sessions. Images were acquired using a transverse orientation, FOV = 240×240×144 mm 3 , voxel size = 1-mm isotropic, flip angle = 20 0 , TR = 29 ms, 5 evenly spaced echoes (TE 1 /TE spacing = 3/5.4 ms), bandwidth = 270 Hz/pixel, SENSE 23 factors = 2/1.5, flyback gradients = on, no flow compensating gradients, total scan duration = 04:37 min:s. 24 Five subjects were scanned twice within the same session without repositioning to enable image-based calculation of magnitude and phase SNRs.

Data simulation from a numerical head phantom
To ensure the availability of ground-truth values against which to test the accuracy of QSM pipelines, a Zubal numerical head phantom was used 25 with the following ROIs: the caudate nucleus (CN), globus pallidus (GP), putamen (PU), thalamus (TH), superior sagittal sinus, gray and white matter (GM and WM), and CSF. To match the acquisitions in vivo, the original 1.5-mm isotropic phantom was resampled to a 1-mm isotropic resolution with matrix size = 384×384×192 voxels. Compared to our previous study, 26 the numerical phantom was updated to achieve realistic regional means ± SDs for , the proton density (M 0 ), and the transverse relaxation rate (T * 2 = 1∕R * 2 ) (see Supporting Information, Section 4). Simulated multi-echo complex data were generated based on these ground-truth spatially variable , M 0 , and T * 2 distributions, as: with and M, respectively, described by Equations 1 and 11, and TEs matched to the in vivo acquisitions. Random zero-mean Gaussian noise with a SD = 0.07 was added to the real and imaginary parts of the noise-free signal independently. 17,26 The random noise matrix was regenerated at each TE.

Data preprocessing
A brain mask was calculated for each subject by applying FSL brain extraction tool 27,28 with robust brain center estimation (threshold = 0.3) to the magnitude image at the longest TE. This choice of TE accounted for the greater amount of signal dropout near regions of high-gradients compared to shorter TEs. A whole-brain mask for the Zubal phantom was calculated by applying FSL brain extraction tool with robust brain center estimation (threshold = 0.5) to the T * 2 map of the numerical phantom. 26

Processing pipelines for QSM
Five distinct processing pipelines ( Figure 1) were applied to both the numerically simulated and the healthy volunteer data, and the time required to run each pipeline was measured using MatLab's stopwatch timer (Math-Works). Three of these pipelines (NLFit, TE-wAvg, and SNR-wAvg) combined the phase across TEs at different stages before performing the ΔB Loc -to-inversion. Two other pipelines (Susc-wAvg and Susc-total generalized variation (TGV)-wAvg) first calculated a distinct map at each TE and then combined the maps. The following paragraphs describe each processing pipeline in detail. The NLFit pipeline 26 first combined the complex GRE signal by nonlinear fitting over TEs 12 using the Cornell QSM software package's Fit_ppm_complex function. 29 It then applied simultaneous spatial phase unwrapping and ΔB Bg removal using sophisticated harmonic artifact reduction for phase data (SHARP), 30 a direct solver of Equation 5. 30 SHARP was chosen because it has been widely used in the literature on QSM and is both robust and numerically efficient. 2 Moreover, in our recent study comparing multi-echo and TE-dependent QSM, a multi-echo pipeline incorporating SHARP gave highly accurate multi-echo QSM values. 26 SHARP was applied using the minimum-size 3-voxel isotropic 3D Laplacian kernel, 30 a threshold for truncated singular value decomposition equal to 0.05, and a brain mask eroded by 5 voxels.
The TE-wAvg processing pipeline first applied Laplacian unwrapping to the phase at each TE using a threshold for truncated singular value decomposition equal to 10 −10 (i.e., the default value in Ref. 29). Second, it calculated
The SNR-wAvg processing pipeline first applied simultaneous phase unwrapping and ΔB Bg removal to the phase at each TE using SHARP with the same parameters as in NLFit. An R * 2 map was calculated by voxel-wise fitting Equation 11 using MatLab's nlinfit function, where initial values for R * 2 and M 0 were calculated by linearly fitting the log-linearized version of Equation 11. The SNR-wAvg pipeline then calculated ΔB Loc by averaging the unwrapped and background field-free phase according to Equations 9 and 10. 10 In the NLFit, TE-wAvg, and SNR-wAvg pipelines, ΔB Loc -to-inversion was performed using Tikhonov regularization with correction for susceptibility underestimation and using the L-curve method to determine the optimal value for the regularization parameter. 13,30,31 This inversion method was chosen because it is computationally efficient and substantially reduces streaking artifacts relative to thresholded k-space division. 32 The Susc-wAvg processing pipeline calculated a distinct map at each TE by applying simultaneous phase unwrapping and ΔB Bg removal using SHARP as in NLFit and performing the ΔB Loc -to-inversion using Tikhonov regularization as in NLFit, TE-wAvg, and SNR-wAvg. This pipeline then calculated a combined map according to Equations 12 and 13. 11 The Susc-TGV-wAvg processing pipeline applied 1-step TGV 33 to the phase at each TE and then calculated a combined map as in Susc-wAvg. The TGV method was tested because it avoids stair-casing artifacts in the resulting map while correctly preserving structural borders. 33 Moreover, in our recent study comparing multi-echo and TE-dependent QSM, TGV provided highly accurate TE-dependent QSM images. 26 TGV (v1.0.0_20210629) was run in Neurodesk (v20220302, https://neurodesk.github.io/) with the default parameter values ( 1 , 0 ) = (0.0015,0.005), which are optimal for medical imaging applications. 33

ROI segmentation in the healthy volunteer images
Regional values were compared within the simulated and in vivo data, similarly to our previous study. 26 ROIs similar to those in the numerical phantom were segmented in vivo: the CN, GP, PU, TH, posterior corona radiata (PCR) as a WM ROI, and the straight sinus (StrS) as a venous ROI. Briefly, for each subject, the CN, GP, PU, TH and PCR were segmented based on the Eve atlas, 34 for which the GRE magnitude image was aligned to each subject's fifth-echo magnitude image using NiftyReg (NiftK v14.11.0) 35,36 (TE Eve /TE 5 = 24/24.6 ms). The quality of ROI alignment was assessed by visual inspection. The ITK-SNAP (active contour segmentation tool 37 was used to segment the StrS over several slices based on the fifth-echo magnitude image, which presented the best contrast between the StrS and the surrounding brain tissue.

Quantitative evaluation of the measured
In the numerical phantom simulations, each QSM pipeline's performance relative to the ground truth was visually assessed by calculating a difference image between the corresponding ΔB Loc ∕ map and ΔB True Loc ∕ True . Here, ΔB True Loc referred to the ground-truth local field calculated using the reference scan method, 18,38 and True to the ground-truth magnetic susceptibility distribution with realistic regional means ± SDs of (Supporting Information Figure S1B). Means and SDs of were calculated for each pipeline in each ROI with True ≠ 0, that is, the CN, GP, PU, TH, superior sagittal sinus, and WM. The RMSEs of both ΔB Loc and relative to True were also calculated throughout the brain volume. 26 Notably, RMSEs for ΔB Loc could not be calculated for the 1-step Susc-TGV-wAvg pipeline.
In the volunteers, due to to the lack of a ground truth, representative susceptibility difference images were calculated relative to NLFit because the NLFit pipeline performed multi-echo combination at the earliest possible stage; and relative to TE−wAvg , because the TE-wAvg pipeline had the lowest local field RMSE in the numerical phantom simulations (see the Results). Regional means and SDs of were calculated for each processing pipeline and compared against values in subjects of a similar age from the QSM literature. RMSEs could not be calculated because of the lack of a ground truth. For visualization purposes, the pooled averages and SDs were calculated 26 after verifying that all intrasubject SDs of were larger than the intersubject SD of .

Noise propagation maps
Only the healthy volunteers scanned twice were considered for this analysis. To enable image SNR calculation, in 1 healthy volunteer, five 20×20-voxel ROIs were drawn on a sagittal slice of the first-echo magnitude image, 39 including both the GM and WM and excluding regions with artifacts induced by SENSE, motion, or flow. All 5 ROIs were applied across the other 4 volunteers by using rigid alignment transforms (NiftyReg 36 ) between the first-echo magnitude images.
In each subject, ROI-based magnitude (M ROI ), magnitude noise ( ROI (M)), and phase noise values ( ROI ( )) were calculated based on the SNR difference method 40 : Here, M 1 ∕ 1 and M 2 ∕ 2 , respectively, denote the magnitude/phase images from the first and second scan, and R = 3 is the 2D SENSE factor calculated by multiplying the SENSE factors applied along the 2 phase encoding directions. 41 The values calculated based on Equations 24-26 were averaged across the 5 ROIs to calculate summary values of magnitude, magnitude noise, and phase noise at each TE. For both the numerical phantom simulations and the healthy volunteers, a phase noise ( ) map was calculated at each TE as 17,42 : where c(TE) was a constant equal to 1 (by definition) in the numerical phantom simulations and equal to: in the healthy volunteers. Notably, calculating phase noise analytically as in Equation 27 enabled the direct comparison of noise propagation between numerical simulations and data acquired in vivo. For the NLFit, TE-wAvg, and SNR-wAvg pipelines, the (ΔB Loc ) map was calculated based on the multi-echo ( ) maps according to Equations 14 and 17-19. Then, the ( ) map was calculated according to Equation 20 with B 0 = 3 Tesla and the Tikhonov-regularized inverse magnetic dipole kernel with regularization parameter 13 : and correction for underestimation. 30 For the Susc-wAvg pipeline, the ( ) map was calculated based on Equation 22. ( ) maps were not calculated for the Susc-TGV-wAvg pipeline because TGV estimates iteratively 33 and an analytical expression for ( ) could not be derived.
To compare ( ) maps across pipelines, a line profile was traced in the same location of all ( ) images, and the ( ) value of each voxel along this profile was plotted. To compare the noise intensity and its variability between processing pipelines, the mean and SD of this representative line profile were also calculated.

Statistical analysis
Statistical analyses were performed based on the healthy subject data. For each ROI and each pair of pipelines, Bland-Altman analysis of the average was used to assess if pairs of pipelines systematically produced different results. For each ROI, statistically significant differences between pipelines were tested by considering the corresponding distributions of average values across subjects. To assess whether to apply parametric paired t tests or nonparametric sign tests, the normal distribution of the differences between paired values was assessed using the Shapiro-Wilk test. All statistical tests were 2-tailed, and an uncorrected P value < 0.05 was considered significant.

Pooling of measurements
For each ROI and each processing pipeline, all intrasubject SDs of were larger than the intersubject SD. Thus, pooled means and SDs were calculated. 26

Performance of pipelines for multi-echo QSM
The Susc-wAvg and Susc-TGV-wAvg processing pipelines were the longest to run (Supporting Information Table S1) because they calculated a QSM map at each TE. The longer processing times required for the numerical phantom data were linked to the larger matrix size (384×384×192) compared to acquisitions in vivo (240×240×144).
In the numerical phantom and for each processing pipeline, Figure 3A shows the regional means and SDs of . The error between the calculated and ground-truth appeared similar for all processing pipelines, although slightly larger SDs were always observed for the NLFit pipeline ( Figures 2M-V and 3A). The superior sagittal sinus, which was the structure with the largest True , showed the largest susceptibility errors for all processing pipelines (arrowheads in Figures 2G-L, R-V, and 3A).
For one representative volunteer, Figures 4A-J show the susceptibility images calculated using each processing pipeline. Additionally, differences images are shown relative to the NLFit (K-R) and the TE−wAvg maps (S-Z). Here, susceptibility differences between processing pipelines were most prominent in the StrS (arrowheads in Figures 4F-J, O-R, W-Z). In the healthy volunteers, Figure 3B shows the pooled regional means and SDs of calculated by each processing pipeline. The average measured in the deep-GM ROIs and in the PCR had values within the ranges reported by previous studies 34,[43][44][45][46][47] : 0.01-0.13 ppm for the CN, 0.06-0.29 ppm for the GP, 0.02-0.14 ppm for the PU, −0.02-0.08 ppm for the TH, and −0.06-0.03 ppm for the PCR. In the StrS, only NLFit had an average value close to the previously reported range for venous blood, namely, 0.17-0.58 ppm 32,46,48,49 ( Figure 3B).
In the CN, GP, and venous ROIs, the regional χ measured in vivo had a relative accuracy across pipelines similar to the numerical phantom simulations: the NLFit and Susc-TGV-wAvg pipelines, respectively, provided the highest and lowest means of , whereas the TE-wAvg, SNR-wAvg, and Susc-wAvg pipelines provided intermediate values (Figure 3). Slightly different trends were observed in the PU, TH, and WM ROIs, where, in vivo, the NLFit and Susc-TGV-wAvg pipelines provided the lowest (absolute) means of ; and the TE-wAvg, SNR-wAvg, and Susc-wAvg pipelines provided higher values ( Figure 3B). In the numerical phantom simulations, the NLFit pipeline always had the largest SD of ( Figure 3A). In contrast, in vivo, the NLFit pipeline had the smallest SD of in the CN, PU, TH, and PCR, and SDs of comparable to other pipelines in the GP and StrS ( Figure 3B).

4.3
Phase noise propagation into the maps Figures 5 and 6 show the estimated ( ) maps and profiles in the numerical phantom simulations and a representative healthy subject, respectively. All ( ) images contained some degree of streaking artifacts, especially in vivo, because these are a manifestation of error propagation from ΔB Loc to caused by the dipole kernel null space. 20 In the phantom, the NLFit and Susc-wAvg pipelines had the ( ) line profiles with the lowest means and SDs (Figures 5A,D,E,H,I). The NLFit and SNR-wAvg pipelines always resulted in the lowest streaking artifacts burden ( Figures 5A,E,C,G,I and 6A,E,C,G,I). Streaking artifacts were more severe in the TE-wAvg and Susc-wAvg pipelines, especially near high-venous structures.

Statistical analysis
In the healthy volunteers, for all processing pipelines and ROIs, the Shapiro-Wilk test always rejected the hypothesis of normally distributed paired differences of . Therefore, pairwise comparisons between pipelines were always evaluated using the nonparametric sign test.

F I G U R E 3
Means and SDs of in the phantom and healthy volunteer ROIs. The means and SDs (error bars) of are shown in each ROI of the numerically simulated (A) and pooled healthy volunteer data (B) for each processing pipeline. In the numerical phantom, the ground-truth is also shown. In the healthy volunteers, significant differences between pairs of pipelines are denoted using the symbols * ( P value <0.05) and ** ( P value <0.01) . CN, caudate nucleus; GP, globus pallidus; PCR, posterior corona radiata; PU, putamen; StrS, straight sinus; TH, thalamus; WM, white matter; SSS, superior sagittal sinus Significant differences between pipelines are shown in Figure 3B, whereas the between-pipeline biases are shown in Figure 7. A lower threshold equal to |0.01| ppm (|⋅| denotes the absolute value) was chosen because, below this level, interpipeline differences cannot be disentangled from the intrapipeline variability (i.e., the regional SD).
A bias greater than |0.01| ppm was observed for the NLFit pipeline relative to all other pipelines in the CN and StrS (Figure 7). A bias greater than |0.01| ppm was observed for the Susc-TGV-wAvg pipeline relative to all other pipelines in the GP, and relative to the TE-wAvg and Susc-wAvg pipelines in the StrS ( Figure 7). These results suggests that, for accurate quantification, some of the significant differences detected by the sign test, for example, between the TE-wAvg, SNR-wAvg, and Susc-wAvg pipelines, may be negligible ( Figures 3B and 7).

DISCUSSION
Aiming to elucidate the optimal strategy for multi-echo combination for QSM, this study compared multi-echo combination methods applied at different stages of the F I G U R E 4 maps calculated using distinct multi-echo combination methods in a representative healthy volunteer. The same transverse and sagittal slices are shown for the susceptibility maps calculated using NLFit (A, F), TE-wAvg (B, G), SNR-wAvg (C, H), Susc-wAvg (D, I), and Susc-TGV-wAvg (E, J). The figure also shows the differences between the TE-wAvg, SNR-wAvg, Susc-wAvg, and Susc-TGV-wAvg maps and the NLFit map (K-R); and the differences between the NLFit, SNR-wAvg, Susc-wAvg, and Susc-TGV-wAvg maps and the TE-wAvg map (S-Z). In all the sagittal images (F-J, O-R, W-Z), the yellow arrowheads point at the same location in the StrS QSM processing pipeline before or after LBMs for phase unwrapping or background field removal. Each pipeline was applied to numerically simulated data and images from healthy volunteers.
The higher relative accuracy of the NLFit pipeline in the numerical phantom simulations and in vivo suggests that, for QSM, combining the temporally unwrapped multi-echo phase before applying LBMs for spatial phase

F I G U R E 5
Susceptibility noise profiles in the numerical phantom simulations. The same sagittal slice is shown for susceptibility noise ( ( )) images calculated using the NLFit, TE-wAvg, SNR-wAvg, and Susc-wAvg pipelines (A-D). The susceptibility noise is plotted for a line profile traced on the ( ) images (E-H), including the high-superior sagittal sinus. All line profiles are also shown combined in the same plot (I). The mean and the SD of ( ) are shown for each line profile

F I G U R E 6
Susceptibility noise profiles in a representative healthy volunteer. The same sagittal slice is shown for susceptibility noise ( ( )) images calculated using the NLFit, TE-wAvg, SNR-wAvg, and Susc-wAvg pipelines (A-D). The susceptibility noise is plotted for a line profile traced on the ( ) images (E-H) including the high-StrS. All line profiles are also shown combined in the same plot (I). The mean and SD of ( ) are shown for each line profile unwrapping or ΔB Bg removal is preferable to averaging the TE-dependent LBM-processed phase or . This suggestion appears to conflict with the higher RMSEs associated with the NLFit pipeline compared to other pipelines in the phantom simulations (Figures 2 and Supporting Information Figure S2). However, the RMSE jointly reflects systematic and random errors because it measures the bias between the estimated and true value and also reflects the variability of the estimated relative to its average value. 3 Thus, the RMSE must always be interpreted in combination with complementary measurements of bias and precision. Furthermore, RMSEs of ΔB Loc are difficult to interpret. In contrast with RMSEs of , they allow comparison of different pipelines without the effect of ΔB Loc -to-inversion. However, they are voxel-based measures based on a signal that is intrinsically nonlocal because ΔB Loc variations extend beyond the anatomical region of shift that generated them. 19 Thus, the best set of metrics for comparing images generated by a processing pipeline for QSM is still an active area of research. 3,4 There are several potential explanations as to why LBMs applied before multi-echo combination reduce the overall accuracy of QSM. Firstly, in contrast with path-based or region-growing-based phase unwrapping, Laplacian phase unwrapping usually removes some ΔB Bg components from the input phase image. 30 Thus, the consistently lower accuracy of calculated using the TE-wAvg processing pipeline was probably driven by the incorrect

F I G U R E 7
Bias between multi-echo pipelines for QSM in each ROI. The mean and SDs (error bars) of the bias are shown in each healthy volunteer ROI for all pairs of multi-echo processing pipelines. The gray band denotes the [−0.01-0.01] ppm interval. If the mean of the bias was within this interval, the difference between the corresponding pair of QSM pipelines was considered negligible. ROI, region of interest assumption that the TE-dependent LBM-unwrapped phase corresponded to the true unwrapped phase. LBMs for ΔB Bg removal also applied truncated singular value decomposition (with a larger truncation threshold), but here the high-pass filtering effect was expected because background fields are slowly varying. Finally, the similar accuracy and values of calculated using the TE-wAvg, SNR-wAvg, and Susc-wAvg pipelines suggests a negligible difference between averaging the Laplacian unwrapped phase over TEs before (TE-wAvg pipeline) or after TE-dependent Laplacian ΔB Bg removal (SNR-wAvg and Susc-wAvg pipelines).
In the numerical phantom simulations, all processing pipelines resulted in higher SDs of compared to the ground truth, suggesting that noise in the multi-echo signal phase was amplified by all pipelines. This result is in line with the known noise amplification of ill-posed inverse problems. However, the estimated SDs of varied across pipelines. In the simulations, the Susc-TGV-wAvg pipeline had the smallest SDs of ( Figure 3A). However, in vivo, the NLFit pipeline generally had the smallest SDs of ( Figure 3B). Both the TGV reconstruction pipeline and shortcomings of the numerically simulated data could explain these discrepancies in the performance of the Susc-TGV-wAvg pipeline. The numerically simulated data were generated based on a digital phantom which, despite varying regional values in a realistic fashion (see Supporting Information, Section 4 and Figure  S1), ultimately still appeared as a smooth piece-wise constant model (Figure 2A,G). As previously observed, 3 piece-wise constant geometries allow good recovery of the underlying distribution using TGV-based algorithms because the piece-wise constant constraints exactly match the underlying distribution. However, in regions with flow, anisotropic distributions, or microstructure, these numerical models are likely to depart from a realistic representation of the tissue . To overcome the limitations of this assumption, future studies could exploit a newly developed realistic head phantom for QSM, which does not have a piecewise constant distribution and incorporates microstructural effects. 50 In both numerical simulations and healthy volunteers, based on the line profiles traced on the noise maps and streaking artifact reduction, the NLFit pipeline had better noise mitigation (Figures 5 and 6). This result suggests that combining the temporally unwrapped multi-echo phase by nonlinear complex fitting, designed to account for noise in the complex signal, 12 results in better noise management than combining the multi-echo phase by averaging. As previously shown, 12 errors in the combined field map mainly result from both noise in the signal phase and phase unwrapping errors near high-regions (e.g., the veins). Both sources of error were successfully managed by nonlinear complex fitting, as shown by the dramatic reduction of streaking artifacts in Figure 6A. In line with previous observations, 51 this result also suggests that the regularization strategy employed by the ΔB Loc -to step can mitigate artifactual streaking errors only when major sources of error in the field map have been tightly constrained. At visual inspection in vivo, the SNR-wAvg was the second-best pipeline for the mitigation of streaking artifacts ( Figure 6). Thus, if applying nonlinear complex fitting is not possible, averaging using the SNR-weighting-based method could offer the best alternative for noise reduction.
All these indications do not necessarily apply to estimation in WM tissue. Indeed, due to WM's ordered microstructure, a comprehensive estimation of in WM requires acquiring gradient-recalled echo images at multiple head orientations and modeling as a tensor. 52,53 Thus, further studies are needed to evaluate the applicability of these results to WM tissue.
In the present study, all experiments were limited to one field strength (i.e., 3 Tesla). Because tissue relaxation times (e.g., T * 2 ) shorten with increasing field strength, but the signal phase at a given TE increases, further work is needed to assess the relevance of these results at ultrahigh fields. Finally, it must be noted that in the numerical phantom simulations, all processing pipelines underestimated True (Figure 3). However, in QSM some degree of underestimation is always expected due to the ill-posed nature of the ΔB Loc -to-inverse problem. 20

CONCLUSION
The higher accuracy of regional values and better noise management of the NLFit pipeline suggest that, for QSM, combining the multi-echo phase by nonlinearly fitting over TEs before applying LBMs is preferable to combining the TE-dependent LBM-processed phase or by averaging.

DATA AVAILABILITY
The MatLab code (MathWorks) used to run the analyses in this study is available at https://github.com/ emmabiondetti/multi-echo-qsm. The repository's main page lists all dependencies on code developed by other groups.

SUPPORTING INFORMATION
Additional supporting information may be found in the online version of the article at the publisher's website.
Section 1. Noise in total field map calculated by fitting Section 2. Noise propagation from the local field to the susceptibility map Section 3. Noise propagation in the Susc-wAvg pipeline Section 4. Numerical phantom simulations Figure S1. Properties of the numerical phantom. M 0 in arbitrary units (a. u.), T * 2 in ms and in parts per million (ppm) assigned to various ROIs in the numerical phantom are shown in (A). The location of these ROIs is shown in the (B) and T * 2 maps (C) of the numerical phantom Table S1 . Total image processing time. The table shows the time required to run each pipeline for the numerical phantom simulations and data acquired in vivo. The time reported for the SNR-wAvg pipeline does not include the time required for R * 2 mapping, as optimizing this step was outside the scope of the present study. For the Susc-TGV-wAvg pipeline, approximate timings are reported because image reconstruction was performed using first Neurodesk (QSM calculation at each TE) and then MatLab (multi-echo combination) Figure S2 . ΔB Loc maps calculated using distinct multi-echo combination methods in the numerical phantom simulations. The same transverse and sagittal slices are shown for the ground-truth local field map (A, G), and for the local field maps calculated using NLFit (B, H), TE-wAvg (C, I), SNR-wAvg (D, J), and Susc-wAvg at each TE (E, O). The figure also shows the difference between each local field map and the ground truth (P-E2). The bottom row shows the RMSEs of ΔB Loc for each pipeline How to cite this article: Biondetti