QSM Reconstruction Challenge 2.0–Part 1: A Realistic in silico Head Phantom for MRI data simulation and evaluation of susceptibility mapping procedures

Purpose: To create a realistic in-silico head phantom for the second QSM Reconstruction Challenge and for future evaluations of processing algorithms for Quantitative Susceptibility Mapping (QSM). Methods: We created a whole-head tissue property model by segmenting and post-processing high-resolution, multi-parametric MRI data acquired from a healthy volunteer. We simulated the steady-state magnetization using a Bloch simulator and mimicked a Cartesian sampling scheme through Fourier-based post-processing. We demonstrated some of the phantom’s properties, including the possibility of generating phase data that do not evolve linearly with echo time due to partial volume effects or complex distributions of frequency shifts within the voxel. Computer code for generating the phantom and performing the MR simulation was designed to facilitate flexible modifications of the model, such as the inclusion of pathologies, as well as the simulation of a wide range of acquisition protocols. Results: The brain-part of the phantom features realistic morphology combined with realistic spatial variations in relaxation and susceptibility values. Simulation code allows adjusting the following parameters and effects: repetition time and echo time, voxel size, background fields, and RF phase biases. Additionally, diffusion weighted imaging data of the phantom is provided allowing future investigations of tissue microstructure effects in phase and QSM algorithms. Conclusion: The presented phantom and computer programs are publicly available and may serve as a ground truth in future assessments of the faithfulness of quantitative MRI reconstruction algorithms.


Introduction
Quantitative Susceptibility Mapping (QSM) has proven to be a valuable tool for assessing iron concentrations in the deep gray matter (DGM) (1), estimating vessel oxygenation (2), differentiating blood and calcium products (3), and studying demyelinating lesions in the white matter (WM).However, several recent methodical studies have suggested that study outcomes may be affected by the particular processing algorithms chosen for QSM (4)(5)(6).QSM typically involves the following steps in the order of application: coil combination (4); phase unwrapping (6); multi-echo combination (4); background field removal (6); and finally the estimation of susceptibility maps (5,(7)(8)(9).Processing artifacts and inaccuracies at any of these 5 processing stages propagate into the computed susceptibility maps.
The first QSM Reconstruction Challenge (RC1) in 2016 (10) aimed to provide insights on the accuracy of the last processing step: the estimation of susceptibility maps from backgroundcorrected frequency maps.One of the key conclusions of the challenge was that the choice of the algorithm and the used parameter settings can have a significant effect on the appearance and accuracy of computed susceptibility maps.However, the first challenge also raised questions about the gold-standard (reference) susceptibility map used for evaluating the challenge submissions.The reference susceptibility maps for RC1 were generated from multiple acquisitions in which the subject had rotated the head towards 12 different orientations.Specifically, two reference maps were used, one calculated with susceptibility tensor imaging (STI) (11) technique and one with the Calculation Of Susceptibility through Multiple Orientation Sampling (COSMOS) (12) technique.Meanwhile, only one field-map obtained from those twelve orientation acquisitions was provided to challenge participants.After completion of the challenge, it was observed (13) that a non-negligible discrepancy existed between the provided frequency map and the frequency map obtained when forward-simulating the field perturbations using the provided reference susceptibility maps.Part of the discrepancies could be explained by the presence of unaccounted microstructure effects on in vivo brain phase images (14).Since current single-orientation QSM algorithms assume that frequency contrast is caused entirely by variations in bulk magnetic susceptibility, their solution did not match the multiple orientation methods, which resulted in the best performing challenge submissions being over-regularized and having a non-natural appearance.
During the planning phase for the second reconstruction challenge (RC2) in 2019, we concluded that synthesized phantoms and realistic forward simulations are needed to evaluate the accuracy and robustness of QSM methods.Using in silico models as a basis for benchmarking instead of real data allows comparing the reconstructed maps to the ground truth.The goal of the RC2 was to provide insights on the current state of the art in QSM algorithms and identify their strengths and limitations in different scenarios.The challenge used the numerical phantoms presented in this paper and consisted of two stages: Stage 1 mimicked the clinical setting in which the ground-truth was unknown; in Stage 2 the ground-truth was available and thus allowed for a systematic parameter optimisations for the best possible quality metrics that can be obtained with a specific reconstruction algorithm.The challenge results are reported in a separate manuscript This in silico model was developed to address several specific demands from the community developing QSM methods.Most QSM algorithms were either demonstrated based on their visual appearance, based on the RMSE using simple digital piece-wise constant phantoms (either simple shapes such cylinders or head models), by comparison to phantom data using varying quantities of contrast agents (15), or comparing the obtained in vivo reconstructions to that obtained by a ground-truth COSMOS reconstruction (16,17).These approaches have severe limitations, for example: digital phantoms and liquid or gel phantoms share property of having piece-wise constant susceptibility distributions which tend to favor methods based on total variation.When using a COSMOS solution as ground-truth, despite their reduced level of streaking artifacts, one is assuming that the dipole model with the sphere of Lorentz approximation used in the QSM formulation is valid through-out the brain (14).Moreover, digital models allow a controlled investigation of the effect of deviations from the underlying QSM model on the reconstruction performance, such as the inclusion of microstructure-related frequency effects.Similar limitations are present when developing background field removal methods.Most methods do not consider the degree of accuracy of the field measurements close to the edges that are affected by various factors such as signal drop-out and the non-linearity of the phase evolution due to the non-negligible higher order spatial terms inside the pixel (18).All the above aspects can be addressed using the a realistic digital phantom.This paper presents a framework for generating realistic digital head phantoms used for RC2.The phantom features a realistic brain morphology and naturally varying susceptibility distribution within anatomical regions.As a first step toward systematically evaluating all experimental aspects that influence QSM reconstruction quality, the presented in-silico phantom enables enforcing consistency of the provided frequency map with the physical model underlying current QSM algorithms.Code and data are made freely available and the authors encourage researchers to use the phantom to evaluate their processing algorithms in the future.The modular design of the code facilitates adding new features to the phantom, such as calcifications and hemorrhages provided that relaxometry maps exist.Furthermore the software package may be used to optimize acquisition protocols, prepare and test QSM reconstruction pipelines, and to study the impact of effects that deviate from the QSM standard formulation (such as tissue microstructure and signal voids); or to simulate the effect of the extended readouts.

Data acquisition
We acquired MRI data from a human volunteer (F, 38 years) who gave informed consent, and the experiment was approved by the local Medical Ethical Committees (Amsterdam University Medical Centre and Radboud University Medical Centre).We used a 7T scanner to obtain relaxation rate maps and a 3T scanner to obtain DTI data and bone-air tissue interfaces.For generating the brain model, we acquired inherently co-registered quantitative maps of R1 (19), R2 * , χ and M0 maps using the MP2RAGEME (20) sequence on a 7T (Philips Achieva) scanner.
To enable additional microstructure effects to the model, we acquired DTI data using two simultaneous multi-slice EPI-based datasets with opposed phase encoding directions.The main sequence parameters were TR/TE=3520/74 ms, SMS factor 3, in-plane acceleration 2, matrix size = 140x140x93 and FOV=210x210x139.5 mm, resulting in 1.5mm isotropic image resolution.The diffusion-weighted parameters were b=0/1250/2500 s/mm² and 12/90/90 directions, respectively, resulting in a total acquisition time of 12 min 10 secs.

Tissue segmentation
Figure 1 shows a pictorial representation of the pipeline used for the segmentation.The 7T T1 maps, derived from the MP2RAGE dataset, were segmented into 28 tissue classes, including left and right splitting, using the cbstools atlas based pipeline (https://www.cbs.mpg.de/institute/software/cbs-tools(22)).Classes were then re-clustered into 16 tissue clusters: CSF (initially split into 4 classes); grey matter (initially split into 8 classes, left and right cortical, cerebellar, amygdala and hippocampus); caudate; putamen; thalamus; white matter (encephalus, cerebellum and brain stem); and large blood vessels.Deep grey matter structures not clearly distinguishable on T1 maps (red nucleus, substantia nigra, globus pallidus and dentate nucleus) were manually segmented using the active contours function implemented in ITKSnap (version 3.6) (23) on R2* and  maps.Subject's interhemispheric calcification was identified and segmented using the M0 map.An initial vein-and-artery mask was computed based on Frangi filtered R2* maps as described in (24).The region outside the brain was segmented into bone, air and tissue using a model-based segmentation approach with deformable surface meshes (25) using the PETRA sequence as input.The resulting segmentations of nasal cavities and auditory canals were refined manually using ITKSnap for the computation of realistic background fields (see below).
After the combination of the individual tissue brain masks into a piece-wise constant wholehead model, we used R2* and R1 maps to correct the label boundaries in various brain regions using customized thresholds (see supplementary Table 1).Note that the previous tissue segmentations were based on one single quantitative parameter (either R1 or R2*), and because smooth surfaces were enforced in some regions this resulted, segmentation overestimations that benefited from this second iteration.

Susceptibility map
The susceptibility map was simulated by assigning tissue-typical susceptibility values taken from literature (26,27),   (see Table 1).To avoid that susceptibility values were constant throughout anatomical regions (piece-wise constant), which would be unrealistic and could be advantageous to algorithms with gradient-based regularization terms, we modulated the susceptibility values in each region using the image intensities on R1 and R2* maps according to the following equation: where  � 2 * and  � 1 are the mean apparent transverse and longitudinal magnetization rate of that given segmented class.To obtain realistic proportionality parameters   and   Eq.1 was inverted for each brain tissue class using in that case a susceptibility map () computed from the original data using HEIDI.The coefficients ̅  ,   and   used for the two models in the QSM challenge 2.0 can be seen in Table 1 and Figure 2. Because the measured relaxation rates of blood (both in arteries and veins) are prone to errors (due to the inflow for the R1 and flow for R2*), the proportionality values were relatively small for the blood pool, rendering these compartments piece-wise constant.Similarly, bone, calcification and other non-brain tissue compartments were made piece-wise smooth due to the lack of a susceptibility map outside the brain that could guide the derivations on the terms   and   .
Furthermore, R2* maps are typically overestimated close to air-tissue boundaries due to signal dropout (as can be seen in the R2* maps in Fig. 1 over the ear canals) (28).A low-pass filtered version of the gradient of the acquired field maps was used to differentiate regions where the R2* values could be trusted from those where they had to be considered unreliable.In the latter regions, we forced   = 0 (did not account for the R2* contribution when generating the susceptibility model.To avoid discontinuities in the transition between high field gradient and low field gradient, a smooth transition was created by mixing the two models (when Eq.1 was used and when   was forced to 0) in regions with field gradients between 0.08ppm/mm and 0.3ppm/mm (highlighted by blue arrows in Fig. 3 3).Please refer to provided code for more details of implementation.
To avoid unrealistically sharp edges at the interfaces between tissue regions and introduce some degree of partial volume in those regions, the probability of a voxel being a given tissue, Ptissue, was computed by smoothing each binary brain tissue mask using a 3D Gaussian kernel with a full-width at half maximum (FWHM) of 1.2 voxels.This smoothing was not applied to veins to ensure that veins remain sharp, but that their susceptibility will vary depending on its size.The susceptibility model was then given by: 16 =1 [Eq2] The importance of moving from a piece-wise constant (where a   and   are set to zero), to a contrast modulated (where   is simply a binary mask) or the probabilistic formalism of the susceptibility distribution can be appreciated in Figure 3. Yellow arrows highlighting the transition between grey and white matter that becomes smoother, and green arrows smoothing out small segmentation errors within the thalamus.On the other hand, Figure 3 shows that most vessel structures have remained sharp with only some minor reduction in susceptibility value.

Data Simulation
Spoiled gradient recalled echo data can be simulated using the steady state equation: Here,  0 is an initial phase distribution originated from the transceiver phase, and   is the frequency shift directly attributed to magnetic susceptibility.To simulate S for given repetition times (TR), echo times (TE), and flip angles (), we used the M0, R1 and R2* maps derived from the MP2RAGEME (20,29) sequence,  0 () is the TE=0 phase (a second order polynomial ensuring 2 phase variation inside the brain region was used), and the frequency shift, Δ(r), was calculated according to where D(r) is the magnetic field dipole along the z-direction with Lorentzian correction.The convolution was performed in k-space using the formulation proposed in (30,31).To avoid aliasing artifacts associated with the Discrete Fourier Transform (circular convolution), which would appear as unrealistic background fields, we padded the model with zeros along each dimension (factor of 2) before evaluating Eq. 4.
Having a digital phantom allowed the creation of signal models with and without background fields (fields generated by tissues outside the brain).Using the whole-head susceptibility model allowed the creation of realistic background fields.To create a model without background fields, named here after "local field", all voxels outside the brain were set to zero and the susceptibility distribution within the brain was demeaned: Also relevant, but not pursued for the QSM challenge purposes, field shimming is simulated by fitting the frequency map with second or third-order Legendre polynomials.

Examples of acquisition protocol optimization
For the demonstration of the acquisition protocol simulation with the code in appendix 1, we chose two example protocols optimized for different applications: (P1) A protocol optimal for the quantification of deep grey matter susceptibility.In this case, the longest TE should be at least that of the T2 * of the region with the highest iron concentration which is the globus pallidum (14ms).
(P2) A protocol optimal for the observation of cortical grey/white matter contrast in both the magnitude and phase data.In this case, the longest TE was chosen to be close to that of the T2* of cortical GM (33ms) (32).
For the sake of simplicity, both protocols had the same echo spacing of 8ms as the original volunteer dataset.The TR of the acquisition was chosen as short as possible assuming a readout acquisition window of 8ms (P1: TR = 16ms; P2: TR = 40ms).We neglected dead times associated with phase encoding preparation, flow compensation, rewinding, rf excitation, saturation and crushers.The flip angle was chosen at the Ernst angle for the GP (T1=1100ms; α=8) and such that T1-weighted contrast on magnitude images was maximized between WM (T1=1100ms) and cortical GM (T1=1900ms; α=23) in protocols P1 and P2, respectively.This resulted in the following protocols: Protocol 1: TE1/TE2= 4/12ms ; Protocol 2: TE1/TE5= 4/36ms ; We mimicked k-space sampling by cropping the Fourier spectrum of the original 0.65mm resolution data to an effective spatial resolution of 1mm isotropic.We applied the same approach to down-sample the ground-truth susceptibility map.In the case of the susceptibility maps, the sharp edges between structures as well as the orders of magnitude larger susceptibility differences between air/bone and tissue resulted in severe Gibbs ringing artefacts, which were removed using sub-voxel shifts (33).This step was repeated in all three spatial directions.
Multi-echo combination was performed using optimum weights (16) assuming that the phase at TE=0 was unknown and spatial unwrapping was performed across successive echoes using SEGUE (34).QSM reconstruction using TKD (35), closed form L2 (36), FANSI (37) or iLSQR (38) with each of the QSM reconstructions being run 20 times with varying regularization parameters: -TKD: the k-space threshold was varied from 0.02 to 0.4 in linear steps; -Closed form L2: the regularization parameter was varied in 20 logarithmic steps 0.01 to 100; -FANSI TV: the regularization parameter was varied from 10 -6 to 0.1 in logarithmic steps, remaining parameters were µ1=0.005,µ2=1 and 50 iterations; -iLSQR: the regularization term was varied from 10 -3 to 1 in logarithmic steps, had a tolerance of 0.01 and was allowed 50 iterations.
The reconstructions were evaluated in respect to the ground-truth using various metrics tested for the purpose of the QSM challenge: -nRMSE 1 : Whole-brain root-mean-squared error (after demeaning) relative to the root-meansquared of the ground-truth times 100; -rmse_detrend_Tissue: nRMSE relative to ground-truth (after detrending) in grey and white matter mask.By detrending it is meant the slope was between the QSM reconstruction and the ground truth QSM in the ROI was taken and the original reconstruction was then divided by this factor to ensure proportionality errors (measured elsewhere) would not be affecting the obtained results; -rmse_detrend_blood: RMSE relative to ground-truth (after detrending) using a one pixel dilated vein mask; -rmse_detrend_DGM: RMSE relative to ground-truth (after detrending) in a deep gray matter mask (substantia nigra & sub-thalamic nucleus, red nucleus, dentate nucleus, putamen, globus pallidus and caudate); -DeviationFromLinearSlope: Absolute difference between the slope of the average value of the 6 deep gray matter regions vs the prescribed mean value and 1.0.
-CalcStreak: Estimation of the impact of the streaking artifact in a region of interest surrounding the calcification by means of the standard deviation of the difference map between reconstruction and the ground-truth.The region of interest where this was measured was a hollow rectangular prism, with its inner boundary being 2 voxels away from the edge of the calcification and the outer boundary 6 voxels away from the inner boundary.
-CalcMoment: volumetric susceptibility moment of the reconstructed calcification, compared to the ground-truth (computed in the high resolution model to be -49.8ppm).
The process was repeated for the "challenge protocol" with different simulated datasets: -1mm and 0.64mm isotropic resolution -With infinite SNR (no noise added) or peak SNR of 100

Adding microstructural effects to the obtained contrast
Microstructural effects are known to affect the observed phase.One of the driving factors of the microstructural effects is white matter fiber orientation.Diffusion data was processed using fsl software (https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/), eddy_correct and top up were used to undistort the DWI data.Subsequently the data was coregistered to the 7T anatomical space and FDT was used to extract tensor information (FA, fractional anisotropy, main eigenvector orientation).
The provided data and code include a simple first-order approximation of these microstructure effects which is echo time independent.Wharton et al (14) demonstrated that the typical impact at 7T for a protocol with echo times of 7 and 13 ms was given by: where FAnorm is the fractional anisotropy (FA) divided by 0.59 (the average anisotropy observed in a human optic nerve) and θ is the angle between the white matter fiber and the static magnetic field.Both of these quantities can be derived from the acquired diffusion data.Such a correction to the frequency shift is only applied within the segmented white matter mask.From the data shown in Figure 4 it can be expected that P1 will benefit more from a morphological informed reconstruction than P2, and that the use of non-linear fit (43) for the calculation of the field map will also be particularly relevant as noise will dominate the later echoes in P1.

Acquisition Protocol Optimisation
In Figure 5 we compared the field map obtained with the P1 simulation from the whole head model to a field map obtained directly from the susceptibility brain model via Eq. 5 (ground truth field map).When carrying out the signal simulation with the whole-head model, both the base resolution and the 1mm resolution field maps are dominated by the background components arising from the air/bone/tissue interfaces, as can be seen in the transverse slice above the sphenoid sinus (first column, as pointed by the black arrow).The differences to the ground-truth field map (second column) are to a large extent explained by the quadratic field used to represent the procedure of shimming.Once LBV was applied to the total field to obtain the tissue-specific field contributions (third column), the original resolution field map (top row) showed localized, smoothly varying differences relative to the ground truth, which have been described previously (44).Both the base resolution and 1mm resolution look at first sight to have very similar properties, yet the nRMSE from the down-sampled dataset (bottom row) demonstrated additional deviations from the ground truth (nRMSE was 40% higher than that of the high-resolution dataset).This discrepancies are both due to incomplete background field removal (see grey arrows) and due to errors around veins.In such regions, the field map measured from the GRE data naturally deviates from the mean field in that pixel because of the reduced signal in veins (once partial volume is introduced by the reduced resolution, the field estimation is biased towards the tissue compartments).
To disentangle the effects of background field correction and MRI signal simulation on the deviations observed relative to the ground truth, we repeated the signal simulation with the local fields.In that case, the slowly varying smooth deviations disappeared and the high-resolution model did not demonstrate substantial deviations relative to the ground truth.The differences in the field computed at base resolution without background fields (top right panel) is at the numerical precision level, yet the nRMSE is still not negligible (nRMSE = 27) because of the errors present in the calcification region without signal exists and its immediate surrounding where spatial unwrapping fails.
To further investigate the sources of errors discussed in the previous paragraph, Figure 6 evaluates the phase evolution in three voxels, two in the surrounding of the calcification and one in the white matter.Figure 6c shows in the case of an homogeneous tissue region, there is a perfect match between low resolution and high resolution phase evolutions as well as the fitted frequency (based on the 5 echo simulation) and the ground truth frequency (computed from the susceptibility map).Figure 6b shows that in a region closer to the calcification that is also the case for the high-resolution data (light grey lines), while in the case of the low-resolution data (dark grey) there is a larger error both in respected to fitted frequency (dashed line) and ground truth frequency evolution.Also, it is clear that the phase evolution in Fig. 6b in the low resolution data is no longer linear, as is predicted from theory due to partial volume effects and the varying intravoxel frequency gradients (18).In a pixel immediately close to the vicinity of the calcification the errors are further enhanced now also for the high resolution data where unwrapping errors can introduce errors on the fitted frequency (the data is still fitted accordingly, but it does not correspond to the ground truth field).Figure 6c shows the mean squared difference map associated with the frequency fit on the low resolution data, it is clearly visible that these errors are predominantly found in regions of rapidly changing magnetic fields -around the calcification and close to tissue/air/bone interfaces, and surrounding vessels.

Evaluation of QSM reconstructions
Figure 7a shows the nRMSE values associated with various reconstruction methodologies as a function of the regularization value for the high-resolution model in the absence of noise.Figure 7b shows the more realistic case where the images have been down-sampled to 1mm isotropic and noise was added to the data with a peak SNR of 100.It can be seen from a shift of the individual curves to the right that the more realistic model required stronger regularization than the perfect high-resolution model, which can be attributed to the phase noise and inconsistencies associated with the down-sampling process.Surprisingly, the FANSI method, as opposed to the other tested methods achieved a lower nRMSE in the down-sampled model than on the original model (36 vs 43). Figure 7c shows the reconstructions with minimum RMSE of the plots in Fig. 7b for the different methods tested.It can be seen that the direct methods (TKD and closed-form L2) still have some broad striking artifacts in regions surrounding both the calcification and deep gray matter regions while in the iterative methods these were reduced.It is interesting to note that the total variation (TV) regularized nature of FANSI clearly contributed to a better reconstruction of the superior cerebellar vein when compared to iLSQR.
Figures 7d and 7e show the performance of iLSQR and FANSI respectively for the different models and incremental addition of phase inconsistencies.The addition of a small background field term does not have an impact on the optimum regularization value of any of the methods, but it is interesting to observe that iLSQR was better able to cope with this term than FANSI while the latter was less penalized by the presence of noise on original data (smaller increase in nRMSE).
Finally, more specific metrics were evaluated.These included RMSE in different tissues (Fig. 8a), calcification moment calculation and linearity susceptibility estimations in white matter.It is interesting to observe that, for each method, the optimum regularization values for the different regions remains mainly unchanged, with the minimum of each curve corresponding to either the same regularization factor or minimal deviations when compared to the broadness of the curves.
Interestingly, when comparing the dependence of the calculation of the calcification moment (measured as -49.8 ppm on the high resolution model) as a function of the regularization parameter, it can be observed that the two evaluated methods have different optimum regularizations when compared to their optimum RMSE settings (iLSQR and FANSI needed a lower and a higher regularization factors, respectively).Figure 8c, shows the expected behavior where the greatest precision of the accuracy of deep gray matter susceptibility estimation (as measured by the slope between reconstructed average QSM value per deep gray matter structure and ground-truth) was obtained using the smallest amount of regularization (45,46).

Comparison of simulations including microstructure to real data
Figure 9 shows an example of the originally acquired and the simulated magnitude and phase data, respectively.Here it can be clearly seen that the grey-white matter contrast on both the phase data (second column) and tissue frequency (third column) is better approximated when the microstructural correction term is added to the data as expressed in Equation 7.

Discussion
In this paper and the accompanying shared dataset and code (described in greater detail in the Appendix section), we have presented and disseminated a realistic human brain phantom that can be used by both the QSM and the MRI communities to simulate multi-echo GRE data and evaluate QSM pipelines in a controlled manner.
The data and code provided allow users to: -create new susceptibility phantoms, with different levels of spatial modulation of each compartment -this can be simply performed by changing the values presented on Table 1 which control both the mean value and spatial modulation present within each tissue; -create realistic gradient echo multi-echo data at 7T and, to some extent, also at other fields (it should be noted unlike susceptibility, relaxation times are field dependent and their field dependence is tissue dependent).
-assess the impact of protocol changes (as well as nuisance factors such as RF phase, B0 shimming and noise) on the quality of the obtained QSM maps, -assess the impact of changing some of the background field removal and QSM algorithm specific options while having a ground-truth to test it against.
Such a digital phantom will also be important for QSM users and researchers deciding on protocols and collecting patient data.Protocol considerations such as impact of the number of echo times and its range, as well as the degree of T1 weighting and resolution on the ability to accurately measure QSM in a given brain region can be quickly tested in this framework.This would allow for example to extend the analysis done by Karsa et al (47) on the effects of field of view and anisotropic voxels sizes, or the analysis by Biondetti et al on the effects of Laplacian based single echo vs multiple echoes techniques (48).
While the simulation framework is useful for optimizing protocols, the simulation in its current form is a static one, and consequently flow artifacts (which tend to build up particularly when large number of echoes are used), respiration related B0 fluctuations (49) and spatial distortions associated with the readout bandwidth are not considered.The latter two would be relatively straightforward to implement.For example, respiration artifacts would require a library of respiration fields over time as acquired with field cameras, that would be resampled with the sequence TR the field would be added to the simulated down-sampled complex data, Fourier Transformed back to k-space in order to obtain the corrupted k-space line at a given time point.
Flow artifacts would be more complex to simulate because the current vessel segmentation does not distinguish arteries and veins and it would be difficult to have a local flow velocity and pulsatility estimation.
The dipole model used in QSM assumes a sphere of Lorentz approximation (30), which does not hold true, particularly in white matter.The phantom released for the RC2 purposes, explicitly circumvented this limitation by ensuring perfect consistency with the QSM model in Eq. 4, i.e.
no microstructural effects were present.With the released phantom dataset, we include diffusion data (both raw data on the 1.5mm space and its derivatives co-registered to the phantom space after distortion correction).These data can be used to either compute the frequency perturbation as shown in Figure 9, or the Hollow Cylinder Model can be used to explicitly introduce the echotime varying perturbation similarly to what has been recently done in the context of myelin water imaging (42).A critical challenge for more advanced modelling is the resolution of the diffusion acquisition, despite using state-of-the-art hardware and MR sequences.At 3T resolutions below 1.5mm isotropic are not commonly obtained in vivo (as a reference, a one hour acquisition is used in the human connectome project to a achieve a 1.25mm isotropic resolution(50)).One avenue to obtain higher resolution DWI data are novel 3D multi-slab acquisition with gradient or rf encoding (51,52) that have significant SNR improvements in respect to conventional 2D-SMS as used in this work.Here we have simply interpolated our 1.5mmDWI to the anatomical space and expect this to be sufficient to develop and validate QSM methods that account for microstructural effects in white matter (53).
QSM is gaining interest in the context of neurological disorders such as multiple sclerosis, PD and AD and other clinical applications such as hemorrhages or tumor imaging with iron oxide nanoparticles (54).For the latter applications, relaxation and susceptibility values in the form of, e.g., lesions in strategic locations can be easily added to the current phantom.Simulated data might be also relevant in the case of group studies of diseases, where differences in the deep grey matter nuclei were found (55,56) and the optimum QSM reconstruction parameters might improve the limits to detect those changes.Such question can be readily addressed with the digital phantom by changing the parameters of Eq. 1 for a given set of structures and test the QSM pipeline that can better quantify those changes.

Conclusion:
The presented realistic and modular phantom aims to enable researchers to optimize reconstruction as well as acquisition parameters.As such, the phantom served as a ground-truth for the QSM RC2.Its modular design allows adding microstructure effects a posteriori (14) as well as the inclusion of new nuisances such as hemorrhages or fine vessels with realistic relaxation and susceptibility properties.We foresee that this brain model will be an important tool for the evaluation of various processes associated with QSM processing and interpretation.

Appendix
On the data sharing collection: https://data.donders.ru.nl/login/reviewer-113366422/QRFk431i299BX-8bcY6ta6nQPP-MqSzG0DTIhkJrqBs the code used to create the phantom described in this paper, the various simulations and figures can be found.
Here we outline some of the main functionalities provided: -creation of a realistic magnetic susceptibility phantom; -simulation of a GRE data with a specific protocol; -evaluation of a QSM reconstruction using provided phantom; -adding microstructural information;

Creation of a realistic magnetic susceptibility phantom
Example code can be found on MacroCreateSusceptibilityPhantom.m.

Creation of simulated GRE data
Example code can be found in MacroCreateSimulationdata.m, with the main function being CreateSimulationData that takes as an input three Matlab structures.These Structures contain the information regarding the model parameters, the sequence parameters of the gradient echo sequence and the simulation parameters respectively.
The model parameters fields are: -Chimap_file -location of magnetic susceptibility map in ppm units and nifti format;; Additionally, MacroCreateSimulationdata allows the reproduction of figure 4.

QSM pipeline evaluation
The script MacroCreateQSMpipelineAndCompareHighResLowRes.m allows the creation of figures 5 and 6 of the current manuscript.Furthermore it shows how to create a data structure in order to perform a QSM reconstruction using SEPIA (sepiadocumentation.readthedocs.io/)and evaluate the output using the various metrics used in RC2 and introduced in the Methods section of this manuscript.By running the function EvaluateRecon_ChallengeFinalMetricsRelease that takes as input reconstruction file name and a structure with 4 fields: -Segment -Segmentation file at a matched resolution (output of CreateSimulationData); -chi_crop -Ground truth susceptibility map file obtained by k-space cropping (output of CreateSimulationData); -maskEroded -brain mask where evaluation is to be performed; -CalcMoment -calculated moment of the calcification performed on the high resolution susceptibility map; The output of the function contains the metrics described in methods section;

Comparison of simulations including real data to real data
The script MacroAddingMicrostructure contains the implementation of the code needed to generate Figure 9 using the already pre-processed DWI.The addition of the microstructure data is as described in Eq. 7.  27)) and doing the fitting described in Eq. 1. Model 2 mean susceptibility values were adhoc modification of those found in literature, while the R1 and R2* modulation were changed by plus and minus 10% respectively.

Figures
Figure 1 Diagram representing the process used to obtain the head segmentation: R1 map (obtained from the MP2RAGEME) was used to create an atlas-based segmentation using CBS tools; R2* were processed with a Frangi filter for vein segmentation; A semi-manual approach using ITK snap was used for segmentation of the deep gray matter nuclei based on the R2* and a susceptibility map computed using HEIDI, and the M0 map were used to segment the calcification.Finally, PETRA data was used to obtain air, bone, and tissue masks using a CT-based deep learning algorithm followed by manual ITK-snap.Then, the various tissue segmentations were fine-tuned using denoised R1 and R2* maps using manually defined thresholds.The various masks were combined to generate a whole-head segmentation with 16 different tissue types.
Figure 2 View of three slices in sagittal, coronal and axial directions of the two digital phantoms created for the QSM challenge 2.0.Top and Bottom maps were obtained using the parameters described on Table 1 for Model 1 and 2 respectively.presented in Table 1 and (c) finally adding the probabilistic modulation described in Eq. 2 and masking regions of error bound R2*. Green and yellow arrows highlight transitions between tissue types that get improved via the probabilistic approach applied to the tissue compartments.
Note that the probabilistic smoothing was only applied to the brain tissues and CSF, as a result, veins remain shape in the susceptibility map only with a reduced magnetic susceptibility.Blue arrow highlights regions where the large filed gradient masking approach was able to avoid abnormally large susceptibility values.

Figure 4 shows
Figure4shows Model 1 with the two different sequence parameters created by the proposed simulation toolbox (See Appendix 1 for code).The top row shows an example slice of the simulated data from the P2 protocol aimed at computing QSM in cortical grey and white matter.In this case, the longer echo time matched the T2* of those grey matter resulting in both significant signal decay in deep gray matter and a large number of phase wraps close to tissue air boundaries.The flip angle used (23 degrees) was set to increase T1 contrast in grey vs white The main function responsible by the creation of a new susceptibility phantom is, CreateOwnRealisticPhantom, which takes as an input a matlab structure with the various fields:-R1map_file -location of R1 map in s-1 in nifti format; -R2starmap_file -location of R1 map in s-1 in nifti format; -M0map_file -location of M0 map in arbitrary units and in nifti format; -Segmentation_file -Location of segmentation file in nifti format (each segmented tissue is associated with an integer); -ChiModulation_file -matlab file containing the various parameters relating to Equation 1 and shown on

-
R1map_file -location of R1 map in s-1 in nifti format; -R2starmap_file -location of R2* map in s-1 in nifti format;; -M0map_file -location of M0 map in arbitrary units in nifti format; -Segmentation_file -location of segmentation file in nifti format; -BrainMask_file = 'data/masks/BrainMask.nii.gz';Thesequence parameters fields describe the GRE contrast and are TR (repetition time, secs), TE (echo time), FlipAngle (flip angle, degrees).Finaly, the simulation parameters are :-Res -resolution of output in mm ; -B0 and B0_dir are the main static magnetic field strength (in Tesla) and its orientation in respect to the head model (if along z [0 0 1]); -PhaseOffset -multiplier term of a quadratic phase over the brain aimed at simulating the RF transmit phase (0 ensures no phase offset, 1 ensures a π phase difference inside the brain mask ;-Shimm -boolean variable to account B0 shimming with second order spherical harmonics (0 if no B0 shimms is applied, 1 -2nd order shimms are computed inside the brain mask and applied only when doing the simulations with the whole head model) -Output_dir -Directory where the simulation results will be written Running the function CreateSimulationData creates two directories inside Output_dir, SimulatedHR (where the native resolution simulation data is stored) and Simulated_1p0mm (where the downsampled data simulated data).For each protocol two simulations are performed: one with the full head susceptibility model, and for the brain masked susceptibility model [Eq.5].Simulation results are stored as 3D or 4D (depending on the number of echo times) Data[Extension]_magn.nii.gz and Data[Extension]_phase.nii.gzwhere the Extension field is either empty or is 'BrainExtracted'.Additionally the downsampled directory contains also the new brain mask (Brain.nii.gz), the downsampled Segmented nifti (FinalSegment.nii.gz) and the downsampled susceptibility maps (Chi_crop.nii.gz).

Figure 3
Figure 3 Intermediate stages of the in silico susceptibility head phantom creation: (a) traditional piece-wise constant approach, (b) modulated model as described in Eq. 1 with the values

Figure 4
Figure 4 Transverse slices of the simulated data of Model 1 using Protocol 1 (top row) and Protocol 2 (bottoms row).First two columns show magnitude and phase images, respectively, at the first echo time (where the different T1-weighting is clearly visible) and the last two columns show the magnitude and phase images associated with the last echo time of the respective protocol.

Figure 5
Figure 5 Transverse slices through derived field-maps associated with Protocol 1 data computed at the base resolution (top row) and after down-sampling to 1mm (bottom row), black highlight large background fields induced by air-tissue interfaces.The first and fifth columnsshows the field extracted from the complex signal when the whole-head and the brain-only models were used, respectively, to compute the frequency shift.The third column shows the local field computed after background field removal.The second, fourth and sixth columns show the differences relative to the corresponding ground-truth field distributions.Grey arrows highlight the incomplete background field removal that is exacerbated upon downsampling.The ground-truth field maps were computed using the forward dipole formulation (Eq.4) on the whole head (second column) and brain tissues alone (fourth and sixth columns) susceptibility models.It is clear that the nRMSE, once the down-sampling is performed, is dominated by partial volume effects in and around veins (noise pattern on both bottom 4 th and 6 th column) and imperfect background field removal.

Figure 6
Figure 6 (a) Coronal view of the susceptibility phantom with three locations highlighted (square: region in the middle of white matter; cross and plus: two regions close to the calcification).(b,c,d) Plots of the unwrapped phase at the three locations (b -cross, c -square and d -plus) as a function of echo time.Unwrapped phase on the high-resolution data (light grey) and lowresolution data (dark grey) are shown using the respective markers, while dashed lines corresponds to the fitted frequency for each point at each resolution and the continuous line shows the ground frequency evolutions.(e) Mean squared difference map between fitted phase (dashed line in panel b-d) and measured phase (after unwrapping) on the coronal slice, highlighting tissue bone interfaces as well as regions surrounding the calcification.

Figure 7
Figure 7 Shows various plots of the normalized RMSE as a functions of the regularization value (a,b,d,e).Plots (a) and (b) show the numerical performance of 4 different QSM algorithms when the input data was the high resolution simulated data with no additional noise or the downsampled 1mm data with added noise resulting in an image peak SNR of 100.Plots (d) and (e) show the performance of iLSQR and FANSI respectively for different types of input data: high resolution and 1mm data shown in solid and dashed lines respectively; without and with added noise shown in green and red.Panel (c) shows examples of the optimum nRMSE reconstructions of the 1mm dataset with peak SNR 100 using the 4 reconstruction algorithms tested in the manuscript.

Figure 8
Figure 8 (a-c) Plots of the various metrics as a function of the used regularization value for the iLSQR (continuous line) and FANSI (dashed line) algorithms.The different plots show: (a)

Figure 9
Figure 9 Shows the impact of adding a microstructure correction term to the simulated field map.First and second column show the simulated magnitude and phase data using whole brain susceptibility model at TE=20ms, while the third column shows the computed tissue frequency map (after brain masking and background field removal).First row shows the simulated data using Protocol 1 without the addition of the microstructure effects.The second row shows the impact of adding the microstructural effects as described in equation 7 (first column) both on the third echo time and the computed tissue frequency map.The third row shows, for visual comparison purposes, the experimental data acquired at 7T.

Table 1
Parameters used to create the two magnetic susceptibility head models released in the QSM challenge.The values in the three columns correspond to the parameters described in equation 1, the assumed mean magnetic susceptibility of the tissue, and the R2* (a tissue ) and R1 (a tissue ) modulation terms, respectively.Model 1 values were chosen from literature ( + (26), * (14) †