Hippocampal grey matter tissue microstructure does not explain individual differences in hippocampal-dependent task performance

Individual differences in scene imagination, autobiographical memory recall, future thinking and spatial navigation have long been linked with hippocampal structure in healthy people, although evidence for such relationships is, in fact, mixed. Extant studies have predominantly concentrated on hippocampal volume. However, it is now possible to use quantitative neuroimaging techniques to model different properties of tissue microstructure in vivo such as myelination and iron. Here we investigated whether performance on scene imagination, autobiographical memory, future thinking and spatial navigation tasks was associated with hippocampal grey matter tissue microstructure. MRI data were collected using a multi-parameter mapping protocol from a large sample of 217 young, healthy adult participants with widely-varying task performance. We found little evidence that hippocampal grey matter tissue microstructure was related to task performance. This was the case using different analysis methods (voxel-based quantification, partial correlations), when whole brain, hippocampal regions of interest, and posterior:anterior hippocampal ratios were examined, and across different participant sub-groups (divided by gender, task performance). Variations in hippocampal grey matter tissue microstructure may not, therefore, explain individual differences in performance on hippocampal-dependent tasks in young, healthy individuals.


Introduction
Variations in hippocampal structure within the healthy population have long been posited to influence performance on tasks known to be hippocampal-dependent, such as scene imagination, autobiographical memory recall, future thinking and spatial navigation. Extant studies have predominantly examined this relationship in terms of hippocampal grey matter volume. However, in reviewing the literature, Clark et al. (2020) found mixed evidence for an association between hippocampal grey matter volume and performance on tasks assessing these cognitive functions in healthy individuals. They then proceeded to examine this issue in-depth by collecting data from a large sample of 217 young, healthy, adult participants, but found little evidence that hippocampal grey matter volume was related to task performance.
It could be argued, however, that hippocampal volume is too blunt an instrument to consistently detect structure-function relationships in healthy people. By contrast, it is now possible to use quantitative neuroimaging techniques to model different properties of tissue microstructure (Weiskopf et al. 2015), using a multi-parameter mapping (MPM) quantitative neuroimaging protocol (Weiskopf et al. 2013;Callaghan et al. 2015;Callaghan et al. 2019).
Processing of these images (using the hMRI toolbox; Tabelow et al. 2019) results in four maps that are differentially (but not solely) sensitive to specific aspects of tissue microstructure: magnetisation transfer saturation (MT saturation), sensitive to myelination; proton density (PD), sensitive to tissue water content; the longitudinal relaxation rate (R1), sensitive to myelination, iron and water content (but primarily myelination); and the effective transverse relaxation rate (R2 * ), sensitive to tissue iron content. Extant studies have found relationships between myelination, iron and ageing (Draganski et al. 2011;Callaghan et al. 2014), verbal memory performance in older adults (Steiger et al. 2016), and meta-cognitive ability in young adults (Allen et al. 2017). However, as far as we are aware, no studies have investigated the relationship between hippocampal grey matter tissue microstructure and scene imagination, autobiographical memory recall, future thinking or navigation ability in healthy young people.
Consequently, this is what we sought to examine in the current study.
We used the large dataset (n = 217) from the Clark et al. (2020) study which comprised an MPM quantitative imaging protocol (Weiskopf et al. 2013;Callaghan et al. 2015;Callaghan et al. 2019), and cognitive task performance with wide variability. While aspects of these data (hippocampal volume, cognitive task performance) have been reported before (Clark et al. 2019(Clark et al. , 2020Clark and Maguire 2020), the tissue microstructure MRI data have not been published previously. The mixed literature relating to hippocampal volume and the dearth of hippocampal tissue microstructure studies made the formulation of clear hypotheses difficult.
As such, we focussed on conducting deep and wide-ranging data analyses to characterise any links between the microstructure measures and task performance in the same manner as Clark et al. (2020).

Participants
Two hundred and seventeen people (mean age 29.0 years ± 5.60) were recruited from the general population, 109 females and 108 males. The age range was restricted to 20-41 years old to limit the possible effects of ageing. Participants had English as their first language and reported no psychological, psychiatric or neurological health conditions. People with hobbies or vocations known to be associated with the hippocampus (e.g. licensed London taxi drivers) were excluded. All participants gave written informed consent and the study was approved by the University College London Research Ethics Committee (project ID: 6743/001).

Procedure
Participants completed the study over three visits -structural MRI scans were acquired during the first visit, and cognitive testing was conducted during visits two and three.

Cognitive tasks and statistical analyses
All tasks are published and were performed and scored as per their published use. Full descriptions are also provided in Clark et al. (2019Clark et al. ( , 2020 and Clark and Maguire (2020).

Details of the double scoring for the current study are provided in the Supplementary Methods
Tables S1-S4. In brief, there were four tasks: (1) Scene imagination was tested using the scene construction task (Hassabis et al. 2007) which measures the ability to mentally construct visual scenes. The main outcome measure is the "experiential index", while the sub-measures of interest are content scores and a rating of the spatial coherence of scenes. (2) Autobiographical memory recall was tested using the autobiographical interview (AI; Levine et al. 2002), which measures the ability to recall past experiences over four time periods from early childhood to within the last year. The two main outcome measures are the number of "internal" and "external" details. Internal details are of interest here because they describe the event in question (i.e. episodic details) and are thought to be hippocampal-dependent. Sub-measures are the separate content categories that comprise the internal details outcome measure, and also AI vividness ratings. (3) The Future thinking task follows the same procedure as the scene construction task, but requires participants to imagine three plausible future scenes involving themselves (an event at the weekend; next Christmas; the next time they will meet a friend).
(4) Navigation ability was assessed using the paradigm described by Woollett and Maguire (2010). A participant watches movie clips of two overlapping routes through an unfamiliar real town four times. The main outcome measure is the combined scores from the five sub-measures used to assess navigational ability which are: movie clip recognition, recognition memory for scenes, landmark proximity judgements, route knowledge where participants place scene photographs from the routes in the correct order as if travelling through the town, and the drawing of a sketch map. Data were summarised using means and standard deviations, calculated in SPSS v22.

MRI data acquisition and preprocessing
Three Siemens Magnetom TIM Trio MRI systems with 32 channel head coils were used to collect the structural neuroimaging data. All scanners were located at the same imaging centre, running the same software.
Whole brain images at an isotropic resolution of 800μm were obtained using a MPM quantitative imaging protocol (Weiskopf et al. 2013;Callaghan et al. 2015;Callaghan et al. 2019). This consisted of the acquisition of three multi-echo gradient acquisitions with either PD, T1 or MT weighting. Each acquisition had a repetition time, TR, of 25 ms. PD weighting was achieved with an excitation flip angle of 6 0 , which was increased to 21 0 to achieve T1 weighting. MT weighting was achieved through the application of a Gaussian RF pulse 2 kHz off resonance with 4ms duration and a nominal flip angle of 220 0 . This acquisition had an excitation flip angle of 6 0 . The field of view was 256mm head-foot, 224mm anterior-posterior (AP), and 179mm right-left (RL). The multiple gradient echoes per contrast were acquired with alternating readout gradient polarity at eight equidistant echo times ranging from 2.34 to 18.44ms in steps of 2.30ms using a readout bandwidth of 488 Hz/pixel. Only six echoes were acquired for the MT weighted volume to facilitate the off-resonance pre-saturation pulse within the TR. To accelerate the data acquisition, partially parallel imaging using the GRAPPA algorithm was employed in each phase-encoded direction (AP and RL) with forty integrated reference lines and a speed up factor of two. Calibration data were also acquired at the outset of each session to correct for inhomogeneities in the RF transmit field (Lutti et al. 2010;Lutti et al. 2012).
Data were processed using the hMRI toolbox (Tabelow et al. 2019) within SPM12 (www.fil.ion.ucl.ac.uk/spm). The default toolbox configuration settings were used, with the exception that correction for imperfect spoiling was additionally enabled (see also Callaghan et al. 2019). This processing results in the MT saturation, PD, R1 and R2 * maps which differentially reflect tissue microstructure (Fig. 1).
Each participant's MT saturation map was then segmented into grey and white matter probability maps using the unified segmentation approach (Ashburner and Friston 2005), but using the tissue probability maps developed by Lorio et al. (2016) and no bias field correction (since the MT saturation map shows negligible bias field modulation). The output grey and white matter probability maps were used to perform inter-subject registration using DARTEL, a nonlinear diffeomorphic algorithm (Ashburner 2007). The resulting DARTEL template and deformations were used to normalize the MT saturation, PD, R1 and R2 * maps to the stereotactic space defined by the Montreal Neurological Institute (MNI) template (at 1 x 1 x 1mm resolution), but without modulating by the Jacobian determinants of the deformation field in order to allow for the preservation of the quantitative values. Finally, a tissue weighted smoothing kernel of 4mm full width at half maximum (FWHM) was applied using the voxelbased quantification approach (VBQ; Draganski et al. 2011), which again aims to preserve the quantitative values.

Primary VBQ analyses
Our analyses followed the exact same procedures as detailed in Clark et al. (2020) except that here we assessed each of the tissue microstructure maps using VBQ (Draganski et al. 2011).
VBQ is a similar methodology to the voxel-based morphometry technique used to study grey matter volume (Ashburner and Friston 2000) but one that retains the quantitative values carrying information about the tissue microstructure.
First, we examined the relationship between hippocampal grey matter in each of the tissue microstructure maps and the main outcome measure for each of the cognitive tasks assessing scene imagination, autobiographical memory, future thinking and navigation. We then examined the associations between each of the sub-measures from these tasks and hippocampal grey matter in each of the four tissue microstructure maps. Statistical analyses were carried out using multiple linear regression models with cognitive task performance as the measure of interest, while including covariates for age, gender, total intracranial volume, and the different scanners. The dependent variable was the smoothed and normalised grey matter value from each tissue microstructure map. Whole brain VBQ analyses were carried out for each tissue microstructure map. Twotailed t-tests were used, with statistical thresholds applied at p < 0.05 family-wise error (FWE) corrected for the whole brain, and a minimum cluster size of 5 voxels.
We also performed region of interest (ROI) analyses on bilateral anterior, posterior and whole hippocampal masks using two-tailed t-tests. Voxels were regarded as significant when falling below an initial whole brain uncorrected voxel threshold of p < 0.001, and then a small volume correction threshold of p < 0.05 FWE corrected for each mask, with a minimum cluster size of 5 voxels.

Auxiliary analyses using extracted hippocampal microstructure measurements
These auxiliary analyses were performed using the hippocampal grey matter tissue microstructure measurements that were extracted for each participant from each tissue microstructure map using 'spm_summarise'. Whole, anterior and posterior bilateral anatomical hippocampal masks were applied to each participant's smoothed and normalised grey matter MT saturation, PD, R1, and R2 * maps, and the average value within each mask extracted. We also calculated each participant's posterior:anterior hippocampal ratio for each tissue microstructure measurement (Poppenk and Moscovitch 2011).
We first performed partial correlations between the extracted tissue microstructure metrics and the cognitive task performance measures. Then, we investigated the effects of gender, used median split direct comparisons and partial correlations, and compared the best and worst performers (the top and bottom 10%). As in Clark et al. (2020), statistical correction was made using false discovery rate (FDR; Benjamini and Hochberg 1995), with a FDR of p < 0.05 allowing for 5% false positive results across the tests performed, and calculated using the resources provided by McDonald (2014). Age, gender, total intracranial volume and MRI scanner were included as covariates.

Validation across the tissue microstructure maps
The maps are not completely independent since they are estimated from the same three multiecho gradient echo acquisitions. As such, relationships exist between the tissue microstructure maps, and a finding in one map can be used to validate a finding in another. For example, if a positive association is observed between task performance and the hippocampus in the MT saturation map, then a corresponding positive association would also be expected in the hippocampus when using the R1 map (since increased macromolecular content will also increase R1), and a corresponding negative association would be expected in the PD map (due to a reduction in free water content as the macromolecular content increases; Mezer et al., 2013). Consequently, following the finding of a relationship in one map, we also examined whether corresponding relationships existed in the other maps, even at a more liberal statistical threshold (p < 0.001 uncorrected). Observing related associations across multiple maps was deemed supportive of a true result, while finding a correlation in only one of the tissue microstructure maps was regarded as unreliable.

Cognitive task performance
A summary of the outcome measures for the cognitive tasks is shown in Table 1. A wide range of scores was obtained for all variables with the exception of navigation movie clip recognition, where performance was close to ceiling.
Insert Table 1 about here

Primary VBQ analyses
As our main focus was on the relationship between cognitive task performance and hippocampal grey matter tissue microstructure, here we report findings pertaining to only the hippocampusany regions identified outside the hippocampus are reported in the Supplementary Results. No significant relationships between cognitive task performance and hippocampal grey matter tissue microstructure were identified for any of the main outcome measures of the tasks assessing scene imagination, autobiographical memory, future thinking or navigation. This was also the case for the sub-measures of these tasks.

Hippocampal ROI VBQ
No relationships between cognitive task performance and hippocampal grey matter tissue microstructure were identified using any of the hippocampal masks for the main outcome measures from the tasks examining scene imagination, autobiographical memory, future thinking or navigation. Considering the task sub-measures, it was either the case that they were not associated with any measure of hippocampal grey matter tissue microstructure, or the results were not validated across the tissue microstructure mapscorrelations associated with only one map are reported in the Supplementary Results.

Auxiliary analyses using extracted hippocampal microstructure measurements
No relationships were identified between any of the extracted hippocampal grey matter tissue microstructure metrics and performance for any of the main or sub-measures of the tasks (see Supplementary Results Tables S5-S8 for details). These partial correlation findings therefore support those of the primary VBQ analyses.
Similarly, there were no significant effects of gender (Supplementary Results Tables   S9-S16), no significant results using median split direct comparisons (Supplementary Results   Tables 17-21) and partial correlations (Supplementary Results Tables S22-S29), and when the best and worst performers were compared (Supplementary Results Tables 30-34).

Discussion
In this study we moved beyond hippocampal grey matter volume to examine hippocampal grey matter tissue microstructure, including quantitative neuroimaging biomarkers of myelination and iron, and whether they were linked with performance on tasks known to be hippocampaldependent. We found little evidence for any associations between these measures and scene imagination, autobiographical memory recall, future thinking and spatial navigation. This is despite having a large sample with wide-ranging performance on the cognitive tasks, using different analysis methods (voxel-based quantification, partial correlations), examining whole brain and hippocampal regions of interest, and different participant sub-groups (divided by gender, task performance). Variations in hippocampal grey matter tissue microstructure, seem not, therefore, to be significantly related to hippocampal-dependent task performance in young, healthy individuals.
Quantitative MRI and the examination of tissue microstructure is a relatively new area of study and therefore comparison of our results with other studies in that domain is difficult.
However, changes in tissue microstructure with ageing have been documented (e.g. Draganski et al. 2011;Callaghan et al. 2014;Carey et al. 2018), and hippocampal grey matter tissue microstructure has also been associated with metacognitive ability (Allen et al. 2017). Grey matter tissue microstructure, including that of the hippocampus, has, therefore, been previously correlated with individual differences, even if none were identified here. While null results can, of course, be difficult to interpret and an absence of evidence is not necessarily evidence of absence, we believe the depth and breadth of our analyses permit confidence in these results, aligning as they do with other null findings relating to hippocampal volume (e.g. Maguire et al. 2003;Van Petten 2004;Weisberg et al. 2019;Clark et al. 2020). This is not to say that relationships between hippocampal structure and the cognitive

Compliance with ethical standards
Conflicts of interest The authors declare they have no conflict of interest.  For the MT saturation and PD maps this is as percent units (p.u.) and for the R1 and R2 * maps this is per second (s -1 ).

Supplementary Methods
Double scoring was performed on 20% of the cognitive data. We took the most stringent approach to identifying across-experimenter agreement. Inter-class correlation coefficients with a two-way random effects model looked for absolute agreement among the experimenters. For reference, a score of 0.8 or above is considered excellent agreement beyond chance. Note. Inter-class correlation coefficients from a two way random effect model looking for absolute agreement for each content score and for the quality ratings. Four experimenters scored the whole data set (n = 217 participants, 1519 individual scenes) with double scoring performed on 20% of the data (n = 44 participants, 308 scenes) proportionally for each original experimenter. Note. Inter-class correlation coefficients from a two way random effects model looking for absolute agreement for each score on the autobiographical interview. Three experimenters scored the whole data set (n = 217 participants, 1085 individual memories) and double scoring was performed 20% of the data (n = 43 participants, 215 individual memories) proportionally for each original experimenter. Note. Inter-class correlation coefficients from a two way random effects model looking for absolute agreement for each content score and for the quality ratings. Four experimenters scored the whole data set (n = 217 participants, 651 individual future scenes) with double scoring performed on 20% of the data (n = 44 participants, 132 future scenes) proportionally for each original experimenter. Note. Inter-class correlation coefficients from a two way random effects model looking for absolute agreement for each score on the navigation sketch maps. Three experimenters scored the whole data set (n = 217) and double scoring was performed on 20% of the data (n = 42 participants) proportionally for each original experimenter.

Primary analyses: VBQ results outside of the hippocampus
A small number of relationships were identified between cognitive task performance and grey matter microstructure outside of the hippocampus (when using a statistical threshold of p < 0.05 FWE whole brain corrected).
Within the scene construction outcome measures, one potential relationship was observed; a positive association between scene construction sensory details and 15 voxels in the left middle occipital cortex in the PD map (peak coordinates = -42 -87 17, peak t = 5.30, p FWE whole brain corrected = 0.018). However, no corresponding negative associations with MT or R1, nor any corresponding relationships with R2 * , were identified, even when reducing the statistical threshold to p < 0.001 uncorrected.
For future thinking, two potential associations were found. First, a positive relationship was observed in the PD map between 36 voxels in the right superior occipital gyrus and the future thinking experiential index (peak coordinates = 25 -90 33, peak t = 5.51, p FWE whole brain corrected = 0.007). Reducing the statistical threshold to p < 0.001 uncorrected identified a corresponding negative association between the right superior occipital gyrus and the future thinking experiential index in the R2 * map (cluster size = 808, peak coordinates = 25 -89 35, peak t = 4.53, p uncorrected < 0.001). Overall, higher water content in the right superior occipital gyrus seems to be associated with greater future thinking experiential index scores.
Second, a positive correlation was found in the R1 map between 5 voxels in the left middle cingulate cortex and the future thinking experiential index (peak coordinates = 0 -20 45, peak t = 5.19, p FWE whole brain corrected = 0.031). Reducing the statistical threshold to p < 0.001 uncorrected identified a corresponding negative association between the left middle cingulate cortex and the future thinking experiential index in the PD map (cluster size = 28, peak coordinates = -1, -20, 45 peak t = 3.77, p uncorrected < 0.001). An increase in macromolecular content and corresponding reduction in free water content in the middle cingulate cortex, may, therefore, be associated with greater future thinking experiential index scores.
Considering navigation, for the navigation movie clip recognition task, two clusters were identified on the edge of the right occipital pole in both the PD and the R2 * maps. In the PD map, these relationships were positive (Cluster 1: size = 7 voxels, peak coordinates = 32 -95 15, peak t = 5.23, p = 0.024; Cluster 2: size = 10 voxels, peak coordinates = 26 -97 20, peak t = 5.21, p = 0.026), while in the R2 * map the corresponding negative relationships were observed (Cluster 1: size = 40 voxels, peak coordinates = 32 -95 15, peak t = 5.55, p = 0.006; Cluster 2: size = 18 voxels, peak coordinates = 25 -98 19, peak t = 5.48, p = 0.009). A decrease in iron and corresponding increase in free water content in the right occipital pole, may, therefore, be related to higher navigation movie clip recognition performance.
For the navigation scene recognition task, there was a positive relationship between performance and 36 voxels in the right cuneus in the R2 * map (peak coordinates = 17 -60 10, peak t = 5.83, p FWE whole brain corrected = 0.002). However, no corresponding associations in the MT saturation or PD or R1 maps, were identified in the right cuneus, even when reducing the statistical threshold to p < 0.001 uncorrected.

Hippocampal ROI VBQ: significant results that were not validated across the tissue microstructure maps
Within the scene construction sub-measures one potential relationship was identified. A negative association was found between the scene construction spatial coherence index and 77 voxels in the left hippocampus in the PD map when using the bilateral posterior hippocampal mask (peak coordinates = -19 -41 4, peak z = 3.79, p FWE posterior hippocampus ROI corrected = 0.033).
However, this relationship was not significant when correcting for the bilateral whole hippocampal mask (p FWE whole hippocampus ROI corrected = 0.054). In addition, no corresponding associations were identified between the scene construction spatial coherence index scores and the hippocampus in the MT saturation, R1 or R2 * maps, even when reducing the statistical threshold to p < 0.001 uncorrected.
Within the autobiographical memory sub-measures, three potential associations were observed. First, a positive association was identified between AI emotion and a cluster of 45 voxels in the left hippocampus in the PD map when using the bilateral anterior hippocampal mask (peak coordinates = -26 -9 -27, peak z = 3.64, p FWE anterior hippocampus ROI corrected = 0.033).
However, this relationship was not significant when correcting for the whole hippocampus mask (p FWE whole hippocampus ROI corrected = 0.073). In addition, no corresponding associations were observed in the hippocampus when using the MT saturation R1 or R2 * maps, even when reducing the statistical threshold to p < 0.001 uncorrected.
Second, a negative association was observed between AI vividness ratings and a cluster of 37 voxels in the right hippocampus in the MT saturation map when using the bilateral whole hippocampal mask (peak coordinates = 34 -20 -12, peak z = 4.11, p FWE whole hippocampus ROI corrected = 0.018), split approximately equally between the anterior and posterior hippocampal masks (anterior cluster: cluster size = 21 voxels, peak coordinates = 34 -20 -12, peak z = 4.11, p FWE anterior hippocampus ROI corrected = 0.008; posterior cluster: cluster size = 16 voxels, peak coordinates = 34 -21 -12, peak z = 3.87, p FWE posterior hippocampus ROI corrected = 0.027). However, no corresponding associations with the hippocampus were found in the R1, PD or R2 * maps, even when reducing the statistical threshold to p < 0.001 uncorrected.
AI vividness was also negatively associated with a cluster of 56 voxels in the left hippocampus in the PD map following correction for the bilateral whole hippocampal mask (peak coordinates = -28 -27 -9, peak z = 4.12, p FWE whole hippocampus ROI corrected = 0.014), localised to the posterior hippocampus (p FWE posterior hippocampus ROI corrected = 0.008). However, no corresponding associations were found between AI vividness and the hippocampus in the MT saturation, R1 or R2 * maps, even when reducing the statistical threshold to p < 0.001 uncorrected.
Within the future thinking sub-measures, two potential relationships were identified. A positive association was observed between future thinking spatial references and 33 voxels in the left hippocampus in the R1 map when using the bilateral posterior hippocampal mask (peak coordinates = -18 -22 -21, peak z = 3.75, p FWE posterior hippocampus ROI corrected = 0.032). However, this relationship was not significant when correcting for the bilateral whole hippocampus mask (p FWE whole hippocampus ROI corrected = 0.052). Furthermore, no corresponding associations were observed in the hippocampus in any of the other tissue microstructure maps, even when reducing the statistical threshold to p < 0.001 uncorrected.
Second, a negative association was found between the future thinking spatial coherence index and a cluster of 54 voxels in the right posterior hippocampus in the MT saturation map when using the bilateral posterior hippocampus mask (peak coordinates = 35, -29, -13, peak z = 3.76, p FWE posterior hippocampus ROI corrected = 0.039). However, this relationship was not significant when correcting for the bilateral whole hippocampus mask (p FWE whole hippocampus ROI corrected = 0.064) and the corresponding associations were not observed in the hippocampus when using any of the other tissue microstructure maps, even when reducing the statistical threshold to p < 0.001 uncorrected.