Recall bias during adolescence: gender differences and associations with depressive symptoms

Background: There is a sharp increase in depression in females in mid-adolescence, but we do not understand why. Cognitive theories suggest that people with depression have negative biases in recalling self-referential information. We tested whether recall biases were more negative in girls in early and mid-adolescence and were associated with depressive symptoms. Methods: 315 young and 263 mid-adolescents (11-12 and 13-15 years) completed a surprise test, assessing recall of social evaluation about the self (self-referential) or another person (other-referential). The short Mood and Feelings Questionnaire measured depressive symptoms. We tested the effects of condition (self-referential/otherreferential), valence (positive/negative), gender, and age group on correct recall (hits) and associations with depressive symptoms. Results: There was no evidence for gender or age differences in positive or negative self-referential recall. Selfreferential positive hits were negatively associated with depressive symptoms (adjusted coefficient=-0.38, 95% CI=-0.69–0.08, p=0.01). Self-referential negative hits were positively associated with depressive symptoms (adjusted coefficient=0.45, 95% CI=0.15-0.75, p=0.003), and this association was stronger in females (adjusted interaction p=0.04). Limitations: The reliability and validity of the recall task are unknown. We cannot provide evidence of a causal effect of recall on depressive symptoms in this cross-sectional study. Conclusions: Adolescents who recalled more self-referential negative and fewer self-referential positive words had more severe depressive symptoms. Females did not demonstrate more recall biases, but the association between self-referential negative hits and depressive symptoms was stronger in females. Negative self-referential recall may be a risk factor for depressive symptoms and is a good candidate for longitudinal studies.


Introduction
Throughout adulthood, women are twice as likely to experience depression as men (Salk et al., 2017). This gender difference emerges due to a sharp increase in the incidence of depression in girls relative to boys during mid-adolescence (Hankin et al., 1998;Kwong et al., 2019). We do not understand why this increase in depression occurs but, in order to prevent it, we must identify modifiable risk factors.
According to classic cognitive models of depression, depressed individuals have negative thoughts and beliefs about themselves and the world, which result from early experiences and influence information processing (how you interpret, learn from, and remember your environment; Beck, 2008Beck, , 1979Roiser et al., 2012). Consistent with this, there is evidence that people with depression have reduced positive, or increased negative, information processing, measured using cognitive tasks (Beck, 2008;Roiser and Sahakian, 2017).
Socialisation and gender inequalities during childhood may cause girls to have more negative thoughts and beliefs about themselves (selfschema), and more negative information processing, compared to boys (Bone et al., 2020). Negative self-schema and information processing biases may be more prevalent in girls from early adolescence and contribute to the increase in the incidence of depression (Bone et al., 2020). Alternatively, negative information processing biases may become more prevalent in girls from mid-adolescence, alongside increases in depression.
Memory is an important aspect of information processing. Selfreferential memory is usually tested by asking individuals to rate whether positive and negative personality characteristics describe themselves, followed by a surprise recall test in which participants are asked to remember as many characteristics as possible. Recall biases may be consistent with self-schema, as information about the self is preferentially remembered compared to information about others (the self-reference effect; Rogers et al., 1977;Symons and Johnson, 1997). Healthy adolescents generally recall more positive than negative self-referential information, which may reduce their risk of depression Cole et al., 2014;Connolly et al., 2016;Dainer-Best et al., 2018;Fattahi Asl et al., 2015;Hammen and Zupan, 1984;Kuiper and MacDonald, 1982;Prieto et al., 1992;Taylor and Ingram, 1999;Timbremont and Braet, 2004).
Self-referential recall may be particularly important in adolescence because the social self-concept develops during this period. Adolescents become more aware of, and concerned with, other people's opinions of them (Parker et al., 2006;Sebastian et al., 2008). Self-evaluations become more negative and self-esteem declines sharply, particularly in girls (Robins and Trzesniewski, 2005;van der Aar et al., 2018). Negative self-referential recall biases may lead to increased depressive symptoms in adolescence, and this risk factor may be more prevalent in girls (Bone et al., 2020). It is unclear whether this risk factor would be present from early adolescence or emerge during adolescence.
However, a review did not find consistent evidence of recall biases in adolescent depression (Platt et al., 2017). Depressive symptoms have been associated with poorer recall of positive information, greater recall of negative information, or a combination of both biases (Alloy et al., 2012;Asarnow et al., 2014;Fattahi Asl et al., 2015;Gençöz et al., 2001;Orchard and Reynolds, 2018;Speed et al., 2016;Woolgar and Tranah, 2010). Others have found no evidence for an association between recall biases and depressive symptoms (Dainer-Best et al., 2018;Holt et al., 2016;Reid et al., 2006). The inconsistent evidence may be due to methodological limitations. Many studies use small samples and divide participants into groups according to presence of symptoms or risk of depression, limiting statistical power. It is generally accepted that depression is a continuum (Hankin et al., 2005). Using depressive symptoms continuously in analyses should increase the sensitivity to detect any associations with recall.
Very few studies have tested whether negative recall biases are more prevalent in girls during adolescence. A longitudinal cohort study found evidence that girls had more positive recall than boys around age 13, but there were no gender differences in negative recall (McArthur et al., 2019). Changes in recall bias (from 13 to 19 years) did not differ according to gender. This study did not measure depressive symptoms so could not test whether they were associated with recall bias (McArthur et al., 2019).
In this study, we addressed these issues using a novel recall task in a large cross-sectional study (n=578). Adolescents were recruited from two age groups (young adolescents aged 11-12 years, mid-adolescents aged 13-15 years) to study recall biases before and after the gender difference in depression begins to emerge (Kwong et al., 2019). Depressive symptoms ranged from mild to severe. As previous findings with the traditional recall task are inconsistent, we developed a novel test of recall of social evaluation. Social evaluation was positive and negative personality traits, seen in a task where participants learned whether they or another person were liked or disliked. We measured recall of positive and negative words which were seen describing the self (self-referential) or another person (other-referential).
We aimed to test whether negative biases in recalling self-referential social evaluation were more prevalent in girls and were associated with depressive symptoms. To do this, we tested hypotheses relating to overall recall biases, gender differences, and associations with depressive symptoms. We hypothesised that, overall, adolescents would have a positive self-referential bias, recalling more self-referential than other-referential words (hypothesis 1), and more self-referential positive than self-referential negative words (hypothesis 2). We expected girls to demonstrate more negative self-referential recall biases than boys, recalling fewer self-referential positive and more negative words (hypothesis 3). We hypothesised that this gender difference would be present from early adolescence, so would not differ across age groups (hypothesis 4). We also expected positive self-referential recall biases to be negatively associated with depressive symptoms (hypothesis 5). Finally, we hypothesised that this association with depressive symptoms would be consistent across genders and age groups (hypothesis 6).

Participants
Participants were recruited from two age groups, Year 7 (11-12 years old) and Years 9-10 (13-15 years old), from eight diverse mixed-gender secondary schools across London. We sampled from two separate groups, maximising power to test gender differences before and after the age at which rates of depression start increasing (Kwong et al., 2019). To show a minimal difference of 0.4 standard deviations in recall between males and females (α=0.05, power=80%), a sample of 320 adolescents was needed. To test gender differences within both age groups, we aimed to recruit 640 participants in total. There were no restrictions on whether adolescents had any mental or physical health problems or were receiving psychotropic medication or psychological therapy.

Ethical approval
Ethical approval was obtained from University College London (project 3453/001). Informed consent was provided by all parents/ carers of participants and informed assent was provided by all participants. Participants' parents provided informed opt-in or opt-out consent, dependent on the school their child was attending. Only seven parents chose to opt-out (2% of those contacted). All procedures complied with the ethical standards of the relevant committees on human experimentation, the Helsinki Declaration (2008 revision), and the General Data Protection Regulation.
Participants could opt-in to a prize draw for a £50 Amazon voucher after completing questionnaires at home.

Surprise recall test
Incidental memory was assessed using a surprise recall test. This differed to previous tests which ask participants whether personality characteristics describe themselves, then measure recall of words classified as self-referential. In this novel task, we tested recall of personality descriptors previously seen in a social evaluation learning task. This method differentiated recall of self-referential and other-referential information from social interactions. This allowed us to test if recall of all social evaluation was associated with gender and depressive symptoms, or whether associations were specific self-referential information.
The social evaluation learning task was a two-alternative forced choice task based on probabilistic stimulus-reward learning tasks (Button et al., 2015). Participants learnt whether a person was liked or disliked by a computer character. Learning occurred in two conditions: about the participant themselves (self-referential) or about another person (other-referential). On each trial, a positive and negative word pair was presented (e.g. funny/grumpy). Participants were asked to choose the word which best corresponded to what the character thought about them (self-referential) or the other person (other-referential). Participants received probabilistic feedback about whether this choice was correct and used trial and error to learn whether the character liked or disliked them (or the other person) over 20 trials (feedback contingency 80%; Fig. 1). For each character, one of two social rules was learnt: person is liked or disliked by the character. There were thus four blocks in this task: self-like, self-dislike, other-like, and other-dislike. Twenty word pairs were seen for the self, and 20 for the other person, with each word pair seen twice (once each in the like and dislike blocks). Performance on this task will be modelled and published separately.
Personality descriptors were emotive adjectives describing trait characteristics (e.g. cool/boring, funny/grumpy, generous/greedy). Positive and negative words were selected from databases according to their age of acquisition (Brysbaert and New, 2009;Grühn, 2016;Kučera and Francis, 1967;Leech et al., 2014;Warriner et al., 2013). The oldest mean age of acquisition of any included word was 8.78 years (SD=1.99). Positive and negative words were paired, matched firstly on age of acquisition. We also aimed to pair semantically linked words, minimise differences in psycholinguistic parameters (number of syllables, usage frequency, meaningfulness, familiarity, arousal), and maximise differences in likeableness, valence, and desirability ratings.
After a delay of approximately 4mins participants were asked to remember as many personality descriptors as possible in the surprise recall task. They were given 2mins to type responses. A countdown timer appeared for the final 30s. See Supplementary Figure 1 for further details. Misspelled words that resembled correct responses were categorised as correct to ensure that spelling errors did not bias accuracy. Number of self-referential and other-referential positive and negative words accurately recalled (hits), and positive and negative incorrect responses (false alarms) were calculated.

Depressive symptoms
Participants completed the Mood and Feelings Questionnaire (short version; SMFQ), a 13-item self-report measure of depressive symptoms over the last two weeks (Angold et al., 1995). Items were rated on a scale of 0-2 (total 0-26), with higher scores indicating greater severity. Although the SMFQ is not a diagnostic measure, scores of 12 or higher may indicate the presence of depression. Missing responses were imputed for participants who responded to 10 or more questions using each individual's mean SMFQ score (n=111, 19%).

Confounders
Participants completed an abbreviated nine-item version of the Raven Standard Progressive Matrices Test (non-verbal IQ score; Bilker et al., 2012). Additional confounders were collected through a parental questionnaire. All parents were asked to complete this questionnaire, but response rates were low (n=340, 59%). Analyses were repeated controlling for additional confounders (ethnicity, English as a first language, dyslexia, autism spectrum disorders, parental education, maternal depression, paternal depression) in the Supplement.
We also intended to adjust for pubertal stage. Following classroom data collection, participants were asked to complete the Pubertal Development Scale (PDS; Petersen et al., 1988) at home. Only 117 (20%) participants completed the PDS. Analyses including pubertal stage are in the Supplement.

Procedure
Data collection was computerised and completed online using Gorilla (www.gorilla.sc). It took place with groups of 2-31 adolescents in classrooms using computers, laptops, or tablets. After providing informed assent, participants completed a battery of measures, intended for use in several studies. Participants first completed the social evaluation learning task, followed by the Raven Standard Progressive Matrices Test, and the surprise recall task. The SMFQ was then completed, followed by other questionnaires (Affective Reactivity Index, Revised Children's Anxiety and Depression Scale, Dysfunctional Fig. 1. Social Evaluation Learning task. An example of two trials from a self-referential block, in which the computer character is called Sam and the participant is learning what Sam thinks of them. After viewing a fixation cross, the participant was presented with a positive and negative word pair and instructed to choose the word which best corresponded with what Sam thought about them. They then received feedback about whether their choice was correct (green tick) or incorrect (red cross). Participants used trial and error to learn whether the character liked or disliked them over 20 trials. In the first trial shown here, the participant selected the positive word, which was correct. In the second trial, the participant chose the negative word, which was incorrect. Both of these trials show true (as opposed to misleading) feedback. To prevent ceiling effects, feedback contingency was set at 80%, so that 'correct' responses received an 8:2 ratio of positive to negative feedback and 'incorrect' responses received an 8:2 ratio of negative to positive feedback.
Attitude Scale, Health and Social Risks Questionnaire, Adolescent Social Reward Questionnaire, Children's Rejection Sensitivity Questionnaire). After classroom data collection, participants were emailed a link to complete questionnaires with more sensitive content at home (PDS, Strengths and Difficulties Questionnaire, Olweus Bully/Victim Questionnaire). We decided a priori which measures would be analysed in this study.

Statistical analyses
Analyses were performed using Stata 16 (StataCorp, 2019). As we aimed to compare the influence of gender in each age group, all descriptive statistics were presented separately for each subgroup. At this stage, we found no evidence that false alarms differed according to age group or gender, and false alarms were not associated with depressive symptoms, so they were not analysed further (Table 1).

Negative binomial mixed models
There were four types of hits: self-referential positive, self-referential negative, other-referential positive, and other-referential negative. In order to analyse such data, analysis of variance (ordinary least squares, under Gaussian assumptions) would often be used, testing whether recall differed according to various factors. However, given that hits were count variables which were positively skewed and over-dispersed, negative binomial mixed models were considered more appropriate. The four types of recall were clustered within each individual, with total number of hits as the dependent variable, and a random intercept for participant to account for clustering. Task conditions (self-/otherreferential, positive/negative), gender, age group, and confounders (continuous age within each age group, school, testing group size, nonverbal IQ score, and positive and negative false alarms) were estimated as fixed effects.
We used negative binomial mixed models to calculate a hits ratio as the effect estimate, which represents the number of hits in one category relative to another (e.g. the ratio of negative to positive hits). A hits ratio larger than one meant that hits were lower in the reference category (e. g. more negative than positive hits). All models are presented before and after adjustment for confounders.

Recall biases
Our first question was whether hits differed according to word valence (positive/negative) and the condition in which words were learned (self-referential/other-referential; hypothesis 1). We included condition and valence as independent variables with hits as the dependent variable. Next, we added an interaction between condition and valence, to test whether the association between valence and recall differed for self-referential versus other-referential words (hypothesis 2).

Gender differences
Next, we examined gender differences in recall (hypothesis 3). We tested a three-way interaction between gender, condition and valence with hits as the dependent variable, and report the two-way interactions between these variables. To assess whether gender differences were consistent across age groups, we tested a four-way interaction between age group, gender, condition and valence with hits as the dependent variable (hypothesis 4). As our aim was to compare the influence of gender in each age group, we only report the lower level (two-way and three-way) interactions which include gender. Where there was evidence of an interaction, we examined associations with hits separately for each subgroup. Additionally, to check that gender differences in hits were not explained by depressive symptoms, we added depressive symptoms to the negative binomial mixed models (Supplement).

Associations with depressive symptoms
Finally, we examined whether recall was associated with depressive symptoms (hypothesis 5). Linear regression tested whether selfreferential positive, self-referential negative, other-referential positive, and other-referential negative hits were associated with depressive symptoms (SMFQ score; continuous dependent variable). For this analysis, all task parameters were included in a single model to adjust for overall performance. This model was adjusted for age group and gender in addition to other confounders.
For each type of hit associated with depressive symptoms, we tested whether the association differed according to age group and gender (hypothesis 6). We added a three-way interaction between hits, age group, and gender to the linear regression model with depressive symptoms as the dependent variable. We also included two-way interactions between hits and age group, and hits and gender.
There was evidence that depressive symptoms were higher in midthan young adolescents (coef=1.13, 95% CI=0.20 to 2.05, p=0.02), and depressive symptoms were higher in females than males (coef=2.22, 95% CI=1.30 to 3.15, p<0.001). There was no evidence of an interaction between age group and gender on depressive symptoms (interaction p=0.10). Although the evidence for this interaction missed statistical significance (p=0.05), I conducted the planned linear contrasts because of my a priori hypotheses. As predicted, depressive symptoms were higher in females in both age groups, and the gender difference was larger in the older group (young adolescents coef=1.47, 95% CI=0.23 to 2.70, p=0.02; mid-adolescents coef=3.03, 95% CI=1.65 to 4.41, p<0.001). Note. Young adolescents were recruited from Year 7 (11-12 years old) and midadolescents were recruited from Years 9-10 (13-15 years old). Age in years was missing for n=2 young adolescents. Gender was missing for n=10 young adolescents and n=1 mid-adolescent.

Hypothesis 2.
There was no evidence for an interaction between valence and condition (adjusted p=0.25). Participants made more negative than positive self-referential and other-referential hits (Table 1).

Hypothesis 3.
There was evidence for a two-way interaction between gender and condition (adjusted p=0.04). Males made more selfreferential than other referential hits, whereas females did not show this self-reference effect (Table 2). There was no evidence for a two-way interaction between gender and valence (adjusted p=0.99), as both males and females made more negative than positive hits (Table 2). There was also no evidence for a three-way interaction between gender, condition, and valence on hits (adjusted p=0.87; Table 2).
Hypothesis 4. Next, we tested whether the gender differences in recall differed across age groups. There was no evidence for a two-way interaction between age group and gender on total hits (adjusted p=0.10). In both age groups, females made more hits than males (Table 2). However, there was weak evidence for a three-way interaction between age group, gender and condition (adjusted p=0.05). Young adolescent females made slightly fewer self-referential than other-referential hits but, in all other groups, more self-referential than other-referential hits were made (Table 2; Fig. 2). There was no evidence that the number of positive versus negative hits differed according to age group and gender (adjusted p=0.43). All groups made more negative than positive hits (Table 2; Fig. 2). Finally, there was no evidence of a four-way interaction between age group, gender, condition and valence (adjusted p=0.42; Fig. 2).
Adjusting for depressive symptoms did not substantially alter these findings (Supplement).

Associations with depressive symptoms
Hypothesis 5: There was evidence for an association between positive and negative self-referential hits and depressive symptoms (Table 3). For each additional self-referential negative hit, SMFQ score increased by 0.45 points (95% CI=0.15 to 0.75, p=0.003 adjusted). In contrast, selfreferential positive hits were negatively associated with depressive symptoms. For each additional self-referential positive hit, SMFQ score decreased by 0.38 points (95% CI=-0.69 to -0.08, p=0.01 adjusted).

Hypothesis 6.
Associations between self-referential hits and depressive symptoms did not differ across age groups (adjusted interactions: positive p=0.57; negative p=0.41). The association between selfreferential positive hits and depressive symptoms also did not differ Table 2 Unadjusted and adjusted negative binomial mixed models testing the effect of gender, age group, condition (whether words were learnt in relation to the self or another person) and valence (whether words were positive or negative) on the total number of hits.  Bone et al. according to gender (adjusted interaction p=0.47). However, the association between self-referential negative hits and depressive symptoms was larger in females (adjusted coef=0.85, 95% CI=0.36 to 1.34) than in males (adjusted coef=0.27, 95% CI=-0.13 to 0.67). There was weak evidence for this interaction (adjusted p=0.04). This gender difference in the association between self-referential negative hits and depressive symptoms was present across age groups. There was no evidence for three-way interactions between age group, gender and self-referential hits on depressive symptoms (adjusted: positive p=0.52; negative p=0.30).

Discussion
Following a social evaluation learning task, adolescents were asked to recall this self-referential and other-referential social evaluation. Consistent with our first hypothesis, most adolescents better recalled self-referential than other-referential words, demonstrating a selfreferential bias. However, young adolescent girls (11-12 years) recalled fewer self-referential than other-referential words, which was unexpected. We hypothesised that adolescents' self-referential bias would be positive, with better recall of self-referential positive than selfreferential negative words (hypothesis 2). However, adolescents recalled more negative than positive words in both self-referential and otherreferential conditions. Although we expected girls to demonstrate more negative self-referential recall biases than boys (hypothesis 3), we found no other evidence for gender differences in recall in either age group (hypothesis 4).
As predicted in hypothesis 5, more severe depressive symptoms were associated with a decrease in self-referential positive recall and an increase in self-referential negative recall. These associations were similar across early and mid-adolescence, as outlined in hypothesis 6. However, contrary to hypothesis 6, the association between self-referential negative recall and depressive symptoms was more pronounced in girls than boys.
We found evidence of enhanced memory for self-referential information, as previously shown in children (Cunningham et al., 2014) and adults (Symons and Johnson, 1997). It is unclear why this self-reference effect was not present in young adolescent girls. It is possible that this group found social evaluation about others more salient or paid more attention to other-referential evaluation, and thus better remembered other-referential words, compared to other adolescents. However, we do not have any evidence to support this explanation.
In contrast to previous studies with healthy adults (Denny and Hunt, 1992;Sanz, 1996;Sedikides and Green, 2000) and adolescents Cole et al., 2014;Connolly et al., 2016;Dainer-Best et al., 2018;Fattahi Asl et al., 2015;Hammen and Zupan, 1984;Kuiper and MacDonald, 1982;Prieto et al., 1992;Taylor and Ingram, 1999;Timbremont and Braet, 2004), we did not find evidence for positively biased self-referential recall. Adolescents recalled more negative than positive words in all conditions. This may be because self-evaluations become more negative and self-esteem declines during adolescence (Robins and Trzesniewski, 2005;van der Aar et al., 2018). However, this would account for biases only in self-referential recall. The generalisation of this negative bias to other-referential recall could be due to our encoding task. Words were viewed as social evaluation, which may make negative words more salient and boost memory, regardless of whether words refer to the self or others. Consistent with this explanation, another study using a social evaluative encoding task (participants imagined overhearing others describing them) also found that adolescents remembered more negative than positive words (Holt et al., 2016).
Contrary to our hypotheses, we did not find a gender difference in negative recall biases. We have previously proposed that gender inequality may cause girls to have more negative self-schema, which could lead to more negative recall biases (Bone et al., 2020). It is possible that girls have more negative self-schema in adolescence, but this was not captured by performance on our surprise recall task. However, the only previous study of gender differences in recall biases during adolescence found evidence that girls had more positive recall than boys, and there were no gender differences in negative recall (McArthur et al., 2019). This is opposite to the gender difference that we proposed. Therefore, despite the evidence for an association between negative recall biases and depressive symptoms, and more severe depressive symptoms in girls than boys, girls may not have more negative recall biases during adolescence. We did find some evidence that the association between self-referential negative recall and depressive symptoms was stronger in girls than boys across age groups. This was unexpected as we anticipated that recall would be similarly associated with depressive symptoms across genders. If self-referential negative recall is a risk factor for depressive symptoms, it may be more important for girls.
Both increased negative and reduced positive self-referential recall were associated with depressive symptoms, as found in some previous studies (Fattahi Asl et al., 2015;Speed et al., 2016). This finding differs to a recent review, which did not find consistent evidence for memory biases in adolescent depression (Platt et al., 2017). This could be because previous studies have generally assessed the proportion of words previously endorsed as self-referential that are recalled. Testing recall of social evaluation may provide a more nuanced measure of memory biases.
In this study, effect estimates and confidence intervals for the associations between self-referential positive hits and depressive symptoms were clearly different from the corresponding association with otherreferential positive hits, potentially suggesting a specific role of poorer self-referential positive recall in vulnerability to depressive symptoms. It is less clear whether there is a specific role of self-referential negative recall. In unadjusted analyses, self-referential and other-referential negative hits were associated with depressive symptoms. After adjusting for confounders, evidence for the association between otherreferential negative hits and depressive symptoms was attenuated, but the coefficient and confidence interval were not clearly different from those for the corresponding association with self-referential negative hits. We cannot rule out that the association between self-referential negative hits and depressive symptoms reflects a general negative bias. However, self-referential negative recall was most strongly associated with depressive symptom severity, as previously found (Dainer-Best et al., 2018). Note. All models adjusted for condition and valence. Fully adjusted models also adjusted for continuous age within each age group, school, testing group size, non-verbal IQ score, and positive and negative false alarms. For gender, male was the reference group. For condition, other-referential was the reference group. For valence, positive was the reference group.

Strengths and limitations
We aimed to test whether gender differences in recall bias were associated with depressive symptoms in adolescence. Our sample was population-based and included the full range of depressive symptoms (from none to severe), which we analysed continuously. This should have increased our statistical power to detect any associations between recall bias and depressive symptoms (Button et al., 2013), although this study may have been underpowered for testing three-and four-way interactions. The sample was recruited from eight diverse schools, making it more representative than many previous studies. We used a novel recall task, allowing us to differentiate self-referential and other-referential recall bias. However, this recall task had some limitations. Its reliability and validity are unknown, although tasks assessing memory and emotional biases are generally reliable (Bland et al., 2016). The nature of the encoding task may have influenced recall. Traditional tasks measure Fig. 2. A) Mean hits according showing three-way interaction between age group, gender, and word condition (self-referential or other-referential). B) Mean hits showing three-way interaction between age group, gender, and word valence (positive or negative). C) Mean hits showing four-way interaction between age group, gender, condition and valence. All plotted using raw data.

Table 3
Change in depressive symptoms (SMFQ score) for each additional self-referential positive, self-referential negative, other-referential positive, and otherreferential negative hit. Note. Both models included all four types of hits as independent variables. Model 2 was adjusted for age group, gender, continuous age within each age group, school, testing group size, non-verbal IQ score, and positive and negative false alarms.
recall of words describing how participants see themselves, rather than how another individual sees them. Whilst information consistent with the self-concept was probably preferentially recalled, words incongruent with the self-concept may have been more memorable. Adolescents with more depressive symptoms could have been differentially affected by the idea of someone liking or disliking them, altering reactions to the words, and influencing recall.
The poor parental consent rates in several schools was a limitation. Selection bias may have occurred, as participants had higher non-verbal IQ and better recall in schools with low parental consent. However, 76% of the sample were from schools with high consent. We do not think that the factors influencing selection bias would alter associations between recall bias and depressive symptoms. Opt-out consent was used to recruit nearly half of our sample, which should also have reduced selection bias.
Although we adjusted for several potential confounders, residual confounding is also possible. For subsamples with information available on additional confounders (59% of participants) and pubertal stage (20% of participants), adjusting for these potential confounders did not alter the evidence for any associations (Supplement). However, in these subsamples, there was no evidence for associations between selfreferential positive hits and depressive symptoms. Nevertheless, the effect estimates were similar to the coefficients, and within the confidence intervals, from the primary analyses with the whole sample. The lack of evidence could be due to the reduced sample size or selection bias in participants with data on these potential confounders.
As this was a cross-sectional study, we cannot provide evidence of a causal effect of recall bias on depressive symptoms, as proposed by cognitive models of depression (Roiser and Sahakian, 2017). Our findings are consistent with such models. However, it is equally possible that changes in depressive symptoms cause changes in recall biases (reverse causality), or that the association is bidirectional. Longitudinal data is required to test the hypothesis that negatively biased recall leads to increased depressive symptoms.
Consistent with contemporary cognitive models of depression (Beck, 2008;Roiser and Sahakian, 2017), adolescents who had more self-referential negative and less self-referential positive recall had more severe depressive symptoms. There was no evidence for gender or age differences in these recall biases, although there was some evidence that the association with self-referential negative recall was stronger in girls. The association between recall biases and depressive symptoms was similar across early and mid-adolescence, despite the increase in depressive symptoms in older adolescents. Negatively biased self-referential recall may lead to more negative memories of social interactions and more negative self-concepts, encouraging social withdrawal and increasing depressive symptoms. Negative self-referential bias may be a risk factor for the emergence of depressive symptoms during adolescence and is a good candidate for future longitudinal studies.

Author contributions
JKB was responsible for the original study proposal and securing funding, with input from GeL, JR, SJB, and GlL. JKB had overall responsibility for the study management, data collection, analyses, and drafting of the manuscript. GeL, JR, SJB, and GlL assisted with study planning, management, planning and interpreting analyses, and writing of this manuscript. All authors contributed to, and approved, the final manuscript.

Funding
This work was supported by an Economic and Social Research Council PhD studentship awarded to JKB. The funder had no role in study design, the collection, analysis and interpretation of data, writing of the report, or the decision to submit the article for publication.

Declaration of Competing Interest
JKB, GeL, SJB and GlL report no financial interests or potential conflicts of interest. JPR has acted as a paid consultant for Cambridge Cognition Ltd, Takeda Ltd and GE Healthcare within the last three years, none of which was related to this research.