Systematic review of avian hatching failure and implications for conservation

Avian hatching failure is a widespread phenomenon, affecting around 10% of all eggs that are laid and not lost to predation, damage, or desertion. Our understanding of hatching failure is limited in terms of both its underpinning mechanisms and its occurrence across different populations. It is widely acknowledged that rates of hatching failure are higher in threatened species and in populations maintained in captivity compared to wild, non‐threatened species, but these differences have rarely been quantified and any broader patterns remain unexplored. To examine the associations between threat status, management interventions, and hatching failure across populations we conducted a phylogenetically controlled multilevel meta‐analysis across 231 studies and 241 species of birds. Our data set included both threatened (Critically Endangered, Endangered, and Vulnerable) and non‐threatened (Near Threatened and Least Concern) species across wild and captive populations, as well as ‘wild managed’ (‘free‐living’) populations. We found the mean overall rate of hatching failure across all populations to be 16.79%, with the hatching failure rate of wild, non‐threatened species being 12.40%. We found that populations of threatened species experienced significantly higher mean hatching failure than populations of non‐threatened species. Different levels of management were also associated with different rates of hatching failure, with wild populations experiencing the lowest rate of hatching failure, followed by wild managed populations, and populations in captivity experiencing the highest rate. Similarly, populations that were subject to the specific management interventions of artificial incubation, supplementary feeding, and artificial nest provision displayed significantly higher rates of hatching failure than populations without these interventions. The driver of this correlation between hatching failure and management remains unclear, but could be an indirect result of threatened species being more likely to have lower hatching success and also being more likely to be subject to management, indicating that conservation efforts are fittingly being focused towards the species potentially most at risk from extinction. This is the most comprehensive comparative analysis of avian hatching failure that has been conducted to date, and the first to quantify explicitly how threat status and management are associated with the rate of hatching failure in a population. We discuss the implications of our results, focusing on their potential applications to conservation. Although we identified several factors clearly associated with variation in hatching failure, a significant amount of heterogeneity was not explained by our meta‐analytical model, indicating that other factors influencing hatching failure were not included here. We discuss what these factors might be and suggest avenues for further research. Finally, we discuss the inconsistency in how hatching failure is defined and reported within the literature, and propose a standardised definition to be used in future studies which will enable better comparison across populations and ensure that the most accurate information is used to support management decisions.


I. INTRODUCTION
Evolutionary theory suggests that selection should act strongly against traits that lead to reproductive failure. Despite this, hatching failure (i.e. the failure of eggs to hatch due to fertilisation failure or embryo mortality) is ubiquitous across birds. While mean hatching failure rates of around 8-12% (Koenig, 1982;Morrow, Arnqvist & Pitcher, 2002;Spottiswoode & Møller, 2004;Møller, Erritzøe & R ozsa, 2010) have been reported across species, there can be large intra-and interspecific variation (Rothstein, 1973). Importantly, much higher rates have been described in some threatened species (e.g. Heber & Briskie, 2010). Currently, 13-14% of bird species are threatened with extinction (IUCN, 2021), with more species expected to become threatened in their natural habitats in coming years due to climate change, habitat loss, and invasive species (Birdlife International, 2018). Birds are amongst the most wellstudied taxa, but the limited amount of research into the causes of hatching failure and infertility in non-model, non-domestic species relative to poultry has resulted in substantial gaps in our understanding of why eggs fail (Assersohn et al., 2021b;Assersohn, Brekke & Hemmings, 2021a). In particular, there is a lack of understanding of how conservation management impacts hatching rates across bird populations, despite evidence that hatching failure can be higher in captive populations relative to wild counterparts (e.g. Burnham, 1983;Saint Jalme et al., 1996). Given the growing importance of managed populations to the conservation of threatened bird species, a systematic review of the influence of management on such a key reproductive measure as hatching failure seems timely.
Several comparative reviews of hatching failure across bird species have been performed previously, with findings linking hatching rates to (i) genetic effects such as past population bottlenecks and high genetic similarity (Briskie & Mackintosh, 2004;Spottiswoode & Møller, 2004;Heber & Briskie, 2010), and (ii) environmental effects, including latitude, nest type, diet, and breeding/social system (Koenig, 1982;Spottiswoode & Møller, 2004) (see online Supporting Information, Tables S1 and S2 in Appendix S1). However, previous reviews have included minimal comparison of threatened versus non-threatened species and have generally excluded managed populations. While hatching failure occurs across all birds and there are likely to be common drivers, the very high rates reported for some threatened species indicate that they are either more strongly affected by these shared drivers, or that they are subject to additional drivers compared to non-threatened species. Similarly, captivity has been shown to depress species' reproductive success relative to wild counterparts (Farquharson, Hogg & Grueber, 2018), and there is some evidence that hatching failure in captive populations is primarily caused by fertilisation failure, while embryo mortality is the more common cause of failure in the wild (Hemmings, West & Birkhead, 2012). This could indicate that captive and wild populations may be differentially affected by certain drivers. Failing to account for variation in threat status and management in comparisons of hatching failure could therefore lead to key drivers of hatching failure being missed or underestimated in the species and populations Biological Reviews 98 (2023)  that experience hatching failure at the highest rates, and in which reproductive failure could have a larger impact on extinction risk.
The causes of elevated hatching failure in threatened species remain under-explored, potentially due to the widespread assumption that because threatened species are often present in small, isolated populations, hatching failure results predominantly from inbreeding depression (e.g. Heber & Briskie, 2010). However, the range of hatching failure rates exhibited by threatened species, and the variation among populations of the same species, suggests that inbreeding depression is not solely responsible (Hemmings et al., 2012;Assersohn et al., 2021b). Threatened populations may possess characteristics that could partially explain their higher failure rates, for example, their inherently small population sizes can disrupt social dynamics and breeding systems in cooperative breeders, while demographic stochasticity in the sex ratio can impair reproduction in monogamous species (Lacy, 2000;Lee, Saether & Engen, 2011). Also, the factors that led a population to become threatened in the first place may continue to impact reproduction negatively. For example, high predation levels can indirectly affect hatching failure in non-predated nests due to perceived predation risk disturbing parents during incubation (e.g. Zanette et al., 2011). Habitat loss or deterioration could also increase hatching failure by creating reliance on sub-optimal nesting sites (e.g. Perlut et al., 2016) and/or prompting competition for resources leading to reduced nest attendance (e.g. Koski et al., 2020) and parental condition (e.g. Ardia & Clotfelter, 2007). Finally, some traits that increase a species' vulnerability to extinction, such as high levels of endemism, living in extreme environments, and complex life histories (Mckinney, 1997;Owens & Bennett, 2000;Purvis et al., 2000) could also make them more sensitive to environmental change, negatively impacting their hatching success.
Since it is difficult to recreate optimal breeding environments in captivity, we might expect populations under management to have lower hatching success rates compared to wild populations of the same species. However, this has not been thoroughly investigated or quantified. Understanding how management interventions influence hatching outcomes is essential when developing management guidelines and assessing the effectiveness of different interventions. Conservation managers are often operating under a number of constraints, and may need to weigh up the effectiveness of interventions with any trade-offs in terms of financial costs, time commitment, and stakeholder motivations (e.g. Pritchard et al., 2022). Hatching failure is a relatively straightforward parameter to utilise for this (e.g. Martins et al., 2021;Edwards et al., 2022). For example, a population in the wild might have a low rate of intrinsic hatching failure but loses a large number of eggs to predation. Moving this population into captivity and hence eliminating the effect of predation will likely increase the number of eggs surviving to hatching, but may also increase the rate of hatching failure in nonpredated eggs above natural levels. If this increase in hatching failure cancels out the reduction in egg loss due to predation, it may be more effective to leave the population in the wild and instead employ predator exclusion or removal strategies. Models such as population viability analyses are increasingly being used to estimate the likelihood that a population will go extinct under different scenarios (e.g. Bustamante, 1996;Dolman et al., 2015;Heinrichs et al., 2019). Incorporating 'baseline' measures of hatching failure in a wild, unmanaged population, along with the expected effect of different management interventions, into such models could help increase their accuracy and support decision-making. This may be particularly useful for species that become candidates for management for the first time due to degradation of their natural habitats (Birdlife International, 2018;IUCN, 2021).
To address some of the knowledge gaps surrounding hatching failure and management and to establish average baseline measures of hatching failure across different threat classifications and management levels, we conducted a phylogenetically corrected meta-analysis to investigate: (i) how hatching failure rate varies with a species' threat status; (ii) how hatching failure rate varies across different levels of management; (iii) how hatching failure rate varies under specific management interventions (artificial incubation, artificial nest site provision, and supplementary feeding); and (iv) how threat status and management level or specific management interventions interact with respect to their association with hatching failure. This analysis represents the most up-todate and comprehensive systematic review of avian hatching failure conducted so far, and the first to consider directly the potential influence of management interventions and threat status, with a focus on the implications for conservation. Performing this analysis highlighted the absence of consistent hatching failure terminology in the literature and the consequent constraints upon the scale of comparison among studies, hence we also suggest a framework for defining and reporting hatching failure which we hope will be adopted more broadly. Finally, we identify a number of remaining questions and knowledge gaps and suggest how they could be addressed in future research. (1982) defined 'hatchability' as the percentage of eggs surviving to the time of hatching that produce a chick, thus excluding eggs lost to predation, abandonment, accidental breakage, or any other unknown factor. However, 'hatchability' is commonly used within poultry studies to describe only the proportion of fertile eggs that hatch (King'ori, 2011), and other studies that follow Koenig's definition have instead used 'hatching success' (e.g. Spottiswoode & Møller, 2004) or 'hatching failure' (e.g. Briskie & Mackintosh, 2004). The literature contains examples of 'hatchability' and 'hatching success' being used both interchangeably (e.g. Cicho n, Sendecka & Gustafsson, 2005) and distinctly (e.g. Schwarzbach, Albertson & Thomas, 2006), as well as usage of alternative terms (e.g. 'embryo survival'; Aldredge, 2017). Often, studies lack an explanation of how a term is defined, with further complexity arising when studies do not state whether eggs lost to external factors (e.g. predation) are included within reported hatching failure estimates.

Koenig
In this review, hatching failure is defined as the proportion of eggs present at the end of the incubation period that fail to hatch relative to all eggs present at the end of the incubation period, thus excluding eggs lost due to predation, desertion, accident, extreme weather, or that disappeared during the incubation period due to unknown factors. This definition is used under the assumption that intrinsic and extrinsic factors are independent. For example, it is assumed that unfertilised eggs are not more likely to be abandoned by parents or predated than fertilised eggs, since there is currently no evidence for this in the literature.
(1) Data compilation Studies included in the meta-analyses were compiled from three main sources: (1) The electronic database Web of Science was chosen as the primary search system due to its functionality, use in similar comparative reviews, and suitability as a principal search system for a systematic review (Gusenbauer & Haddaway, 2020). We included the most commonly used terms for hatching failure identified in an initial survey of the literature in our search, i.e. 'hatching failure', 'hatching success', 'hatchability', and 'hatching rate'. Multiple search sets with different terms and restrictions were trialled on 23/01/2020, with the final search set chosen to maximise the likelihood of locating relevant studies while keeping the total number of studies to a manageable level for closer manual inspection (Foo et al., 2021) (see Table S3 in Appendix S2). The final database search using the chosen search set (see Appendix S2) was conducted on 16/02/2020. A Web of Science alert for the final search set was in place until 01/09/2020 and a small number (N = 6) of eligible records were identified and included in the data set during this period.
(2) The data sets of previous comparative analyses that either investigated hatching failure directly or included it as a key variable were searched and the original sources of the data were examined where possible to verify data eligibility and extract additional information. The comparative analyses considered were: Briskie & Mackintosh (2004) Table S1 in Appendix S1). In a number of cases the source of the data was cited as 'personal communication' and hence could not be independently verified and additional information around the population could not be obtained.
(3) Papers encountered during reading of the general literature on hatching failure and reproductive success in birds that contained information relevant to this analysis but were not identified by the Web of Science search were considered for inclusion, as were some partial data sets of hatching failure literature previously compiled by the authors and their colleagues. A small number of articles found in February 2020 via the search engine Google Dataset Search using the search terms 'hatching failure' and 'egg hatching success' which were not identified in other searches were also considered for inclusion. A final source of additional studies was Google Scholar alerts for the terms 'hatching success', 'hatching failure', 'hatchability', and 'captive breeding' + 'bird'. The alerts were originally set up in January 2019 and relevant papers from alerts were included up until 01/09/2020.
(2) Inclusion criteria The criteria for inclusion of a study in the final data set were as follows: (1) The study was published in a peer-reviewed journal or master's and PhD theses/dissertations published online. Articles published in 'predatory' journals or from 'predatory' publishers as listed on Beall's List (https://beallslist.net/) were excluded.
(2) The study was accessible through reasonable effort on the publisher's website using institutional access or available in institutional libraries.
(3) The study was conducted on wild or captive birds, but not domesticated species kept for commercial purposes such as poultry or gamebirds, or birds bred intensively for the pet trade.
(4) The study did not include in-ovo injection of compounds or physical manipulation of egg components or eggshell structure such as 'windowing' of eggs. The control data set from experimental studies was included if appropriate, for example if control eggs were handled, but not if they were injected with a non-active compound. (5) The study's definition of hatching failure matched that of this review. In some cases, checking the original sources of data included in previous comparative analyses revealed that the hatching success rate cited by the comparative analysis did not appear to fit our definition of hatching success. However, as authors of previous analyses reported validating rates through personal communication with researchers, cited rates were accepted as reliable. (6) The study contained sufficient information in the text or supplementary material to calculate hatching failure/success even if the study itself did not report a hatching failure/ success percentage, or the definition used did not match our definition. (7) The study contained information on the sample size (total number of eggs present at the end of the incubation) and/or the total number of eggs hatched/unhatched along with hatching failure/success proportion (as in criterion 6). A minimum sample size of 10 was required for inclusion.
The processes of literature searching, data eligibility assessment, and ultimately data inclusion are summarised in a Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) diagram (Fig. 1) and a PRISMA-EcoEvo checklist provided in Appendix S3 (Moher et al., 2009;O'Dea et al., 2021). Further details are included in Appendices S2 and S4, and the final data set is also available (see Appendix S5). After the initial pooling of all articles, we removed all studies prior to 2004 that were not included in the previous comparative analyses as it was assumed that the majority of eligible studies from this period would have been captured in these previous analyses.
The final data set consisted of 483 records (contained within 233 articles; see Appendix S6 for the full bibliography of the final data set), of which 422 records had been identified in the Web of Science search, 13 in Google Scholar alerts, 4 from Google Dataset Search, 5 from partial data sets previously compiled by the authors and former colleagues, 28 from backward searches within these previously mentioned articles, and 11 from general reading of the literature. The main reasons that eligible records were not identified in the main Web of Science search was due to their absence from the Web of Science Core Collection, or due to the lack of any of the key terms 'hatching failure', 'hatching success', 'hatchability', or 'hatching rate' in their title, abstract, or key words, primarily due to hatching failure not being a main focus of the paper and hence only being reported within the main body of the text. While including records found during general reading of the literature could potentially introduce bias due to the influence of the authors' research interests, we found that the hatching failure data from this small number of studies reflected the trends seen in the overall data set, and we are confident that their inclusion did not influence the results of the meta-analysis.

(3) Effect size extraction and calculation
The chosen effect size was the proportion of hatching failure across the population, i.e. the proportion of eggs present at the end of the incubation period that fail to hatch relative to all eggs present at the end of the incubation period, excluding eggs lost to predation, desertion, accident, extreme weather, or unknown factors. For 286 (59.2%) of the 483 records the absolute sample size (total number of eggs present at the end of the incubation) was taken or calculated directly from the information given in the study or supplementary material. Where this information was not available an estimation for the sample size was performed. In 125 records (25.9%) the sample size was estimated as the product of the total number of successfully hatched clutches/nests (i.e. excluding any whole clutches lost to predation, desertion etc.) and the mean clutch size reported in the study. In the absence of a mean clutch size value within the study, for 60 records (12.4%) the mean clutch size was taken from Cooney et al. (2020), for 10 records (2.1%) the mean clutch size was taken from New Zealand Birds Online (www.nzbirdsonline.org.nz), and for two records (0.4%) the mean clutch size was taken from Birds of the World (www.birdsoftheworld.org). For subspecies without a reported mean clutch size the available clutch size for the parent species was used. The appropriateness of the two main proxies used was assessed by performing a series of statistical tests on 125 records for which a sample size could be directly obtained from the study and estimated using the study's mean clutch size and 101 records for which a sample size could be directly obtained from the study and estimated using the mean clutch size from Cooney et al. (2020). A test for association using Pearson's correlation coefficient showed that in both cases there was a significantly strong relationship between the directly obtained and estimated sample sizes (P < 0.001), with correlations of 0.9998 and 0.9973 for the estimations from the study's mean clutch size and the mean clutch size from Cooney et al. (2020) respectively. A Bartlett's test for homogeneity of variances indicated that in both cases the variances were homogenous, with no significant difference in variances between direct and estimated values (P > 0.05). The results of a paired sample t-test indicated that overall the directly obtained values and those estimated from the study's mean clutch size were not significantly different (P = 0.273); the estimation could therefore be considered an appropriate proxy. For the directly obtained values and those estimated using the mean clutch size from Cooney et al. (2020), the results of a paired sample t-test indicated that overall the estimated values were significantly smaller than the direct values (P = 0.012). This indicates that records for which the sample size was estimated using the mean clutch size from Cooney et al. For each effect size we also coded a number of key variables for use in analyses, including: Order, Species name, Common name, IUCN Red List classification [Least Concern (LC), Near Threatened (NT), Vulnerable (VU), Endangered (EN), or Critically Endangered (CR)], Threat status (threatened or non-threatened), Management level (wild, wild managed, or captive), Incubation type (artificial or natural), Supplementary feeding (fed or not fed), and Artificial nest provision (provided or not provided). For the IUCN Red List classification we used the current classification (IUCN, 2021) at the time of our meta-analysis to avoid having to exclude records which were not classified at the time of the original study, and we used global assessments where both global and local assessments were available. For populations of subspecies the classification of the parent species was used unless a classification for the subspecies was available from a regional or national Red List (www.nationalredlist.org), Federal Endangered Species List (www.fws.gov/endangered), or European Red List (https://eunis.eea.europa.eu/species.jsp). Management-level classifications were based on the management applied to both the breeding population and the eggs. Populations were classified as 'wild' if the adults lived in the wild with no management applied, and as 'captive' if the adults were maintained in captivity and the eggs were laid, hatched, and reared in  (2019) [174 records]. †Other sources: articles passively encountered during reading; articles taken from partially completed databases; studies captured in Google Scholar alerts, studied captured in Google Dataset Search searches. ‡When a study contained multiple population records these were assessed for inclusion individually. §Some articles contained a mixture of excluded, unverified, and included population records and are counted in multiple record totals. ¶'Unverified' articles were potentially eligible for inclusion but their data could not be confirmed without individual author verification. Superscript numbers indicate the inclusion criteria used to exclude articles/records (see Section II.2).
captivity. Populations were classified as 'wild managed' if the adults lived in the wild with management interventions applied, including nestbox provision, supplementary feeding, and/or egg manipulations such as removal for artificial incubation. See Appendix S3 for additional details of the classification of variables.

(4) Data analyses
Analyses were completed in RStudio version 1.4.1090 (RStudio Team, 2020) running R version 4.0.3 (R Core Team, 2020) and were conducted using the final data set of included studies unless stated otherwise. The R code is available as part of the online Supporting Information (see Appendix S7).

(a) Hatching success/failure terminology
To investigate the usage of different definitions and terminology for hatching failure within each study we noted whether 'hatching success', 'hatching failure', 'hatchability', or an alternative term was used within the text, tables, figures, or supplementary material and extracted any definition exactly as written. If a term was used but not defined, the definition was recorded as 'not stated'. To account for syntax differences that did not reflect actual differences in meaning, we homogenised definitions to use consistent phrasing, abbreviations were expanded, and calculations were expressed in words. All definitions of hatching failure were inverted to hatching success as the latter was used substantially more frequently, and where alternative terms were defined but were being used interchangeably with hatching success the definitions were reassigned to define hatching success. See Tables S4 and S5 in Appendix S8 for further details.

(b) Meta-analysis
We conducted a meta-analysis with the goal of obtaining a more precise estimate of the overall hatching failure rate and identifying and characterising the factors that have an impact on between-study heterogeneity and the overall effect estimate. The studies included were generally observational and non-comparative, with each contributing a number of failures (eggs present at the end of the incubation period that failed to hatch) and a sample size (total number of eggs present at the end of the incubation period).
(i) Model choice and transformation of proportions. Due to a hierarchical structure resulting from several studies contributing multiple effect size estimates, and the need to account for phylogenetic dependence across the multiple included species, a multilevel meta-analytic model structure was required (Konstantopoulos, 2011;Nakagawa & Schielzeth, 2013;Cinar, Nakagawa & Viechtbauer, 2022).
The meta-analytic models were run using the rma.mv function in the package metafor (Viechtbauer, 2010). The function escalc from the metafor package was used to estimate the individual effect sizes and their sample variances with a Freeman-Tukey double arcsine transformation (Freeman & Tukey, 1950;Miller, 1978) and the meta-analysis model was fitted using a restricted maximum-likelihood (REML) estimation (Viechtbauer, 2005). We also applied the Knapp-Hartung adjustment (Knapp & Hartung, 2003) to all metaanalytical models as often recommended within the literature (Assink & Wibbelink, 2016;Harrer et al., 2021). Appendix S9 provides additional information on model selection.
(ii) Phylogeny. A phylogenetic framework was constructed using the tool available on www.birdtree.org, which is based on the taxonomy and phylogenies of Jetz et al. (2012). We downloaded 1000 trees based on the full 'Hackett' backbone [10,000 trees with 9993 operational taxonomic units (OTUs) each] with a subset of species matching our data set. This number has been shown to result in very precise model parameters (Rubolini et al., 2015). Three species from our data set (five records in total) were not available on BirdTree and were excluded from the phylogenetic tree and all subsequent analysis. This reduced the final data set to 478 records across 231 studies (Fig. 1). Where our data set contained subspecies the parent species was used for construction of the phylogenetic tree and we resolved any taxonomical name differences between our data set and the taxonomy used by BirdTree as described in Table S6 in Appendix S10. The R packages ape (Paradis & Schliep, 2019) and phangorn (Schliep, 2011;Schliep et al., 2017) were used to remove any excluded species, compute the maximum clade credibility tree from the downloaded trees, compute the branch lengths, and compute the correlation matrix. The final ultrametric tree used to compute the correlation matrix contained 241 unique species (Fig. S1, Appendix S10).
(iii) Controlling for non-independence of data. Random effects were incorporated into the meta-analytic model to control for various sources of non-independence. To account for between-study variability and the inclusion of multiple effect sizes per study, a random effect was included at the study level, and to capture variability in the true effects within studies a random effect was included at the effect size level (Nakagawa & Santos, 2012;Assink & Wibbelink, 2016;Harrer et al., 2021). As the final data set included multiple populations of the same species, which can lead to overestimation of the actual degrees of freedom, this was controlled for by including the species name as a random effect intercept (Benítez-L opez et al., 2021;Cinar et al., 2022). We used the phylosig function from phytools (Revell, 2012) to compute the phylogenetic signal in the effect size and conduct hypothesis tests for its significance. We found that there was a significant phylogenetic signal (P < 0.05) in the effect size from our data set using both the K and λ methods, and that the phylogenetic relationships were relatively strong based on the mean correlation of the phylogenetic correlation matrix (see Cinar et al., 2022). Hence, a phylogenetic relatedness correlation matrix was also included as a random effect (Hadfield & Nakagawa, 2010;Chamberlain et al., 2012;Cinar et al., 2022).
To ensure that all four random effects (effect size ID, study ID, species name, and phylogeny) were identifiable in the Biological Reviews 98 (2023)  multilevel meta-analytical model, we plotted profile likelihood plots for each using the profile function within metafor (Fig. S2, Appendix S11).
(iv) Outliers and influential studies. The functions rstudent and cooks.distance from metafor were used to identify potential outliers and influential cases through calculation of the externally studentised residuals and the Cook's distance (Cook, 1977). While there is some discussion over interpretation of both of these measures, according to Viechtbauer & Cheung (2010) finding more than k/10 studentised residuals larger than ±1.96 in a set of k studies would be considered unusual. Twelve studies in our data set of 478 records were found to have absolute externally studentised residuals larger than ±1.96, which is well within this threshold. The criteria commonly used to interpret Cook's distance are to investigate records with a Cook's distance value above 0.5, more than three times the mean of all distances, or equal to more than 4/N where N is the total number of records (Glen, 2016). No records had Cook's distances above 0.5, 11 records had values more than three times the mean, and three of these also had values more than 4/N (Table S7, Appendix S11). Investigating all potential outliers showed no reasons to suspect errors in the data or other justification for removal of these studies from the meta-analysis. Many of the potential outliers displayed very high hatching failure proportions, with 12 out of 20 having rates of >50%, but these records also generally represented captive species, threatened species, or populations undergoing artificial incubation. As these are suspected to have moderating effects on hatching failure which will be accounted for once these variables are included in the meta-analytical model (see Table S8 in Appendix S11), the outliers were not removed for the main analyses. However, sensitivity analyses were conducted with the outliers removed to validate findings (see Tables S9 and S10 in Appendix S11).
(v) Identifying and quantifying heterogeneity. We calculated the overall mean hatching failure by running the multilevel meta-analytical model with only the previously described random effects (effect size ID, study ID, species name, and phylogeny). We calculated the level of heterogeneity across all effect sizes using the I 2 statistic (Higgins & Thompson, 2002;Higgins et al., 2003) and partitioned heterogeneity with respect to the random factors following Nakagawa & Santos (2012) (Table S9 in Appendix S11). The I 2 statistic is defined as the percentage of variability in the effect sizes that is not caused by sampling error, and is a common method used to quantify the between-study heterogeneity. A high degree of heterogeneity (I 2 = 99.23%) ( Table S9 in Appendix S11) was found which could not be attributed to outliers or to sampling variance alone, hence moderator analyses were conducted to identify the sources of heterogeneity, extending the model to become a mixedeffects multilevel meta-analytic model.
(vi) Explaining heterogeneity with moderator analyses. The potential moderators considered for this analysis were threat status (threatened or non-threatened), management level (wild, wild managed, or captive), incubation type (natural or artificial), supplementary feeding (fed or not fed), and artificial nest provision (provided or not provided). All moderators were binary variables with the exception of management level which was coded as a categorical variable. While information on some other management interventions was available for several populations (for example, the use of artificial insemination, predator control, and fostering of eggs), in general there were not enough observations per category within the data set to estimate the moderator effects accurately. An additional moderator of IUCN Red List classification (LC, NT, VU, EN, CR) was also tested in certain models as an alternative to threat status to assess finer level hatching failure differences between threat categories. As with management level, this moderator was coded as a categorical variable. Finally, to examine whether there was a trend in mean hatching failure over time, we ran a metaregression with publication year as a fixed effect.
Variables considered for inclusion as moderators in a meta-regression are often correlated, which can lead to multicollinearity (Hox, Moerbeek & van de Schoot, 2010;Assink & Wibbelink, 2016;Harrer et al., 2021). Correlations between categorical variables were assessed using a series of pairwise chi-squared tests of independence using the chisq.test function from the package stats (R Core Team, 2020) and by computing the Cramér's V (Cramér, 1946) for each relationship (Table S11 in Appendix S11). This revealed weak significant correlations between threat status and management level and between threat status and artificial nest provision, strong significant correlations between management level and incubation type, supplementary feeding, and artificial nest provision, a moderate significant correlation between incubation type and supplementary feeding, and a weak significant correlation between supplementary feeding and artificial nest provision. No significant correlations were found between threat status and incubation type or supplementary feeding, nor between incubation type and artificial nest provision. The variance inflation factor (VIF) was also calculated for each two-level moderator using the vif function in the metafor package, with a high VIF value indicating high collinearity with other moderators in the model (Table S12, Appendix S11). All moderators had VIF values between 1 and 4, indicating low to moderate multicollinearity (although the appropriate VIF threshold is subject to debate; Kock & Lynn, 2012). For the three-level moderator management level the generalised variance inflation factor (GVIF) was calculated (Fox & Monette, 1992) and the transformation (GVIF 1/(2×df) ) 2 applied to enable comparison with VIF values, also indicating low to moderate multicollinearity (Table S12, Appendix S11).
To avoid potentially obscuring significant effects due to collinearity if moderators were combined into a single model, all proposed moderators were tested individually in separate univariate models to check for significance. However, we acknowledge that this approach could increase the chances of false-positive results due to multiple testing (Davies, Lewis & Dougherty, 2020). We repeated the univariate models for incubation type, supplementary feeding, and Biological Reviews 98 (2023)  artificial nest provision excluding captive populations to attempt to account for the likelihood that captive populations were subjected to multiple concurrent management interventions that could mask the effect of the individual intervention included in the model. Thus, we also repeated the univariate models for the specific management interventions where each named intervention was the only one applied to the population. Given that meta-regression is able to handle low levels of collinearity (Harrer et al., 2021), multivariate mixed-effects multilevel meta-analytical models were used to test for evidence of interactions between both threat status and IUCN Red List classification with management level, artificial incubation, supplementary feeding, and artificial nest provision. We also ran these models based on an additive model to obtain estimates of mean hatching failure for each combination of factors for these moderators (Tables S13-S20, Appendix S11). Significance tests and likelihood ratio tests were used to assess whether a significant interaction was present between any of the pairs of moderators.
As we applied the Knapp-Hartung adjustment to all meta-analytical models, the F distribution was used to determine whether the mean effect size (hatching failure proportion) significantly differed across moderator categories, and the mean effect size and 95% confidence intervals were estimated for each moderator category.

(c) Publication bias
Publication bias is a major threat to the validity of metaanalysis . However, as this metaanalysis includes studies that are generally observational and non-comparative, and hence do not calculate significance levels, our results are unlikely to be strongly coloured by outcome reporting bias or time-lag bias . As the data were primarily obtained from a search of the title, abstract, and key words of articles, it is possible that our choice of search terms ('hatching success', 'hatching failure', 'hatchability', and 'hatching rate') could have resulted in a search bias due to authors being more likely to include these terms in a prominent part of the article if they found an unusually high or low level of hatching failure. While this is difficult to detect with any certainty, comparing the mean hatching failure rate across studies with the search terms in more prominent (title and key words) versus less prominent (abstract) parts of the article did not indicate any consistent bias. It is possible that as we searched for both 'hatching success' and 'hatching failure' this may have helped to ensure a balance in the literature between strong findings in either direction.
We considered a number of other sources of research biases (Gurevitch & Hedges, 1999;O'Dea et al., 2021) and attempted to mitigate these where possible. First, studies were generally limited to those written in the English language. Efforts were made to access translations of non-English language studies, but these were often limited to the title and/or abstract only, which did not allow for thorough examination of the text to verify the study's eligibility, ultimately leading to their exclusion. Excluding non-English language studies could under-represent study populations from certain geographic regions, which can be further exacerbated by unequal output of published research globally, particularly in English language journals. To assess the extent of this, each record's study location was plotted using ggplot2 (Wickham, 2016) to visualise their distribution (Fig. 2). This revealed an apparent over-representation of records from English-speaking countries and an underrepresentation of records from much of the African continent, but in general the final data set included a wide geographic distribution of populations. Visualising the geographic distribution also highlighted the frequent occurrence of island-dwelling threatened species in our data set, but this is perhaps expected given the overall trend of island species being more threatened than mainland species (Tershy et al., 2015) and does not necessarily represent a bias in the data set itself. Another consideration in this meta-analysis are potential taxonomic biases due to some species, families, or orders being subject to substantially more research effort than others. To check if our final data set was representative of the global distribution of birds, we used a chi-squared goodness of fit test to compare our data set with the total number of species per order according to the IUCN Red List (www.iucnredlist. org/statistics) (Table S21, Appendix S11). The results showed a significant P value (P < 0.001), indicating that our final data set was not taxonomically representative of the global distribution of birds. Certain orders such as Accipitriformes, Caprimulgiformes, and Columbiformes were underrepresented in our data set relative to the overall global distribution, while other orders including Charadriiformes, Falconiformes, and Sphenisciformes were overrepresented relative to the global distribution. Finally, the threat status of a species could cause bias in the data set either due to non-threatened species being studied more due to being more common or widespread, or threatened species being studied more due to being of greater research interest. We used a chi-squared goodness of fit test to compare our final data set with the threat statuses of the global distribution of birds (www. iucnredlist.org/statistics) (Table S22, Appendix S11), and again found a significant P value (P < 0.001), indicating that our final data set is not representative of the overall global distribution of birds. This appears to be due to a slight bias towards threatened species in our data set relative to the overall global distribution. (1) Description of data set

III. RESULTS
The PRISMA diagram depicting our literature search and screening process is shown in Fig. 1. A total of 231 articles comprising 478 population records were included in the final data set. An additional 638 articles reporting 1039 population records were initially considered for inclusion pending verification of data based on contacting authors directly, but as this number became insurmountable, they were ultimately excluded from the quantitative analyses. Information on the taxonomy, threat status, and management level distribution in the final data set can be found in Table 1.
(2) Hatching success/failure terminology After homogenising the language used in definitions by revising the wording to use consistent phrasing and terminology, inverting definitions of 'hatching failure' to 'hatching success', and reassigning definitions of alternative terms to define 'hatching success' where appropriate (e.g. a study used 'hatching success' and the alternative term synonymously) we were left with 51 different definitions of hatching success and 12 definitions of hatchability across our final data set (Tables S4 and S5, Appendix S8). In terms of usage, 166 records used both terms, with 11 of these records defining hatching success and hatchability to describe different results, while 155 either defined only one of the terms or neither term, hence it was assumed that they were using them interchangeably. In total, 149/483 (30.8%) records in the final data set did not include any definition of hatching success, hatchability, or an alternative term.

(3) Meta-analysis
We found that there was significant within-and betweenstudy variance in hatching failure amongst the studies included in the meta-analysis. Compared to a two-level (i.e. traditional random effects model) and hierarchical  (random effects at effect size level and study level) metaanalytical model, the multilevel meta-analytical model (random effects for effect size, study, species, and phylogeny) was found to be the best fit for the data based on the Akaike (AIC) and Bayesian information criterion (BIC) values, and the significance of the likelihood ratio test. The profile likelihood plots of the random effects peaked at the respective parameter estimates and the restricted log-likelihood values decreased as the component values moved away from the REML estimates, allowing us to have confidence that all four random effects were identifiable (Fig. S2, Appendix S11). The results of the final multilevel meta-analytical model showed an overall hatching failure percentage of 16.79% (95% CI: 8.28-27.40%; N = 478) across all bird populations (Fig. 3). The total amount of heterogeneity (I 2 ) across effect sizes was 99.23% (Table S9, Appendix S11), which is considered high (Higgins & Thompson, 2002;Higgins et al., 2003). The highest amount of variance was attributed to phylogenetic history, with an I 2 of 39.70%. The within-study, or observation-level, variance was found to be 24.02%, between-study variance was 15.04%, and between-species variance was 20.46% (Table S9, Appendix S11). According to the 'rule of thumb' that I 2 values of 25%, 50%, and 75% correspond to low, moderate, and high heterogeneity, these random effects can all be seen to have low to moderate heterogeneity.
The results of the univariate models for each moderator are presented in Table S8 (Appendix S11) and Fig. 3. Hatching failure was significantly (P = 0.0028) influenced by threat status, with threatened species experiencing a mean hatching failure percentage of 21.02% (95% CI: 11.62-32.22%; N = 96) and non-threatened species experiencing a mean hatching failure percentage of 15.24% (95% CI: 7.42-25.07%; N = 382). Examining the influence of the IUCN Red List classification levels also showed a significant (P = 0.0022) overall effect on mean hatching failure, but the differences between threat categories were not all significant. Populations classified as Critically Endangered had significantly higher mean hatching failure than those classified as Vulnerable, Near Threatened, or Least Concern, and populations classified as Endangered had significantly higher Fig. 3. Forest plot of the mean effect size (hatching failure percentage) estimates and 95% confidence intervals for each factor level for six moderators: threat status, IUCN Red List classification, management level, incubation type, supplementary feeding, and artificial nest provision. Random effects for each model were effect size ID, study ID, species name, and phylogenetic history. Proportions were transformed using a Freeman-Tukey double arcsine transformation, and all models were fitted using a restricted maximum likelihood (REML) estimation with a Knapp-Hartung adjustment. A significant F statistic indicates that at least one of the regression coefficients of the moderator categories significantly deviates from zero, indicating that that moderator influences hatching failure. Significant differences between levels are also shown. The mean hatching failure percentage for the whole final data set is shown for comparison (red square). Estimates were back-transformed with the inverse of the Freeman-Tukey transformation using the harmonic mean of the sample sizes and are presented here in the natural scale. k = number of effect sizes. Levels of significance across categories and levels: ***, P < 0.001; **, P < 0.01; *, P < 0.05. mean hatching failure than those classified as Least Concern. Populations classified as Vulnerable experienced a lower mean hatching failure than those classified as Near Threatened but this difference was not significant, and the general trend of populations at higher risk of extinction having lower hatching success is still clearly apparent (Fig. 3). There was no significant relationship (P = 0.9897) between hatching failure and publication year (Fig. S3, Appendix S11). However, when we used the publication year of the original source (i.e. for data included in previous comparative analyses) we did find a weakly positive, significant relationship (P = 0.0037) (Fig. S4, Appendix S11).
The management level of a population was significantly associated with its hatching failure percentage (P < 0.0001), with wild populations experiencing the lowest mean hatching failure of 13.78% (95% CI: 7.25-21.87%; N = 265), wild managed populations experiencing a higher mean hatching failure of 20.05% (95% CI: 12.02-29.46%; N = 172) and captive populations experiencing the highest mean hatching failure of 38.07% (95% CI: 26.60-50.23%; N = 41) ( Table S8, Appendix S11; Fig. 3). The three specific management interventions investigated were also significantly associated with mean hatching failure, with populations undergoing artificial incubation displaying higher mean hatching failure than those practicing natural incubation, supplementary fed populations experiencing higher mean hatching failure than non-supplementary fed populations, and populations provided with artificial nests experiencing higher mean hatching failure than populations without nests (Table S8, Appendix S11; Fig. 3). Repeating the univariate models for the specific management interventions excluding captive populations showed consistent results for the association between hatching failure and both incubation type and supplementary feeding, however, the relationship between artificial nest provision and hatching failure was no longer significant (Table S23, Appendix S11). Similarly, repeating the univariate models for each specific management intervention where the named intervention was the only one applied to the population showed higher hatching failure rates in populations undergoing artificial incubation, supplementary fed populations, and populations provided with artificial nests compared to unmanaged populations, but only the relationships for artificial incubation and supplementary feeding were significant (Table S24, Appendix S11).
The results of both the significance tests and likelihood ratio tests showed that there were no significant interactions (P > 0.05) between either threat status or IUCN Red List classification with management level, or with any of the three included management interventions, indicating that the relationship between management and hatching failure is similar regardless of a species' risk of extinction (Tables S13-S20, Appendix S11). As there was no evidence of interactions between the moderators, we also ran the models based on an additive model to obtain mean hatching failure percentage estimates and 95% confidence intervals for each combination of factors of these moderators, with the results for IUCN Red List classification presented in Table 2.
We ran three sensitivity analyses to assess the robustness of the results. The first removed all previously identified potential outliers, the second removed all records for which the sample size was estimated based on the mean clutch size from Cooney et al. (2020), while the last removed all records from Briskie & Mackintosh (2004) due to ambiguity about whether or not records should be classified as wild or wild managed based on information in the paper (Appendix S4). We found consistent results across all sensitivity analyses and the analysis with the final data set (Tables S9, S10, S25 and S26, Appendix S11). One result of note was a substantial decrease in the proportion of heterogeneity attributed to phylogenetic history (dropping from 39.70% to 18.81%) in the analysis without outliers compared to the full analysis (Table S9, Appendix S11).

IV. DISCUSSION
Here we performed a multilevel meta-analysis of 478 measures of hatching failure in bird populations from 231 studies across 241 species, finding a mean overall hatching failure percentage of 16.79% (95% CI: 8.28-27.40%). Hatching failure was significantly lower for non-threatened species than for threatened species, and wild populations had significantly lower hatching failure than both wild managed and captive populations, with wild managed populations also having significantly lower hatching failure than captive populations. In addition, populations undergoing the specific management interventions of artificial incubation, supplementary feeding, and artificial nest provision all displayed significantly higher hatching failure than populations without these interventions. We did not find any significant interactions between threat status and overall management level of a population, nor with any specific management interventions. We also examined the terminology and definitions of hatching success and failure currently used throughout the literature and found a lack of consistency among studies.
As far as we are aware, this meta-analysis represents the most comprehensive analysis of avian hatching failure conducted to date. Our mean overall rate of hatching failure (16.79%) exceeds the widely accepted value of 9.4% (Koenig, 1982), as well as the mean hatching failure rates found by a number of other comparative analyses [12.35% (Morrow et al., 2002); 10.9% (Spottiswoode & Møller, 2004); and 8.27% (Møller et al., 2010)]. We found a weak but significant relationship between original publication year and hatching failure, which indicates a slight increase in hatching failure over time (Fig. S4, Appendix S11). Given that the current global trend is generally for species decline, even for many populations currently classified as Least Concern, it may be expected that an assessment including the most recent data on hatching failure would report slightly higher rates of hatching failure overall. However, we believe our higher value likely mainly results from our explicit inclusion of threatened and managed populations (and their general exclusion from other Biological Reviews 98 (2023)  studies). The mean rate of hatching failure for wild, nonthreatened species in our study was 12.40%, which is more consistent with the rates found by previous comparative analyses which primarily focused on natural or free-living populations and included mainly (>90%) nonthreatened species (Table 2; Table S1, Appendix S1). By contrast, the mean rate of hatching failure for captive, threatened species was 42.84% (Table 2), showing the degree to which variation in failure rate depends on a population's threat status and management level and explaining our higher overall hatching failure rate relative to other analyses.
Our finding that threatened species had significantly higher rates of hatching failure than non-threatened species is consistent with findings from other studies (e.g. Briskie & Mackintosh, 2004). The link between hatching failure rate and a population's risk of extinction suggests that hatching failure rate could be used as an indicator of a population's risk of extinction, and similarly that threat status could be used to identify populations that may be experiencing high Table 2. The mean hatching failure percentage estimates with 95% confidence intervals for each level of IUCN Red List classification with all other significant moderators. Random effects for each model were effect size ID, study ID, species name, and phylogenetic history. Proportions were transformed using a Freeman-Tukey double arcsine transformation, and all models were fitted using a restricted maximum likelihood (REML) estimation with a Knapp-Hartung adjustment. Estimates are based on an additive model as there was no evidence for significant interactions between the moderators (Tables S13-S20, Appendix S11). Estimates were back-transformed with the inverse of the Freeman-Tukey transformation using the harmonic mean of the sample sizes and are presented here in the natural scale. LC, Least Concern; NT, Near Threatened; VU, Vulnerable; EN, Endangered; CR, Critically Endangered.
Hatching failure % rates of hatching failure. This further highlights the need for an improved understanding of the drivers of hatching failure, which will enable the development and implementation of appropriate management interventions to hopefully mitigate reproductive losses. The significantly higher rates of hatching failure in captive populations compared to wild populations is perhaps unsurprising given existing evidence of poor reproductive success in populations in captivity (Asa et al., 2011;Farquharson et al., 2018). As previously mentioned, there are particular features of captivity which could explain relatively higher rates of hatching failure such as limited mate choice and inclusion of older individuals in breeding events, but the finding that wild managed populations also have significantly higher hatching failure than wild populations may indicate that breeding conditions in captivity are not solely responsible for lower hatching success in managed populations. It is possible that populations are more likely to be managed if they are already experiencing a high rate of hatching failure, with the exception of artificial nest provision which is often applied to common species with low risk of extinction. This is partly supported by our finding of a weak but significant correlation between threat status and management level (Table S11, Appendix S11), with threatened species slightly more likely to be managed compared to nonthreatened species. As we have also shown that threatened species exhibit significantly higher hatching failure than non-threatened species, this could therefore partially explain the higher rates of hatching failure in managed populations. Overall, this finding shows that it is important to consider both threat status and the extent of management a population is experiencing when measuring hatching failure and to compare different populations of the same species, and also indicates that conservation efforts are being focused on the species with the lowest rates of hatching success and hence potentially most vulnerable to extinction.
The lack of an interaction between threat status and management level indicates that threatened species do not experience a greater deficit in hatching failure under management compared to non-threatened species. However, based on an additive model the mean hatching failure of threatened species in captivity (42.84%) is much higher than the mean hatching failure of threatened species in the wild (17.39%) ( Table 2). Again, it is possible that populations that are already experiencing high rates of hatching failure are more likely to be taken into captivity or otherwise managed. However, a similar increase in hatching failure is also seen in a comparison of non-threatened species in captivity (36.06%) versus the wild (12.40%) ( Table 2). While the reasons behind higher rates of hatching failure in managed populations urgently need further investigation, the quantification of mean rates of hatching failure across different management levels established here can be used by conservation managers and other decision-makers to estimate the expected hatching failure rate of populations under different management scenarios, given their threat classification. This can be used alongside other information such as predicted survivability of hatched chicks, which is typically higher in carefully managed captive populations compared to wild populations, when determining the best conservation strategy for a species. To illustrate, if a wild threatened population lays 100 eggs in a breeding season but loses 40 eggs to predators and 10 to hatching failure (a hatching failure rate of 16.67%), only 50 chicks would have the chance to reach fledging. That same population laying 100 eggs in captivity without the threat of predation could lose 42 eggs to hatching failure, i.e. more than twice the rate of hatching failure in the wild, and still result in more chicks having the chance to reach fledging.
We found that populations in which eggs are artificially incubated have significantly higher hatching failure compared to those where eggs are naturally incubated, supporting evidence from other studies (Page, Quinn & Warriner, 1989;Hamilton et al., 1999;Sancha et al., 2004;Amar, Arroyo & Bretagnolle, 2008). This difference is generally considered to be a consequence of the difficulty in simulating natural incubation conditions (Deeming, 2002;Klimstra et al., 2009), as well as the possibly increased risk of trans-shell infections in artificially incubated eggs compared to parentally incubated eggs Rideout, 2012;Assersohn et al., 2021b). In addition, the eggs of different species may require particular artificial incubation conditions (Kuehler & Good, 1990;Klimstra et al., 2009) and information on these optimal conditions, if known, may not be easily or openly available, resulting in the use of unsuitable incubation parameters and subsequent higher rates of hatching failure. There is some evidence that captive-laid eggs have lower hatching success than wild-laid eggs under artificial incubation (Burnham, 1983;van Heezik et al., 2005) which could be contributing to our results, but a repeated analysis excluding captive populations still showed a significantly higher rate of hatching failure for eggs undergoing artificial incubation (Table S23, Appendix S11). The removal of eggs from the wild for artificial incubation has been used successfully in a number of well-known species' conservation programs, for example the California condor (Gymnogyps californianus) (Kuehler & Witman, 1988), the North Island brown kiwi (Apteryx mantelli) and rowi (Okarito kiwi; Apteryx rowi) (Colbourne et al., 2005), and the Mauritius kestrel (Falco punctatus) (Cade & Jones, 1993;Jones et al., 1994). Consequently, this technique may be considered a relatively safe option by conservation managers hoping to improve the number of chicks hatching in a population, particularly if a population is experiencing external risks to eggs such as predation or extreme weather conditions (Williams et al., 2013). However, our finding that populations of threatened species undergoing artificial incubation have a mean hatching failure rate of 41.46% compared to 17.71% for those practicing natural incubation demonstrates that this potential reduction in hatching success should be taken into consideration when deciding whether to remove eggs (Table 2). Overall, this finding further indicates that focusing efforts on improving artificial incubation in a wide range of species could be critically important for improving hatching success in managed populations.
Supplementary feeding is often considered to be a passive, non-invasive management intervention, so the finding that supplementary fed populations experience significantly higher rates of hatching failure than non-supplementary fed populations may appear somewhat unexpected. Several previous studies have found no effect of supplementary feeding on hatching success (Newton & Marquiss, 1981;Sanz, 1996;Peach, Sheehan & Kirby, 2014;Ruffino et al., 2014;Vafidis et al., 2016), while evidence of both positive (Nilsson & Smith, 1988;Korpimäki, 1989) and negative (Harrison et al., 2010) effects has also been found. One explanation for higher hatching failure in supplementary fed populations may be that supplementary feeding is often applied to populations suspected to be living in depleted environments, meaning that they may also be experiencing other problems such as habitat degradation, lack of nest sites, and disturbance, all of which may increase hatching failure. Supplementary feeding is also often used to support reintroduced and translocated populations while they become established in an unfamiliar habitat (Ewen et al., 2015), with such populations likely to be of small population size which could negatively impact their hatching success (Heber & Briskie, 2010). Supplementary feeding has been linked to the spread of disease (Wilson & Macdonald, 1967) which could potentially impact parental condition or increase the risk of trans-shell infection, increasing hatching failure. The presence of food may also attract competitors or predators (Arcese & Smith, 1988;Carrete, Don azar & Margalida, 2006), which could lead to higher hatching failure due to limitation of other resources or nonconsumptive predator effects respectively. Finally, it may be difficult to provide supplementary food which adequately meets the nutritional needs of a species, with a risk that if a population is highly dependent on the supplementary food due to unavailability of natural food sources (e.g. Edmunds et al., 2008) this could have consequences for hatching success. The majority of the supplementary fed populations in our data set were also captive, as by default populations in captivity are not dependent upon natural food sources, and since captive populations were found to experience significantly higher hatching failure than wild managed populations, this could be a driving factor of the higher rate seen in supplementary fed populations. However, a repeated analysis that excluded captive species still showed significantly higher rates of hatching failure for supplementary fed populations, indicating that this is not the case (Table S23, Appendix S11). The relatively low financial cost, effort, and perceived risk of disturbance of supplementary feeding contribute to it often being one of the first interventions applied to a struggling population, but as with other interventions any potential negative effects on hatching success should also be taken into consideration during decision-making.
Artificial nest provision, particularly in the form of nestboxes, is another widely practiced intervention generally considered to be relatively passive, and is often used when suitable natural nest sites are limited and as a rudimentary form of protection from predators (Sutherland et al., 2021). Nestboxes are also often used to facilitate population monitoring for various conservation-related and non-conservationrelated research purposes. Our finding that populations provided with artificial nests have significantly higher hatching failure than those without artificial nests may therefore seem surprising. However, while appropriate artificial nest sites have been shown to have positive effects on the number of eggs hatching, this is often driven by a reduction in eggs losses due to predation, brood parasitism, or adverse weather conditions (Semel, Sherman & Byers, 1988;Piper et al., 2002;Shealer, Buzzell & Heiar, 2006) rather than an increase in hatching success. Very few studies have compared hatching failure rates between natural and artificial nests; instead, clutch size, nesting success, and fledging success appear to be much more frequently used as measures of reproductive performance (Sutherland et al., 2021). There is some evidence that the orientation of artificial nestboxes can impact hatching failure due to creating a non-optimal microclimate (Butler, Whitman & Dufty Jr., 2009), and as nestboxes can provide a favourable environment for microorganisms (Baggott & Graeme-Cook, 2002;Goodenough & Hart, 2011;Gonz alez-Braojos et al., 2012;Devaynes et al., 2018) this may increase the risk of egg infection and subsequently hatching failure. To our knowledge there has been almost no investigation to date into the microbial loads of artificial versus natural nests, but several studies have shown artificial nests to contain significantly higher loads of ectoparasites (Wesolowski & Stanska, 2001;Hebda & Wesołowski, 2012;Espinaze et al., 2020). As with supplementary feeding, a large proportion of our sample of populations provided with artificial nests were also captive, potentially contributing to the higher rate of failure. When the analysis was repeated excluding captive populations the hatching failure of populations provided with artificial nests was still higher, but the relationship was no longer significant, adding support to this possibility (Table S23, Appendix S11). Similarly, a repeated analysis limited to populations where artificial nest provision was the only intervention applied also showed a higher hatching failure rate for populations provided with artificial nests, but again the relationship was no longer significant (Table S24, Appendix S11). Given the long history and widespread use of artificial nest provision, there is remarkably little published information on the interaction with hatching failure. The results of this analysis indicate that the potential for negative effects on hatching success should not be overlooked when deciding whether to provide artificial nests as part of a conservation strategy.
While we found several moderators that influenced the rate of hatching failure across our data set, a significant amount of heterogeneity remains unexplained, indicating that there are other factors not included in our analysis that influence hatching failure. We only had sufficient data for three management interventions to be included in our meta-analysis, but there are several other interventions that could have consequences for hatching failure. Firstly, artificial insemination has been applied in a number of different species to improve fertility and hatching success, with evidence of positive effects (Wiemeyer, 1981;Saint Jalme, Gaucher & Paillat, 1994;Saint Jalme et al., 1996;Gee et al., 2004;Blanco et al., 2009) and hence would be a useful addition to future analyses. Secondly, while artificial incubation was included as a moderator in our analysis, an oftenassociated management intervention is forced re-clutching, where eggs are removed from a population and artificially incubated (or fostered) to encourage the breeding pair to lay a replacement clutch, increasing the overall number of eggs laid in the population (Wood & Collopy, 1993;Seddon et al., 1995;Ellis, Gee & Mirande, 1996;Jones, 2004). Egg fertility, hatching success, and egg quality have been shown to decline in replacement clutches (Cade & Jones, 1993;Jones et al., 1994) and forced re-clutching can also impact other aspects of current and future reproductive success (Wood & Collopy, 1993;Parmley et al., 2015). A comparison of hatching failure across multiple clutches would therefore be a useful inclusion in future analyses. Another aspect of breeding management which can influence hatching success is 'forced' pairing of individuals and maintenance in captivity as a single pair due to genetic or logistical reasons, eliminating the opportunity for mate choice and extra-pair copulations, both of which are gaining recognition in their potential influence on hatching success (Wetton & Parkin, 1991;Pizzari & Birkhead, 2000;Hemmings & Birkhead, 2015, 2017. While it can be difficult to facilitate mate choice in captivity, practices such as 'flock-mating' may help (Asa et al., 2011;Ihle, Kempenaers & Forstmeier, 2015). Other management interventions not included here which may account for some of the unexplained heterogeneity include predator exclusion, egg cleaning, egg fostering, and egg warming or cooling, while other aspects of management potentially affecting breeding or incubation such as levels of researcher disturbance or use of radio transmitters on birds could also be worth investigating in future analyses, but are currently limited in their frequency of utilisation or in how often or well they are reported in studies.
Grouping mean hatching failure rates by taxonomic order appeared to show some variation across orders, along with differing levels of intra-order variation (Fig. S5, Appendix S11). However, these apparent differences may have been due to some orders being represented by only a small number of effect sizes. For example, Accipitriformes, Otidiformes, and Pterocliformes appeared to have much higher rates of hatching failure relative to the other orders, but all contained less than five effect sizes taken primarily from captive and/or threatened populations, likely explaining the high hatching failure for the order overall. Greater consistency in hatching failure rates was seen across orders represented by at least 10 effect sizes, although some still appeared to show higher intra-order variation than others, potentially due to the inclusion of populations from across a wide range of threat and management levels (Figs S5 and S6, Appendix S11). Another possible explanation for high variation within taxonomic orders, which was not explored in this analysis, is the influence of species-specific life-history variables. Life-history variables including diet, nest type, breeding system, incubation pattern, and latitude have been found to influence hatching failure significantly in other comparative analyses (Koenig, 1982;Spottiswoode & Møller, 2004), although not consistently (Table S2, Appendix S1). While our meta-analysis was phylogenetically controlled, we found that phylogenetic history accounted for a large proportion of overall heterogeneity, and that the removal of potential outliers dramatically reduced this proportion (Table S9, Appendix S11). As more closely related species tend to share similar life-history traits this could indicate that certain traits influence hatching failure in different ways, and that these could also interact with the applicability or efficacy of management interventions. For example, Koenig (1982) found that hole-nesting species exhibited higher hatching failure than open-nesters, speculating that this was an indirect result of correlations between breeding inexperience, predation, and hatching success. We support this possibility, and also suggest a link between microclimate and microbial conditions of cavities and related temperature fluctuations or trans-shell infection impacting hatching failure. Hole-nesting species may be more likely to be managed in the wild due to their often-widespread acceptance of nestboxes. Further research is needed to assess if certain life-history variables significantly affect hatching failure, in particular those that may influence or interact with any management interventions which are applied to a population.
An additional dimension which would have been interesting to examine in this study is the variability in hatching failure between nests within studies, i.e. whether hatching failure rate was similar across all nests in a population or concentrated within a few nests. It seems likely that a population where the mean hatching failure rate is due to high, or total, failure in a small number of nests versus a population with the same mean hatching failure rate but resulting from lower failure across a higher proportion of the nests may be experiencing different drivers of hatching failure. Unfortunately, the overwhelming majority of studies included only hatching failure averaged across all nests in the population, so we were unable to include information on the level of within-population variance in hatching failure in our analysis. We suggest this would be a useful inclusion in future studies.
It is clear that a very wide variety of definitions have been used throughout the existing body of literature on hatching success and failure, and it is reasonable to assume are still being used by researchers currently assessing and reporting levels of hatching failure in different bird species. The lack of consistency in terminology and definitions, and in particular the frequent absence of any definition of the terms used or description of data collection, resulted in the exclusion from this meta-analysis of a large proportion of studies (which would have likely otherwise been eligible for inclusion) due to uncertainty around the reported value of hatching failure. We propose that the definition used herei.e. hatching failure is the proportion of eggs present at the end of the incubation period that fail to hatch relative to all eggs present at the end of the incubation period, excluding eggs lost due to Biological Reviews 98 (2023)  predation, desertion, accident, extreme weather, or unknown factorswhich is consistent with that used in several previous comparative analyses of hatching failure, should be considered as the standardised definition of hatching failure for future studies. A standardised definition will not only enable better comparison of hatching failure across multiple studies, allowing for more robust comparative analyses in future, but will also help to ensure that studies are reporting the most accurate information from the population which can be used to draw more meaningful conclusions and apply more appropriate management interventions. While this definition was developed for birds it can also be applied to oviparous species in other taxa such as amphibians, reptiles, and fish. Additionally, the lack of consistent terminology and methodology for reporting reproductive failure in populations is likely to be a problem in viviparous taxa, and it may be possible to adapt this definition of hatching failure to be applicable for measuring reproductive success in viviparous species.
Overall, this review demonstrates the importance of gaining a better understanding of the occurrence of hatching failure, its drivers, and the effects of management interventions, in order to optimise the outcomes of conservation efforts. Conservation programmes worldwide are focused on protecting and recovering threatened species (Ebenhard, 1995;Mallinson, 1995;Bolam et al., 2021), with interventions ranging in intensity from in-situ monitoring and 'non-invasive' management (e.g. supplementary feeding), through to practices such as captive-breeding. Faced with a struggling wild population, conservation managers may decide when and how to intervene using evidence-based conservation (Sutherland et al., 2004(Sutherland et al., , 2021. When multiple alternative interventions are available it is especially important to be able to compare their potential impacts accurately and systematically (e.g. Ruiz et al., 2021). Reproductive parameters such as clutch size, fledging success, and recruitment success are already commonly used for such assessments. Incorporating baseline measures of hatching failure in the wild and in captivity, alongside an understanding of how interventions can impact hatching failure, will add another dimension to facilitating decisions on the best strategy to reduce early reproductive stage losses. Furthermore, is has been found that threatened species are less likely to be held in zoos than non-threatened species, with one possible explanation being that some species are difficult to breed in captivity (Conde et al., 2013;Martin et al., 2014;Biega et al., 2019). By improving our understanding of hatching failure under management it may become possible to apply captive breeding practices to taxa where this has not previously been feasible, hence providing more options to protect some of the most vulnerable species from extinction. While this review focuses exclusively on birds it is likely that similar knowledge gaps and assessment issues occur across other taxa (Grueber et al., 2015;Kaumanns, Begum & Hofer, 2020), and hence some of the lessons learned from this review may be applicable beyond birds alone.

V. CONCLUSIONS
(1) The results of a multilevel meta-analytical model showed the mean overall rate of hatching failure to be 16.79% (95% CI: 8.28-27.40%).
(2) Populations of species classified as threatened (CR, EN, or VU) have significantly higher rates of hatching failure than species classified as non-threatened (LC or NT).
(3) Populations in captivity have significantly higher rates of hatching failure than wild managed and wild populations, and wild managed populations have significantly higher hatching failure than wild populations. (4) Univariate meta-regression models show that populations undergoing artificial incubation have significantly higher hatching failure than those with natural incubation, supplementary fed populations have significantly higher hatching failure than populations without supplementary feeding, and populations provided with artificial nests have significantly higher hatching failure than populations using only natural nests. (5) The results of multivariate mixed-effects multilevel metaanalytical models showed no evidence of significant interactions between threat status and management level, or between threat status and each of the management interventions artificial incubation, supplementary feeding, and artificial incubation, showing that threatened and non-threatened species experience the same level of effect from management on their rates of hatching failure. (6) The mean hatching failure rates for populations under different management levels and experiencing different levels of extinction risk established here could be used alongside baseline measures of hatching failure of wild populations to assess the potential effectiveness of different management scenarios and aid in conservation decision-making. (7) The absence of a standardised definition of hatching success or failure restricts comparative analyses across the literature, limiting our understanding of the overall picture. (8) Further research into hatching failure should incorporate additional management interventions and species-specific life-history traits such as nest-type and breeding system.

VI. ACKNOWLEDGEMENTS
A. F. M. thanks Catherine Finlayson, Catharine Horswill and Christopher Cooney for advice on meta-analysis and phylogenetic control methods, and Fay Morland, Katherine Assersohn, Victoria Franks and David Murrell for helpful discussion of the study during its formulation. The authors also thank Gary Ward and two anonymous reviewers for their comments on the manuscript. Preliminary results of this work were presented at the BOU Annual Conference 2021 and The Joint DPT Conference 2020, we are grateful to the audience at each of these conferences for their feedback and questions. A. F. M. was supported by a ZSL Collaborative Award in Science and Engineering (CASE) Biological Reviews 98 (2023)