Identification of novel coloboma candidate genes through conserved gene expression analyses across four vertebrate species

Ocular coloboma (OC) is a failure of complete optic fissure closure during embryonic development and presents as a tissue defect along the proximal distal axis of the ventral eye. It is classed as part of the clinical spectrum of structural eye malformations with microphthalmia and anophthalmia, collectively abbreviated to MAC. Despite deliberate attempts to identify causative variants in MAC, many patients remain without a genetic diagnosis. To reveal potential candidate genes, we utilised transcriptomes experimentally generated from embryonic eye tissues derived from human, mouse, zebrafish, and chicken at stages coincident with optic fissure closure. Our in-silico analyses found 10 genes with optic fissure specific enriched expression: ALDH1A3, BMPR1B, EMX2, EPHB3, NID1, NTN1, PAX2, SMOC1, TENM3, and VAX1. In situ hybridization revealed that all 10 genes were broadly expressed ventrally in the developing eye, but that only PAX2 and NTN1 were expressed in cells at the edges of the optic fissure margin. Of these conserved optic fissure genes, EMX2, NID1, and EPHB3 have not previously been associated with human MAC cases. Targeted genetic manipulation in zebrafish embryos using CRISPR/Cas9 caused the developmental MAC phenotype for emx2 and ephb3. We scrutined available whole genome sequencing datasets from MAC patients and identified a range of variants with plausible causality. In combination our data suggest that expression of genes involved in ventral eye development are conserved across a range of vertebrate species, and that EMX2, NID1, and EPHB3 are candidate loci that should be adopted into clinical diagnostic screens for patients with structural eye malformations.


37
Congenital structural eye malformations can cause severe visual impairment and reduced quality of life, with a 38 prevalence ranging up to 19/10,000 newborns [1][2][3]. However, the genetic and environmental causes remain elusive, 39 leading to lack of clinical management or treatment options. Within these disorders, ocular coloboma (OC) is the most 40 common and considered part of the clinical phenotypic spectrum with anophthalmia (completely missing eye) and 41 microphthalmia (small, underdeveloped eye), together referred to as MAC. The developmental insults that cause MAC 42 are mostly genetic but may also be influenced by environmental factors [4][5][6]. They occur early in gestation and typi-43 cally affect the morphogenesis and structural development of the optic vesicle (anophthalmia) or eye cup (microph-44 thalmia and coloboma) [7]. 45 During normal eye development, the anterior neural plate develops a transcriptionally distinct eye field which 46 evaginates bilaterally from the surrounding neuroepithelium as the neural plate folds [8]. These outgrowing neural 47 pouches are referred to as the optic vesicles and each one eventually reaches the overlying surface ectoderm, trigger- 48 ing a mutual invagination of the two tissue to create a bilayered optic cup and lens vesicle [9,10]. The ventral retina 49 remains undifferentiated during optic cup stages, largely through a unique transcriptional signature which enables this 50 region of the eye to undergo morphogenic changes that distinguish it from the dorsal retina [11][12][13][14][15]. The morphogen-51 esis of the ventral eye creates a gap along the proximal-distal axes of each optic cup-the optic fissure. The edges of 52 the optic fissure come closer together as the optic cups continue to grow, displacing neural-crest derived periocular 53 mesenchyme (POM) and vascular endothelia, until they eventually come into direct contact and fuse together to com-54 plete the circumferential continuity of the eye. The genetic and physical processes that control the fusion of these 55 opposing epithelial sheets is poorly understood but requires the decoupling and reorientation of cells at the leading 56 edges of the epithelia, similar to a partial epithelial to mesenchymal transition, and coincident with remodeling of 57 overlying basement membrane structures [16][17][18]. 58 Clinically, OC manifests as an inferonasal gap that could affect one or more tissues including the iris, ciliary body, 59 neural retina, retinal pigmented epithelium (RPE), choroid with/without involvement of the optic disc and macula. It 60 accounts for 10% of childhood blindness and has a variable prevalence, but may affect up to 11.9 per 100,000 births 61 [2,19]. It can manifest with wider systemic features as part of a syndrome (syndromic OC), be associated with other 62 eye defects (complex OC) or occur in isolation (isolated OC, and can affect only one (unilateral) or both (bilateral) eyes. 63 The use of next generation sequencing including targeted gene panels and whole genome sequencing in MAC patients 64 and their families, has led to an increase in genetic diagnosis rates up to 33% [20], with MAC panels now including up 65 to 86 different genes [21]. However, very few OC loci show recurrence among unrelated families, and recent studies 66 suggest over 80% of cases do not have a genetic diagnosis [7,22,23]. 67 Animal models of optic fissure closure (OFC) have recently been powerful in revealing the transcriptional land-68 scapes associated with ventral eye development and fissure fusion, successfully identifying novel MAC candidate 69 genes [17,24,25]. Using these datasets, we previously generated a pipeline to enable the identification of new colo-70 boma candidate genes based on evolutionary conserved gene expression in the developing optic fissure [26]. This 71 method leveraged microarray data from mouse [24] and RNAseq data from zebrafish [25] to reveal 4 novel coloboma 72 candidate genes and supported the identification of human pathogenic variants therein [26]. Here, we applied a similar 73 strategy, this time exclusively using RNAseq data but also increasing the analysis to four vertebrate species, represent-74 ing mammals (human, mouse), fish (zebrafish) and avians (chicken). Our analyses revealed three novel candidate genes 75 for roles in eye development with supporting evidence from in vivo work that these are essential for OFC. Based on 76 this, we propose NID1, EMX2 and EPHB3 are now included in targeted gene panels for MAC-patients and families. RNAseq was performed on dissected tissue collected from dorsal and fissure regions of the chicken eye at both fusing 82 (HH St30) and fully fused (HH St34) stages of optic fissure closure (as described in Hardy et al., 2019 [17] and staged 83 according to Hamburger and Hamilton criteria [27]). Three replicates for each sample stage and region were prepared 84 and each sample replicate contained pooled tissue from a minimum of 10 different embryos. Pooled samples were 85 processed for RNA extraction on the day of dissection. Total RNA was extracted using Trizol reagent (Ambion) and 86 treated with Turbo DNaseI (Thermo #AM2238). Libraries were prepared from 500ng of the total-RNA sample using the 87  96 Sequencing quality was evaluated using FASTQC (version 0.11.4). Sequencing adaptors were trimmed using Cu-97 tadapt [28] (version 1.16) and Trimmgalore (version 0.4.1). Trimmed sequences were aligned to the chicken genome 98 (galgal6) using STAR aligner to generate bam files, which were then coordinate sorted and indexed using Samtools [29] 99 (version 1.6). A count matrix of transcripts annotated in Ensembl (version 103) was subsequently generated from bam 100 files using featureCounts part of the Subread package [30,31] (version 2.0.1). Differential gene expression and principal 101 component analysis was performed using DESeq2 [32] in R (version 4.1.0). Multiple test comparisons were corrected 102 using Benjamini and Hochberg correction to obtain the adjusted p -value [33]. Genes with a log2 fold change ≥ 1 and 103 an adjusted p-value ≤0.05 were considered as upregulated or fissure enriched differentially expressed genes (DEGs), 104 while those with a log2 fold change of ≤ -1 and an adjusted P-value of ≤ 0.05 were considered downregulated or 105 dorsally enriched DEGs in the analysis (i.e. Fold change ≥ 2 or ≤ 0.5, respectively). Volcano plots of DEGs were drawn 106 using ggplot2 package in R. Enrichment analysis and functional annotation was performed in DAVID (https://da-107 vid.ncifcrf.gov/home.jsp) using Gallus_gallus as a background list and ENSEMBL gene IDs as inputs. Lists of GO terms 108 were ranked and plotted (-log10(p val)) with cut off values p < 0.001. Sequencing data from Hardy,et.al.,Patel et al.,and Richardson,et.al., were downloaded using the 112 SRA NCBI Toolkit. All reads were processed as follows: Reads were trimmed using Trimmomatic. Pairwise comparison 113 between optic fissure and dorsal or whole eye (in chick pre-fusion samples) tissue were carried out using the Wald 114 Test in DESeq2 using raw count data generated by featureCounts. Comparisons over time were performed using the 115 LRT test. Ensembl gene ID identifiers were converted to their mouse orthologs using the biological DataBase network 116 tool (https://biodbnet-abcc.ncifcrf.gov/db/dbOrtho.php). All up-regulated genes in the fissure at any stage (absolute 117 LFC ≥ 1.0, adjusted p value ≤ 0.05) were selected for further analysis. Using mouse, chicken and zebrafish up-regulated 118

Transcript quantification and differential gene expression analysis
DEGs we identified commonalities across species using Ensembl gene database (version 102) with custom scripts based 119 upon the Bioconductor biomaRt package [34,35] (version 4.2). Briefly, orthologs were identified using bioDBnet bio-120 logical DataBase network (https://biodbnetbcc.ncifcrf.gov/db/dbOrtho.php) using mouse Ensembl gene IDs as a ref-121 erence Where orthologues were not automatically identified, manual curation of the list was carried out.  124 Chick embryos were staged according to Hamburger and Hamilton [27] and fixed overnight at 4 o C in 4 % paraformal-125 dehyde (PFA) in 1.0 M phosphate buffered saline solution (PBS). Embryos were then rinsed in PBS twice and immersed 126 in a 10 % sucrose-PBS solution overnight. The following day eyes were removed and mounted in Neg-50 (Richard Allan 127 Scientific) and snap frozen in iso-pentane (Thermo). Sections were cut at 20 µm on a Cryostat (Leica) onto Superfrost-128 plus slides (Manufacturer), air dried for 2 hours, and then stored at -80oC. Slides were then thawed for 2h at room 129 temperature and rinsed in PBS, before performing RNAscope following the Manufacturer's protocol (ACD Bio-techne, 130

In situ hybridisation gene expression analyses
RNAscope Multiplex Fluorescent Reagent Kit v2 User Manual) and using the target-specific probes (ACD Bio-techne) 131 detailed below in Table 1   to verify that mutations were induced at the CRISPR target site. The PCR primer sequences and CRISPR guide sequences 166 used are listed in Table 2.

169
For assessing eye size, 3 and 5 dpf injected zebrafish embryos and un-injected wild-type siblings were anaesthetized 170 using 0.2 mg/ml tricaine and imaged using a Zeiss Stereo Discovery V20 microscope. ImageJ (FIJI) was used to measure 171 eye diameter from the images. For each group, the mean ± standard deviation was calculated. Data were compared 172 using either unpaired t-tests or Mann-Whitney tests. For immunofluorescence, whole zebrafish larvae at 3 dpf were 173 fixed in 4% PFA/PBS overnight at 4°C before washing in PBS and incubation in 30% sucrose/PBS overnight at 4°C. The 174 samples were mounted and frozen in TissueTek O.C.T (VWR) using dry ice. 12 μm sagittal sections were cut and col-175 lected onto Superfrost PLUS slides (Thermo Fisher Scientific). After air-drying, sections were washed in PBS-0.5% 176 Triton-X before being blocked for 1 hour with 20% normal goat serum (Sigma-Aldrich) in PBS-0.5% Triton-X. and incu-177 bating with anti-laminin (Sigma-Aldrich L9393) diluted 1:50 in antibody solution (2% normal goat serum in PBS-0.5% 178 Triton-X) at 4°C overnight. After washing with PBS-0.5% Triton-X, the sections were incubated with Alexa Fluor 568 nm 179 secondary antibody (Thermo Fisher) diluted 1:500 in antibody solution for 2 hours at room temperature. Finally, the 180 sections were washed and mounted in Prolong Diamond Antifade mountant + DAPI (Thermo Fisher Scientific). The 181 slides were imaged using a Zeiss Invert 880 microscope.   190 We previously used dissected optic fissure tissue from chick retinas to define the transcriptonal landscape during 191

Transcriptomic analyses during chicken optic fissure closure
fusion [17]. However, this analysis did not include tissue from fused stages of OFC, and whole eye tissue was used as 192 the non-fusing comparison rather than dorsal retina used in other species OFC transcriptomic studies [25,39]. To ena-193 ble pan-specific comparisons, we generated transcriptomic data using RNAseq from optic fissure and dorsal retina 194 samples collected from chicken eyes at fusing (HH St30) and fused (HH St34) OFC stages (Figure 1a). In both cases, 195 micro-dissected tissue samples were pooled depending on condition and stage (tissue from >10 embryos per condition 196 or stage) prior to total RNA extraction and Illumina paired end sequencing with enrichment for mRNA transcripts. 197 Variance across the samples were assessed by principal component analysis (PCA) and these indicated that the sample 198 replicates clustered well and that groups of samples segregated according to stage (PC1) and tissue region (PC2) (Sup-199 plemental Figure S1). We then performed differential gene expression (DEG) analyses within the stages to determine 200 fissure-specific gene expression (Figure 1b). We found 158 unique upregulated DEGs that only showed increased ex-201 pression in the fusing (HH St30) fissure, 178 that were increased specifically in fused (HH St34) fissure, and 84 genes 202 that were enriched at both stages. Similarly, for downregulated DEGs (dorsal > fissure) there were 175 genes whose 203 expression was reduced in the fusing fissure compared to dorsal, 83 that were reduced in the fused fissure, and 49 204 that were reduced at both stages. All gene lists and ontology enrichment outputs can be found in Supplemental Table 205 1. We then assessed those genes with enriched expression in the fissure at each stage. In accordance with previous 206 studies [17,25,39], at fusing stages (HH.St30) we found NTN1, ALDH1A3, VAX1, SMOC1, and PAX2 were the highest 207 fissure-specific (Fissure > Dorsal) DEGs, whereas ALDH1A1, LYPD6, GDF6, TBX3 and TBX5 were the most dorsal-specific 208 DEGs (Dorsal > Fissure) (Figure 1c). In silico ontology analysis of the upregulated gene set (Log2FC >1.0, p<0.05; n = 209 246 genes) in the fusing stage fissure revealed enrichment for GO terms of various developmental processes as well 210 as those implicated in OFC, such as "ECM organisation", "signal transduction", "negative regulation of wound healing" 211 and the "BMP signalling pathway", in addition to multiple terms related to cell-cell adhesion or the regulation of cell-212 junctional complexes (Figure 1d). In the fused tissue (HH.St34), NTN1, ALDH1A3, VAX1, SMOC1, and PAX2 were also 213 identified as fissure specific DEGs, in addition to BMPR1B, CHRDL1, and CD109 (Figure 1e). Dorsal-specific DEGs ob-214 served at both stages were TBX5, TBX3, GDF6 and LYPD6 (Figure 1c & e), consistent with previous transcriptomic 215 analyses of OFC [17,39]. Ontology enrichment analysis for all upregulated DEGs (Log2FC >1.0, p<0.05; n = 262 genes) 216 in the fused fissure ( Figure 1e) revealed similar GO terms to those enriched for the fusing fissure related to cell-cell 217 adhesion and cell-junction assembly, for example: "cell-cell adhesion via plasma-membrane adhesion molecules" and 218 "calcium-dependent cell-cell adhesion via plasma membrane cell adhesion molecules". Both stages also included, "in-219 ner ear morphogenesis" and "sensory perception of sound", likely to reflect overlap in epithelial fusion processes 220 involved in inner ear morphogenesis and OFC. The GO term "Axon guidance" was enriched among DEGs in both 221 stages of OFC, and includes genes involved in the ephrin, netrin and semaphorin regulatory pathways. Interestingly, 222 HH.St30 had fewer genes in this GO term classification (9 compared to 27) and, in contrast to HH.St34, did not include 223 members of the UNC5 gene family, as previously shown [17,39]. In combination, these validated the use of these 224 chicken transcriptome datasets for inclusion in subsequent analyses.

Cross-species in silico analysis of optic fissure transcriptomes 237
To identify evolutionarily conserved gene expression profiles in OFC, whole transcriptome RNAseq datasets from 238 chick (this study and reference [17]), zebrafish [25], mouse and human [39] were compared. Analysis of variance using 239 PCA analyses between species at each of the stages (pre-fusion, fusion, and fused) indicated that the samples clustered 240 broadly according to tissue type (dorsal or fissure) and species (Figure 2a-c), although the variance for PC1 ranged 241 from 20-38%. After alignment, differential expression analysis was performed using DESeq2 for successive pairwise 242 condition testing (v1.18.1). Up-regulated genes were defined as those showing a log fold change (Log2FC) of >1.0 and 243 p-adjusted value <0.05. Analysis of upregulated genes at all stages throughout fusion among chicken, mouse, and 244 zebrafish revealed 10 genes that were common to all three species (Figure 2d). Among these were the known human 245 MAC genes TENM3, SMOC1, PAX2, ALDH1A3, and BMPR1B [7,26] , in addition to NID1, VAX1 and NTN1, which have 246 been previously identified as MAC or coloboma candidates based on animal studies [13,17,40]. Novel OFC genes not 247 previously associated with MAC in either humans or animal models were EPHB3 and EMX2. We then added human 248 transcriptome data to our analyses and found that those genes upregulated during fusion across all four species were 249 TEMN3, ALDH1A3, NTN1, SMOC1 and VAX1 (Figure 2e). Within this group, only NTN1 and VAX1 have not previously 250 been associated with MAC in humans. Thus, we identified EMX2 and EPHB3 as potential new OFC candidates, a core 251 group of 10 genes whose expression was enriched in the fissure during OFC across diverse avian, mammalian, and fish 252 species, and 5 genes whose expression was also enriched in the human fissure during OFC.   308 To determine the requirement of the novel genes identified in this study for eye development and OFC, we 309 generated F0 knock-out zebrafish embryos using CRISPR/Cas9 gene editing (GE) targeting exon DNA immediately 310 downstream of the translation initiation site (ATG) and analysed their eye developmental progress at two stages (Fig-311   ure 4). The zebrafish genome has two duplicated ephb3 genes, ephb3ba and ephb3bb, and therefore we designed 312 guide RNAs to both loci to ensure we generated complete removal of these gene products whose activities could be 313 functionally redundant. Both emx2 and ephb3 loss-of-function gene editing events were confirmed by PCR from ge-314 nomic DNA (Figure 4a). In zebrafish, OFC is complete by 56 hours post fertilization [41], and compared to wild type 315

Coloboma candidate gene targeting in vivo
(not injected) and Cas9-only injected embryos, the GE embryos had developed normally at 3 days post fertilisation 316 (dpf) and 5 dpf (Figure 4b) although eye sizes were apparently reduced. Measurements of eye diameters revealed 317 smaller eye size in both GE groups compared to controls at 3 dpf and 5 dpf (Figure 4b,c). Sections cut from control and 318 GE eyes and stained for laminin showed persistence of the basement membrane surrounding the edges of the optic 319 fissure, consistent with fusion failure and coloboma in both ephb3ab and emx2 targeted embryos (Figure 4d). Thus, 320 of the 10 genes identified in this study whose expression in the ventral retina during OFC is conserved across different 321 vertebrates, all can be associated with coloboma or microphthalmia causation in animals or humans when their func-322 tion is perturbed, and can be considered as MAC candidates for diagnostic screens.  340 We then sought to identify any novel disease-causing variants associated with MAC in EMX2, EPHB3, and NID1. 341 for which in silico predictions for consequence to protein range from uncertain to pathogenic. These alleles were not 366 found in gnomAD. No copy number or structural variants were identified in any of the candidate genes screened.

381
The main purpose of this study was to identify novel MAC candidates, and in this endeavor, we found two new 382 genes, EMX2 and EPHB3, that should now be associated with eye development and optic fissure closure defects. Our 383 cross-species meta-analysis also provided evidence that genes previously associated with coloboma causation in 384 model organisms, such as VAX1, NTN1, and NID1, should be more widely recognized as human MAC candidates.

386
The identification of novel MAC genes is challenging, largely due to the poor recurrence rates among non-related 387 affected families, genetic heterogeneity, limits in sample sizes, and human phenotype ontology inconsistencies [21,23]. 388 Single variants, even if they are predicted to be damaging or are shown in animal models as deleterious and phenocopy 389 the human phenotype, still do not meet the strict criteria to be considered as causative [43]. Here, we present novel 390 variants and report new genes associated with MAC but were unable to unambiguously confirm their causality. Iden-391 tifying coloboma candidate genes is even more challenging, largely due to the difficulties in defining the transcriptional 392 networks that govern tissue fusion itself [16]. This is a highly complex process, involving small populations of cells from 393 both the edges of the retinal epithelium, and the peri-ocular mesenchyme (POM) that directly, or indirectly, mediate 394 fusion in the eye, respectively [17,18,44]. We hypothesized that performing in silico data mining of vertebrate OFC 395 transcriptomes may yield novel genes that are expressed in the pioneer cells or in the adjacent POM. This suffered the 396 inherent limitations that OFC in these diverse vertebrates are not identical, for example the timing and duration of the 397 OFC process and the overall size of the eye [16]. In addition, there are inter-species differences to the presence or 398 absence of apoptotic foci in pioneer cells [17,39], and the intercalation of optic nerve cells and astrocytes at the prox-399 imal but not medial region of the fissure [45]. We aimed to overcome this second difference by carefully dissecting 400 the medial and not proximal OF from our chicken retinas. Despite these differences, and similar to a previous study 401 [26], we successfully showed conservation of gene expression across these species during OFC, and found that the 402 majority of these genes had already been associated with MAC phenotypes, further validating our approach.

404
We identified POM specific expression of EMX2 during OFC and showed that its targeted disruption led to MAC 405 phenotypes in zebrafish, however we were unable to identify any novel pioneer cell markers. We propose that the 406 identity of pioneer cells is highly transient and due to the rapid changes in cell behavior required for fusion, their 407 unique molecular signatures may only be revealed through single-cell transcriptomic analyses rather than through 408 bulk RNAseq approaches. However, the generation of a pioneer cell reporter line in any of these species could enable 409 the selective isolation of these cells, and their subsequent characterization throughout the OFC process, at multiple 410 'omics levels. Our study shows that the gene expression conservation we observed in OFC among different vertebrates 411 would permit either of these model species and approaches to be used for such experiments, and we propose these 412 will be the logical next steps for OFC research and OC candidate gene identification.

414
EMX2 is a homeobox transcription factor required for central nervous system and urogenital development 415 [46,47]. It is a homologue of the Drosophila empty spiracles (ems) gene that is essential for anterior head development 416 and brain segmentation. Mice lacking functional Emx2 die at birth and have no kidneys or reproductive organs [46]. noted, when OFC should be complete [48]. As this was unreported by the authors, without further data it is unclear 423 what the penetrance or underpinning mechanism of this phenotype is, however this appears to support our data 424 indicating the necessity for EMX2 in eye development. Further investigation is required to understand its precise role 425 in OFC. become apposed prior to fusion. A previous study [40] found that the zebrafish orthologue nid1 is one of several 429 paralogous nidogens expressed in the fissure margin, but whose expression is downregulated specifically at the onset 430 of fusion, presumably helping to mediate the remodeling of the basement membrane (BM) to enable fusion through 431 either reducing the integrity of the BM or by exposing proteolytic sites in the remaining components of the OFM BM. 432 We found NID1 among the 3-species list (chicken, zebrafish, and mouse) of OFM enriched genes and found that its 433 expression was reduced in fused versus fusing transcriptomes. We also showed its expression was localized to the 434 apical-most cells of the chicken ventral neural retina. Morpholino-based knock-down of nid1 in the previously men-435 tioned study caused coloboma [40], and therefore we did not generate mutants for this gene. However, given the 436 zebrafish phenotype, and the conservation of OFM NID1 expression across the range of species shown here, we sug-437 gest this gene should be considered in the genetic analysis of MAC patients.

439
Ephrin type-B receptor 3 is a transmembrane tyrosine kinase protein with affinity for the ephrin-B family of lig-440 ands and is encoded by the EPHB3 gene. Ephrin receptor-ligand interactions at cell surfaces mediate numerous dy-441 namic developmental processes, such as migration, proliferation, and cell fate determination, mediated through 442 changes to cytoskeletal dynamics [49], and can also act as tumor suppressor genes [50]. Acknowledgments: We would like to thank the staff at the Greenwood Building (Roslin Institute) for chicken hus-504 bandry, Aara Patel and Jane Sowden (UCL) for sharing RNAseq data, and James Prendergast (Roslin Institute) for help 505 with Bioinformatics. This research was made possible through access to the data and findings generated by the 506 100,000 Genomes Project. The 100,000 Genomes Project is managed by Genomics England Limited (a wholly owned 507 company of the Department of Health and Social Care). The 100,000 Genomes Project is funded by the National Insti-508 tute for Health Research and NHS England. The Wellcome Trust, Cancer Research UK and the Medical Research Council 509 have also funded research infrastructure. The 100,000 Genomes Project uses data provided by patients and collected 510 by the National Health Service as part of their care and support.