MORE THAN FORTY YEARS OF NUCLEIC ACID STRUCTURAL SCIENCE

As scientists who have worked with Stephen Neidle over many years and stages of his career, we present our perspective of his contributions to nucleic acid structural science. We trace some of the highlights of his research on nucleic acid drug interactions and the unique insights about the importance of hydration.


Introduction
The authors of this review are fortunate to have collaborated with Stephen Neidle at various stages of his career.Helen Berman met Stephen Neidle in the 1970s at a meeting when they were both young faculty members.She was at the Institute for Cancer Research in Philadelphia, and he was in the Biophysics Department at King's College London -the place where the data for the double helix was collected by Rosalind Franklin and Maurice Wilkins 1 and the place where Struther Arnott determined very high-quality structures of nucleic acid fibers. 2 At that time in her career, Berman had a strong interest in nucleic acids having worked on a small fragment of RNA with colleagues at MIT and Columbia. 3She was delighted when Stephen proposed to visit her laboratory to work on a nucleic acid structure.This visit marked the beginning of an exciting collaboration on nucleic acid-drug complexes that lasted for many years.
Gary Parkinson became aware of Stephen Neidle's work while he was training in Berman's laboratory at Rutgers University.When he interviewed for a post in Neidle's group, he was excited to meet a well-established research team, fully resourced with the latest x-ray crystallography facilities, revealing the importance that Neidle placed on the use of structural data to support the development of small molecule molecular entities within drug discovery.Thus, began a long and fruitful collaboration on quadruplex DNA/RNA structures.
In order to understand the fundamental processes in which nucleic acids were involved including replication, transcription, and translation, it is necessary to know the structures.In the 1970's it became possible to synthesize defined sequences of DNA and RNA and thus be able to determine their structures at near atomic resolution.When Neidle entered the field in the 1970's, he had a particularly strong interest in how drugs interact with nucleic acids and how they disrupt the processes in which they are involved.From the very beginning Stephen focused his research in this area.Over the forty years of his research his group determined 119 structures, most of which are duplex and quadruplexes DNA's, with more than 60% of these structures bound to drugs.By every standard Stephen Neidle is a leader in the field and is highly cited with two papers on quadruplexes having more than 2000 citations each.
In this review, we try to capture the essence of Stephen Neidle's many contributions to nucleic acid structural science, as summarized in Fig. 1.We focus on exemplars (Table S1) of his broad ranging approach to understanding nucleic acid drug interactions as a method to develop new therapeutic agents.

Double helical DNA-drug complexes
In the 1970s Berman worked with Neidle on small fragments of nucleic acids that were complexed with proflavines (Fig. 2).The structure of a complex between CpG and proflavine (CG_PF) presented large technical challenges. 4Because standard structure determination methods did not work, they used high-resolution data to determine the positions of the phosphorus atoms, as had been done in the determination of the structure of UpA. 3 To their great surprise, the C3' endo -C2' endo patterns of sugar pucker that had been seen in other similar complexes were not observed.Instead, both sugars exhibited the C3' endo conformation.In the structure of the complex between dCpG and proflavine (dCG_PF) one chain had mixed sugar pucker and the other had all C3' endo pucker.Based on these structures, they demonstrated that upon intercalation, the conformation angles around the C5'-O5' bond and the glycosidic bond at the 3'end of DNA increase.These structures also demonstrated the versatility of proflavine interactions.In addition, the dCG_PF structure showed an intriguing pattern of water structures that had never been seen before 5,6 (Fig. 3).It contained interlocking pentagons with a semi clathrate arrangement.This structure became the basis for Monte Carlo simulations to further understand what was at the time an unusual hydration pattern. 7The importance of hydration in drug nucleic acid complexes has subsequently been verified in other structures.

Fig. 2. A)
The crystal structure CpG proflavine. 4 One proflavine is intercalated and two are stacked above and below the base pairs.There are also two sulphate groups bound to the intercalated proflavine.B) The crystal structure of dCpG proflavine.One proflavine is intercalated and one is stacked.Fig. 3.The extended water network in the crystal structure of dCpG proflavine. 5The unique waters in the crystal are shown in red.This is the earliest example of pentagon water networks in a nucleic acid structure.
The Neidle laboratory has produced an extensive body of research on longer DNA sequences alone and complexed with drugs (Fig. 4).For example, in studies of how drugs bind to the minor groove, they examined how the anti-trypanosomal agent berenil, propamidine an analog of pentaamine which has anti HIV activity, and Hoechst 33258 whose analogs have been investigated as possible anti cancer agents, bind to two different sequences of DNA: dCGCGAATTCGCG (A2T2) and dCGCAAATTTCGC (A3T3).The structures of these sequences bound to berenil, 8, 9 propamidine 10 and Hoechst 33258 11 were determined.The fine details of the DNA structures were examined as were the interactions of the drugs.In each case, the drugs bind differently to the different sequences.For example, berenil binds asymmetrically to A2T2 with water mediating interactions (Fig. 5a), whereas berenil binds symmetrically with A3T3 (Fig. 5b).In the case of the propamidine complexes the binding to A3T3 is asymmetric (Fig. 6a) and that to A2T2 is symmetric (Fig. 6b).Similarly, the binding of Hoechst 33258 to A2T2 is symmetric and to A3T3 asymmetric. 12These studies support the concept that the DNA is not altered by the drug.  In this case the berenil is bound symmetrically in the minor groove and there is no water mediating the binding.To understand how heterocyclic diamines interact with DNA, a series of related drugs were investigated.One, DB 884, (2GYX) 13 showed particularly strong binding to DNA.Extensive biophysical studies including calorimetry, SPR biosensor data, foot printing analysis, and CDspectrometry were combined with x-ray crystallographic studies to explain the very strong binding.It was shown that the drug covers six base pairs with the central pyrrole of the drug forming an unprecedented central hydrogen bond that explains the very favorable binding enthalpy.In another remarkable set of studies of three minor groove binders to the A2T2 sequence (3U08, 3U0U, 3U05), 14 an extensive water network at the boundary of the AATT sequence and the G nucleotide (Fig. 7).It is posited that water adds the extra stabilization for the drug interaction.This structure comes full circle with the results first seen in the 80's in the dCG_PF complex which also demonstrated extensive networks of water.
In a more recent work, designed to understand GG mismatches, complexes were made of TTGGCGAA and two drugs, chromomycin and actinomycin. 15The chromomycin-containing complex shows the diversity of GG mismatches; in the asymmetric unit three helices form a pseudo continuous double helix with each helix incorporating a different GG mismatch (Fig. 8).The actinomycin-bound DNA complex forms a zig zag right-handed helix with G's flipping out.These studies highlight the versatility of guanine interactions as had been so well exemplified by the groundbreaking quadruplex research.Fig. 8.The versatility of GG mispairs in a complex between chromomycin and DNA (6J0I). 15The asymmetric unit of the crystals contains 3 DNA duplexes forming a pseudo continuous double helix.

Quadruplexes (G4Q) and their drug complexes
By the mid 1990s the Neidle group had synthesized an extensive library of ligands that targeted a variety of DNA motifs.This was to be expanded when a family of amidoanthraquinone derivatives was shown as the first non-nucleoside telomerase inhibitors of G4 quadruplexes (G4Q), where the anthraquinone stack and stabilize the G4Q. 16The integrated approach taken by an international multidisciplinary team that Stephen Neidle had help to assemble, included molecular modelling, x-ray-derived data analysis on relevant telomeric sequences d[TG4T), 1-D NMR data on human telomeric sequences d[T2AG3T], biophysical thermal denaturation experiments and in-vitro assays.This approach put the Neidle group at the forefront of G4Q targeted structure-based drug discovery (SBDD) efforts for over 20 years (Fig. 4).
The late 90s was an opportune time to advance G4Q research.G4Q was validated as a new molecular target within telomeres and provided a novel therapeutic pathway to not only inhibit telomerase function but also interfere with chromosomal maintenance.While NMR methods had provided important structural data on telomeric G4Qs for structure-based drug design including the core human telomere sequence, it did not deliver the full picture of the diversity of secondary structures available to telomeric DNA and RNA.The structural determination of a parallelstranded G4Q (139D) 17 folded as an intermolecular quadruplex confirmed the importance of the G-tetrad core in providing the necessary conformational stability while retaining the characteristic grooves and possible hydration structures seen in duplex DNA.Additionally, it highlighted the importance role of the K + stabilizing the central-tetrad core.The human telomeric d(AG3(T2AG3)3) sequence folded in the presence of Na + revealed the TTA connecting loops in an intramolecular G4Q (143D) 18 context, and revealed additional hints towards the key molecular pharmacophores needed for the rational design of small molecular entities targeting these sequences.The antiparallel arrangement of the DNA backbone results in a mix of wide and narrow groove widths that allow discrete hydration structures and opportunities for design of selective binders.However, these structures did not provide any clue towards possible alternative folded topologies that would ultimately be needed to fully explain the quantitative structure-activity relationships (QSAR) ligand data.There was a gap between the molecular modeling data based on the available NMR structures, and the data derived from the ligand libraries that specifically targeted human telomeric sequences.It was not until new crystallographic models became available in 2002 that this would be understood.Thus, Neidle and his collaborators relied on modeling using NMR-derived models and data, along with hit identification, assay development, high-throughput screening, and QSAR.In the meantime, the goal of obtaining structural data of the molecular target with ligands bound continued in the Neidle's group with the characterization of the structure of the ligand 1,4-bispiperidino amidoanthraquinone, a molecule designed to target non-duplex nucleic acids here in complex with telomeric G4Q putative forming sequence.The ligand was observed bound to a G4Q in which four independent strands folded as an intermolecular parallel stranded quadruplex d[(TG4T)]4. 19However, these crystals did not provide diffraction data quality required for a full understanding of the molecular interactions at atomic resolution.In addition, this sequence lacked the all-important connecting loops, crucial within the context of the human telomeric sequence d(TTAGGG).
When the Neidle group completed its move to the Chester Beatty Laboratories at the new ICR site, Fulham Road in 2000, they expanded their crystallographic investigations in support of drug discovery efforts.Newly developed ligand scaffolds were explored, and large libraries were generated that provided a wealth of new molecular entities to target telomeres; these included 2,6disubstituted amidoanthracene-9,10-dione 20 and 3,6-disubstituted acridine chromophores.Perhaps the most significant advance was the generation of a family of 3,6,9-trisubstituted acridine ligands where the substitution at the 9 position provided the necessary selectivity over duplex DNA. 21The structure-activity relationships of these and other molecules are summarized in several important papers [22][23][24] .Fig. 9. Crystal structures of G4Qs determined in the Neidle Lab.Schematic DNA and RNA with ligands drawn as stick representations (ViewerLite). 25A) An intramolecular G4Q formed from one strand containing four human telomeric repeats d(TTAGGG) and folded into three stacked tetrads with connecting chain-reversal loops.B) An intermolecular G4Q bound to a tetrasubstituted naphthalene diamide quadruplex binding ligand.Ligands are able to stack on top of the external 5' or 3' ends tetrads without modifying the overall topology as the connecting chainreversal loops sit external to the G4Q.C) A bimolecular G4Q formed from two RNA strands containing a human sequence containing two r(UUAGGG) (TERRA) repeats resulting in only two UUA loops.D) An intermolecular G4Q formed from Oxytricha telomeric DNA containing two d(TTTTGGGG) repeats folded into four stacked-tetrads with diagonal loops bound to a disubstituted aminoalkylamido acridine compound.The diagonal loops are opened-up in the presence of the di-substituted ligands, allowing the insertion of the ligands and the stacking of the acridine core over the G-tetrad similar to intercalation observed with dsDNA.E) The Oxytricha telomeric DNA bound to three daunomycins.The absence of connecting backbone loops and their steric constraints provides additional opportunities for ligand binding through p stacking over the G-tetrad core.
By 2002, the ground-breaking publication of two quadruplex crystal structures were reported revealing an unexpected and completely novel G4Q topology (1K8P, 1KF1). 26The structures annealed in the presence of K + were completely different from the NMR derived models annealed in presence of Na + that had been used as the templates for modeling ligand interactions and provided the first glimpse of a G4Q of human telomeric sequence in a crystalline lattice.The topology of these structures was unexpected in that they folded in an all-parallel arrangement that necessitates connecting chain-reversal loops and with three equal groove widths.The three parallel stacked G-tetrads and the novel chain-reversal loop arrangements (first described as propeller loops) answered many of the questions raised by the previously determined structural activity relationships (Fig. 9).Although the structure determinations provided the basis for SBDD, a model that included ligands was the true goal.A renewed effort was undertaken to generate crystallography-derived structural models of the interactions of ligands to the DNA G4Qs.The Neidle group reported on two Oxytricha telomeric DNA quadruplex structures (1JRN, 1JPQ) 27 that differed from the structures derived by x-ray methods using similar sequences.The crystallographic structural arrangements and topologies of the Oxytricha telomeric DNA were further validated by the first crystal structure of the quadruplex DNA-drug complex (1L1H) 28 .In this structure, the diagonal loops are opened-up and the ligands are accommodated in an endstacked conformation not easily predicted by molecular modeling.It became apparent that the plasticity of the connecting loops is an important element in ligand interactions and in controlling groove dimensions.The next ligand G4Q complex confirmed the terminal G-tetrads as key ligand binding sites for the polyaromatic core of the ligands but the sequence lacked the important connecting loops.The ligand, daunomycin is observed stacked above the unconstrained terminal G-tetrads formed from four parallel strands (1O0K) 29 .Within the Neidle group the focus remained on understanding the connecting loops and a structural analysis was undertaken on the consequence of changing loop lengths between stacked G-tetrads.It was observed that the sequences (G4T3G4) folded as bimolecular G4Qs with antiparallel strands in this case linked with lateral loops rather than diagonal loop (2AVJ, 2AVH). 30e next breakthrough was the determination of the structure of a ligand, TMPyP4 bound to G4Q containing the human telomeric sequence (2HRI). 31The observation of the sequence folded in an all-parallel arrangement confirmed this crystallographically-determined topology as an important molecular target.This intermolecular all-parallel arrangement of the telomeric DNA sequence is stabilized in the presence of K + in contrast to the NMR determined basket-type intramolecular Gquadruplex topology, first shown to form in the presence of Na + ions of the same sequence 18 .At this time the topological diversity of NMR derived G4Q structures containing human telomeric sequences was becoming apparent with the publications of the hybrid-1 topology folded here in the presence of K + 32 and the hybrid-2 topology. 33In these NMR experiments the topological arrangements of the G4Qs could be selected for based on their sequence length and on the terminal sequences and on the selection of metal cations used to stabilize the G4Qs.This is different from single all-parallel G4Q topology as determined by crystallographic methods, observed alone or in the presence of ligands.The wealth of crystallographic data being generated by the Neidle group at this time affirmed the group's position in the field and established this alternative folded topology for the human telomeric sequence as a viable model for the design of highly selective ligands.The importance of this all-parallel topology was further supported by the work on the two G4Q structures containing human telomeric DNA bound to a ligand (3CE5 34 , 3CDM 35 ).In the crystal lattice both these sequences fold as bimolecular parallel-stranded human telomeric quadruplex with 3CE5 shown in complex with an important 3,6,9-trisubstituted acridine molecule, BRACO19.In 2004, BRACO19 and AS1410 were important lead molecules that targeted telomeres.They were taken forward for development by the company Antisoma Plc as potential first in class agents.The crystal structures confirmed that the selectivity and specificity of these G4Q-targeting agents are consistent with a model in which DNA hybridization to the RNA templating region of the holoenzyme of telomeres is prevented.New molecular entities were also being developed in the Neidle laboratory, such as the naphthalene diamides.The successful cocrystallization of these molecules with human telomeric sequences (3CCO, 3CDM) 35 expanded the understanding of G4Qs as flexible molecular targets.The adaptability of the chain-reversal loops of the G4Qs provides the necessary conformational space required for selective binding, while still retaining the core stacked tetrads and the associated rigidity, thus redefining the quadruplex-drug recognition interfaces.In all cases this molecular target retains the context of an all parallel G4Q folded motif either as a bimolecular or intramolecular folded arrangement, each containing four repeats of human telomeric hexamer sequences.Importantly, the chain-reversal loops topology keeps the external tetrad open for other G4-tetrad aromatic-aromatic interaction, as seen in the packed crystalline lattice or for ligands to bind an interact through p stacking.In solution the formation of alternative topologies such as hybrid-1 and -2 result in the nucleotides in the linking loops stacking and interacting together over these external G4-tetrads, sterically restricting ligand binding to the external G4-tetrad and reducing the opportunities for p stacking.
In the early 2000s, concurrent with the research on targeting telomeres, both the Neidle group and the Balasubramanian group independently undertook comprehensive bioinformatics studies of the prevalence of quadruplexes in the human genome. 36,37 hey showed that putative G4Qs forming sequences are not randomly distributed; they are focused within promoters of genes involved in replication and cancer related genes.This, combined with the understanding that many helicases involved in unwinding G4Q topologies are compromised, led to structural investigations into promoter regions and the targeting of DNA sequences as therapeutic targets to modulate cell cycle progression and induce programmed cell death.The first structural determination in collaboration with Anh Tuân Phan and Dinshaw Patel was carried out with a monomeric putative G4Q folding sequence identified within the c-kit promoter region and published in 2007 (2O3M) 38 providing the first solution-based representation of this promoter target.These structures showed similar structural features seen in the human telomeric sequences with chain-reversal loops.Indeed, this relationship had already been observed in the NMR determined c-myc promoter structure in 2005 (1XAV). 39That structure showed an uncanny resemblance to the human telomeric crystallography-determined structure that folded as an all parallel G4Q with three chainreversals. 26Subsequent collaborations with Shanker Balasubramanian brought together a G-rich sequence within the c-kit oncogene promoter that forms a parallel G-quadruplex having asymmetric G-tetrad dynamics determined by NMR methods (2KQG and 2KQH) 40 .Subsequently, the equivalent crystal structure of a brominated c-kit-1 proto-oncogene promoter quadruplex DNA (3QXR) 41 was determined by the Neidle group.The topologies of the promoter region determined in solution by NMR agree with the crystallography-derived structures; they both display double-chain-reversal loops, lateral loops, and a stem-loop.By 2014 the determination of a B-raf dimer DNA quadruplex was reported (4H29) 42 using a sequence taken from the promoter region of the BRAF gene.
Work continued targeting telomeres and in 2009, a structural understanding of ligand interaction with G4Qs was advanced using the telomeric Oxytricha containing sequences and resulted in the publication of a suite of ligand/DNA structures (3EM2, 3EQW, 3ERU, 3ES0, 3ET8, 3EUM, 3EUI). 43The G4Q Oxytricha nova telomeric sequences-based structures fold as bimolecular antiparallel-stranded quadruplexes and are in complex with a family of 3,6-disubstituted acridines: BSU-6038; BSU-6042; BSU-6045; BSU-6048; BSU-6054; BSU-6066.This work progressed with the publication of a bimolecular anti-parallel-stranded Oxytricha Nova telomeric quadruplex in complex with a 3,6-disubstituted acridine ligand containing bis-3-fluoropyrrolidine end side chains (3NYP, 3NZ7). 44In 2012, the Neidle group reported the first crystal structure of a human telomeric G-quadruplex DNA bound to a metal-containing ligand (a copper complex) (3QSC). 45y 2013 several crystal structures of an intramolecular human telomeric DNA G-quadruplex bound by the naphthalene diimide were determined.Sequences included a bimolecular forming G4Q and intra molecular G4Qs, bound to MM41, BMSG-SH-3, (3UYH, 4DAQ, 4DA3). 46new thread of investigation was developing in the Neidle group in the late 2000s, when it became known that in mammalian cells, non-coding DNA telomeric repeats are also transcribed into guanine-rich RNA sequences r(GGGUUA), telomeric repeat-containing RNA (TERRA).This prompted structural investigations into transcribed RNA based on their potential to fold as G4Qs.In 2010, the Neidle group published work on the structure determination of a parallel-stranded G4Q formed from the human RNA telomeric sequence, thus revealing the importance of the propeller-like folds, here with connecting UUA loops (3IBK).47 This ground-breaking structure provided the first insight into a G4Q RNA containing a human sequence (TERRA).The RNA structure displays the same overall topology and structural arrangement as its DNA analogue, with the expected C3′-endo sugar puckers along with a modified hydration structure.Shortly thereafter, the first crystal structure of a telomeric RNA G-quadruplex complexed with an di-substituted acridine-based ligand was determined (3MIJ), 48 along with the same disubstituted acridine ligand complex bound to human telomeric DNA (3QCR).These structures revealed the adaptability of the UUA loops within the chain-reversal loops, where the O2' hydroxyl groups of the ribonucleotide sugars play a central role in defining an RNA specific binding environment for the acridine molecule complexed to the RNA quadruplex along with the additional stability associated with all-anti glycosidic within the G-tetrads.Coming full circle from his original studies of dCG_PF which showed extensive water networks in a drug nucleic acid complex, 5 Neidle has recently analyzed water networks in G-quadruplex crystal structures and structured waters that mediate small molecule binding to G-quadruplexes.49,50 These analyses of high-resolution crystal structures with varying topologies that have different groove widths and connecting loop arrangements, show extended spines of hydration that are distinct from A/T-rich regions in duplex DNAs.It is anticipated this research will be important in identifying specific networks of hydration that small molecule groove binding molecules can exploit to mediate key interactions between ligand and DNA (Fig. 10).
The many years of research on quadruplexes and their interactions with drugs has recently led to a licensing agreement for a new and first in class treatment for pancreatic cancer -QN-302 one of the family of naphthalene diamides initially developed at UCL SoP targeting G4Qs.Neidle's goal of confirming G4Qs as viable therapeutic targets may soon be reached.

Fig. 1 .
Fig. 1.Pie chart showing diversity of nucleic acid containing structures determined by the Neidle's group.The chart represents 118 PDB entries.

Fig. 4 .
Fig. 4. A) Publications reporting research undertaken involving the Neidle group related to G4Qs since 1997.B) Individual structures deposited at the PDB involving the Neidle group since 1988.The black bars represent all structures submitted and the orange bars represent G4 quadruplex containing structures.

Fig. 5 .
Fig.5.A) Berenil bound to the CGCGAATTCGCG (A2T2) sequence (2DBE).8A water molecule (shown in magenta) mediates the binding.B) Berenil bound to CGCAAATTTCGCG (A3T3) (1D63).9In this case the berenil is bound symmetrically in the minor groove and there is no water mediating the binding.
Fig.5.A) Berenil bound to the CGCGAATTCGCG (A2T2) sequence (2DBE).8A water molecule (shown in magenta) mediates the binding.B) Berenil bound to CGCAAATTTCGCG (A3T3) (1D63).9In this case the berenil is bound symmetrically in the minor groove and there is no water mediating the binding.

Fig. 7 .
Fig. 7. Minor groove drug binding of DB 1963 (cyan) to DNA (3U08) 14 showing an extensive water network that helps stabilize the complex.The conserved water molecules are shown in magenta.The dotted lines represent hydrogen bonds.

Fig. 10 .
Fig. 10.Water cluster networks in the 7KLP crystal structure of an intramolecular G4Q formed from one strand containing four human telomeric repeats and folded into three stacked-tetrads with connecting chain-reversal T-T-A loops.Waters are shown as dark blue spheres, (A) in the grooves