I am currently reading “The Gene” by Siddharta Mukherjee, a wonderfully-written book on the history of our genes. Mukherjee does well to capture the key events in how we have come to understand (and manipulate) genetics today. Not only focusing on the famous names in science but also highlighting the lesser known characters who played a large role in how history unfolded. It’s an eye-opening and interesting read for scientists and non-scientists alike and increases your respect for a simple little molecule that has been and continues to be the subject of intense study and debate.
DNA or deoxyribonucleic acid came to be discovered as the transmitter of genetic information only after much effort was spent disproving that other molecules were not doing the job. As compared to proteins, it was seen to be a “stupid” molecule due to its simple composition of four bases – adenine (A), thymine (T), guanine (G) and cytosine (C). Only after its structure was revealed in Rosalind Franklin’s X-ray crystallography photos and deciphered by Watson and Crick, was the simplicity and elegance of the molecule fully appreciated. The specific pairing of A-T and G-C bases via hydrogen bonds (termed Watson-Crick pairing) while sitting opposite each other in a double-stranded helix formed the basis of the genetic code. Hydrogen bonds can be broken with much lower energy, which enables the two strands to separate easily, allowing one strand to serve as a copy in the synthesis of another. This allows the genetic code to be transmitted to daughter cells or translated into other molecular forms such as RNA (ribonucleic acid) and protein, the latter being the so-called workhorses that carry out the encoded function of a gene.
More to DNA than meets the eye
More secrets about DNA were revealed following Watson and Crick’s discovery. The pair based their findings on a particular form of DNA called B-DNA which is the predominant form that DNA takes in water. Further structures – supercoiled, A-DNA (occurs in RNA or RNA/DNA duplexes), Z-DNA (left-handed helix) and H-DNA (triple-helix) were later discovered. See here for a more detailed write-up on some of them. Apart from Watson-Crick base-pairing, DNA can also form hydrogen bonds called Hoogsteen base pairs. See below for the difference:
“Shown are WC and HG A•T and G•C bps with highlighted key geometrical differences. Heavy atoms involved in HG hydrogen bonds (in red), syn χ angle (in orange) and constricted C1′–C1′ distances (in green). Average C1′–C1′ distances from the survey are shown for each base-pair type.” Comment and image from: Zhou, H., Hintze, B. J., Kimsey, I. J., Sathyamoorthy, B., Yang, S., Richardson, J. S., & Al-Hashimi, H. M. (2015). New insights into Hoogsteen base pairs in DNA duplexes from a structure-based survey. Nucleic Acids Research, 43(7), 3420–3433. http://doi.org/10.1093/nar/gkv241
Hoogsteen base pairs form when one of the bases has flipped 180 degrees into its so-called syn transition, binding to another base in the anti transition. This results in the two bases being closer together and the angle around the glycosidic bond (χ) being altered which modifies the DNA strand structure. First discovered by Karst Hoogsteen in 1959, it is mostly observed in protein-DNA complexes, damaged or modified DNA and in stretches of AT repeats at an overall incidence of ~0.3% of all base-pairs.
G quadruplexes – what are they?
Hoogsteen base pairing allows DNA to take on more complicated structures such as triple helices or even quadruplex structures, increasing the functional diversity of DNA. G quadruplexes are formed when two or more G quartets/tetrads are stacked on top of each other. One G quartet consists of four guanines hoogsteen base-paired with each other in a circle, surrounding a stabilizing positively charged ion such as Na+ or K+ (but not Li+). The guanines can be from the same strand (intramolecular) or different strands (intermolecular), associating with each other via folding of the DNA/RNA. Once formed, G quadruplexes are thermodynamically stable (even more than B-DNA in vitro), and their conservation especially among mammalian species indicates they may perform important biological functions.
“G-quadruplex structures are polymorphic and can be sub-grouped into different families, as for example parallel or antiparallel according to the orientation of the strands and can be inter- or intramolecular folded. The type of structure depends on the number of G-tracts in a strand.” Image and comment from: Rhodes, D., & Lipps, H. J. (2015). G-quadruplexes and their regulatory roles in biology. Nucleic Acids Research, 43(18), 8627–8637. http://doi.org/10.1093/nar/gkv862
G quadruplexes have been studied by gel electrophoresis, nuclear magnetic resonance, chromatography, and mass spectrometry and can even be specifically bound by antibodies and drugs (pyridostatin, PDS). They are found enriched on the 3′ terminal regions of telomeres, and inhibit the activity of telemorase, preventing telomere extension. They are also found on 3′ and 5′ untranslated regions in DNA, with more than 50% of genes containing G quadruplexes in their promoters – interestingly more so on proto-oncogenes than house-keeping or tumour suppressor genes. This indicates they may have some role in gene regulation, which was supported by gene expression changes being induced by G quadruplex stabilizing ligands and by associations of G quadruplexes with proteins involved in transcription and replication. The presence of G quadruplexes are largely associated with gene suppression and their additional effects on telomeres makes them a prime target for cancer treatment.
A recent paper by Shankar Balasubramaniam’s group has characterized the presence of G quadruplexes in purified polyadenylated (polyA) RNA isolated from human HeLa cells with the use of reverse transcriptional stalling in presence of Li+ vs K+ vs K+/PDS, followed by deep sequencing:
“The BASP1 (chr5:17,276,185-17,276,254) example here shows a drop in coverage (from 3’ to 5’ direction) in K+ and K++PDS conditions due to rG4 formation, whereas coverage is generally uniform in Li+” Image and Comment from Kwok, C. K., Marsico, G., Sahakyan, A. B., Chambers, V. S., & Balasubramanian, S. (2016). rG4-seq reveals widespread formation of G-quadruplex structures in the human transcriptome. Nat Meth, 13(10), 841–844. Retrieved from http://dx.doi.org/10.1038/nmeth.3965
Utilizing 2 facts:
- Reverse transcriptase is stalled by G quadruplexes
- G quadruplexes are stabilized by K+ and not Li+, with this stability being enhanced even more by pyridostatin.
they attached 3′ adaptors to the ends of RNA transcripts and subjected them to folding under different conditions as depicted above, and carried out reverse transcription which produced cDNA of different lengths that they subsequently characterized by PCR and next-generation-sequencing.
- RNA G quadruplex (rG4) structures differed between each other and consisted of 4 main types – G3L1-7G3 (the so-called canonical structure consisting of 3 G tetrads connected by L, loops of varying nucleotide lengths), long loops (the same as previous but with longer loops), bulges (same as G3L1-7G3 but with some DNA bulges within the loops, and 2 quartets (2 G tetrads connected by loops).
- Some rG4s were found in transcripts linked to cancers and neurological diseases, such as PIM1 and APP
- For my non-coding RNA friends, rG4s were detected in several long noncoding RNAs (lncRNAs), including MALAT1 and NEAT1.
- 3′ UTRs contained majority of G quadruplexes (61.7%) compared to coding sequences (CDS) or 5′ UTRs with greater enrichment in UTRs than CDS.
- Enrichment of G quadruplexes near miRNA target binding sites
- Enrichment of G quadruplexes near polyadenylation sites
- G quadruplexes altered RNA folding conformations
- G quadruplexes were found associated with genes involved in transcription, chromatin organisation and biosynthetic processes.
- There was strong cross-species conservation of G quadruplex regions in genes related to transcription and RNA processing
Only drawback of this paper is probably that it was done in vitro on HeLa cells which is a poor representation of what may be going on physiologically. However, it highlights some novel means by which G quadruplexes may regulate gene expression i.e through affecting miRNA binding or Argonaute association and influencing polyadenylation.
Some problems remain with targeting G quadruplexes for disease treatment. Namely how to achieve gene specificity given they are rather widely distributed. So far only one G quadruplex ligand (quarfloxin) reached PII clinical trials for neuroendocrine/carcinoid tumours but dropped out due to bioavailability issues. More structural investigations may be required to develop ligands that bind G quadruplexes with greater affinity. However, we have come a long way in showing how a “stupid” molecule like DNA can become devastatingly complex. And with their increasing ease of production, sequencing, manipulation and ability to assume complex structures that can be targeted by drugs, they have become intensely more interesting that proteins ever have.
Murat, P., & Balasubramanian, S. (2014). Existence and consequences of G-quadruplex structures in DNA. Current Opinion in Genetics & Development, 25, 22–29. http://doi.org/10.1016/j.gde.2013.10.012
G-Quadruplexes: A web of Functional Complexities by Janelle Vultaggio
Balasubramanian, S., Hurley, L. H., & Neidle, S. (2011). Targeting G-quadruplexes in gene promoters: a novel anticancer strategy? Nature Reviews. Drug Discovery, 10(4), 261–275. http://doi.org/10.1038/nrd3428
Banner image “Fairy DNA by kyz”