Having just returned from an EMBO organised conference on non-coding RNA (ncRNA), I am bursting with facts about non-coding RNA which may be of interest (or not) to some of you. ncRNAs are RNAs that have been transcribed from the genome but do not get translated into protein. There is a burgeoning interest in them mainly because they represent a substantial proportion of the genome as compared to coding RNA. Specifically, ncRNAs make up ~80% of the genome and ~95% of the transcriptome (source: this paper). Here’s what you need to know about ncRNAs:
1. How many types of ncRNAs are there? So far, 8, maybe 9. These include transfer RNAs (1) and ribosomal RNAs (2) which were well-characterized alongside messenger/coding RNA in the 1970s and partake in protein synthesis. Small ncRNAs, that include microRNAs (3), siRNAs (4) and germline specific piwi-RNAs (5), range between 21 to 35 nucleotides. Their discovery revealed a novel system termed RNA interference, used by organisms to regulate gene expression. In essence, the small ncRNAs bind to longer protein-coding RNAs, leading to their subsequent degradation or translation inhibition. Other small nuclear-based RNAs termed snoRNAs or U-RNAs (6) are also classified as ncRNAs and are involved in RNA editing or splicing. The more recent forms of ncRNAs include long ncRNAs (7) which are 200 nucleotides or longer, and enhancer RNAs or promoter-associated RNAs (8), which range from 100-9000 nucleotides and are transcribed from enhancer regions on DNA. There is also a new kid on the block called circular RNAs (9) but its a little uncertain if they are non-coding as some do get translated into protein.
Graphical representation of the transcription in mammals from Mattick and Manukin, Hum Mol Gen, 2006. Excerpt – ” The area of the box represents the genome. The area of large green circle is equivalent to the documented extent of transcription, with the darker green area corresponding to that on both strands. It should be noted that these estimates may and probably will increase as more information comes to hand. The function of most of these transcripts is unknown. CDSs are protein-coding sequences, and UTRs are 5′- and 3′-untranslated sequences in mRNAs. The dots indicate (and in fact overstate) the proportion of the genome occupied by known snoRNAs and miRNAs.”
2. How did we not notice them before? This is mainly in relation to long ncRNAs (lncRNAs) whose presence was only confirmed in 2005. They were formerly thought to be junk or “transcriptional noise” because of their low abundance and low sequence conservation. Furthermore many of them were missed in sequencing analysis due to the use of polyA-tail based isolation techniques since a good proportion of them (~40%) are non-polyadenylated.
3. Are they really that important? Well, yes. The smaller ncRNAs such as miRNAs, piRNAs and snoRNAs already have established functions in gene expression regulation and RNA editing. What’s more unclear is the role of lncRNAs. Some well-studied lncRNAs such as XIST, H19 and HOTAIR induce epigenetic silencing of genomic loci, namely through recruitment of chromatin modifiers that affect access to transcriptional machinery. However a vast majority of lncRNAs have undefined functions and some groups argue that as much as 90% of transcription by RNA polymerase II can be random. Furthermore, there is some evidence that transcription events tend to “ripple” into neighbouring regions giving rise to “leaky transcription” that may give rise to ncRNAs.
However, based on numerous reports and from what I garner from the conference, the highly-regulated expression of lncRNAs, both spatially and temporally, argues that at least some of them are important for specific functions. This is confirmed by observations that their expression closely correlates and in some cases is required for certain phenotypes including embryonic development, pluripotency, cell cycle progression, cell proliferation and death, and even motor function.
From an evolutionary perspective, despite the low sequence conservation of lncRNAs across organisms, sequence conservation of lncRNA promoters are comparable if not higher than protein-coding RNAs. ncRNAs are thought to have less stringent restraints on sequence than coding RNAs as they do not have to maintain correct reading frames and rely more on their secondary structures (which rely more on short stretches of sequences) for normal function. Indeed, high sequence conservation was seen for short defined regions of lncRNAs. Xist for example, has a well-defined function across species, and possesses short domains with high sequence conservation yet overall sequence conservation is low. lncRNAs may also be under rapid evolution, where due to the aforementioned less stringent species requirements, they undergo sequence mutations a lot more rapidly. Its interesting that organism complexity shows little correlation with genome size but massive correlation with levels of lncRNAs.
4. How do people study them? RNA-sequencing is commonly used to identify lncRNAs but this is hampered by the low abundance of lncRNAs, which often leads to erroneous transcript reconstruction. A recent method of RNA-captureseq, developed by the Mattick lab in Sydney, utilizes DNA probes to enrich RNA derived from certain genomic regions of interest (i.e. containing sites where ncRNAs are derived) followed by RNA sequencing which provides greater depth and coverage. You can read more about it here.
That’s about the lowdown for now. There is substantial evidence that ncRNAs play important roles in disease but the field is still pretty new. So one has to wait a while before they start to provide new avenues for therapeutic intervention.