Only about 15 percent of the human genome consists of protein-coding genes, but in recent years scientists have found that a surprising amount of the junk, or intergenic DNA, does get copied into RNA — the molecule that carries DNA’s messages to the rest of the cell.
Scientists have been trying to figure out just what this RNA might be doing, if anything. In 2008, MIT researchers led by Institute Professor Phillip Sharp discovered that much of this RNA is generated through a process called divergent expression, through which cells read their DNA in both directions moving away from a given starting point.
In a new paper appearing in Nature on June 23, Sharp and colleagues describe how cells initiate but then halt the copying of RNA in the upstream, or non-protein-coding direction, while allowing it to continue in the direction in which genes are correctly read. The finding helps to explain the existence of many recently discovered types of short strands of RNA whose function is unknown.
“This is part of an RNA revolution where we’re seeing different RNAs and new RNAs that we hadn’t suspected were present in cells, and trying to understand what role they have in the health of the cell or the viability of the cell,” says Sharp, who is a member of MIT’s Koch Institute for Integrative Cancer Research. “It gives us a whole new appreciation of the balance of the fundamental processes that allow cells to function.”
Graduate students Albert Almada and Xuebing Wu are the lead authors of the paper. Christopher Burge, a professor of biology and biological engineering, and undergraduate Andrea Kriz are also authors.
DNA, which is housed within the nucleus of cells, controls cellular activity by coding for the production of RNAs and proteins. To exert this control, the genetic information encoded by DNA must first be copied, or transcribed, into messenger RNA (mRNA).
When the DNA double helix unwinds to reveal its genetic messages, RNA transcription can proceed in either direction. To initiate this copying, an enzyme called RNA polymerase latches on to the DNA at a spot known as the promoter. The RNA polymerase then moves along the strand, building the mRNA chain as it goes.
When the RNA polymerase reaches a stop signal at the end of a gene, it halts transcription and adds to the mRNA a sequence of bases known as a poly-A tail, which consists of a long string of the genetic base adenine. This process, known as polyadenylation, helps to prepare the mRNA molecule to be exported from the cell’s nucleus.
By sequencing the mRNA transcripts of mouse embryonic stem cells, the researchers discovered that polyadenylation also plays a major role in halting the transcription of upstream, noncoding DNA sequences. They found that these regions have a high density of signal sequences for polyadenylation, which prompts enzymes to chop up the RNA before it gets very long. Stretches of DNA that code for genes have a low density of these signal sequences.
The researchers also found another factor that influences whether transcription is allowed to continue. It has been recently shown that when a cellular factor known as U1 snRNP binds to RNA, polyadenylation is suppressed. The new MIT study found that genes have a higher concentration of binding sites for U1 snRNP than noncoding sequences, allowing gene transcription to continue uninterrupted.
The work demonstrates the important role of U1 snRNP in protecting mRNA as it is transcribed from genes and in preventing the cell from unnecessary copying of non-protein-coding DNA, says Gideon Dreyfuss, a professor of biochemistry and biophysics at the University of Pennsylvania School of Medicine.
“They’ve identified a very likely mechanism for early termination of these upstream RNAs by depriving them of U1 snRNP suppression of polyadenylation and cleavage,” says Dreyfuss, who was not part of the research team.
A widespread phenomenon
The function of all of this upstream noncoding RNA is still a subject of much investigation. “That transcriptional process could produce an RNA that has some function, or it could be a product of the nature of the biochemical reaction. This will be debated for a long time,” Sharp says.
His lab is now exploring the relationship between this transcription process and the observation of large numbers of so-called long noncoding RNAs (lncRNAs). He plans to investigate the mechanisms that control the synthesis of such RNAs and try to determine their functions.
“Once you see some data like this, it raises many more questions to be investigated, which I’m hoping will lead us to deeper insights into how our cells carry out their normal functions and how they change in malignancy,” Sharp says.
The research was funded by the National Institutes of Health, the National Cancer Institute and the National Institute of General Medical Sciences.