How the Genetic Code Guards Against Frameshifts

Share: Facebook; Twitter/X; LinkedIn; Flipboard; Print; Email

In a previous article, I discussed how the genetic code used in life appears to be highly optimized for the minimization of errors that might be introduced through point mutations. Here, I will discuss another astonishing aspect of the genetic code’s design. The genetic code appears to have been set up in such a way to dampen the impact of frameshift mutations. A frameshift mutation occurs as the result of indels (insertions or deletions) of a number of nucleotides that is non-divisible by three. Such an event causes the reading frame to be shifted, resulting in the production and accumulation of misfolded proteins. The earlier on in the sequence that this indel occurs, the greater the alteration of the protein’s amino-acid sequence.

Termination of Translation

The genetic code is thought of as being comprised of groups of four codons where the first positions are the same for all four (whereas the third can be occupied by any base). When codons code for the same amino acid, they are referred to as a “codon family.” Half of the genetic code is comprised of such codon families. In the codon families designated AAN and AGN (which categorize Asn/Lys and Ser/Arg triplets respectively), the triplets are only a single frameshift away from forming the stop codons UAA and UAG, which signal termination of translation. These encrypted stop signs help to prevent the accumulation of misfolded proteins.

Another two frequently used amino acids are aspartic acid (specified by GAU and GAC) and glutamic acid (specified by GAA or GAG). These codons are only one frameshift away from UGA, which is a stop codon. Given that these amino acids are used with high frequency, this ensures that when a frameshift occurs, there is a high probability that stop codons will arise and stop translation.

A Remarkable Feature

As Bollenbach et al. (2007) explain,

…stop codons can easily be concealed within a sequence. For example, the UGA stop codon is only one frameshift away from NNU|GAN; the GAN codons encode Asp and Glu, which are very common in protein sequences. Similarly, UAA and UAG can be frameshifted to give NNU|AAN and NNU|AGN (the AAN codons encode Asn or Lys and AGN gives Ser or Arg). Glu, Lys, Asp, Ser, and Arg are relatively common amino acids in the genome, so the probability of a stop codon arising from a misread of a codon from one of these three amino acids is very high. The fact that a stop codon can be “hidden” in this way using a frameshift means that even a signal sequence that happens to include a stop codon (a problem that is bound to arise sooner or later) can be encoded within the protein sequence by using one of the two reading frames in which the stop codon encodes for a frequently used amino acid.¹

Remarkably, the 64-to-20 mapping system is set up in order to minimize the number of amino acids that are translated from a frameshifted transcript before the appearance of one of the stop codons. Highly frequent codons (e.g. those coding for aspartic or glutamic acid) can frequently form stop codons in the event of a frameshift. Thus, in the conventional genetic code, translation of a frameshift error is halted faster on average than in 99.3 percent of alternative codes.²

Thus far, we have considered two parameters of the genetic code that appear to be highly optimized. In my next article, we will consider two further aspects of the genetic code’s design.

Notes

Bollenbach T, Vetsigian K, Kishony R. Evolution and multilevel optimization of the genetic code. Genome Res. 2007 Apr;17(4):401-4. doi: 10.1101/gr.6144007. Epub 2007 Mar 9. PMID: 17351130.
Itzkovitz S, Alon U. The genetic code is nearly optimal for allowing additional information within protein-coding sequences. Genome Res. 2007 Apr;17(4):405-12. doi: 10.1101/gr.5987307. Epub 2007 Feb 9. PMID: 17293451; PMCID: PMC1832087.