In two previous articles (here and here), we considered two ways in which the genetic code is highly optimized for the minimization of errors from point mutations and frameshifts. In this third post, I will consider two additional levels of fine-tuning.
Facilitation of Overlapping Coding Sequences
In a 2007 paper published in Genome Research, Itzkovitz and Alon show that the universal genetic code is structured in such a way as to “allow arbitrary sequences of nucleotides within coding sequences much better than the vast majority of other possible genetic codes.”1 They report,
We find that the universal genetic code can allow arbitrary sequences of nucleotides within coding regions much better than the vast majority of other possible genetic codes. We further find that the ability to support parallel codes is strongly correlated with an additional property — minimization of the effects of frameshift translation errors.
The genetic code is thus highly optimized for encoding additional information beyond the amino acid sequence in protein-coding sequences. Examples include RNA splicing signals and information about where nucleosomes should be positioned on the DNA, as well as sequences for RNA secondary structure.
It has also been shown by more recent work that, for a great many protein sequences, multiple key physicochemical properties (such as hydrophobicity profiles) appear to be retained upon a +1 or -1 frameshift, and it has therefore been suggested that “frameshift stability is embedded in the structure of the universal genetic code.”2 Another paper sought to measure the extent to which the conventional genetic code is optimized relative to thousands of alternative code sets, including purely random codes as well as codes that partially preserved the structure of the conventional code.3
In particular, they examined the impact of conservative mutations on alternative reading frames — that is, the question of whether amino acid similarity is maintained in alternative frames when a mutation occurs that results in the same or a similar amino acid in the other frame. They determined that the standard code was most highly optimized, of the codes examined, in the -1 frame, though the standard code performed well in other frames as well. This indicates an additional level of fine-tuning of the standard genetic code, in order to enable coding-sequence overlaps. The paper claimed that “not a single code better than the standard code was found in 1010 [i.e., ten billion] codes.” This reveals a level of fine-tuning that exceeds even previous estimates.4
The Failed Rejection Problem
Another paper models the specificity of correct codon-anticodon duplex formation during translation.5 According to their model, for an incorrect duplex to be rejected by the ribosome, it is necessary for it to have at least one uncompensated hydrogen bond: a criterion that presents difficulties when duplexes have a pair of pyrimidines (i.e., U or C) in the codon’s third position, i.e., the wobble position. Pyrimidine bases are somewhat smaller than purine (G and A) bases and, in the wobble position, can allow certain mismatches in the second position to produce non-Watson-Crick pairs that compensate the missing hydrogen bonds. This results in a mistranslation event because the mismatches in the second position are not properly rejected.
This problem can be circumvented by preventing an anticodon’s pyrimidine in the wobble position from forming a pyrimidine pair. Such a modification entails that a single anticodon that could have recognized four codons is now able to recognize only two. So there will now need to be one tRNA for the pyrimidines of the wobble position and another tRNA for the purines of the wobble position.
This explains why 32 codons (those ending with A and G) in the standard genetic code are in “family boxes” (where all four codons specify the same amino acid) and the other 32 (those ending with C and U) are in “split boxes” (where the four codons are divided between different amino acids). Indeed, the very same stereochemical constraints that make particular second-position mismatches difficult for the ribosome to detect also determine which codon boxes have to be split in order to avoid these errors.
The Optimization of the Genetic Code Is Multi-Layered
In this and previous articles, we have surveyed multiple levels at which the genetic code, far from being a “frozen accident,” appears to be highly optimized across multiple independent constraints. This raises a natural question: What best explains these phenomena? In the final article of this series, I will assess the relative merits of evolution and design as an explanation of the features of the genetic code.
Notes
- Itzkovitz S, Alon U. The genetic code is nearly optimal for allowing additional information within protein-coding sequences. Genome Res. 2007 Apr;17(4):405-12. doi: 10.1101/gr.5987307. Epub 2007 Feb 9. PMID: 17293451; PMCID: PMC1832087.
- Bartonek L, Braun D, Zagrovic B. Frameshifting preserves key physicochemical properties of proteins. Proc Natl Acad Sci U S A. 2020 Mar 17;117(11):5907-5912.
- Wichmann S, Ardern Z. Optimality in the standard genetic code is robust with respect to comparison code sets. Biosystems. 2019 Nov;185:104023. doi: 10.1016/j.biosystems.2019.104023. Epub 2019 Sep 11. PMID: 31520875.
- Wichmann S, Ardern Z. Optimality in the standard genetic code is robust with respect to comparison code sets. Biosystems. 2019 Nov;185:104023. doi: 10.1016/j.biosystems.2019.104023. Epub 2019 Sep 11. PMID: 31520875.
- Lim VI, Curran JF. Analysis of codon:anticodon interactions within the ribosome provides new insights into codon reading and the genetic code structure. RNA. 2001 Jul;7(7):942-57. doi: 10.1017/s135583820100214x. PMID: 11453067; PMCID: PMC1370147.









































