In a previous article, I discussed the implications of the highly optimized set of amino acids commonly used in biological proteins. As we saw, this provides in itself a strong argument for design. Another incredible feature of the genetic code is that the assignments between codons and amino acids is finely tuned to minimize errors that might arise due to mutations. In this and three subsequent articles, I will consider multiple facets of fine-tuning of the conventional genetic code for the minimization of error-impact. In the final article, we will consider carefully the extent to which the genetic code points to design.
Genetic Code Redundancy
The optimization of the genetic code for error minimization is made possible by the redundancy of the code. What is meant by redundancy? The total number of possible RNA triplets amounts to 64 different codons. Of those, 61 specify amino acids, with the remaining three (UAG, UAA and UGA) serving as stop codons, which halt the process of protein synthesis. Because there are only twenty different amino acids, some of the codons are redundant. This means that multiple codons can code for the same amino acid. The cellular pathways and mechanisms that make this 64-to-20 mapping possible is a marvel of molecular logic. If you need a refresher on the remarkable process by which the mRNA transcript is translated into proteins by the ribosome, here is a three-minute animation:
In every other realm of experience we habitually associate language conventions or coding systems with conscious deliberate agents rather than unguided processes. But the evidence for design, as we shall see, extends well beyond the mere fact of the genetic coding system.
Minimizing the Impact of Point Mutations
The genetic code’s degeneracy is largely caused by variation in the third position, which is recognized by the nucleotide at the 5′ end of the anticodon (the so-called “wobble” position). The wobble hypothesis states that nucleotides that are present in this position can make interactions that aren’t permitted in the other positions (though it still leaves some interactions that aren’t allowed).
This arrangement is far from arbitrary. Indeed, the genetic code found in nature is exquisitely tuned to protect the cell from the detrimental effects of substitution mutations. The system is so brilliantly set up that codons differing by only a single base either specify the same amino acid, or an amino acid that is a member of a related chemical group. In other words, the structure of the genetic code is set up to mitigate the effects of errors that might be incorporated during translation (which can occur when a codon is translated by an almost-complementary anti-codon).
For example, the amino acid leucine is specified by six codons. One of them is CUU. Substitution mutations in the 3′ position which change a U to a C, A, or G result in the alteration of the codons to ones that also specify leucine: CUC, CUA, and CUG respectively. On the other hand, if the C in the 5′ position is substituted for a U, the codon UUU results. This codon specifies phenylalanine, an amino acid that exhibits similar physical and chemical properties to leucine. The fact in need of explaining is thus that codon assignments are ordered in such a way as to minimize ORF degradation. In addition, most codons specify amino acids that possess simple side chains. This decreases the propensity of mutations to produce codons encoding amino acid sequences that are chemically disruptive.
A paper published in 2000 found that the genetic code is highly optimized, taking into account two parameters: first, the relative likelihood of transitions and transversions; and second, the relative impact of mutation.1 They observe,
When the error value of the standard code is compared with the lowest error value of any code found in an extensive search of parameter space, results are somewhat more variable. Estimates based on PAM data for the restricted set of codes indicate that the canonical code achieves between 96% and 100% optimization relative to the best possible code configuration (fig. 2c ). If our definition of biosynthetic restrictions are a good approximation of the possible variation from which the canonical code emerged, then it appears at or very close to a global optimum for error minimization: the best of all possible codes.
A subsequent paper, by Gillis et al., argued that when the varying frequencies of amino acids are accounted for, the genetic code appears to be even more finely optimized than had been suggested by previous studies.2 Whereas earlier studies had assumed that all amino acids are equivalently likely to occur in proteins, this assumption does not reflect the real world, since amino acids vary significantly in frequency — for example, leucine occurs much more frequently than tryptophan. The authors therefore weighted errors by the frequency with which any given amino acid appears in proteins. Their finding was that frequent amino acids are particularly well protected against mutational errors. Frequent amino acids, then, are more protected than less frequent ones. The authors state,
We found that taking the amino-acid frequency into account decreases the fraction of random codes that beat the natural code. This effect is particularly pronounced when more refined measures of the amino-acid substitution cost are used than hydrophobicity. To show this, we devised a new cost function by evaluating in silico the change in folding free energy caused by all possible point mutations in a set of protein structures. With this function, which measures protein stability while being unrelated to the code’s structure, we estimated that around two random codes in a billion (109) are fitter than the natural code. When alternative codes are restricted to those that interchange biosynthetically related amino acids, the genetic code appears even more optimal.
This level of optimization is much more extreme than earlier estimates that the genetic code optimization is one in a million.3
A subsequent paper, published in BioSystems in 2004, argued that the level of optimization of the conventional code is even more extreme.4 In particular, they observe that there exists a correlation between the frequency with which an amino acid is used and the number of codons encoding it. Moreover, codons that differ by only one base with the stop codons tend to encode rare amino acids such as cysteine, tryptophan, and tyrosine. These amino acids are used with low frequency and are also often encoded with few codons. This limits the potential for nonsense mutations which are the most damaging of mutations (since they involve swapping a regular codon for a stop codon, thereby truncating the protein). The authors argue that, when this optimization parameter is accounted for, the code appears to be even more statistically exceptional. They set a lower bound on the statistical rarity of the conventional genetic code as 1 in 2 billion.
Just the Beginning
If this were all there were to the optimization of the genetic code, it would be remarkable by itself. There are, however, multiple additional levels of fine-tuning that go beyond this. The genetic code therefore is simultaneously optimized for multiple constraints. In my next article, we will see how the genetic code is optimized to dampen the harmful effects of frameshift mutations, and for the presence of overlapping coding sequences.
Notes
- Freeland SJ, Knight RD, Landweber LF, Hurst LD. Early fixation of an optimal genetic code. Mol Biol Evol. 2000 Apr;17(4):511-8. doi: 10.1093/oxfordjournals.molbev.a026331. PMID: 10742043.
- Gilis D, Massar S, Cerf NJ, Rooman M. Optimality of the genetic code with respect to protein stability and amino-acid frequencies. Genome Biol. 2001;2(11):RESEARCH0049. doi: 10.1186/gb-2001-2-11-research0049. Epub 2001 Oct 24. PMID: 11737948; PMCID: PMC60310.
- Freeland SJ, Hurst LD. The genetic code is one in a million. J Mol Evol. 1998 Sep;47(3):238-48. doi: 10.1007/pl00006381. PMID: 9732450.
- Goodarzi H, Nejad HA, Torabi N. On the optimality of the genetic code, with the consideration of termination codons. Biosystems. 2004 Nov;77(1-3):163-73. doi: 10.1016/j.biosystems.2004.05.031. PMID: 15527955.









































