Skip to Main Content

Physico-chemical constraints connected with the origin of the genetic code

Jean Lehmann, Ph.D.

This research is centered on the origin of the genetic code. Based on a theoretical model developed earlier1, 2, the goal of this project is to test experimentally some predictions of the model and to develop further the theory.

Several assumptions can help us in establishing the primitive form of translation. For instance, it is impossible that the set of different codons (or anticodons) was already as high as 64. The main reason for this has to be connected with the accuracy of translation; only an evolved ribosome would be able to discriminate similar anticodon-codon couples (with similar binding energies) and to maintain the reading frame during translation. Thus, a model for the primitive translation must at least address these two potential problems.

The model mentioned above predicts that small particular RNA strands folded into one stem-loop structure in their most stable conformations (primordial tRNAs) would be able to selectively self-aminoacylate their 3' ends with activated amino acids (accordingly, synthetases were not necessarily present in the beginning). This process, postulated on the basis of an analysis of the organization of the genetic code, has two main predictions:

  1. Given that such RNAs display a loop of seven nucleotides, with three bases constituting the "anticodon", the first base of the anticodon has to be cytosine (C) in order to ensure self-aminoacylation. Consequently, only RNA templates displaying a GNN pattern (N meaning any of the four nucleotides) would be able to undergo translation.
  2. During self-aminoacylation, the second base of the anticodon operates a selection on the amino acids based on hydrophobicity: hydrophobic amino acids would be coded by A and G, while hydrophilic ones would be coded by C and U.
  3. According to an evolutionary model connecting translation and replication, the third base of the anticodon would be G after some time in the evolution.
  4. Following (i), (ii) and (iii), there are only four different anticodons (and codons) after some time in the early molecular evolution, all of the kind 5'GNC3'.

Given the fact that the primitive pool of RNAs is undifferentiated, the GNC pattern just mentioned characterizes both the adaptor (primordial tRNA) and the template (first "genes"). Particular constraints connected with the folding of these RNAs into adaptor explain then why all anticodons are of the type 3'CNG5'.

The model thus address the two problems mentioned earlier: only four different codons (and anticodons) are present in the system, and there is only one possible reading frame in translation. Besides, the mid-position of the discriminatory base (N) ensures a certain level of discrimination with regard to all possible anticodon-codon couples. As a support to the model, the four amino acids coded by the GNC codons in the Canonical genetic code are Valine, Alanine, Glycine and Aspartic acid, which are the most abundant amino acids found in the so-called prebiotic synthesis experiments. Moreover, according to some analyses, it would be possible to build up functional proteins (with an active site and the major folding structures) with only these four amino acids, and thus to generate a basic form of proteinic metabolism.

With regard to the present research, we are currently studying experimentally the self-aminoacylation property of small RNAs, with the aim to test some basic predictions of the model just mentioned.


(1) Physico-chemical Constraints Connected with the Coding Properties of the Genetic System, by Jean Lehmann. J. Theor. Biol. 202, 129-144 (2000)

(2) Amplification of the Sequences Displaying the Pattern RNY in the RNA World: The Translation -> Translation/Replication Hypothesis, by Jean Lehmann. J. Theor. Biol. 219, 521-537 (2002).