Two Levels of Information in DNA (1971)

Dual Information in DNA and the Evolution of the Genetic Code


Laboratory of Genetics, The Hebrew University, Jerusalem, Israel

J. theor. Biol. (1971) 32, 293-298 

With the permission of the author. Academic Press copyright

Some of the molecules acting on DNA do so at specific points along the DNA macromolecule and consequently require recognition sites. It seems likely that at least some of the recognition sites are situated intragenically. It is suggested that the information of the recognition sites is carried by a code using the four nucleotides to form words which differ in length and structure from the conventional codons. The same recognition site may therefore correspond to different amino acid sequences. On the other hand, the redundancy of the genetic code permits the correspondence between the same amino acid sequence and different recognition sites. The presence or absence of a recognition site in a particular nucleotide sequence may have an adaptive value. The establishment of homologous and neutral mutations, usually attributed to random processes, can be accordingly understood as the result of natural selection acting at the level of the DNA rather than that of the polypeptide.

The concept of DNA as the carrier and perpetuator of genetic information is so deeply rooted that it tends to obliterate the role of DNA as the passive participant in interactions with other molecules. Some of these molecules act at specific points along the DNA macromolecule and as such, require some kind of signal to direct them towards their site of action [see Szybalski et al. (1969) for a review of the functions requiring recognition sites]. The aim of this paper is to put forward the theory that at least some of these signals are carried by the same physical entity that also carries the information for protein synthesis, but by a different code.

     The molecules interacting with DNA fall into two classes: 

  • (1) enzymes acting on DNA, notably the ones involved in recombination, replication and transcription; 

  • (2) "regulators", that by binding to DNA, prepare it for, or protect it from the action of these enzymes - repressors and inducers fall into this class.

The specificity of binding sites of regulators, e.g. the operator, is obvious. Evidence for the non-randomness of some sites of action of enzymes acting on DNA can be gathered from data concerning all three groups of enzymes.

   The initiation of DNA replication in bacterial chromosomes and episomes requires the presence of a specific element of recognition - the "replicator" (Jacob, Brenner & Cuzin, 1963). The enzyme initiating DNA replication - the "initiator" - is specifically related to the replicator, a particular replicator being recognised and acted upon only by the corresponding initiator.

   Similarly, transcription is not initiated randomly. In this case more information is available about the recognition sites. The "promoter", or site of initiation of transcription, was shown to consist of a sequence of pyrimidines in T4 (Maitra & Hurwitz, 1965) and of a run of cytosines in Bacillus subtilis (Kubinski, Opara-Kubinska & Szybalski, 1966).

   Evidence for site-specificity can also be gathered from recombination studies: in several cases proof was obtained for the specificity of certain recombination genes towards particular regions of the genome. In Neurospora crassa, three mapped recombination genes were shown to affect three different chromosomal segments (Catcheside, 1968). In Schizophyllum commune, recombination in two chromosomal segments was shown to be independently controlled (Stamberg, 1968; Simchen & Stamberg, 1969). In bacteriophage λ, a site-specific recombination enzyme was demonstrated (Weil & Signer, 1968; Echols, Gringery & Moore, 1968).

   The nature and function of recombination genes is a matter for speculation. Several hypotheses could be offered concerning their gene products, the most obvious ones being "regulators" or enzymes involved in recombination. Whatever their precise mode of action, they seem to recognise and act upon particular regions of the genome. This recognition is most convincingly explained by the presence of recognition sites, situated at or near the recombination point and corresponding to each of the specific recombination gene-products.

   A priori, two hypotheses can be offered concerning the nature of recognition sites and their position in relation to the genetic material: they can be either non-DNA or DNA. In the former case, they must be situated in some intergenic linkers. In the latter case, they must be localized either inside transcribed DNA, i.e. intragenically, or in "discontinuities" in the genetic material linking fragments of transcribed DNA, i.e. intergenically.

   Recognition sites must consist of DNA to account for the specificities of replication, transcription and recombination in bacteria and phage, whose chromosomes contain only DNA. In higher organisms, recognition sites of replication and transcription were not studied. However, recombination studies in fungi led Holliday (1968) to the conclusion that

"there would appear to be no reason why the recombinator should not be a sequence which codes in the normal way for amino acids, and that it would therefore be within in the genes rather than between them".

    The problem of intra- vs. intergenic position of recognition sites can be resolved by the effects of mutation on recombination, transcription and replication. A prediction of the model placing recognition sites inside transcribed DNA is the causal relationship between particular mutations and events on the DNA level. These mutations, occurring inside sequences that may serve as recognition sites, will modify the recognition site in the DNA, in addition to causing an amino acid substitution in the polypeptide. Such a mutation may inactivate an existing recognition site or modify its affinity towards its recognising molecule. In a similar way, it may create a new recognition site.

    In the case of recombination, these mutations will cause changed recombination frequencies in their vicinity. Transformation studies in Pneumococcus demonstrated the "marker effect", i.e. the effect of the particular confrontation of mutated sites on recombination frequency (Ravin & Iyer, 1962). In Escherichia coli, different mutations in the same codon of the tryptophan synthetase A gene show different recombination frequencies in crosses with the same markers (Drapeau, Brammar & Yanofsky, 1968). Further evidence that recombination frequency is indeed dependent on the mutation used as a marker was found in Ascobolus immersus (Paszewski & Prazmo, 1969) and Neurospara crassa (Jha, 1969).

    In the case of transcription these mutations can modify gene expression by altering recognition sites. Thus, mutations that altered a site essential for transcription or translation, reduced the efficiency of reading in E. coli (Ippen, Miller, Scaife & Beckwith, 1968). Likewise, mutations inside the tryptophan locus (Morse & Yanofsky, 1969; Margolin & Bauerle, 1966) and the histidine locus (St. Pierre, 1968) created new promoters in S. typhimuriuyn. The C17 mutation in 2 creates a new promoter inside an operon (Pereira da Silva & Jacob, 1968).

    If indeed recognition sites are located intragenically, the DNA must contain two kinds of information: the information transcribed into mRNA and translated into polypeptides, henceforth named "active", and the information serving to distinguish particular regions of the DNA molecule, henceforth called "passive".

   The two types of information could be carried by the same, i.e. the transcribed, DNA strand. Alternatively, the passive information could be carried by the complementary DNA strand. The two alternatives are equivalent in the forthcoming reasoning about the differences and relations between the two kinds of information. However, the assignment of the passive information to the transcribed or the complementary DNA strand may be crucial in the context of the molecular mechanisms underlying transcription and recombination.

   If both types of information were carried by the same code, a strict correlation would be found between specific amino acids and certain events on the DNA level. If, for example, the recognition site of the product of a specific recombination-gene were AGG, recombination controlled by this recombination-gene would always happen at or near the serine site. No such correlation has thus far been demonstrated. Furthermore, it is difficult to reconcile the great specificity and uniqueness of some recognition sites with the frequency and distribution of amino acids.

 It seems therefore logical to assume that the two kinds of information are carried by different codes using the same letters: the four nucleotides, which are read in triplets in the active information code, form "words" of a different length and/or different structure in the passive information code. A word in the passive code may, for example, consist of a nucleotide sequence in which only every second or third nucleotide is essential to the message.

    The active and passive codes differ by another criterion besides word length and structure: whereas words of the active code - the codons - form very long sentences, words of the passive code probably form no sentences at all, or very short ones; like the two-word sentence promoter-operator. The long informative sequences of the active code are separated by short punctuations (chain termination codons), while the relatively short informative sequences of the passive code are probably separated by long meaningless sequences.

   Another difference between the active and passive codes lies in the effects of mutations on the two types of information. A single nucleotide pair substitution in an active codon can transform it into a homologous or a non-homologous codon. In the latter case, the mutation will result either in an amino acid substitution or in chain termination. The relative frequencies of these possible consequences of mutation depend on the mutated codon, but on the average chain termination should be the rarest and amino acid substitution the most frequent.

   Analogously, a single nucleotide pair substitution in a recognition site may leave it unchanged or only quantitatively modified (e.g. by changing a non-essential nucleotide pair); it may transform it into a different recognition site, i.e. a recognition site corresponding to another recognising molecule; it also may transform the recognition site into a meaningless sequence. The relative frequencies of these possible consequences of one nucleotide pair substitution cannot be calculated before more precise information about the recognition sites is available. However, the great specificity of recognition sites makes the second consequence (recognition site "a" transformed into recognition site "b") seem highly improbable, in contrast to the high probability of the analogous consequence of a mutation in an active codon.

FIG. 1              
tion site


  AAA CUU UCU UUA AA lys-leu-ser-leu


AAC UUU CUU UAA A asn-phe-leu-ochre


ACU UUC UUU AAA   thr-phe-phe-lys
Mutated recogni-
tion site


  AAA CUU CCU UUA AA lys-leu-pro-leu


AAC UUC CUU UAA A asn-phe-leu-ochre


ACU UCC UUU AAA   thr-ser-phe-lys

   The relations between the active and passive informations are illustrated by the wholly imaginary recognition site GAAAGAAA (Fig. 1). By shifting the reading frame, the same recognition site can be made to correspond to three different amino acid sequences. On the other hand, a mutation to a homologous codon inactivates the recognition site without affecting the corresponding amino acid sequence. The second amino acid sequence in Fig. 1 is identical for the normal and the mutated recognition site.

   In conclusion, identical passive messages may correspond to different active messages and vice versa: equivalent active messages may correspond to different or even contrasting passive messages.

    This may have a bearing on the evolution of the genetic code. Were homologous codons fully equivalent, the establishment of any particular one would have been most easily understood as the result of a random process, such as drift. However, non-equivalence of homologous codons may stem from their participation in sequences forming recognition sites. Consequently, the choice of a particular codon rather than its homologs may be a function of the selective value of recombination, or initiation of replication or transcription in its vicinity. Thus, the establishment of specific codons at particular sites may well be the result of natural selection rather than a random process.

 The dual information carried by DNA can also explain the so-called neutral mutations (King & Jukes, 1969). The difference between the selective values of an active and an inactive recognition site may be greater than the one between two polypeptides differing in one amino acid. In such cases, the passive rather than the active information will be the criterion for selection, causing the establishment of mutations with no apparent selective advantage.

 I am grateful to Dr Giora Simchen for critically reading the manuscript and providing helpful suggestions. Thanks are also due to Miss Ruth Voss and Mr Joseph Hillel for valuable discussions.


CATCHESIDE, D. G. (1968). In Replication and Recombination of Genetic Material. (W. J.. Peacock & R. D. Brock, eds.) p. 216. Canberra: Australian Academy of Science.

DRAPEAU, G. R., GRAMMAR, W. J. & YANOFSKY, C. (1968). J. molec. Biol. 35, 357.

ECHOLS, H., GRINGERY, R. & MOORE, L. (1968). J. molec. Biol. 34, 251.

HOLLIDAY, R. (1968). In Replication and Recombination of Genetic Material. (W. J. Peacock & R. D. Brock, eds.) p. 157. Canberra: Australian Academy of Science.

IPPEN, K., MILLER, J. G., SCAIFE, J. & BECKWITH, J. (1968). Nature, Lond. 217, 825.

JACOB, F., BRENNER, S. & CUZIN, F. (1963). Cold Spring Harbor Symp. quant. Biol. 28, 329.

JHA, K. K. (1969). Molec. gen. Genet. 105, 30.

KING, J. L. & JUKES, T. H. (1969). Science, N. Y. 164, 788.

KUBINSKI, H., OPARA-KUBINSKA, Z. & SZYBALSKI, W. (1966). J. molec. Biol. 20, 313.

MAITRA, U. & HURWITZ, J. (1965). Proc. natn. Acad. Sci. U.S.A. 54, 815.

MARGOLIN, P. & BAUERLE, R. H. (1966). Cold Spring Harbor Symp. quant. Biol. 31, 311.

MORSE, D. E. & YANOFSKY, C. (1969). J. molec. Biol. 41, 317.

PASZEWSKI, A. & PRAZMO, W. (1969). Genet. Res. 14, 33.

PEREIRA DA SILVA, L. H. & JACOB, F. (1968). Anns Inst. Pasteur, Paris 115, 145.

RAVIN, A. W. & IYER, V. N. (1962). Genetics 47, 1369.

SIMCHEN, G. & STAMBERG, J. (1969). Heredity 24, 369.

STAMBERG, J. (1968). Molec. gen. Genet. 102, 221.

ST. PIERRE, M. L. (1968). J. molec. Biol. 38, 71.

SZYBALSKI, W., BOVRE, K., FIANDT, M., GUHA, A., HRADECNA, Z., KUMAR, S., LOZERON, H. A., SR, MAHER, V. M., NIJKAMP, H. J. J., SUMMERS, W. C. & TAYLOR, K. (1969). J. Cell Physiol. 74, Suppl. l., 33.

WEIL, J. & SIGNER, E. R. (1968). J. molec. Biol. 34, 273.

Two Levels of Information in DNA (1999) (Click Here)

Bioinformatics Index (Click Here)

HomePage (Click Here)

Last edited 07 Nov 2020 by Donald Forsdyke