Molecular sex

Molecular sex: The importance of base composition rather than homology when nucleic acids hybridize

Journal of Theoretical Biology (2007) 249, 325-330 http://dx.doi.org/10.1016/j.jtbi.2007.07.023

Abstract

On learning that nucleic acid hybridization had been achieved in a test tube, Huxley hailed the discovery of "molecular sex." The description was apt, since sex involves recombination, which requires hybridization that, in turn, depends on a successful homology search. Conversely, when the homology search fails, recombination fails. In yeast this failure has been attributed to "simple sequence divergence." But sequence divergence does not impair nucleic acid hybridization simply. Most natural single-stranded nucleic acids are predisposed to adopt higher order structures containing stem-loops. Tomizawa showed that the rate-limiting step in the hybridization of single-stranded sequences is an initial "kissing" exploration between complementary loops, which must first be appropriately extruded and aligned. Successful duplex formation requires successful synchronization of matching higher ordered structures, which depends, not so much on the degree of similarity between their base sequences as on the closeness of their base compositions (GC%). In these terms we can understand how the anti-recombinational effect of GC% differences supports the duplication both of genes within a genome and of genomes within a genus (speciation).

Hypotheses of the origin of biological species are broadly categorized as genic and chromosomal (Coyne and Orr, 2004). While in individual cases either category may have applied, at issue is which is most likely to have applied in the general case (Kliman et al., 2001; Forsdyke, 2004a). Genetic analyses in yeast have shown that genic incompatibilities are unlikely to have initiated divergence into new species. Furthermore, it is inferred that incompatibilities which impair the meiotic pairing of chromosomes are not due to segmental DNA rearrangements. By exclusion, only "simple sequence divergence" - namely, differences in individual DNA bases - is left (Liti et al., 2006; Greig, 2007). A similar conclusion is suggested by studies in fruit fly by Naveira and Maside (1998) who invoke "foreign DNA amount" irrespective of its protein-encoding potential. Base differences would impair the homology search that precedes meiotic recombination between chromosomes, so impeding gametogenesis and rendering hybrids sterile. Thus the parents of a hybrid would be reproductively isolated from each other, a condition that could facilitate divergence into new species.

The yeast results were found "surprising" since, although complicated, genic hypotheses were considered to be "widely accepted" (Greig, 2007). Yet, the alternative - a chromosomal hypothesis of sequence divergence due to base differences - was viewed as "simple" (Liti et al., 2006; Greig, 2007). This may be reflective of a dichotomy between genetical and biochemical evolutionists. At the extremes, the former tend to think in terms of phenotypes and mathematical models, whereas the latter tend to think in terms of genotypes and DNA chemistry. Both groups recognize that base differences usually suffice to prevent the hybridization that can lead to recombination. But biochemical evolutionists have long dwelled on the fact that species differ in base composition (Sueoka, 1961; Wada et al., 1991; Bernardi, 2005; Forsdyke, 2006). Either directly (Bellgard et al., 2001) or indirectly (Forsdyke, 2007), such differences can sometimes be detected early in the speciation process - consistent with a cause and effect relationship.

Of the three fundamental parameters involving the sum of two of four bases - GC%, AG% and GT% (which reciprocate, respectively, with AT%, CT% and AC%) - values for GC% (i.e. G + C expressed as a percentage of the four bases) vary most widely both between and within species (Schultes et al., 1997), and have proved to be critical. I here review evidence on the decisive role played by differences in GC%, rather than in base sequence per se, in the preservation of sequences by protecting them from recombination with sequences from which they have begun to diverge. A genome which has diverged from others in its species may no longer be a reliable template for error-correction. As such it must be excluded from recombination with other members of the species, but when so excluded the deviant genome then becomes a candidate for an incipient speciation event (Forsdyke, 2001).

In the 1950s it became possible to synthesize artificial single-stranded RNA sequences such as UUUUUUUUUUUU - poly(rU), and AAAAAAAAAAAA - poly(rA) (Grunberg-Manago et al., 1955). The single strands when mixed together (i.e. poly(rU) + poly(rA)) formed a double-stranded hybrid (Rich and Davies, 1956; Warner, 1957), which had a helical structure similar to that of double-stranded DNA (Watson and Crick, 1953). Omitting the helix, this can be represented as:

UUUUUUUUUUUU -------------------------- . . [1]
AAAAAAAAAAAA

At the time it appeared amazing that this could occur in a simple salt solution at room temperature in the absence of enzymes. The biologist Julian Huxley congratulated Rich for having discovered "molecular sex" (Rich, 2006). Whether said in jest, or from profound insight, the description fits perfectly (see below).

What was going on in the privacy of the test-tube when millions of flexible, snake-like, poly(rU) molecules were mixed with millions of flexible, snake-like, poly(rA) molecules? Following the Watson-Crick base pairing rules, molecules of poly(rU) react only weakly with each other (since U pairs weakly with U). Furthermore, there is little inclination for the molecules to fold back on themselves, permitting internal pairing of U with U. The same applies for poly(rA). So there was nothing left but for As to pair with Us. Since the molecules had little internal secondary structure (no folding back on themselves), it was easy for a writhing chain of Us to find a writhing chain of As. Millions of relatively rigid, duplex molecules [1] resulted. Their formation could be monitored either spectrophotometrically or by observing an increase in viscosity.

Things got more complicated when more complex sequences were tried. Take, for example, the twelve base sequence UUUUUUUUAAAA, which should mix with the twelve base sequence AAAAAAAAUUUU, to give:

UUUUUUUUAAAA --------[2]

AAAAAAAAUUUU

In this case, before they can be mixed, each molecule will have rapidly and spontaneously folded back on itself:

                                                            U                    A                  ------------ [3]
                                                UUUU    U           A     AAAA
                                                AAAA    U           A     UUUU
                                                            U                    A

Each of the molecules in [3] has a stem (bases 1-4 and 9-12) where there is base-pairing, and a loop (bases 5-8) where there is no base pairing. This situation better corresponds to that of cellular RNAs (Meyer and Miklos, 2005; Shabalina et al., 2006; Forsdyke, 2006). By virtue of their complex structures, "sense" natural RNAs would need a little coaxing - perhaps heating a little - to get them to form duplexes as in [2] with the corresponding "antisense" RNAs. The details of this process were elucidated by Tomizawa (1984). In [3] the loops are facing each other, so that the bases in the left loop can reversibly pair with the bases in the right loop. Tomizawa referred to this - the critical rate-limiting stem in hybridization - as "kissing" (Eguchi et al., 1991).

In this case the kissing can rapidly escalate since Us pair with As. But if the left loop had one of its Us substituted by the base C (in one of the positions 5-8) escalation would be less likely, and would become increasingly more unlikely as more of the four positions were substituted with C. Under the Watson-Crick rules, As do not pair with Cs. Tomizawa used the word "kissing" to imply an exploratory interaction. The chemical energetics are such that, if the kissing can be sustained, the stems of the two parental molecules will disrupt to allow formation of a complete duplex as in [2]. A more elaborate example is shown in Figure 1. Here the pairing first occurs between As and Us and between Gs and Cs on the loops at the left. The loops attempt to form a mini-double helix (not shown). This is reversible, so that if adequate complementary pairing bases are not found, the kissing loops (middle) separate. The long duplex at the right (actually a double helix), is more chemically stable than the structures at the left, so if the conditions are right the reaction will primarily be in the direction of the arrows (Eguchi et al., 1991).

Both reacting molecules at the left in Figure 1 are in the same tube in the same salt solution and at the same temperature. For consummation of their initial pairing a little warming might be needed, but the heat would be delivered to the tube in a uniform manner, so that both reacting molecules would be affected equally. We can think of them as being synchronized so that the structures match - one does not remain in entirely single-stranded mode while the other adopts a stem-loop configuration. Both form stem-loops to an equal extent and at the same time, apart from minor idiosyncratic fluctuations.

Given their common environment, what small difference in the partners at the left would be most effective in preventing their union? One might imagine that a change in one of the base-pairs in the loops - perhaps a C opposite one of the As - would impede the kissing. Alternatively, there might be a mismatch in the duplex at the right. In other words, one tends to think in terms of the sequence similarity between the two partners being less than perfect. However, imperfect similarity per se does not necessarily prevent hybridization. Far more critical are the base pairs that give the secondary structure of the RNA its stability. A difference in pairing, such as the substitution of an AU pair for a GC pair almost anywhere in the stem of one partner, would tend to desynchronize their configurations so that the bases in the loops, even if precisely matched, would not meet each other. Indeed, numerous studies reveal the exquisite sensitivity of the structures formed by single-stranded nucleic acids to changes in only one base pair (Orita et al., 1989; Shen et al., 1999; Dong et al., 2001; Woodside et al., 2006). Chen et al. (1990) showed for RNA that it is not so much similarity (i.e. equivalence in base order) as base composition - and specifically GC% - that critically determines secondary structure. Base composition makes a much greater contribution to structural energetics than base order. Of various base compositional parameters, the product of the frequencies of G and C is the best predictor of structure.

This takes us back to Huxley's remark (Rich, 2006). It is easy to think that a human baby itself (or the adult that baby develops into) is the final product of a parental copulation nine months earlier. In fact, that act of copulation began something that (as far as an ending can be discerned in a process that is essentially cyclical) ends in the gonads of the adult their baby grows to become. The chances are that, right now, within your gonads that ending is being enacted by DNA copies derived from parental DNA molecules that have been cooperating from the moment of your conception, both within your main body ("soma") and within your gonads ("germ-line"). Most of the time your two parental DNAs (genomes) work together but separately. They multiply as the cells containing them multiply (by mitosis). However, in the gonad, when new gametes are made there is a different type of cell division (meiosis). This meiotic division is characterized by the union (or "conjugation" as the early cytologists called it) of your parental chromosomes. In essence, it completes the act of conjugation your parents initiated decades earlier, and involves the formation of hybrid duplexes with one DNA strand being of paternal origin and the other of maternal origin.

Subjects of much debate are how and why this meiotic union occurs. Crick saw the problem as one of determining how parental duplex DNA molecules, both double-stranded in accordance with the Watson-Crick model, would be able to recognize each other. For simplicity, he thought the recognition should follow the base-pairing rules that he and Watson had discerned. But in helical duplex DNA the bases were inward-looking. How could inward-looking bases in one duplex look outwards to recognize homologous bases in another DNA duplex? Thus, came the "unpairing hypothesis" (Crick, 1971). In both duplexes the two strands would locally unpair so as to present outward-looking, single-stranded, sequences of bases (Krueger et al., 2006). In this way, a segment of bases in a paternal duplex would be able to pair with a similar segment of bases in a maternal duplex. If the pairing sequences were absolutely identical (i.e. there was homology) the duplex would be considered a "homoduplex" (like the original parental duplexes). If the pairing sequences differed by as little as one base-pair, then the duplex would be considered a "heteroduplex" (Holliday, 1990; Allers and Lichten, 2001).

Crick saw the unpaired strands as remaining single-stranded prior to forming a homoduplex or heteroduplex. But energetic considerations dictate that, as they "unzip" from each other, the unpaired single strands should quickly fold back on themselves to form stem-loop structures (Murchie et al., 1992; Woodside et al., 2006). This would be supported by the crowded intracellular environment where entropic contributions to the critical base-stacking interactions would be increased (Yakovchuk et al., 2006). Thus, the initiation of pairing between the parental DNA strands should involve the same "kissing" process as envisaged by Tomizawa for RNA molecules (Sobell, 1972; Wagner and Radman, 1975; Doyle, 1978; Kleckner and Weiner, 1993; Hawley and Arbel, 1993). Of key importance for this would be that the unpairing and folding displayed a sufficient degree of synchrony so complementary DNA loops would have the opportunity to meet. As in the case of RNA folding (Chen et al., 1990), it was found for DNA that base composition, rather than actual sequence similarity (base order), was critical in determining the degree of secondary structure (Forsdyke, 1998). These results, derived from thermodynamic calculations of nearest-neighbour energies (Mathews, 2006), were supported by optical force clamp studies of isolated molecules (Woodside et al., 2006). Thus, a very small difference in the base composition between the paternal and maternal DNAs should suffice to prevent the initiation of hybridization. This is summarized in Figure 2. Here, on the left, paternal (P) and maternal (M) DNAs have the same base composition ("X"). As conditions change, the two strands unpair in synchrony and loops are positioned so that kissing can occur. Formation of a paranemic joint (no immediate strand breakage) can lead to recombination (Wong et al., 1998). On the right, the base compositions differ slightly (X and X+1) and the unpairing is unsynchronized.

Fig. 2. The exquisite sensitivity of stem-loop extrusion from duplex DNA to differences in base composition can prevent the initiation of pairing between homologous DNA sequences. At the left, paternal (P) and maternal (M) duplexes have the same GC% value. As negative supercoiling progressively increases, the strands of each duplex synchronously open to allow formation of equivalent stem-loop secondary structures so that "kissing" interactions between loops can progress to pairing. At the right, paternal and maternal duplexes differ slightly in GC%. The maternal duplex of higher GC% opens less readily as negative supercoiling increases, so strand opening is not synchronous, "kissing" interactions fail, and there is no progress to pairing. In this model chromosome pairing occurs before the strand breakage that accompanies recombination (not shown). Even if strand breakage were to occur first (as required by some models), unless inhibited by single-stranded DNA-binding proteins the free single strands so exposed would, in the crowded intracellular environment, rapidly adopt stem-loop configurations. So the homology search could still involve kissing interactions between the tips of loops.

Why does meiotic pairing occur? Why should it matter that the pairing partners be synchronized so that a hybrid duplex is formed between paternal and maternal chromosomes? Furthermore, what are the consequences of failure to form such a duplex? The case has been made that, apart from assisting the equal partitioning of chromosomes among gametes, a hybrid duplex is an essential intermediate in the recombination of segments of paternal and maternal genomes that results, not only in increased genetic diversity among gametes (by intra-chromosomal exchange of segments) but, more importantly, decreased genetic diversity due to correction of mutations (gene conversion; Bernstein and Bernstein, 1991). This appears as a compelling reason for sexual, as contrasted with asexual, reproduction. In the course of this correction, differences between the parental genomes would decrease, so that information for any adaptations that might have depended on those differences would be less likely to be forwarded to their grandchildren. In other words, while heterozygosity would not be entirely eliminated, there would be some blending - an ironing out of differences (Forsdyke, 2006).

The opposite would be expected when duplex formation fails. Sequences of DNA then become recombinationally isolated, a condition favourable to the emergence both of new genes from the duplication of pre-existing genes within a genome, and of new species from the duplication of pre-existing species within a genus. It is here that the importance of base composition differences (rather than non-homology per se) in preventing the initiation of recombination, is apparent. This would seem to provide a rationale for the well known differences in base composition between genes within a genome, and between species within a genus (Wada et al., 1991; Bernardi, 2005). Thus, the recombinational isolation brought about by differences in GC% - the "accent" of DNA - can be seen as a major protector of novel DNA sequences against recombinational repair processes (Forsdyke, 2004b). When in a protein-encoding region, the differences would primarily affect third (synonymous) codon positions, so there could be reproductive isolation without necessarily affecting protein function (i.e. without in the first instance affecting phenotype; Forsdyke, 2007).

One recurring criticism of this viewpoint (raised by a reviewer of this manuscript) has been that closely related species may have very similar GC% values. However, barriers to reproduction (the first being a postulated difference in GC%) tend to replace each other consecutively. Having diverged, GC% values can converge when a second barrier appears (Schultes et al., 1997). The following metaphor may help. Your dog may be tethered by a leash. But if you build a high fence, the leash is no longer necessary. If the initial barrier (the leash) is damaged or lost, it may not be noticed. The second barrier (fence) should suffice. On the other hand, the leash then becomes available for some other function. Thus, following establishment of a second barrier, a first barrier may degenerate or change in a random way, or may find other employment. If a Sherlock Holmes then tried to discern whether there had been an earlier barrier than the fence, and what form it had taken, there might be a problem (Forsdyke, 2001).

Allers, T., Lichten, M., 2001. Intermediates of yeast meiotic recombination contain heteroduplex DNA. Mol. Cell 8, 225-231.

Bellgard, M., Schibeci, D., Trifonov, E., Gojobori, T. J., 2001. Early detection of G + C differences in bacterial species inferred from the comparative analysis of the two completely sequenced Helicobacter pylori strains. J. Mol. Evol. 53, 465-468.

Bernardi, G., 2005. Natural Selection and Genome Evolution. Elsevier Science, Amsterdam.

Bernstein, C., Bernstein, H., 1991. Aging, Sex and DNA Repair. Academic Press, San Diego.

Chen, J-H., Le, S-Y., Shapiro, B., Currey, K. M., Maizel, J. V., 1990. A computational procedure for assessing the significance of RNA secondary structure. CABIOS 6, 7-18.

Coyne, J. A., Orr, H. A., 2004. Speciation. Sinauer, Sunderland, Massachusetts.

Crick, F., 1971. General model for the chromosomes of higher organisms. Nature 234, 25-27.

Dong, F., Allawi, H. T., Anderson , T., Neri, B. P., Lyamichev, V. I. , 2001. Secondary structure prediction and structure-specific sequence analysis of single-stranded DNA. Nucleic Acids Res. 29, 3248-3257.

Doyle, G. G., 1978. A general theory of chromosome pairing based on the palindromic DNA model of Sobell with modifications and amplification. J. Theor. Biol. 70, 171-184.

Eguchi, Y., Itoh, T., Tomizawa, J., 1991. Antisense RNA. Ann. Rev. Biochem. 60, 631-652.

Forsdyke, D. R., 1998. An alternative way of thinking about stem-loops in DNA. A case study of the G0S2 gene. J. Theor. Biol. 192, 489-504.

Forsdyke, D. R., 2001. The Origin of Species, Revisited. McGill-Queen's University Press, Montreal.

Forsdyke, D. R., 2004a. Chromosomal speciation: a reply. J. Theor. Biol. 230, 189-196.

Forsdyke, D. R. 2004b. Regions of relative GC% uniformity are recombinational isolators. J. Biol. Sys. 12, 261-271.

Forsdyke, D. R., 2007. Positive Darwinian selection. Does the comparative method rule? J. Biol. Sys. 15, 95-108.

Greig, D., 2007. A screen for recessive speciation genes expressed in the gametes of F1 hybrid yeast. PLOS Genetics 3, 281-286.

Grunberg-Manago, M., Ortiz, P. J., Ochoa, S., 1955. Enzymatic synthesis of nucleic acid-like polynucleotides. Science 122, 907-910.

Hawley, R. S., Arbel, T., 1993. Yeast genetics and the fall of the classical view of meiosis. Cell 72, 301-303.

Holliday, R., 1990. The history of heteroduplex DNA. BioEssays 12, 133-141.

Kleckner, N., Weiner, B. M., 1993. Potential advantages of unstable interactions for pairing of chromosomes in meiotic, somatic and premeiotic cells. Cold Spring Harb. Symp. Quant. Biol. 58, 553-565.

Kliman, R. M., Rogers, B. T., Noor, M. A. F., 2001. Differences in (G+C) content between species: a commentary on Forsdyke's "chromosomal viewpoint" of speciation. J. Theor. Biol. 209, 131-140.

Krueger. A., Protozanova, E., Frank-Kamenetskii, M. D., 2006. Sequence-dependent base-pair opening in DNA double helix. Biophys. J. 90, 3091-3099.

Liti, G., Barton, D. B. H., Louis, E. J., 2006. Sequence diversity, reproductive isolation and species concepts in Saccharomyces. Genetics 174, 839-850.

Mathews, D. H., 2006. Revolutions in RNA secondary structure prediction. J. Mol. Biol. 359, 526-532.

Meyer, I. M., Miklos, I. , 2005. Statistical evidence for conserved, local secondary structure in the coding regions of eukaryotic mRNAs and pre-mRNAs. Nucleic Acids Res. 33, 6338-6348.

Murchie, A. I. H., Bowater, R., Aboul-Ela, F., Lilley, D. M. J., 1992. Helix opening transitions in supercoiled DNA. Biochem. Biophys. Acta 1131, 1-15.

Naveira, H. F., Maside, X. R., 1998. The genetics of hybrid male sterility in Drosophila. In: Endless Forms: Species and Speciation. (Howard, D. J., Berlocher, S. H., eds.), pp. 330-338. Oxford University Press.

Orita, M., Iwahana, H., Kanazawa , H., Hayashi, K., Sekiya, T., 1989. Detection of polymorphisms of human DNA by gel electrophoresis as single strand conformation polymorphisms. Proc. Natl. Acad. Sci. USA 86, 2766-2770.

Rich, A., 2006. Discovery of the hybrid helix and the first DNA-RNA hybridization. J. Biol. Chem. 281, 7693-7696.

Rich, A., Davies, D. R., 1956. A new two-stranded helical structure: polyadenylic acid and polyuridylic acid. J. Amer. Chem. Soc. 78, 3548-3549.

Schultes, E., Hraber, P. T., La Bean, T. H., 1997. Global similarities in nucleotide base composition among disparate functional classes of single-stranded RNA imply adaptive evolutionary convergence. RNA 3, 792-806.

Shabalina, S. A., Ogurtsov, A. Y., Spiridonov, N. A., 2006. A periodic pattern of mRNA secondary structure created by the genetic code. Nucleic Acids Res. 34, 2428-2437.

Shen, L. X., Basilion, J. P., Stanton, V. P., 1999. Single nucleotide polymorphisms can cause different structural folds of mRNA. Proc. Natl. Acad. Sci. USA 96, 7871-6.

Sobell, H. M., 1972. Molecular mechanism for genetic recombination. Proc. Natl. Acad. Sci. USA 69, 2483-2487.

Sueoka, N., 1961. Compositional correlation between deoxyribonucleic acid and protein. Cold Spring Harb. Symp. Quant. Biol. 26, 35-43.

Tomizawa, J., 1984. Control of ColE I plasmid replication: the process of binding of RNA I to the primer transcript. Cell 38, 861-870.

Wada, A., Suyama, A., Hanai, R., 1991. Phenomenological theory of GC/AT pressure on DNA base composition. J. Mol. Evol. 32, 374-378.

Wagner, R. E., Radman, M., 1975. A mechanism for initiation of genetic recombination. Proc. Natl. Acad. Sci. USA 72, 3619-3622.

Warner, R. C., 1957. Interaction of polyadenylic and polyuridylic acids. Fed. Proc. 16, 266-267.

Watson, J. D., Crick, F. H. C., 1953. A structure for deoxyribose nucleic acid. Nature 171, 738-740.

Wong, B. C., Chiu, S-K., Chow, S. A., 1998. The role of negative superhelicity and length of homology in the formation of paranemic joints promoted by RecA protein. J. Biol. Chem. 273, 12120-12127.

Woodside, M. T., Behnke-Parks, W. M., Larizadeh, K., Travers, K., Herschlag, D., Block, S. M., 2006. Nanomechanical measurements of the sequence-dependent folding landscapes of single nucleic acid hairpins. Proc. Natl. Acad. Sci. USA 103, 6190-6195.

Yakovchuk, P., Protozanova, E., Frank-Kamenetskii, M. D., 2006. Base-stacking and base-pairing contributions into thermal stability of the DNA double helix. Nucleic Acids Res. 34, 564-574.