CHROMOSOMES AS INTERDEPENDENT ACCOUNTING UNITS

THE ASSIGNED ORIENTATION OF C. ELEGANS CHROMOSOMES MINIMIZES THE TOTAL W-BASE CHARGAFF DIFFERENCE

DONALD R. FORSDYKE, CHIYU ZHANG & JI-FU WEI

Journal of Biological Systems (2010) 18, 1-16

Published by World Scientific Publishing Company, Singapore Click Here

1. Introduction  
2. Materials and methods
2.1 Chromosome orientation
2.2 Holocentric nematode chromosomes 
3. Results
3.1 The human-assigned "top" strands of C. elegans chromosomes 
3.2 Other orientations of the top strands
3.3 Accounting in D. melanogaster
4. Discussion
4.1 Are C. elegans chromosomes orientated randomly?
4.2 Why account?
5. Conclusion

DNAs of individual chromosomes violate, albeit perhaps by only one in a thousand bases, Chargaff's second parity rule, which is that Chargaff's first parity rule for duplex DNA (A = T, G = C) applies, to a close approximation, to single stranded DNA. If the "top" strand of one chromosome has A > T and the "top" strand of another has T > A, can they complement to approach even parity (A = T)?  

    Assignment of orientation to the six chromosomes of Caenorhabditis elegans is said to have been arbitrary and, of 26 (= 64) possible combinations of top (T) and bottom (B) strands, the GenBank orientation (designated "TTTTTT") is but one. Yet, for the W bases (A and T) the chromosomes in the GenBank orientation complement to reduce the Chargaff difference (A-T) to only 200 bases (i.e. only one in 323658 bases does not have a potential Watson-Crick pairing partner). This suggests that the assignment was not arbitrary. However, the GenBank orientation for the S bases (G and C) allows an approach to even parity less well than many other orientations, the best of which is BBBBTT (indicating a disparity between the GenBank orientations of the first 4 autosomes and those of chromosomes V and X).

    Although only the euchromatic regions of Drosophila melanogaster chromosomes have been sequenced, there are orientations that allow an approach to even parity. We conclude that, with respect to their Chargaff differences, the chromosomes of C. elegans have the potential to engage in interdependent base accounting. Since this might also apply to D. melanogaster, even when heterochromatin-associated DNA rich in tandem repeats (microsatellite DNA) is excluded, then heterochromatic DNA might not normally participate in the hypothetical accounting process.

Keywords:  Base Composition; Chromosome Orientation; Fruit Fly; Microsatellites; Nematode; Parity Rules; Recombination; Top DNA Strand

1. Introduction

In the early decades of the twentieth century Wilhelm Johannsen introduced the word "gene" and suggested that genes might be thought of as accounting units (Rechnungseinheiten).1 However, he did not envisage a higher level of accounting since, due to recombination, intergenerational changes in haplotype appeared at odds with a higher level of chromosomal organization. Since Johannsen proposed that there was also "a great central something," which played a fundamental role in evolution at a higher level than genes, he was obliged to assign this "something" to a non-chromosomal location.2

That chromosomes can indeed "account" became evident when Chargaff's first parity rule (PR1; A = T and G = C) was seen to apply generally to duplex DNA, be it genic or non-genic.3 However, although the rule was seldom violated, the "accounting unit" was very narrow, being manifest as the pairing of single complementary bases that were opposite each other in the "top" (T) and "bottom" (B) strands of a duplex (i.e. pairing in-parallel). But, to a close approximation, the same parity was found to apply to bases in single strands of DNA (Chargaff's second parity rule; PR2). Here the putative accounting unit was wider, involving bases that were distributed along the length of a strand (with the potential to pair in-series), and parity was sometimes violated in a systematic manner. These violations were recorded as "Chargaff differences" or "skews" - the extents to which there were, in a segment of DNA, more As than Ts (A > T), or more Gs than Cs (G > C). When compared with the corresponding shuffled sequences, these differences tended to approach a maximum when accounting was in gene-sized segments (1-2 kb) and decreased as segment length increased, sometimes approaching a minimum only at the length of an entire chromosome.4-7 Thus, judging by its low Chargaff difference, each chromosome could be considered as an independent accounting unit.

The precision of PR2, first postulated from direct chemical analyses,8 became increasingly evident as the number of complete chromosome sequences increased. Viruses of importance to humans were sequenced at an early stage. For example, the single chromosome of Vaccinia virus has 63921 As, 63,776 Ts, 32,030 Gs, and 32,010 Cs for a total of 191,737 bases. The value A - T is only 145 bases (i.e. 0.11% of the total W bases). The value G - C is only 20 bases (i.e. 0.03% of the S bases). Only one in 881 of the W bases does not have a potential pairing partner in the same strand, and only one in 3202 of the S bases does not have a potential pairing partner in the same strand.4

Violations of parity are particularly evident in genes and generally can be forecast by Szybalski's transcription direction rule - purine (R) excess in the mRNA-synonymous strand and a corresponding pyrimidine (Y) excess in the RNA polymerase template strand. Thus, of the 92 rightward-transcribed open reading frames (ORFs) in Vaccinia virus, 83 have A > T and 70 have G > C. Similarly, of the 105 leftward-transcribed ORFs 94 have T > A and 75 have C > G. Since there are approximately equal numbers of rightward and leftward ORFs, these local violations tend to cancel out.5 However, in some viruses all ORFs are transcribed in the same direction. Thus, HIV-1 grossly violates the second parity rule because all ORFs are transcribed to the right.9

A proposed adaptive explanation for PR2 was that the potentially complementary bases in single stranded nucleic acids pair to form stem-loop secondary (and higher-ordered) structures that facilitate recombination. For this, the complementary strands of a DNA duplex would have to "unpair."10-11 Indeed, the potential for the local extrusion of such structures is pervasive, both in genic and non-genic DNA (introns and intergenic regions). However, the demands of protein-encoding appear to constrain the potential and PR2 is followed more closely in non-coding regions where greater potential for secondary structure is demonstrable.12-16 A proposed non-adaptive "neutral" explanation for PR2 in terms of "mutational biases,"17 now seems unlikely. It is acknowledged18 that genomes contain palindromic sequences that "may be under selective pressure to preserve their palindromic character and therefore follow PR2 (as pure palindromic sequences are effectively base paired)."

 Thus, PR2 appears at least partly to reflect the potential to extrude stem-loop structures from palindromic sequences.6-7 Yet only the stems in such structures provide a clear basis for the rule. It is postulated that in higher ordered structures Watson-Crick-type "kissing" interactions between bases in loops have the potential to contribute to the accounting process, be it local or long-range.5 Thus, PR2 might apply to long genomic segments because of the summation of underlying primary accounting processes involving both stems (short-range accounting) and loops (short and long-range accounting).

It was speculated that the rule might result from evolutionary pressures on nucleic acid sequences for the development of genome-wide stem-loop potential as part of short and long range accounting processes which work to sustain the integrity of various levels of information in DNA. If genome-wide, could there be inter-chromosomal accounting? We here explore whether, and to what extent, there is the potential to further minimize Chargaff differences by some form of inter-chromosomal accounting. Can the entire genome be viewed as an accounting unit? To simplify, we here ignore potential weak base pairing between G and T.

 

2. Materials and methods

2.1 Chromosome orientation

Fundamental to the problem of inter-chromosomal accounting is chromosome orientation. In simple terms, a chromosome can be considered freely rotatable through 180 degrees, and which DNA end is "left", and which DNA end is "right" is decided by the original human mappers or sequencers of the DNA, often in an arbitrary manner. However, orientation affects the sign of Chargaff differences. It follows from PR1 (duplex DNA) that if A > T in the top strand, then T > A in the bottom strand (i.e. the strands will have equal violations of PR2, but with opposite signs). While the human sequencers of chromosomes may have designated one strand of each chromosome as the "top" strand and entered the corresponding sequences in GenBank, "Nature" may have chosen a different orientation. Assuming that one strand of a chromosome provides a relatively uniform accounting template, which strand of another chromosome would "Nature" have picked to co-account with that strand?

A simple example for a hypothetical three chromosome organism should make this clearer. We will consider just the W bases (A and T). If "Nature" would like PR2 to cover the entire genome, then in the first chromosome the top strand difference (A - T) might be +100 (A > T). In the second chromosome the top strand difference might be -25 (i.e. A < T). In the third chromosome the top strand difference might be -75 (i.e. A < T). The total difference for the three chromosomes would be zero (100 - 25 - 75 = 0). "Accounting" would be perfect. But if human sequencers assigned the third chromosome the opposite orientation, the total difference for the three chromosomes would be 100 - 25 + 75 (= 150). Thus, for any set of chromosomes, unless there are compelling reasons for the orientation displayed in GenBank, we have to explore alternative orientations.

The first hypothetical chromosome can also be viewed as inverted (i.e. T > A) giving a top strand difference (A - T) of -100. This would require the other two chromosomes to be viewed as inverted (i.e. A > T) for values of 25 and 75 respectively. Again, the total difference for the three chromosomes would be zero (-100 + 25 + 75 = 0). For three chromosomes there are 23 (= 8) alternative orientations. If we designate the human-assigned orientation as TTT, then there are 8/2 alternatives (TTT, TBT, TTB, and TBB). But there is also a complementary set of 8/2 alternatives for the bottom strand (BBB, BTB, BBT, and BTT). From the viewpoint of the cell these two sets would seem to be identical.

 

 2.2. Holocentric nematode chromosomes

A problem with this approach is that we are seeking small differences between two very large sets (the set of A bases and the set of T bases in a chromosome). For many so-called "complete" genome sequences, there are still incomplete or uncertain regions. Reflecting a fascination with genes, sequencers have focused on gene-rich euchromatic DNA rather than gene-poor heterochromatic DNA. The sequencers of Drosophila melanogaster lamented: "Because of the unclonable repetitive DNA surrounding the centromeres, it is highly unlikely that the genomic sequence of chromosomes from eukaryotes such as Drosophila or human will ever be "complete"."19 However, the chromosomes of the nematode worm C. elegans have no fixed centromeric region (i.e. they are holocentric), and their dispersed repetitive elements have not evaded sequencing.20-22 With a per base sequencing error rate estimated at <10-5, the six chromosomes of this organism (26 = 64 human-assigned alternative orientations) appear amenable to exact study.

 

3. Results

3.1 The human-assigned "top" strands of C. elegans chromosomes

This worm had 5 autosomes (I, II, III, IV, V) and one sex chromosome (X) with a cumulative total in GenBank of 100,267,450 bases (excluding the mitochondrial genome). For some chromosomes the top strand Chargaff differences are positive (e.g. A > T); others are negative (e.g. A < T). Thus, there is an opportunity for positives and negatives to cancel out to reduce the overall Chargaff difference of the genome. Remarkably, while the percentage Chargaff differences for the W bases in individual chromosomes constitute between 0.07% to 0.24% of the W bases in each chromosome (average absolute value 0.136% with standard deviation 0.061%), the overall percentage W base Chargaff difference is 0.0003%. This is many standard deviations away from the average. The total As and total Ts differ by only 200 bases (Table 1). Only one in 323658 bases violates the second parity rule. However, the corresponding S bases show little evidence of canceling out. The overall S base percentage Chargaff difference is -0.067%, which is within the range displayed by individual chromosomes. Indeed, the cumulative S base Chargaff difference (-23840 bases) exceeds that of each individual chromosome.

    On the assumption (improbable) that a W base Chargaff difference unit has the same weight as an S base Chargaff difference unit, the two Chargaff differences can be united as the R - Y Chargaff difference. However, the weakly positive W base Chargaff difference (+200 bases) cannot complement the extremely negative S base Chargaff difference (Table 1). It is probable that a W base Chargaff difference unit would have to be multiplied by some factor to make it equal to an S base Chargaff difference unit but, judging by the relative strengths of AT and GC base pairing, such a factor is likely to be no more than 2-3 fold. Hence, even this correction would not ameliorate the extreme R - Y Chargaff difference (-23640 bases). Thus, it is possible that, in this AT-rich organism (36% G + C) inter-chromosomal accounting is a function of the W bases with the S bases playing some other role. On the other hand, there might be a more favorable orientation for reducing the S base Chargaff difference.


Table 1  Base compositions and Chargaff differences for C. elegans chromosomes when in GenBank orientationa

 

Chromo

W bases

 

S-bases

 

W+S bases

some

A

T

A-T

%b

 

G

C

G-C

%

 

R-Y

%

 

A+T+G+C

I

4835939

4848452

-12513

-0.1292

 

2692150

2695878

-3728

-0.0692

 

-16241

-0.1078

 

15072419

II

4878195

4869712

8483

0.0870

 

2762195

2769214

-7019

-0.1269

 

1464

0.0096

 

15279316

III

4444653

4423570

21083

0.2377

 

2466319

2449139

17180

0.3495

 

38263

0.2776

 

13783681

IV

5711040

5730969

-19929

-0.1742

 

3017008

3034767

-17759

-0.2935

 

-37688

-0.2154

 

17493784

V

6748863

6758902

-10039

-0.0743

 

3700477

3711156

-10679

-0.1441

 

-20718

-0.0990

 

20919398

X

5747199

5734084

13115

0.1142

 

3117867

3119702

-1835

-0.0294

 

11280

0.0637

 

17718852

Totalsc

32365889 

32365689 

200   

 0.0003

 

 17756016

 17779856

-23840 

 -0.0671

 

-23640 

 -0.0236

 

100267450 

 

 

 

 

 

a May 2008 GenBank designations: I, NC_003279.5; II, NC_003280.6; III, NC_003281.7; IV, NC_003282.4; V, NC_003283.7; X, NC_003284.6.

 

b Difference between the sums of two bases (in the same row) as a % of their combined sum.

 

 

c Totals are for base and Chargaff difference (base "skew") columns, not for % value columns (i.e. 200 is 0.0003% of the sum of 32365889 and 3265689).


3.2Other orientations of the top strands  

The GenBank orientation of the six C. elegans chromosomes being designated TTTTTT, then the total Chargaff differences for other orientations (TTTTTB, TTTTBT, TTTBTT, etc.) were examined. Thus, for the pattern TTTTTB, the orientation of the last of the six chromosomes, the X chromosome, was reversed. Then A - T was -13115 (instead of 13115 as in Table 1) and G - C was 1835 (instead of -1835 as in Table 1). Total Chargaff differences (A - T, G - C, R - Y) were calculated for this and other orientation patterns.

The 64 orientations with corresponding values for A - T, G - C and R - Y, separately rank-ordered, are shown in Fig. 1. Values for the GenBank orientation (Table 1), expressed in kilobases (0.2 kb, -23.84 kb and -23.64 kb, respectively), are indicated by vertical arrows. The 32 positive orientation patterns were symmetrical with the 32 negative orientation patterns and, to simplify, the three complementary GenBank orientations at -0.2 kb, 23.84 kb and 23.64 kb (corresponding to inversion of all six chromosomes) are not shown. For a given rank, Chargaff differences for the W bases (A - T) were generally further from zero than Chargaff differences for the S bases (G - C). Values for the addition of the W and S base Chargaff differences corresponding to a particular orientation pattern (R - Y) were further from zero than the corresponding ranks of W and S base Chargaff differences alone. At certain points there were discontinuities and Chargaff differences seemed to ascend in stepwise fashion.

Fig. 1.  

Rank ordering of summations of C. elegans chromosomal Chargaff differences for each of the 64 possible chromosomal orientation patterns. Rank orderings were independently performed for: A - T (open circles); G - C (grey squares); R - Y (black triangles). Each R - Y value is the summation of the A - T and G - C values for a particular orientation pattern. The horizontal dashed line indicates even parity (A = T, G = C, R = Y). Thus, above the line A > T, G > C and R > Y; below the line A < T, G < C and R < Y. Unique values, corresponding to the orientation of the chromosomes as deposited in GenBank, are indicated by vertical arrows.

 

    The GenBank orientation at position 33 (0.2 kb) began the 32 positive orientations for A - T (i.e. orientations 33-64), showing that this was the "best" orientation (i.e. closest to even parity) for the W bases. Somehow the original orientation assigners had hit on this - an improbable chance event (P = 0.031). However, the GenBank G - C orientation ranked at position 13 showing that there were 19 "better" (i.e. less negative) orientations for the S bases. When the Chargaff differences were combined (R - Y) the GenBank orientation still ranked low (position 24 with 8 less negative orientations).

Fig. 2.  Summation of C. elegans chromosomal Chargaff differences for each of the 32 chromosomal orientation patterns that have positive A - T differences, in rank order. (a) A - T; (b) G - C; (c) R - Y. The latter, (b) and (c), are keyed to the rank order of (a). Thus, unlike Fig. 1, the three values for a particular orientation pattern are related vertically to each other. While the A-T values in (a) must be positive, corresponding values in (b) and (c) can be either positive or negative. Vertical arrows refer to unique values that correspond either with the GenBank orientation (designated TTTTTT for the top strands of six consecutive chromosomes), or the "best" orientation. This is designated BBBBTT in (b) and (c) to indicate that, due to chromosome switching from the GenBank orientation (T), the "left" ends of the first four chromosomes are now at the right, so that the bottom strand (B) is on top. The diagonal arrow in (a) indicates the unique value for A-T that corresponds to the "best" orientation (rank order position 6) in (b) and (c).

 

The 32 positive A - T Chargaff differences were again plotted in rank order. The verticle arrows in Fig. 2a point to the GenBank orientation (pattern TTTTTT), which is also the "best" orientation as far as the W bases are concerned (Chargaff difference only 0.2 kb). Using the rank order of Fig. 2a as a key, the corresponding values for the S base Chargaff differences for each orientation pattern were plotted in Fig. 2b. The GenBank S base orientation (vertical arrow) was at position 1 (pattern TTTTTT) since the latter orientation had been the "best" for the W bases. However, this value was -23.84 kb (see Table 1). The best orientation for the S bases alone was at position 6 (-1.2 kb). This corresponded to the pattern BBBBTT. Thus, in order to minimize the total S base Chargaff difference, the sequences of chromosomes I, II, III and IV would have to be inverted. Alternatively, complementing this (i.e. TTTTBB), the sequences of chromosome V and of the sex chromosome (X) would have to be inverted. The value of the W base total Chargaff difference for this pattern (position 6) was still quite low (6.0 kilobases; Fig. 2a), so the overall "best" orientation pattern when W and S bases were combined (R - Y) was still BBBBTT or TTTTBB (Fig. 2c).

Although there are equal numbers of patterns producing negative and positive S base Chargaff differences (32 negative and 32 positive; Fig. 1), in C. elegans the set of patterns corresponding to positive W base Chargaff differences (Fig. 2a) associates mainly with positive S base Chargaff differences (Fig. 2b). Thus, the set of patterns corresponding to negative W base Chargaff differences would associate mainly with negative S base Chargaff differences. The combinations of Chargaff differences for each particular orientation (R - Y; Fig. 2c) on average exceeds the corresponding individual Chargaff differences for the W bases (Fig. 2a) and S bases (Fig. 2b), so are generally positive (see also Fig. 1).

3.3 Accounting in D. melanogaster

Given that mainly the euchromatic portion of this fruit fly genome has been sequenced,19 it seemed unlikely that the above approach would be informative. Nevertheless, the analysis was repeated for fruit fly DNA, and the results were of sufficient interest to warrant their inclusion here.

Table 2 shows Chargaff differences for the GenBank orientations of, successively, chromosomes X, II (left of the centromere), II (right of the centromere), III (left of the centromere), III (right of the centromere), and IV. Values (%) for the smallest chromosome (IV) were markedly different from the rest. Nevertheless, although heterochromatic regions and the entire Y chromosome (highly heterochromatic) were excluded, the total W base Chargaff difference was 0.01% which, as in C. elegans, is considerably less than the W base Chargaff differences of the original chromosomes that contributed to that total  (average absolute value 0.212% with a large standard deviation 0.299%, primarily due to chromosome IV). As in C. elegans, the total S base Chargaff difference (-0.03%) was of the same order as those of the individual chromosomes, with the exception of chromosome IV. Exclusion of the latter from the calculation increased the total W base and total S base Chargaff differences (%); so, despite its aberrant values, chromosome IV appeared to contribute to overall accounting. However, together the W and S bases gave a total Chargaff difference (R - Y) of only -0.007%, a value which could be lowered by omitting chromosome IV (-0.002%).  In this case the average absolute value was 0.086% with a standard deviation 0.032%. Despite the difficulties with chromosome IV, the overall similarity with C. elegans raises the possibility that for the fruit fly "Nature" may have excluded the DNA in extensive heterochromatic regions from her accounting. 

 

 

Table 2  Base compositions and Chargaff differences for D. melanogaster chromosomesa in GenBank orientationb

 

 

 

Chromo-

W bases

 

S-bases

 

W+S bases

 

some

 

A

T

A-T

%c

 

G

C

G-C

%c

 

R-Y

%

 

A+T+G+C

 

X

 

6409325

6432035

-22710

-0.1013

 

4748415

4742952

5463

0.0244

 

-17247

-0.0769

 

22332727

 

IIL

 

6699731

6684734

14997

0.0652

 

4815192

4811687

3505

0.0152

 

18502

0.0804

 

23011344

 

IIR

 

6007371

5988450

18921

0.0895

 

4574750

4576037

-1287

-0.0061

 

17634

0.0834

 

21146608

 

IIIL

 

7113242

7135141

-21899

-0.0892

 

5141498

5153576

-12078

-0.0492

 

-33977

-0.1384

 

24543457

 

IIIR

 

7979156

7950459

28697

0.1028

 

5980227

5995211

-14984

-0.0537

 

13713

0.0491

 

27905053

 

IV

 

430227

441336

-11109

-0.8218

 

242039

238155

3884

0.2873

 

-7225

-0.5345

 

1351757

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Totalsd

 

34639052

34632155

6897

0.0100

 

25502121

25517618

-15497

-0.0304

 

-8600

-0.0071

 

120290946

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

a Release 5.2 dated May 2006. The left and right arms of chromosomes II and III were sequenced independently.

 

 

 

 

 

 

 

 

 

 

 

 

 

bGenBank designations: X, NC_004354.3; IIL, NT_033779.4; IIR, NT_033778.3; IIIL, NT_037436.3; IIIR, NT_033777.2; IV, NC_004353.3.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 For these chromosomes there are, respectively, 90100, 200, 100, 100, 0, and 100 bases of unassigned sequence, designated "other."

 

 

 

 

 

 

 

 

 

 

 

 

 

 Heterochromatic regions, including the Y chromosome, were not sequenced.

 

 

 

cDifference between the sums of two bases as a % of their combined sum.

 

 

 

d Totals are for base and Chargaff difference columns, not for % value columns.

 

 

    

    Fig. 3 shows the symmetrically balanced grouping of 32 negative and 32 positive Chargaff differences as in Fig. 1. Again, S base Chargaff differences diverged less from zero than W base Chargaff differences, but the summations of the differences corresponding to a particular orientation patterns (R - Y) were seldom able to exceed (either negatively or positively) those of the W base Chargaff differences that ranked at the same position. For the GenBank orientation, the W base Chargaff difference (6.9 kb) appeared to be countermanded by the S base Chargaff difference (-15.5 kb), for an overall difference in that chromosomal orientation (R - Y) of -8.6 kb. In all three cases (A - T, G - C, R - Y) there were orientations where the Chargaff differences diverged less from zero than the corresponding GenBank orientation.

Fig. 3.  Rank ordering of summations of D. melanogaster chromosomal Chargaff differences for each of the 64 possible chromosomal orientation patterns. Details are as in Fig. 1.

 

    Fig. 4 shows Chargaff differences keyed to the 32 W base positive ranking orientation patterns. There is a less marked stepwise progression than in the case of C. elegans, but the "best" orientation (879 bases) is nearly 5-fold less than the next best orientation (5077 bases; Fig. 4a). For the W bases the "best" orientation corresponded to BTBBBT (or the complement TBTTTB, which is not shown). For this, X, IIR, IIIL and IIIR would need to be in the non-GenBank orientation. However, for the S bases (Fig. 4b) the "best" orientation corresponded to BTBBTT (or the complement TBTTBB, which is not shown). For this, X, IIR and IIIL would need to be in non-GenBank orientation. The orientation with the best combination of the two Chargaff differences (R - Y; Fig. 4c) was TBBBTB. For this, IIL, IIR, IIIL and IV would need to be in the non-GenBank orientation.

Fig. 4.  Summation of D. melanogaster chromosomal Chargaff differences for each of the 32 chromosomal orientation patterns that have positive A - T differences, in rank order. Details are as in Fig. 2.

    Although the DNA sequences of chromosome segments IIL and IIR appear to have been orientated with respect to each other (also IIIL and IIIR), the segments sometimes separate in the above orientations. It can also be noted that in D. melanogaster the set of patterns corresponding to positive W base Chargaff differences (Fig. 4a) associate equally with both positive and negative S base Chargaff differences (Fig. 4b). However, the magnitudes of the W base Chargaff differences are such that the combinations of Chargaff differences for each particular orientation (R - Y; Fig. 4c) are generally positive (when W base and S base Chargaff difference values are assigned equal weighting).

 

4. Discussion

A purely stochastic ("neutral") explanation for PR2 at the level of individual chromosomes17 is now considered unlikely.18 Whether or not our proposed adaptive explanation (see Section 1) will prove valid, it was of interest to seek evidence for PR2 at a level higher than that of the individual chromosome. The absence of a defined centromere (See Section 2.2) suggested that the chromosomes of C. elegans would be ideal for this purpose. Mutants induced by ethyl methanesulfonate (EMS) were originally mapped into six complementation groups by Brenner.23-24 EMS usually alkylates guanine residues that have neighboring purine residues,25 and so would be expected preferentially to locate to the purine-rich mRNA synonymous stands of coding regions.7

4.1 Are C. elegans chromosomes orientated randomly?

Given a set of numbers that can be assigned either positive or negative values, the sum of the numbers should be minimizable by appropriate choice of positive or negative assignments. Thus, the fact that, given a set of chromosomes that are apparently randomly orientated, one can find an orientation pattern that diminishes their collective Chargaff differences should not be surprising. However, it is remarkable that six AT Chargaff differences, with absolute values varying between 8.4 kb and 21.1kb, should be minimizable to only 0.2 kb (Table 1). The case that this reflects the operation of some form of whole-genome accounting would be strengthened if the orientations assigned from Chargaff difference values could be related, in a consistent fashion, to some other polar aspect of chromosomes (e.g. acentric centromeres, and defined p and q chromosomal arms). That the assignment of orientations to the holocentric C. elegans chromosomes22-24 was really arbitrary, is brought into question by our observations that the GenBank orientation turns out to be the best orientation as far as maintaining even parity for the W base pair (P = 0.031), although reassignment was required for chromosomes V and X for the S base pair (Table 1; Figs. 1, 2). In D. melanogaster the best orientation for the W base pair requires at least two reorientations (Fig. 4a), and for the S base pair three reorientations (Fig. 4b). Yet, even with sequencing uncertainties, the W base GenBank orientation is not far removed from the best orientation.

4.2 Why account?  

Our study raises several questions, some of which have been touched on in Section 1 and previously.4-7 If genomes truly account, what is the mechanism of accounting and what adaptive purposes might it serve? Is multi-chromosomal Chargaff difference analysis a valid way of exploring this? To what extent can accounting involve W bases and S bases independently, and do their units score differently? Is this influenced by the relative proportions of W and S bases (i.e. GC%) in a genome? Do repetitive elements (as in heterochromatin) participate in the putative accounting process, or should they be masked prior to analysis? Should the left and right arms of a monocentromeric chromosome be considered as separate chromosome units for accounting purposes?

We have observed that, despite the omission of large tracts of heterochromatic DNA rich in simple sequence tandem repeats (microsatellites), the chromosomes and half-chromosomes of D. melanogaster can give results not too dissimilar from the chromosomes of C. elegans. This suggests that much heterochromatin-associated DNA may not participate in accounting as measured through Chargaff differences. In this respect some, perhaps related, special properties of tandem repeats can be noted: (i) They accumulate in regions with suppressed recombination.20 (ii) They can expand in number to allow individuals within a species to vary in total genome size yet still remain within the species.26 (iii) They are unlikely to provoke, through failure to correctly pair at meiosis, divergence into species.21

The ability to diversify DNA sequence while remaining a functioning member of a species probably constitutes a major defense against intracellular pathogens.27 Yet, there must be limits to such diversification and hence mechanisms to sense when such limits are approached or crossed. We suspect that accounting processes operate at many levels to sense impending or actual boundary violations and invoke necessary corrections. All this appears far from Johannsen's original Rechnungseinheiten. Yet, in the early decades of the twentieth century Johannsen with his "great central something," William Bateson with his "residue," and Michael Guyer with his "general substratum," were all seeking something beyond genes to explain the process by which certain individuals, through genome diversification, might "escape" from their species to form a new one.2 This would have required either a subversion of putative accounting processes or the suspension of any sequence corrections that such processes might call for.

5. Conclusion

We have explored Johannsen's idea that there might be a high form of genetic accounting, and have set out some conceptual and methodological principles. Our results with C. elegans (Table 1; Figs. 1, 2) suggest that the assigned chromosome orientations were not arbitrary. It is possible that, consciously or unconsciously, those who originally mapped and sequenced employed some orientating principle, perhaps related to EMS mutagenesis.25 In future we hope to scale up multi-chromosomal Chargaff difference analysis to species with more than six chromosomes. Such studies should be more productive when the "complete" genomic sequences of these species become truly complete. However, our initial study with D. melanogaster (Table 2; Figs. 3, 4) raises the possibility that heterochromatic DNA may not participate in some species.

Acknowledgement   

Queen's University hosts Forsdyke's web-pages, which contain copies of some of the cited references.

References

  1. Roll-Hansen N, The genotype theory of Wilhelm Johannsen and its relation to plant breeding and the study of evolution, Centaurus 22:201-235, 1979.

  2. Cock AG, Forsdyke DR, Treasure Your Exceptions. The Science and Life of William Bateson, Springer, New York, pp. 497-498, 504-507, 2008.

  3. Watson JD, Crick FH, Genetical implications of the structure of deoxyribonucleic acid, Nature 171:964-967, 1953.

  4. Bell SJ, Forsdyke DR, Accounting Units in DNA, J Theor Biol 197:51-61, 1999.

  5. Bell SJ, Forsdyke DR, Deviations from Chargaff's second parity rule correlate with direction of transcription, J Theor Biol 197:63-76, 1999.

  6. Forsdyke DR, Evolutionary Bioinformatics, Springer, New York, 2006.

  7. Forsdyke DR, Bell SJ, Purine-loading, stem-loops, and Chargaff's second parity rule: a discussion of the application of elementary principles to early chemical observations, Appl Bioinf 3:3-8, 2004.

  8. Chargaff E, Essays on Nucleic Acids, Elsevier, Amsterdam, 1963.

  9. Cristillo AD, Mortimer JR, Barrette IH, Lillicrap TP, Forsdyke DR, Double-stranded RNA as a not-self alarm signal: to evade, most viruses purine-load their RNAs, but some (HTLV-1, Epstein-Barr) pyrimidine-load, J Theor Biol 208:475-491, 2001.

  10. Crick F, General model for the chromosomes of higher organisms, Nature 234:25-27, 1971.

  11. Wilson JH. Nick-free formation of reciprocal heteroduplexes: a simple solution to the topological problem, Proc Natl Acad Sci USA 76:3641-3645, 1979.

  12. Forsdyke DR, Relative roles of primary sequence and (G+C)% in determining the hierarchy of frequencies of complementary trinucleotide pairs in DNAs of different species, J Mol Evol 41:573-581, 1995.

  13. Forsdyke DR , A stem-loop "kissing" model for the initiation of recombination and the origin of introns, Mol Biol Evol 12:949-958, 1995.

  14. Forsdyke DR , Conservation of stem-loop potential in introns of snake venom phospholipase A2 genes: an application of FORS-D analysis, Mol Biol Evol 12:1157-1165, 1995.

  15. Bultrini E, Pizzi E, Giudice PD, Frontali C, Pentamer vocabularies characterizing introns and intron-like intergenic tracts from Caenorhabditis elegans and Drosophila melanogaster, Gene 304:183-192, 2003.

  16. Spano M, Lillo F, Micciche S, Mantagna RN, Statistical properties of thermodynamically predicted RNA secondary structures in viral genomes, Eur Phys J B 65:323-331, 2008.

  17. Sueoka N, Intrastrand parity rules of DNA base composition and usage biases of synonymous codons, J Mol Evol 149:125-131, 1995.

  18. Lobry JR, Sueoka N, Asymmetric directional mutation pressures in bacteria, Genome Biol. 3 (10):research 0058, 2002.

  19. Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, Scherer SE, Li PW, Hoskins RA, Galle RF et al., The genome sequence of Drosophila melanogaster, Science 287:2185-2195, 2000.

  20. Gvozdev VA , Kogan GL, Usakin LA, The Y chromosome as a target for acquired and amplified genetic material in evolution, BioEssays 27:1256-1262, 2005.

  21. Zhang C, Xu S, Wei J-F, Forsdyke DR, Microsatellites that violate Chargaff's second parity rule have base order-dependent asymmetries in the folding energies of complementary DNA strands and may not drive speciation, J Theor Biol 254:168-177, 2008.

  22. Cutter AD, Dey A, Murray RL, Evolution of the Caenorhabditis elegans genome, Mol Biol Evol 26:1199-1234, 2009. Mol Biol Evol 26:1199-1234, 2009.

  23. Brenner S, The genetics of Caenorhabditis elegans, Genetics 77:71-94, 1974.

  24. Hillier LW, Coulson A, Murray JI, Bao Z, Sulston JE, Waterston RH, Genomics in C. elegans: so many genes, such a little worm, Genome Res 15:1651-1660, 2008.

  25. Greene EA, Codomo CA , Taylor NE, Henikoff JG, Till BJ, Reynolds SH, et al., Spectrum of chemically induced mutations from a large-scale reverse-genetic screen in Arabidopsis, Genetics 164:731-740, 2003.

  26. Biemont C, Within-species variation in genome size, Heredity 101:297-298, 2008.

  27. Forsdyke DR , Adaptive value of polymorphism in intracellular self/not-self discrimination, J Theor Biol 210:425-434, 2001.

Bioinformatics Index (Click Here)

HomePage (Click Here)

This page was established in October 2009 and was last edited on 26 Oct 2020 by Donald Forsdyke