DOES THE COMPARATIVE METHOD
Journal of Biological Systems (2007) 15, 95-108
Accepted 18th August 2006. Published March 2007
World Scientific Pub. Corp.
Keywords: Positive selection;
Single sequence; Primary information; Secondary information; Conventional
phenotype; Genome phenotype; Speciation
Positive Darwinian selection is usually evaluated by comparing two
nucleotide sequences that are presumed to have diverged from a common
ancestor. Recently, Plotkin et al. suggested that the codon volatility
displayed in a single genic sequence provides evidence for positive
selection.1 However, this approach was criticized.2-9
Irrespective of the validity of the criticisms,10 there is
general agreement that, while currently "the comparative method
in a single sequence would "confer
greater power of analysis with less information,"5
and would be "revolutionary
-- because it challenges the essentiality of the
Furthermore, "because a closely
related sequence may be unavailable for comparison -- a method to detect
a signature of positive selection in a single genome has considerable
appeal."3 Thus, "the idea of using just one DNA sequence to
detect natural selection -- is novel and attractive, and it would be
interesting to develop other measures that may accomplish this goal."7
when a novel method for detecting positive Darwinian selection from a
single sequence was presented a decade ago,11 there was little
interest, perhaps because the method involved new concepts and a new
technology - the detection of base order-dependent stem-loop potential.12-14
The conceptual and technological aspects have since become better
established.15-21 For example, it is recognized that "the
protein encoding region -- can -- comprise one or more overlapping
layers of information."22 These
layers can be considered as a "genome dialect."23 Furthermore, there is increasing
recognition of the possibility of non-neutral evolution in non-coding DNA
and at synonymous sites in coding DNA.24-26
In view of increased questioning of the comparative method,26-30 I here review the conceptual bases of the stem-loop potential and comparative methods. Technical details may be found elsewhere.11 I conclude that, while sometimes a misleading indicator of positive selection, the comparative method casts new light on the fundamental question Darwin posed - namely, that of the origin of species. The initiation of divergence between lineages can involve secondary information (the genome phenotype), not primary information (the conventional phenotype).
reproductive success is impeded by a mutation, then selection of organisms
with the mutation is negative. If reproductive success is promoted then
the selection is positive. These are two, usually mutually exclusive,
consequences of a mutation in a nucleic acid sequence. The extreme
imperative of negative selection is: if you mutate, you die. Thus, the
broad population of non-mutators remains and the few mutators die. The
extreme imperative of positive selection is: if you do not mutate,
you die. Thus, the broad population of non-mutators dies, and the few
mutators flourish (i.e. there is a population "bottle neck" from which
only the mutators emerge). Occupying the middle-ground are "neutral"
mutations, and mutations that may lead to either weak positive or weak
negative selection. In the latter cases there will, by definition, be
effects on the number of descendents, but only in the long-term.
a genic base mutation will lead to negative or positive selection usually
depends on the part of a gene-product that it affects. A mutation
affecting the active site of an enzyme will usually disturb enzyme
function and this may impair the function of an organism so its fitness to
reproduce is impeded. In the extreme, this is ensured by the death of the
organism. On the other hand, a
mutation affecting an antigen at the surface of a pathogen may allow it to
evade the immune defences of its host, so its fitness to reproduce is
enhanced. In the extreme this is ensured by the death of pathogens that do
not have the mutation.
Nucleic acid bases that are evolving very slowly (i.e. they are conserved among related organisms) are likely to affect functions subject to negative selection (i.e. organisms with mutations in the bases are functionally impaired). Nucleic acid bases that are evolving very rapidly (i.e. they are not conserved among related organisms) are likely to affect functions subject to some degree of positive selection (i.e. organisms with mutations in the bases are functionally improved). Thus, a determination of evolutionary rate has the potential to assist the distinction between bases under positive selection, and bases under negative selection. For this, base differences between sequences can be calibrated against some temporal scale (e.g. the period from the present to the time of divergence of sequences from a common ancestral sequence). Accumulation of a large number of differences in a short time would indicate positive selection. However, accurate temporal calibration is difficult. Accordingly, an alternative comparative approach, involving ratios of non-synonymous and synonymous base substitution mutations, has been widely adopted.31
Note that here we are concerned with genes under some degree of selection, not with rare genes whose evolution has been predominantly influenced by random drift. Such genes are assumed to be randomly distributed among those predominantly influenced by selection, and so should have little statistical impact on the data under discussion. We are also not concerned with comparative methods that use the extent of decrease in recombination rate as a measure of positive selection. These assume, for a nucleic acid segment containing a gene that is evolving rapidly (i.e. positive selection), that there may not be time for separation of the gene from neighboring segments by recombination. Thus neighboring genes will tend to remain linked. They will "hitchhike" through the generations with a positively selected gene. Variant forms of neighboring genes will be lost from the population in the course of this selective sweep. Consequently, polymorphism among members of a population in the region of a positively selected gene is decreased.18,31
best recognized form of genomic information is genic.
The proteins and RNAs encoded by genes reflect their "primary
information" and appear to have most influence on the conventional
phenotype - an individual's somatic form and function. However, other
types of information ("secondary information") exert pressures in
genome space. These pressures have the potential to affect the
conventional phenotype, often affect base composition, and are either
local or general. Local
pressures acting on genes include purine-loading pressure (AG-pressure),
and RNY-pressure (the pressure for first and third codon bases to be
purine and pyrimidine, respectively). General pressures affect entire
genomes and include GC-pressure, fold pressure ("stem-loop potential")
and pressures for genome compactness.18,32-36 In other words,
there is a genome phenotype, meaning that aspects of genome organization
have the potential to influence reproductive success in the same sense
that aspects of the conventional phenotype have the potential to influence
reproductive success. Sometimes pressures appear to conflict. For example,
seeming to accommodate AG-pressure, protein lengths can increase by
inclusion of low complexity, inter-domain, segments, that do not appear
important for protein function, yet contain "placeholder"
amino acids encoded by AG-rich codons. Here compactness cedes to purine-loading
Thus, genomes can be seen as channels carrying multiple forms of information through the generations from the distant past to the present. As with information channels in general, carrying capacity is finite. When genomes cannot satisfy all informational demands a balance is established, with trade-offs between competing demands. This contrasts with the long-held view that there is an excess of carrying capacity in genome space, so that "neutral" mutations can endure, and sequences without obvious function can be considered as "junk."41
Compared with genes under negative selection, there is a greater onus on genes under positive selection to adapt to local genic pressure - the pressure to transmit a gene's primary information. To accommodate this increased pressure, the trade-offs in secondary information by non-genic pressures, be they local or general, must be greater. General pressures are of most investigative utility, since their diminution in protein-encoding regions that are under positive selection pressure can be evaluated relative to their levels either in local non-protein-encoding regions (i.e. in intronic and intergenic sequences that are assumed not to be under positive selection), or in local protein-encoding regions (likely to be under negative selection pressure).
Among general pressures, the pressure to order a sequence of bases to promote the potential for nucleic acid structure (base order-dependent stem-loop potential) has emerged as a sensitive index.11 The principle of the method can be briefly summarized. Higher ordered structures of single-stranded nucleic acids may be calculated from the base-pairing energies of overlapping dinucleotides, which are fundamental units of nucleic acid structure. Sequences are reiteratively folded until energetically most favorable structures are arrived at.19 Contributions to the energetics of each structure decompose into base composition-dependent and base order-dependent components. The latter is determined by subtracting the base composition-dependent component from the total folding energy. The base composition-dependent component is itself determined by shuffling and refolding a sequence several times - thus destroying the base order-dependent component - and then taking the average folding energy of the resulting structures.12-14
provides an analogy for the conflict between primary and secondary
information. A public speaker conveys both a message (primary information)
and an accent (secondary information). Normally these are not in conflict.
Imagine requiring each member of a group of competing speakers, one at a
time, to read a given text to a large audience. The speakers are informed
that they will be timed to determine the slowest, and that the audience
will be polled to determine the most incomprehensible. The slowest and
most incomprehensible speakers will then be eliminated. Those who survive
will repeat the performance after which the slowest and most
incomprehensible speakers will again be eliminated. Eventually a winner
Initially, each speaker relays both the text and an individual accent. However, under pressures both to speak rapidly and to be understood, speakers with more deviant accents are soon eliminated. Speakers are under strong pressure to eliminate personal idiosyncrasies of accent (i.e. to mutate their secondary information). The pressures for fast and coherent speech will progressively decrease the diversity of the secondary information among surviving group members. The final sound of the text will probably be the same for any large group of competing speakers exposed to the same large audience. Thus, the divergent accents of the initial multiplicity of speakers converge on a single accent to which the hearing of the average member of the audience is best attuned. In a competition where there is no pressure for speed, idiosyncrasies of accent are less likely to interfere with comprehension (i.e. the diversity of secondary information is tolerated).
Viewed from this perspective we see that a nucleic acid segment that is evolving rapidly with respect to its primary information (e.g. the sequence of a protein) may not be able to accommodate some of the other forms of information that it might otherwise carry. These other forms, assumed to be evolving leisurely under negative selection, include the ordering of bases to support stem-loop potential, and purine-loading.16-18 Thus, sequences under positive selection are also likely to be sequences where one or more forms of secondary information are impaired. On this basis the type of selection can be evaluated in a DNA sequence without temporal calibration and with, at most, a need to compare only with neighboring sequences.
method has confirmed as under positive selection numerous genes so
designated by the non-synonymous/synonymous ratio method. These genes
include those encoding major histocompatability complex proteins,11
snake venom proteins,13 and AIDS virus proteins.14
Their rapid evolution is predicted from the underlying biology. The need
for confirmation of results of the ratio method is now pressing since the
validity of the method itself, and the underlying concept of neutrality,
are under increasing scrutiny.26-30
mutations seemed to offer an internal frame of reference for evaluating
mutation rates and for determining whether a mutation would impede or
promote reproductive success. Mutations in third positions of codons often
do not change the nature of an encoded amino acid, and hence do not change
the corresponding protein and any characters that depend on that protein.
It was tempting to consider such synonymous mutations as neutral.31
An obvious advantage of the use of one particular codon, rather than a synonymous one, is that some codons can be translated more rapidly, or more accurately. This is indeed of evolutionary significance for certain unicellular organisms where the speed of protein synthesis is critical.42-43 Hence, synonymous mutations may not be neutral. But in many organisms the rate of protein synthesis is not critical. Thus, to provide a relatively time-independent, internal, frame of reference for determining the form of selection (negative or positive), it was found convenient to compare the ratio of amino acid-changing (non-synonymous) base substitution mutations to non-amino acid-changing (synonymous) base substitution mutations in orthologous genes. This assumed that non-amino acid-changing base substitution mutations were adaptively neutral, and hence reflected a "background" rate of accepted mutation. The ratio within a nucleic acid segment seemed capable of providing an index of the rate at which that segment was evolving. A high ratio suggested the segment was under positive selection. A low ratio suggested the segment was under negative selection.31
many cases values for rates of synonymous base substitution mutations are
significantly above zero, and determinations of ratios agree with
biological expectations. This suggests that third codon position mutations
can indeed be neutral. Yet it is not unusual to find values for synonymous
base substitution mutations at or close to zero. This is particularly
apparent with certain genes of the malaria parasite, Plasmodium
falciparum. Some interpret this as revealing a recent population
bottleneck - namely a shrinking of the population, the surviving members
of which become founders for a subsequent population expansion, but there
is a loss of population diversity.44 Thus, at the extreme,
existing species members are derived from one "Eve" of recent origin.
However, others propose that zero values for synonymous substitutions could result from high conservation of bases that, while not determining the nature of an encoded amino acid, do determine something else (unspecified secondary information). This violates the 'neutral' assumption, so that both the recent origin argument, and calculations based on ratios, could be invalid. Favouring this view, it has been shown that, more than most genomes, that of P. falciparum is sensitive to some of several non-classical selective factors, which affect third codon positions and collectively constitute the "genome phenotype."27,39
Further evidence for conservation of bases at synonymous sites derives from the uniformity of codon-bias among orthologous genes in vertebrates ("coincident codons"). This is attributed to selection acting coincidentally not only on proteins but also on RNA structure.26,33,36,45 Other evidence derives from vesicular stomatitis virus, a rapidly evolving RNA virus in which synonymous site conservation can be evaluated under defined conditions.28 Various virus strains evolve in parallel as they adapt to new conditions, and this is presumed to be primarily through non-synonymous base substitution mutations. But concurrent synonymous substitutions do not occur randomly as classical neutral theory would predict. The same synonymous substitutions are accepted independently in different evolving strains. Thus, synonymous substitutions may independently contribute to strain adaptation (perhaps by affecting RNA structure), and/or they may be secondary to primary adaptations at non-synonymous sites (or vice-versa; i.e. non-synonymous and synonymous substitution mutations are correlated). Indirect evidence suggests correlation. Here "the comparative method" can be helpful.
For each individual gene, rates of non-synonymous and synonymous base substitution mutations can be highly correlated, so that the rate of synonymous substitution does not constitute a gene-independent frame of reference.46 This is shown for orthologous genes of mouse and rat in Figure 1, which displays data of Wolfe and Sharp47 that have been replotted to emphasize the divergence of the lineages from a common ancestor. Each point corresponds to a gene and, since divergence increases with time, the X-axis (percentage DNA divergence) can be conceived as a time axis. Plotted against the DNA divergence for the open reading frame of each gene are the corresponding protein divergences (Fig. 1a), and the corresponding non-synonymous (dn) and synonymous (ds) divergences (Fig. 1b). Genes differ dramatically in divergence. While some proteins (bottom left of Fig. 1a) have not diverged at all, the corresponding DNA sequences have diverged a little. Some proteins (top right of Fig. 1a) have diverged more than 20% and the corresponding DNA sequences are also highly diverged. The two divergences are linearly related with an intercept on the X-axis that is significantly different from zero.
So it appears that, since the time of divergence of mice and rats from a common ancestor, some proteins (bottom left of Fig. 1a) have remained unchanged. Presumably organisms with mutations in these proteins have been negatively selected, so no mutations are found in modern organisms. On the other hand, organisms with synonymous mutations would not have been counter-selected so severely. Thus, for a gene whose protein is unchanged there is a DNA divergence of about 5.4% (intercept on X-axis), implying that some synonymous mutations have been accepted. This is shown in Figure 1b where, as expected, the plot for amino acid-changing mutations (dn) resembles that in Figure 1a, and the plot for synonymous mutations (ds) extrapolates back close to zero.
The linear relationships show that non-synonymous (amino acid-changing) mutations, and synonymous (non-amino acid-changing) mutations are correlated. If a gene has zero or a few non-synonymous mutations (i.e. it is likely to have been under negative selection), then it will have few synonymous mutations, and will display a low overall DNA divergence. If a gene has many non-synonymous mutations (i.e. it is likely to have been under some degree of positive selection), then it will also have many synonymous mutations, and will display a high overall DNA divergence. This is a general observation and is not confined to the mouse-rat divergence. 30,46,48-50
2 shows replots of data from a recent study of the mouse-rat divergence by
Bazykin et al.,51 which involves a much larger number of
orthologues and a different method of calculating mutation rates (see also
the results of Makalowski and Boguski, and of Friedman and Hughes).52-53
While there is a greater scatter of points (partly due to the utilization
of rat sequences that were incompletely curated at the time of the
analysis), the results support those of Wolfe and Sharp.47 It
is likely that the well-documented tendency of ds values to
curve to the right when DNA divergences are high (e.g. Figs. 1b, 2b)
reflects synonymous site saturation for forward mutations.54 It
follows from this that the da/ds ratio is positively
correlated to ds. Remarkably, this relationship was recently
described as "highly unexpected."30
is the synonymous DNA divergence low in genes with low protein divergences
and, despite probable site saturation, high in genes with high protein
divergences? It seems likely that, within the group of codons that encode
a particular protein, the demands of the conventional and genome
phenotypes interrelate.55 An accepted non-synonymous mutation
that primarily changes an amino acid often cannot help change one or more
aspects of the genome phenotype. This invokes (makes more acceptable)
secondary compensatory mutations, mainly synonymous, to correct this
change. By the same token, an accepted mutation that primarily changes the
genome phenotype may happen to be non-synonymous and so may also change
the conventional phenotype. This invokes further compensatory mutations,
mainly non-synonymous, to correct this change.56
For example, a primary mutation from the codon ACA to AAA, not only causes a lysine to be substituted for threonine, but also has the potential to marginally affect nucleic acid conformation (stem-loop potential), purine-loading, and GC%. A primary mutation from the codon AGC to AGG, while not affecting GC%, causes an arginine to be substituted for serine and has the potential to marginally affect purine-loading, RNY-pressure and nucleic acid conformation. On the other hand, a primary pressure to purine-load might change ACA to AAA, or AGC to AGG, thus secondarily causing proteins to encode lysine or arginine (i.e. purine-loading "calls the tune").35,38
A secondary increase in purine-loading following primary codon mutations from ACA to AAA, and from AGC to AGG, would favor (make propitious) the acceptance of local synonymous exchanges of pyrimidines for purines, thus restoring the original degree of purine-loading. Similarly, an increased content of basic amino acids secondary to mutations from ACA to AAA, and AGC to AGG, which had been primarily driven (made more acceptable) by a pressure to purine-load, would favour the acceptance of local non-synonymous compensatory mutations in the amino acid sequence.51
As overall DNA divergence increases (Figs. 1b, 2b), the plot for protein divergence (dn) increases rectilinearly, or curves upwards, whereas the plot for synonymous divergence (ds) curves to the right as points become more scattered. At high degrees of divergence (i.e. where genes, by definition, have evolved rapidly), in some genes the dn/ds ratio increases, but in others it does not. Furthermore, the ratio increase is due more to a decline in ds than to an increase in dn. At high degrees of divergence, synonymous mutations tend to be constrained in some genes but not in others. Whatever the cause of this, the designation by the ratio method of some genes as under positive selection, would here seem to depend, not on a rapid change in the conventional phenotype, but on a decreased change in the genome phenotype.
Conversely, from the extrapolation back towards zero divergence, it would seem that synonymous mutations played a greater role in the early stages of the divergence, in contrast to their decreased role in many genes in later stages of the divergence. This is consistent with the initiation of the speciation process predominantly involving changes in the genome phenotype rather than in the conventional phenotype. Such changes precede species establishment, which is likely to predominantly involve changes in the conventional phenotype rather than the genome phenotype.16-18,32,34,57 These considerations suggest that, while often correlating with dn, ds can have a life of its own. Its reliability as a frame-of-reference for dn remains problematic.
may not be immediately apparent that the X-axes in Figures 1b and 2b (DNA
divergence) can be viewed as conceptual time axes. The following metaphor
may help. When at cruising speed the efficiency of a vehicle's fuel
usage (kilometres/litre) is constant. However, when initially accelerating
to that cruising speed efficiency is less (i.e. fuel usage is greater).
Thus, a plot of distance travelled (kilometres) against fuel usage
(litres) would appear like the plots of dn against DNA divergence in Figures 1b and 2b.
Around the time of the divergence from an ancestral species (i.e. "acceleration"
of a new species "to cruising speed"), synonymous
would have been differentially
accepted ("high fuel usage relative to distance traveled").
After the divergence ("attainment of cruising speed"),
synonymous mutations ("fuel usage"),
and amino acid-changing mutations ("distance
traveled"), would have been accepted proportionately (Fig. 3).
mutations, being non-amino acid changing, are a useful indicator of
mutations occurring concomitantly both extragenically, and within introns.
These, in concert with intragenic synonymous mutations, would have
initiated the divergence process.17,18
Indeed, the base compositions of synonymous sites often being closely
correlated with those of neighbouring introns and extragenic DNA,58
it is likely that they are under similar evolutionary constraints.
Numerous authors cited in the Introduction have proclaimed what would be
the high virtues of a single-sequence method relative to the comparative
method. To this extent, they have attacked the comparative method. This
paper has drawn attention to the fact that a single-sequence
method, which does not appear to have the defects of the method proposed
by Plotkin et al.,1 has been available for a decade.
Thus, at least with respect to positive Darwinian selection, that "the
comparative method rules"2 is questionable. This should be
further investigated by comparing methods that require more than one
sequence, with methods that depend on conflict between different levels of
information within a single sequence. Claims of sovereignty will be
resolved by a balance-sheet of the advantages and disadvantages of the
different methods. However, with respect to Darwin's great
question, the comparative method provides evidence that the initial
divergence between rat and mouse lineages was driven by synonymous base
substitutions (i.e. differences in secondary genomic information). This is
a key prediction of a non-genic "chromosomal" model for the
origin of species,17-18,23,57,59 and should be further
investigated in other lineages.
University hosts my web-pages where full text versions of some of the
cited references may be found.
End Note 2007 (not in the published paper)
In April 2004 Nature, presumably after an exhaustive review process, published the paper of Plotkin and coworkers claiming a paradigm shift!:
Shortly thereafter papers flooded in, both to Nature and elsewhere, criticizing the Plotkin approach. Given the great interest in a single sequence method one might have thought that my respectful reminder (the above paper) that a single sequence method had been "out there" for at least a decade, would have been favourably received. Not so (see Table below). Had the above paper been published expeditiously, you, dear reader, would have been able to access it in the early summer of 2005. As it is, various members of the biomedical research establishment, acting as reviewers, had privileged access to it for two years ahead of you! In this period they have probably declined many papers that you may not need to read - thus, through their sifting, they have done you a good turn. But, as the peer-review section of these web-pages shows - there is a flip-side to this. Some potentially important babies can be lost with the bathwater!
The reviewers' comments were barely cogent, and I will not burden you with them. Most annoying was the retrospective discovery that Trends in Genetics, while toying with my proposal (2 months) and then requesting that the paper be forced into a 2500 word straight-jacket, was simultaneously reviewing and accepting a paper of Wyckoff, Malcom, Vallender and Lahn (University of Chicago) on positive Darwinian selection. These authors replotted the well-known tendency of ds values to curve to the right when DNA divergences are high (e.g. Figs. 1b, 2b) in such a way that the da/ds ratio was shown to be positively correlated to ds. This was described as "a surprising finding" and "highly unexpected," for "neither classical theories nor previous studies predicted a strong positive correlation between -- [the Ka/Ks ratio] and Ks that is evident in our data." Of course, due to the decrease in relative numbers of synonymous mutations as divergence increases, it is obvious that da/ds ratio [the Ka/Ks ratio] increases and so is positively correlated to ds [Ks]. The authors also confused the correlation coefficient (r) with the coefficient of determination (r2). Through 2005 and early 2006, as more and more commentaries on the Plotkin paper appeared in the literature, my paper expanded in length, but the essential message remained unchanged.
Usually new results first appear in scientific papers and, only much later, in books. Due to the two year delay, the results presented in the present paper were available in my text Evolutionary Bioinformatics (released by Springer in Nov 2006) some months before the formal publication of the paper in the Journal of Biological Systems! Readers who are inclined to be cynical (of course I don't mean you) should note that I have long been an "advisor" to this journal, but I have no reason to believe that its reviewing procedures are not adequate.
I have long passed being concerned with "publish or perish." So, given all this trouble, why bother? The first reason is that if one thinks one is saying something important one should try (indeed one has a responsibility) to get it heard above the babble. Publication in a high-profile journal achieves this. Second, the reviews (albeit usually unhelpful) allow one to "touch base." If there were some frightful error there is a chance a reviewer would spot it (although the Wyckoff case appears to demonstrate the opposite). Third, it provides you, dear reader, with some assurance that it has passed an author-independent quality check.
However, the stone-walling experienced with the above paper is more the rule than the exception. I suspect I, like many others, will be increasingly driven to choose to place my works in an institutional depository (e.g. these web-pages) and then move on. Life is just too short! So readers might anticipate that, failing drastic system reforms, the body of original work in institutional depositories is likely to grow. Fortunately, key-word searches, etc., now allow you to recover from such sources what is relevant to your needs. Those who omit such searches may find themselves reinventing wheels.
Papers that appear supportive of the above thesis are becoming more evident in the mainstream literature. Thus, a paper from the Ellegren laboratory notes that genes that are candidates for being implicated in positive selection often "have an unexpectedly low number of synonymous substitutions compared with the genome background." So now it should be recognized "that inconsistencies in the behavior of dN/dS are to be expected" since "this behavior may be inherent to taking the ratio of two randomly distributed variables that are nonlinearly correlated."
Wolf, J. B. W. et al. (2009) Nonlinear dynamics of nonsynonymous (dN) and synonymous (dS) substitution rates affects inference of selection. Genome Biology & Evolution 1, 308-319.
Also, there is a convergence of the above bioinformatic analysis (see Section 9) with independent phylogenetic analysis. Venditti and Pagel (2008, 2009) postulate "accelerated rates of evolution following speciation" and hence deduce that an evolutionary line with more speciation events (nodes in the phylogenetic tree) might display an overall rate of evolution greater than an evolutionary line with less speciation events. Using data similar to the above, but plotting in terms of path-lengths and node number, they obtain evidence supporting this deduction. For this they assume a regular molecular "clock," so path lengths increase when the number of mutations separating two diverging lines increase. Thus: "Accelerated rates of evolution at the time of speciation are expected to leave a distinctive signature on a phylogenetic tree." They appear to be in agreement with the main theses of these web-pages:
(i) the primary nature of reproductive isolation in the speciation process,
However, they state that "previous studies have not made any serious attempt to determine whether the accelerated rates of change occur predominantly in neutral or coding sites or some combination of the two."
Venditti, C. & Pagel, M. (2008) Evolution by fits and starts. The Biologist 55, 140-146.
Venditti, C. & Pagel, M. (2009) Speciation as an active force in promoting genetic evolution. Trends in Ecology & Evolution 25, 14-20.
It should be noted that the data of Wolfe and Sharp (1993) showing that, early in the speciation process, there were changes in bases not critical for protein-encoding (here Figure 1), were previously plotted more schematically (Figure 4 of Forsdyke 1996; Click Here). The present form of data presentation is perhaps more intuitive [indeed some later referred to as "Forsdyke plots." DRF 2020] See also the end notes to my other papers on speciation.
The above noted observation of Wykoff and his coworkers (2004) continued to be regarded as "highly unexpected " (Vallender & Lahn 2007) and "unexpected" (Stoletzki & Eyre-Walker 2011).
Vallender EJ, Lahn BT (2007) Uncovering the mutation-fixation correlation in short lineages. BMC Ev0lutionary Biology 7, 168.
Stoletzkii N, Eyre-Walker A (2011) The positive correlation between dN/dS and dS in mammals is due to runs of adjacent substitutions.Molecular Biology & Evolution 28, 1371-1380.
Much of the work on the relationship between rate of evolution and nucleic acid structure, as reported in these webpages, has been confirmed by Park et al. (2013). For example, a major conclusion is that "amino acid substition rate is negatively correlated with mRNA folding strength" or "amino acid substitutions are slower as the mRNA folding strength increases." The latter would be equivalent to "amino acid substitutions are faster as the mRNA folding strength decreases," which is basically the point I made in the 1990s (refs 11-14 above). However, while their data look good, their interpretations differ from mine. The following abstract on "Significance" will give the flavour of their study:
The possibility that mRNA has structure by default, because the encoding DNA needs to have structure, is not considered. Nevertheless it is a bold attempt at bringing to order numerous disparate observations. The subject also touches our work on X-chromosome dosage compensation, which is dealt with elsewhere on these pages.
Park C, Chen X, Yang J-R, Zhang J (2013) Differential requirements for mRNA folding partially explain why highly expressed proteins evolve slowly. Proceedings of the National Academy of Sciences USA In Press. doi/10.1073/pnas.1218066110
There was a follow up paper from the Zhang laboratory (Yang et al. 2014). Being in PLOS Biology, I was able to comment directly:
DNA structure trumps RNA structure
To an email request for clarification of the above comment, the following reply was sent:
This page was released in February 2007 and was last edited 13 Nov 2020 by Donald Forsdyke