Beginning in the late 1960s, Motoo Kimura overturned over a century of “pan-selectionist” thinking in evolutionary biology by proposing what has come to be called The Neutral Theory of Molecular Evolution. The Neutral Theory in its basic form states that the dynamics of the majority of changes observed at the molecular level are governed by the force of Genetic Drift, rather than Darwinian (i.e. Positive) Natural Selection. As with all paradigm shifts in Science, there was much of controversy over the Neutral Theory in its early years, but nevertheless the Neutral Theory has firmly established itself as the null hypothesis for studies of evolution at the molecular level since the mid-1980s.
Despite its widespread adoption, over the last ten years or so there has been a worrying increase in abuse of terminology concerning the Neutral Theory, which I will collectively term here the “Neutral Sequence Fallacy” (inspired by T. Ryan Gregory’s Platypus Fallacy). The Neutral Sequence Fallacy arises when the distinct concepts of functional constraint and selective neutrality are conflated, leading to the mistaken description of functionally unconstrained sequences as being “Neutral”. The Fallacy, in short, is to assign the term Neutral to a particular biomolecular sequence.
The Neutral Sequence Fallacy now routinely causes problems in the fields of evolutionary and genome biology, both in terms of generating conceptual muddles as well as shifting the goalposts needed to reject the null model of sequence evolution. I have intended to write about this problem for years in order to put a halt to this growing abuse of Neutral terminology, but unfortunately never found the time. However, this issue has unfortunately reared its head more strongly in the last few days with new forms of the Neutral Sequence Fallacy arising in the context of discussions about the ENCODE project, motivating a rough version of this critique to finally see the light of day. Here I will try to sketch out the origins of the Neutral Sequence Fallacy, in its original pre-genomic form that was debunked by Kimura while he was alive, and in its modern post-genomic form that has proliferated unchecked since the early comparative genomic era.
The Neutral Sequence Fallacy draws on several misconceptions about the Neutral Theory, and begins with the abbreviation of the theory’s name from its full form (The Neutral Mutation – Random Drift Hypothesis) to its colloquial form (The Neutral Theory). This abbreviation de-emphasizes that the concept of selective neutrality applies to mutations (i.e. variants, alleles), not biomolecular sequences (i.e. regions of the genome, proteins). Simply put, only variants of a sequence can be neutral or non-neutral, not sequences themselves.
The key misconception that permits the Neutral Sequence Fallacy to flourish is the incorrect notion that if a sequence is neutrally evolving, it implies a lack of functional constraint operating on that sequence, and vice versa. Other ways to state this misconception are: “a sequence is Neutral if it is under no selective constraint” or conversely “selective constraint rejects Neutrality”. This misconception arose originally in the 1970s, shortly after the proposal of The Neutral Theory when many researchers were first coming to terms with what the theory meant. This misconception became prevalent enough that it was the first to be addressed head-on by Kimura (1983) nearly 30 years ago in section 3.6 of his book The Neutral Theory of Molecular Evolution entitled “On some misunderstandings and criticisms” (emphasis is mine):
Since a number of criticisms and comments have been made regarding my neutral theory, often based on misunderstandings, I would like to take this opportunity to discuss some of them. The neutral theory by no means claims that the genes involved are functionless as mistakenly suggested by Zuckerkandl (1978). They may or may not be, but what the neutral theory assumes is that the mutant forms of each gene participating in molecular evolution are selectively nearly equivalent, that is, they can do the job equally well in terms of survival and reproduction of the individual. (p. 50)
As pointed out by Kimura and Ohta (1977), functional constraints are consistent with neutral substitutions within a class of mutants. For example, if a group of amino acids are constrained to be hydrophilic, there can be random changes within the codons producing such amino acids…There is, of course, negative selection against hydrophobic mutants in this region, but, as mentioned before, negative selection does not contradict the neutral theory. (p. 53)
It is understandable how this misconception arises, because in the limit of zero functional constraint (e.g. in a non-functional pseudogene), all alleles become effectively equivalent to one another and are therefore selectively neutral. However, this does not mean that an unconstrained sequence is Neutral (unless we redefine the meaning of Neutrality, see below), because a sequence itself cannot be Neutral, only variants of a sequence can be Neutral with respect to each other.
It is crucial in this context to understand that the Neutral Theory accommodates all levels of selective constraint, and sequences under selective constraint can evolve Neutrally (see formal statement of this in Equation 5.1 of Kimura 1983). This point is often lost on many people. Until you get this, you don’t understand the Neutral Theory. A simple example shows how this is true. Consider a single codon in a protein coding region that codes for a degenerate amino acid. Deletion of the third codon position would creat a frameshift, and thus a third position “silent” site is indeed functional. However, alternative codons for this amino acid are functionally equivalent and evolve (close to) neutrally. The fact that these alternative alleles evolve neutrally has to do with their equivalence of function, not the degree of their functional constraint.
To demonstrate the The Neutral Sequence Fallacy, I’d like to point out a few clear examples of this misconception in action. The majority of transgressions in this area come from the genomics community where people may not have been formally trained in evolution, but I am sad to say that an increasing number of evolutionary biologists are also falling victim to The Neutral Sequence Fallacy these days. My reckoning is that the The Neutral Sequence Fallacy gained traction again in the post-genomic era around the time of the mouse genome paper by Waterston et al. (2002). In this widely-read paper, putatively unconstrained ancestral repeats were referred to (incorrectly) as “neutrally evolving DNA”, and used to estimate the fraction of the human genome under selective constraint. This analysis culminated with the following question: “How can we cleanly separate neutral and selected sequences?”. Under the Neutral Theory, this question makes no sense. First, sequences cannot be neutral; and second the framework used to detect functional constraints by comparative genomics assumes Neutral evolution of both classes of sites (unconstrained and constrained) – i.e. most changes between species are driven by Genetic Drift not Positive Selection. The proper formulation of this question should have been: “How can we cleanly separate unconstrained and constrained sequences?”.
Here is another clear example of the Neutral Sequence Fallacy in action from Lunter et al. (2006):
Here are a couple of more examples of the Neutral Sequence Fallacy in action, right in the title of fairly high-profile comparative genomics papers:
I don’t mean to single these papers out, they just happen to represent very clear examples of the Neutral Sequence Fallacy in action. In fact, the Lunter et al. (2006) paper is one of my all time favorites, but it bugs the hell out of me when I have to unpick student’s misconceptions after they read it. Frustratingly, the list of papers repeating the Neutral Sequence Fallacy is long and growing. I have recently started to collect them as a citeulike library to provide examples for students to understand how not to make this common mistake. (If anyone else would like to contribute to this effort, please let me know — there is much work to be done to reverse this trend.)
So what’s the big deal here? Some would argue that these authors actually know what they are talking about, but they just happen to be using the wrong terminology. I wish that this were the case, but very often it is not. In many papers that I read or review that perpetrate the Neutral Sequence Fallacy, I usually find further examples of seriously flawed evolutionary reasoning, suggesting that they actually do not have a deep understanding of the issues at hand. In fact, evidence of the Neutral Sequence Fallacy is usually a clear hallmark in a paper that the authors are most likely practicing population genetics or molecular evolution without a license. This leads to a Neutral Sequence Fallacy of the 1st Kind: where authors do not understand the difference between the concepts functional constraint and selective neutrality. The problems for the Neutral Theory caused by violations of the 1st Kind are deep and clear. Because the Neutral Theory is not fully understood, it is possible to construct a straw-man version of the null hypothesis of Neutrality that can easily be “rejected” simply by finding evidence of selective constraint. Furthermore, because selectively unconstrained sequences are asserted (incorrectly) to be “Neutral” without actually evaluating their mode of evolution, this conceptual error undermines the entire value of the Neutral Theory as a null hypothesis testing framework.
But some authors really do know the difference between these ideas, and just happen to be using the term “Neutral” as shorthand for the term “Unconstrained.” Increasingly, I see some of my respected peers making this mistake in print who are card-carrying molecular evolutionists and do know their stuff. In these cases what is happening is a Neutral Sequence Fallacy of the 2nd Kind: understanding the difference between functional constraint and selective neutrality, but using lazy terminology that confuses these ideas in print. This is most often found in the context of studies on noncoding DNA where, in the absence of the genetic code to conveniently constrain terminology, people use terms like “neutral standard” or “neutral region” or “neutral sites” or “neutral proxy” in place of “putatively unconstrained”. While the meaning of violations of the 2nd Kind can be overlooked and parsed correctly by experts in molecular evolution (I hope), this sloppy language causes substantial confusion about the Neutral Theory by students or non-evolutionary biologists who are new to the field, and leads to whole swathes of subsequent violations of the 1st Kind. Moreover, defining sequences as Neutral serves those with an Adaptationist agenda: since a control region is defined as being Neutral, all mutations that occur in that region must therefore be neutral as well, and thus any potential complications of the non-neutrality of mutations in one’s control region are conveniently swept under the carpet. Violations of the 2nd Kind are often quite insidious since they are generally perpetrated by people with some authority in evolutionary biology, often who are unaware of their misuse of terminology and who will vigorously deny that they are using terms which perpetuate a classical misconception laid to rest by Kimura 30 years ago.
Which brings us to the most recent incarnation of the Neutral Sequence Fallacy in the context of the ENCODE project. In a companion post explaining the main findings of the ENCODE Project, Ewan Birney describes how the ENCODE Project reinforced recent findings that many biochemical events operate on the genome that are highly reproducible, but have no known function. In describing these event, Birney states:
I really hate the phrase “biological noise” in this context. I would argue that “biologically neutral” is the better term, expressing that there are totally reproducible, cell-type-specific biochemical events that natural selection does not care about. This is similar to the neutral theory of amino acid evolution, which suggests that most amino acid changes are not selected either for or against…Whichever term you use, we can agree that some of these events are “neutral” and are not relevant for evolution.
Under the standard view of the Neutral Theory, Birney misuses the term “Neutral” here to mean lack of functional constraint, repeating the classical form of the Neutral Sequence Fallacy. Because of this, I argue that Birney’s proposed terminology be rejected, since it will perpetuate a classic misconception in Biology. Instead, I propose the term “biologically inert”.
But wait a minute, you say, this is actually a transgression of the 2nd Kind. Really what is going on here is a matter of semantics. Birney knows the difference between functional constraint and selective neutrality. He is just formalizing the creeping misuse of the term Neutral to mean “Nonfunctional” that has been happening over the last decade. If so, then I argue he is proposing to assign to the term Neutral the primary misconception of the Neutral Theory previously debunked by Kimura. This is a very dangerous proposal, since it will lead to further confusion in genomics arising from the “overloading” of the term Neutral (Kimura’s meaning: selectively equivalent; Birney’s meaning: no functional constraint). This muddle will subsequently prevent most scientists from properly understanding the Neutral Theory, and lead to many further examples of the Neutral Sequence Fallacy of both Kinds.
In my view, semantic switches like this are dangerous in Science, since they massively hinder communication and, therefore, progress. Semantic switches also lead to a distortion of understanding about key concepts in science. A famous case in point is Watson’s semantic switch of Crick’s term “Central Dogma” that corrupted Crick’s beautifully crafted original concept into the watered down textbook misinterpretation that is most often repeated: “DNA makes RNA make protein” (See Larry Moran’s blog for more on this). Some may say this is the great thing about language, the same word can mean different things to different people. This view is best characterized in the immortal words of Humpty-Dumpty in Lewis Carroll’s Through the Looking Glass:
Others, including myself, disagree and prefer to have fixed definitions for scientific terms.
In a second recent case of the Neutral Sequence Fallacy creeping into discussions in the context of ENCODE, Michael Eisen proposes that we develop a “A neutral theory of molecular function” to interpret the meaning of these reproducible biochemical events that have no known function. Inspired by the introduction of a new null hypothesis in evolutionary biology ushered in by the Neutral Theory, Eisen calls for a new “neutral null hypothesis” that requires the molecular functions to be proven, not assumed. I laud any attempt to promote the use of null models for hypothesis testing in molecular biology, and whole-heartedly agree with Eisen’s main message about the need for a null model for molecular function.
But I disagree with Eisen’s proposal for a “neutral null hypothesis”, which from my reading of his piece, directly couples the null hypothesis for function with the null hypothesis for sequence evolution. By synonymizing the Ho of the functional model with the Ho of the evolutionary model, regions of the genome that would fail to reject the null functional model (i.e. have no functional constraint) will then be conflated with “being Neutral” (incorrect) or evolving neutrally (potentially correct), whereas those regions that reject the null functional model will be immediately considered as evolving non-neutrally (which may not always be the case since functional regions can evolve neutrally). While I assume this is not what is intended by Eisen, this is almost inevitably the outcome of suggesting a “neutral null hypothesis” in the context of biomolecular sequences. A “neutral null hypothesis for molecular function” makes it all to easy to merge the concepts of functional constraint and selective neutrality, which will inevitably lead many to the Neutral Sequence Fallacy. As Kimura does, Eisen should formally decouple the concept of functional constraint on a sequence from the mode of evolution by which that sequence evolves. Eisen should instead be promoting a “A null model of molecular function” that cleanly separates the concepts of function and evolution (an example of such a null model is embodied in Sean Eddy’s Random Genome Project). If not, I fear this conflation of concepts, like Birney’s semantic switch, will lead to more examples of the Neutral Sequence Fallacy of both Kinds.
The Neutral Sequence Fallacy shares many sociological similarities with the chronic misuse and misconceptions about the concept of Homology. As discussed by Marabotti and Facchiano in their article “When it comes to homology, bad habits die hard“, there was a peak of misuse of the term Homology in the mid-1980s, which lead to backlash of many publications demanding more rigorous use of the term Homology. Despite this backlash and the best efforts of many scientists to stem the tide of misuse of Homology, ~43% of abstracts surveyed in 2007 use Homology incorrectly, down from 51% in 1986 before the assault on its misuse began. As anyone teaching the concept knows, unpicking misconceptions about Homology vs. Similarity is crucial for getting students to understand evolutionary theory. I argue that the same is true for the distinction between Functional Constraint and Selective Neutrality. When it comes to Functional Constraints on biomolecular sequences, our choice of terminology should be anything but Neutral.
Chin CS, Chuang JH, & Li H (2005). Genome-wide regulatory complexity in yeast promoters: separation of functionally conserved and neutral sequence. Genome research, 15 (2), 205-13 PMID: 15653830
Elnitski L, Hardison RC, Li J, Yang S, Kolbe D, Eswara P, O’Connor MJ, Schwartz S, Miller W, & Chiaromonte F (2003). Distinguishing regulatory DNA from neutral sites. Genome research, 13 (1), 64-72 PMID: 12529307
Lunter G, Ponting CP, & Hein J (2006). Genome-wide identification of human functional DNA using a neutral indel model. PLoS computational biology, 2 (1) PMID: 16410828
Marabotti A, & Facchiano A (2009). When it comes to homology, bad habits die hard. Trends in biochemical sciences, 34 (3), 98-9 PMID: 19181528
Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, Antonarakis SE, Attwood J, Baertsch R, Bailey J, Barlow K, Beck S, Berry E, Birren B, Bloom T, Bork P, Botcherby M, Bray N, Brent MR, Brown DG, Brown SD, Bult C, Burton J, Butler J, Campbell RD, Carninci P, Cawley S, Chiaromonte F, Chinwalla AT, Church DM, Clamp M, Clee C, Collins FS, Cook LL, Copley RR, Coulson A, Couronne O, Cuff J, Curwen V, Cutts T, Daly M, David R, Davies J, Delehaunty KD, Deri J, Dermitzakis ET, Dewey C, Dickens NJ, Diekhans M, Dodge S, Dubchak I, Dunn DM, Eddy SR, Elnitski L, Emes RD, Eswara P, Eyras E, Felsenfeld A, Fewell GA, Flicek P, Foley K, Frankel WN, Fulton LA, Fulton RS, Furey TS, Gage D, Gibbs RA, Glusman G, Gnerre S, Goldman N, Goodstadt L, Grafham D, Graves TA, Green ED, Gregory S, Guigó R, Guyer M, Hardison RC, Haussler D, Hayashizaki Y, Hillier LW, Hinrichs A, Hlavina W, Holzer T, Hsu F, Hua A, Hubbard T, Hunt A, Jackson I, Jaffe DB, Johnson LS, Jones M, Jones TA, Joy A, Kamal M, Karlsson EK, Karolchik D, Kasprzyk A, Kawai J, Keibler E, Kells C, Kent WJ, Kirby A, Kolbe DL, Korf I, Kucherlapati RS, Kulbokas EJ, Kulp D, Landers T, Leger JP, Leonard S, Letunic I, Levine R, Li J, Li M, Lloyd C, Lucas S, Ma B, Maglott DR, Mardis ER, Matthews L, Mauceli E, Mayer JH, McCarthy M, McCombie WR, McLaren S, McLay K, McPherson JD, Meldrim J, Meredith B, Mesirov JP, Miller W, Miner TL, Mongin E, Montgomery KT, Morgan M, Mott R, Mullikin JC, Muzny DM, Nash WE, Nelson JO, Nhan MN, Nicol R, Ning Z, Nusbaum C, O’Connor MJ, Okazaki Y, Oliver K, Overton-Larty E, Pachter L, Parra G, Pepin KH, Peterson J, Pevzner P, Plumb R, Pohl CS, Poliakov A, Ponce TC, Ponting CP, Potter S, Quail M, Reymond A, Roe BA, Roskin KM, Rubin EM, Rust AG, Santos R, Sapojnikov V, Schultz B, Schultz J, Schwartz MS, Schwartz S, Scott C, Seaman S, Searle S, Sharpe T, Sheridan A, Shownkeen R, Sims S, Singer JB, Slater G, Smit A, Smith DR, Spencer B, Stabenau A, Stange-Thomann N, Sugnet C, Suyama M, Tesler G, Thompson J, Torrents D, Trevaskis E, Tromp J, Ucla C, Ureta-Vidal A, Vinson JP, Von Niederhausern AC, Wade CM, Wall M, Weber RJ, Weiss RB, Wendl MC, West AP, Wetterstrand K, Wheeler R, Whelan S, Wierzbowski J, Willey D, Williams S, Wilson RK, Winter E, Worley KC, Wyman D, Yang S, Yang SP, Zdobnov EM, Zody MC, & Lander ES (2002). Initial sequencing and comparative analysis of the mouse genome. Nature, 420 (6915), 520-62 PMID: 12466850
Thanks to Chip Aquadro for originally pointing out to me when I perpetrated the Neutral Sequence Fallacy (of the 1st Kind!) during a journal club as an undergraduate in his lab. I can distinctly recall hot, embarrassment of the moment while being schooled in this important issue by a master. Thanks also to Alan Moses, who was the first of many people I converted to the light on this issue, and who has encouraged me since to write this up for a wider audience. Thanks also to Douda Bensasson for putting up with me ranting about this issue for years, and helpful comments on this post.