On the 30th Anniversary of DNA Sequencing in Population Genetics

30 years ago today, the “struggle to measure genetic variation” in natural populations was finally won. In a paper entitled “Nucleotide polymorphism at the alcohol dehydrogenase locus of Drosophila melanogaster” (published on 4 Aug 1983), Martin Kreitman reported the first effort to use DNA sequencing to study genetic variation at the ultimate level of resolution possible. Kreitman (1983) was instantly recognized as a major advance and became a textbook example in population genetics by the end of the 1980s. John Gillespie refers to this paper as “a milestone in evolutionary genetics“. Jeff Powell in his brief history of molecular population genetics goes so far as to say “It would be difficult to overestimate the importance of this paper”.

Arguably, the importance of Kreitman (1983) is greater now than ever, in that it provides both the technical and conceptual foundations for the modern gold rush in population genomics, including important global initiatives such as the 1000 Genomes Project. However, I suspect this paper is less well know to the increasing number of researchers who have come to studying molecular variation from routes other than through a training in population genetics. For those not familiar with this landmark paper, it is worth taking the time to read it or Nathan Pearson‘s excellent summary over on Genomena.

As with other landmark scientific efforts, I am intrigued by how such projects and papers come together. Powell’s “brief history” describes how Kreitman arrived at using DNA to study variation in Adh, including some direct quotes from Kreitman (p. 145). However, this account leaves out an interesting story about the publication of this paper that I had heard bits and pieces of over time. Hard as it may be to imagine in today’s post-genomic sequence-everything world, using DNA sequencing to study genetic variation in natural populations was not immediately recognized as being of fundamental importance, at least by the editors of Nature where it was ultimately published.

To better understand the events of the publication of this work, I recently asked Richard Lewontin, Kreitman’s PhD supervisor, to provide his recollections on this project and the paper. Here is what he had to say by email (12 July 2013):

Dear Casey Bergman

I am delighted that you are commemorating Marty’s 1983 paper that changed the whole face of experimental population genetics. The story of the paper is as follows.

It was always the policy in our lab group that graduate students invented their own theses. My view was (and still is) that someone who cannot come up with an idea for a research program and a plan for carrying it out should not be a graduate student. Marty is a wonderful example of what a graduate student can do without being told what to do by his or her professor. Marty came to us from a zoology background and one day not very long after he became a member of the group he came to me and asked how I would feel about his investigating the genetic variation in Drosophila populations by looking at DNA sequence variation rather than the usual molecular method of looking at proteins which then occupied our lab. My sole contribution to Marty’s proposal was to say “It sounds like a great idea.”  I had never thought of the idea before but it became immediately obvious to me that it was a marvelous idea.  So Marty went over on his own initiative, to Wally Gilbert’s lab and learned all the methodology from George Church who was then in the Gilbert lab.

After Marty’s work was finished and he was to get his degree, he wrote a paper based on his thesis and, with my encouragement, sent the paper to Nature. He offered to make me a co-author, but I refused on long-standing principle. Since the idea and the work were entirely his, he was the sole author, a policy that was general in our group. I had no doubt that it was the most important work done in experimental population genetics  in many years and Nature was an obvious choice for this pathbreaking work.

The paper was soon returned by the Editor saying that they were not interested because they already had so many papers that gave the DNA sequence of various genes that they really did not want yet another one! Obviously they missed the point. My immediate reaction was to have Marty send the paper to a leading influential British Drosophila geneticist who would obviously understand its importance, asking him to retransmit the paper to Nature with his recommendation. He did so and the Editor of Nature then accepted it for publication. The rest is history.

Our own lab very quickly converted from protein electrophoresis to DNA sequencing, and I spent a lot of time using and updating the computer interface with the gel reading process, starting from  Marty’s original programs for reading gels and outputting sequences. We never went back to protein electrophoresis. While protein gel electrophoresis certainly revolutionized population genetics, Marty’s introduction of DNA sequencing as the method for evolutionary genetic investigation of population genetic issues was a much more powerful one and made possible the testing of a  variety of evolutionary questions for which protein gel electrophoresis was inadequate. Marty deserves to be considered as one of the major developers of evolutionary and population genetics studies.

Yours ,

Dick Lewontin

Some may argue that Kreitman (1983) did not reveal all forms of genetic variation at the molecular level (e.g. large-scale structural variants) and therefore does not truly represent the “end” of the struggle to measure variation. What is clear, however, is that Kreitman (1983) does indeed represent the beginning of the “struggle to interpret genetic variation” at the fundamental genetic level, a struggle that may ultimately take longer then measuring variation itself. According to Maynard Olson interpreting (human) genomic variation will be a multi-generational effort “like building the European cathedrals“. 30 years in, Olson’s assessment is proving to be remarkably accurate. Here’s to Kreitman (1983) for laying the first stone!

Related Posts:

Calvin Bridges, Automotive Pioneer

Calvin Bridges in 1935 (Photo Credit: Smithsonian Institution Collections SIA Acc. 90-105 [SIA2008-0022])

Calvin Bridges (1889-1938) is perhaps best known as one of the original Drosophila geneticists in world. As an original member of Thomas Hunt Morgan’s Fly Room at Columbia University, Bridges made fundamental contributions to classical genetics, notably contributing the first paper ever published in the journal Genetics. The historical record on Bridges is scant, since Morgan and Alfred Sturtevant destroyed Bridges’ papers after his death to preserve the name of their dear friend whose politics and attitudes to free love were radical in many ways. Morgan’s biographical memoir of Bridges presented to the National Academy of Sciences in 1940 contains very little detail on Bridges’ life, and this historical black hole has piqued my curiosity for some time.

Recently, I stumbled across a listing in the New York Times for an exhibit in Brooklyn recreating the original Columbia Fly Room, which will be used as a set in an upcoming film of the same name directed by Alexis Gambis. Gambis’ film approaches the Fly Room from the perspective of a visit to the lab by one of Bridges children, Betsy Bridges. I recommend other Drosophila enthusiasts to check out The Fly Room website and follow @theflyroom and @alexisgambis on Twitter for updates about the project.

In digging around more about this project, I found a link to the Kickstarter page that was used to raise funds for the film. This page includes an amazing story about Bridges that I had never heard about previously. Apparently, after Morgan and his group moved to Caltech in 1928, Bridges built from scratch a futuristic car of his own design called “The Lightening Bug”. This initially came a big surprise to me, but on reflection it is in keeping with Bridge’s role as the main technical innovator for the original Drosophila group. For example, Bridges introduced the binocular dissecting scope, the etherizer, the controlled temperature incubator, and agar-based fly food into the Drosophilist’s toolkit.

Here is a clipping from Modern Mechanix from Aug 1936 describing the Lightening Bug:

Coverage of Calvin Bridge’s Lightning Bug in Modern Mechanix (Aug 1936).

Bridge’s Lightening Bug was notable enough to be written up in Time Magazine in May 1936, which described his car as follows:

It is almost perfectly streamlined, even the license plates and tail-lamp being recessed into the body and covered with Pyralin windows flush with the streamlining. There are no door handles; the doors must be opened with special keys. Dr. Bridges pronounced the Lightning Bug crash-proof and carbon-monoxide-proof. “My whole aim,” said he, “was to show what could be done to attain safety, economy and readability in a small car.”

Newshawks discovered that for months, when he got tired of looking at fruit flies, the geneticist had retired to a garage, put on a greasy jumper and worked on his car far into the night, hammering, welding, machining parts on a lathe. Now & then, the foreman reported, Dr. Bridges hit his thumb with a hammer. Once he had to visit a hospital to have removed some tiny bits of steel which flew into his eyes. It was Calvin Bridges’ splendid eyesight which first attracted Dr. Morgan’s interest in him when Bridges was a shaggy, enthusiastic student at Columbia.

Calvin Bridges next to the Lightening Bug (Time Magazine, 4 May 1936).

Gambis has also posted a video of the Lightening Bug being driven by Bridges taken by Pathé News. Gambis estimates this clip was from around 1938, but it is probably from 1936/7 since Bridges died in Dec 1938 and by the time Ed Novitski started graduate school at CalTech in the autumn of 1938 Bridges was terminally ill, but appears fit in this clip.  This clip clearly shows the design of Bridges’ Lightening Bug was years ahead of its time in comparison to the other cars in the background. I also would wager this is the only video footage in existence of Calvin Bridges.

The only other information I could find on the web about the Lightening Bug was a small news clipping that was making the rounds in local new April/May 1936:


Interestingly, the only mention I can find of this story in historical accounts of the Drosophila group is one parenthetical note by Shine and Wrobel in their 1976 biography of Morgan that had previously escaped my notice. On page 120, they discuss how Morgan handled the receipt of his 1933 Nobel Prize in Physiology or Medicine (emphasis mine):

…Morgan was very modest about the honor. He frequently pointed out that it was a tribute to experimental biology than to any one man….As Morgan acknowledged the joint nature of the work, he divided the tax-free $40,000 award equally among his own children and those of Bridges and Sturtevant (but not of Muller’s). He gave no reason; in the letter to Sturtevant for example, he said merely I’m enclosing some money for your children. (Bridges, however, is said to have used his to build a new car.)

So there you have it: Calvin Bridges, Drosophila geneticist, was also an unsung automotive pioneer whose foray into designing futuristic cars was likely funded in part by the proceeds of the 1933 Nobel Prize!

Related Posts:

Twitter Tips for Scientific Journals

The growing influence of social media in the lives of Scientists has come to the forefront again recently with a couple of new papers that provide An Introduction to Social Media for Scientists and a more focussed discussion of The Role of Twitter in the Life Cycle of a Scientific Publication. Bringing these discussions into traditional journal article format is important for spreading the word about social media in Science outside the echo chamber of social media itself. But perhaps more importantly, in my view, is that these motivating papers reflect a desire for Scientists to participate, and urge others to participate, in shaping a new space for scientific exchange in the 21st century.

Just as Scientists themselves are adopting social media, many scientific journals/magazines are as well. However, most discussions about the role of social media in scientific exchange overlook the issue of how we Scientists believe traditional media outlets, like scientific journals, should engage in this new forum. For example in the Darling et al. paper on the The Role of Twitter in the Life Cycle of a Scientific Publication, little is said about the role of journal Twitter accounts in the life cycle of publications beyond noting:

…to encourage fruitful post-publication critique and interactions, scientific journals could appoint dedicated online tweet editors who can storify and post tweets related to their papers.

This oversight is particularly noteworthy for several reasons. First, it is fact that many journals, and journal staff, play active roles in engaging with the scientific debate on social media and are not simply passive players in the new scientific landscape.  Second, Scientists need to be aware that journals extensively monitor our discussions and activity on social media in ways that were not previously possible, and we need to consider how this affects the future of scientific publishing. Third, Scientists should see social media represents an opportunity to establish new working relationships with journals that break down the old models that increasingly seem to harm both Science and Scientists.

In the same way that we Scientists are offering tips/advice to each other for how to participate in the new media, I feel that this conversation should also be extended to what we feel are best practices for journals to engage in the scientific process through social media. To kick this off, I’d like to list some do’s and don’ts for how I think journals should handle their presence on Twitter, based on my experiences following, watching and interacting with journals on Twitter over the last couple of years.

  • Do engage with (and have a presence on) social media. Twitter is rapidly on the uptake with scientists, and is the perfect forum to quickly transmit/receive information to/from your author pool/readership. I find it a little strange in fact if a journal doesn’t have a Twitter account these days.
  • Do establish a social media policy for your official Twitter account. Better yet, make it public, so Scientists know the scope of what we should expect from your account.
  • Don’t use information from Twitter to influence editorial or production processes, such as the acceptance/rejection of papers or choice of reviewers.  This should be an explicit part of your social media policy. Information on social media could be incorrect and by using unverified information from Twitter you could allow competitors/allies to block/promote each other’s work.
  • Don’t use a journal Twitter account as a table of contents for your journal. Email TOCs or RSS feeds exist for this purpose already.
  • Do tweet highlights from your journal or other journals. This is actually what I am looking for in a journal Twitter account, just as I am from the accounts of other Scientists.
  • Do use journal accounts to retweet unmodified comments from Scientists or other media outlets about papers in your journal. This is a good way for Scientists to find other researchers interested in a topic and know what is being said about work in your journal. But leave the original tweet intact, so we can trace it to the originator and so it doesn’t look like you have edited the sentiment to suit your interests.
  • Don’t use journal account to express personal opinions. I find it totally inappropriate that individuals at some journals hide behind the journal name and avatar to use journal twitter accounts as a soapbox to express their personal opinions. This is a really dangerous thing for a journal to do since it reinforces stereotypes about the fickleness of editors that love to wield the power that their journal provides them. It’s also a bad idea since the opinions of one or a few people may unintentionally affect a journal or publisher.
  • Do encourage your staff to create personal accounts and be active on social media. Editors and other journal staff should be encouraged to express personal opinions about science, tweet their own highlights, etc. This is a great way for Scientists to get to know your staff (for better or worse) and build an opinion about who is handling our work at your journal. But it should go without saying that personal opinions should be made through personal accounts, so we can follow/unfollow these people like any other member of the community and so their opinions do not leverage the imprimatur of your journal.
  • Do use journal Twitter accounts to respond to feedback/complaints/queries. Directly replying to comments from the community on Twitter is a great way to build trust in your journal.  If you can’t or don’t want to reply to a query in the open, just reply by asking the person to email your helpdesk. Either way shows good faith that you are listening to our concerns and want to engage. Ignoring comments from Scientists is bad PR and can allow issues to amplify beyond your control, with possible negative impacts on your journal (image) in the long run.
  • Don’t use journal Twitter accounts to tweet from meetings. To me this is a form of expressing personal opinion that looks like you are endorsing certain Scientists/fields/meetings or, worse yet, that you are looking to solicit them to submit their work to your journal, which smacks of desperation and favoritism. Use personal accounts instead to tweet from meetings, since after all what is reported is a personal assessment.

These are just my first thoughts on this issue (anonymised to protect the guilty), which I hope will act as a springboard for others to comment below on how they think journals should manage their presence on Twitter for the benefit of the Scientific community.

Launch of the PLOS Text Mining Collection

Just a quick post to announce that the PLOS Text Mining Collection is now live!

This PLOS Collection arose out of a twitter conversation with Theo Bloom last year, and has come together through the hard work of the authors of the papers in the Collection, the PLOS Collections team (in particular Sam Moore and Jennifer Horsely), and my co-organizers Larry Hunter and Andrey Rzhetsky. Many thanks to all for seeing this effort to completion.

Because of the large body of work in the area of text mining published in PLOS, we struggled with how best to present all these papers in the collection without diluting the experience for the reader. In the end, we decided only to highlight new work from the last two years and major reviews/tutorials at the time of launch. However, as this is a living collection, new articles will be included in the future, and the aim is to include previously published work as well. We hope to see many more papers in the area of text mining published in the PLOS family of journals in the future.

An overview of the PLOS Text Mining Collection is below (cross-posted at the PLOS EveryONE blog) and a commentary on Collection is available at the Official PLOS Blog entitled “A mine of information – the PLOS Text Mining Collection“.

Background to the PLOS Text Mining Collection

Text Mining is an interdisciplinary field combining techniques from linguistics, computer science and statistics to build tools that can efficiently retrieve and extract information from digital text. Over the last few decades, there has been increasing interest in text mining research because of the potential commercial and academic benefits this technology might enable. However, as with the promises of many new technologies, the benefits of text mining are still not clear to most academic researchers.

This situation is now poised to change for several reasons. First, the rate of growth of the scientific literature has now outstripped the ability of individuals to keep pace with new publications, even in a restricted field of study. Second, text-mining tools have steadily increased in accuracy and sophistication to the point where they are now suitable for widespread application. Finally, the rapid increase in availability of digital text in an Open Access format now permits text-mining tools to be applied more freely than ever before.

To acknowledge these changes and the growing body of work in the area of text mining research, today PLOS launches the Text Mining Collection, a compendium of major reviews and recent highlights published in the PLOS family of journals on the topic of text mining. As one of the major publishers of the Open Access scientific literature, it is perhaps no coincidence that research in text mining in PLOS journals is flourishing. As noted above, the widespread application and societal benefits of text mining is most easily achieved under an Open Access model of publishing, where the barriers to obtaining published articles are minimized and the ability to remix and redistribute data extracted from text is explicitly permitted. Furthermore, PLOS is one of the few publishers who is actively promoting text mining research by providing an open Application Programming Interface to mine their journal content.

Text Mining in PLOS

Since virtually the beginning of its history [1], PLOS has actively promoted the field of text mining by publishing reviews, opinions, tutorials and dozens of primary research articles in this area in PLOS Biology, PLOS Computational Biology and, increasingly, PLOS ONE. Because of the large number of text mining papers in PLOS journals, we are only able to highlight a subset of these works in the first instance of the PLOS Text Mining Collection. These include major reviews and tutorials published over the last decade [1][2][3][4][5][6], plus a selection of research papers from the last two years [7][8][9][10][11][12][13][14][15][16][17][18][19] and three new papers arising from the call for papers for this collection [20][21][22].
The research papers included in the collection at launch provide important overviews of the field and reflect many exciting contemporary areas of research in text mining, such as:

  • methods to extract textual information from figures [7];
  • methods to cluster [8] and navigate [15] the burgeoning biomedical literature;
  • integration of text-mining tools into bioinformatics workflow systems [9];
  • use of text-mined data in the construction of biological networks [10];
  • application of text-mining tools to non-traditional textual sources such as electronic patient records [11] and social media [12];
  • generating links between the biomedical literature and genomic databases [13];
  • application of text-mining approaches in new areas such as the Environmental Sciences [14] and Humanities [16][17];
  • named entity recognition [18];
  • assisting the development of ontologies [19];
  • extraction of biomolecular interactions and events [20][21]; and
  • assisting database curation [22].

Looking Forward

As this is a living collection, it is worth discussing two issues we hope to see addressed in articles that are added to the PLOS text mining collection in the future: scaling up and opening up. While application of text mining tools to abstracts of all biomedical papers in the MEDLINE database is increasingly common, there have been remarkably few efforts that have applied text mining to the entirety of the full text articles in a given domain, even in the biomedical sciences [4][23]. Therefore, we hope to see more text mining applications scaled up to use the full text of all Open Access articles. Scaling up will maximize the utility of text-mining technologies and the uptake by end users, but also demonstrate that demand for access to full text articles exists by the text mining and wider academic communities.

Likewise, we hope to see more text-mining software systems made freely or openly available in the future. As an example of the state of affairs in the field, only 25% of the research articles highlighted in the PLOS text mining collection at launch provide source code or executable software of any kind [13][16][19][21]. The lack of availability of software or source code accompanying published research articles is, of course, not unique to the field of text mining. It is a general problem limiting progress and reproducibility in many fields of science, which authors, reviewers and editors have a duty to address. Making release of open source software the rule, rather than the exception, should further catalyze advances in text mining, as it has in other fields of computational research that have made extremely rapid progress in the last decades (such as genome bioinformatics).

By opening up the code base in text mining research, and deploying text-mining tools at scale on the rapidly growing corpus of full-text Open Access articles, we are confident this powerful technology will make good on its promise to catalyze scholarly endeavors in the digital age.


1. Dickman S (2003) Tough mining: the challenges of searching the scientific literature. PLoS biology 1: e48. doi:10.1371/journal.pbio.0000048.
2. Rebholz-Schuhmann D, Kirsch H, Couto F (2005) Facts from Text—Is Text Mining Ready to Deliver? PLoS Biol 3: e65. doi:10.1371/journal.pbio.0030065.
3. Cohen B, Hunter L (2008) Getting started in text mining. PLoS computational biology 4: e20. doi:10.1371/journal.pcbi.0040020.
4. Bourne PE, Fink JL, Gerstein M (2008) Open access: taking full advantage of the content. PLoS computational biology 4: e1000037+. doi:10.1371/journal.pcbi.1000037.
5. Rzhetsky A, Seringhaus M, Gerstein M (2009) Getting Started in Text Mining: Part Two. PLoS Comput Biol 5: e1000411. doi:10.1371/journal.pcbi.1000411.
6. Rodriguez-Esteban R (2009) Biomedical Text Mining and Its Applications. PLoS Comput Biol 5: e1000597. doi:10.1371/journal.pcbi.1000597.
7. Kim D, Yu H (2011) Figure text extraction in biomedical literature. PloS one 6: e15338. doi:10.1371/journal.pone.0015338.
8. Boyack K, Newman D, Duhon R, Klavans R, Patek M, et al. (2011) Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches. PLoS ONE 6: e18029. doi:10.1371/journal.pone.0018029.
9. Kolluru B, Hawizy L, Murray-Rust P, Tsujii J, Ananiadou S (2011) Using workflows to explore and optimise named entity recognition for chemistry. PloS one 6: e20181. doi:10.1371/journal.pone.0020181.
10. Hayasaka S, Hugenschmidt C, Laurienti P (2011) A network of genes, genetic disorders, and brain areas. PloS one 6: e20907. doi:10.1371/journal.pone.0020907.
11. Roque F, Jensen P, Schmock H, Dalgaard M, Andreatta M, et al. (2011) Using electronic patient records to discover disease correlations and stratify patient cohorts. PLoS computational biology 7: e1002141. doi:10.1371/journal.pcbi.1002141.
12. Salathé M, Khandelwal S (2011) Assessing Vaccination Sentiments with Online Social Media: Implications for Infectious Disease Dynamics and Control. PLoS Comput Biol 7: e1002199. doi:10.1371/journal.pcbi.1002199.
13. Baran J, Gerner M, Haeussler M, Nenadic G, Bergman C (2011) pubmed2ensembl: a resource for mining the biological literature on genes. PloS one 6: e24716. doi:10.1371/journal.pone.0024716.
14. Fisher R, Knowlton N, Brainard R, Caley J (2011) Differences among major taxa in the extent of ecological knowledge across four major ecosystems. PloS one 6: e26556. doi:10.1371/journal.pone.0026556.
15. Hossain S, Gresock J, Edmonds Y, Helm R, Potts M, et al. (2012) Connecting the dots between PubMed abstracts. PloS one 7: e29509. doi:10.1371/journal.pone.0029509.
16. Ebrahimpour M, Putniņš TJ, Berryman MJ, Allison A, Ng BW-H, et al. (2013) Automated authorship attribution using advanced signal classification techniques. PLoS ONE 8: e54998. doi:10.1371/journal.pone.0054998.
17. Acerbi A, Lampos V, Garnett P, Bentley RA (2013) The Expression of Emotions in 20th Century Books. PLoS ONE 8: e59030. doi:10.1371/journal.pone.0059030.
18. Groza T, Hunter J, Zankl A (2013) Mining Skeletal Phenotype Descriptions from Scientific Literature. PLoS ONE 8: e55656. doi:10.1371/journal.pone.0055656.
19. Seltmann KC, Pénzes Z, Yoder MJ, Bertone MA, Deans AR (2013) Utilizing Descriptive Statements from the Biodiversity Heritage Library to Expand the Hymenoptera Anatomy Ontology. PLoS ONE 8: e55674. doi:10.1371/journal.pone.0055674.
20. Van Landeghem S, Bjorne J, Wei C-H, Hakala K, Pyysal S, et al. (2013) Large-Scale Event Extraction from Literature with Multi-Level Gene Normalization. PLOS ONE 8: e55814. doi:10.1371/journal.pone.0055814
21. Liu H, Hunter L, Keselj V, Verspoor K (2013) Approximate Subgraph Matching-based Literature Mining for Biomedical Events and Relations. PLoS ONE 8(4): e60954. doi:10.1371/journal.pone.0060954
22. Davis A, Weigers T, Johnson R, Lay J, Lennon-Hopkins K, et al. (2013) Text mining effectively scores and ranks the literature for improving chemical-gene-disease curation at the Comparative Toxicogenomics Database. PLOS ONE 8: e58201. doi:10.1371/journal.pone.0058201
23. Bergman CM (2012) Why Are There So Few Efforts to Text Mine the Open Access Subset of PubMed Central? https://caseybergman.wordpress.com/2012/03/02/why-are-there-so-few-efforts-to-text-mine-the-open-access-subset-of-pubmed-central/.

Directed Genome Sequencing: the Key to Deciphering the Fabric of Life in 1993

Seeing the #AAASmtg hashtag flowing on my twitter stream over the last few days reminded my that my former post-doc advisor Sue Celniker must be enjoying her well-deserved election to the American Association for the Advancement of Science (AAAS). Sue has made a number of major contributions to Drosophila genomics, and I personally owe her for the chance to spend my journeyman years with her and so many other talented people in the Berkeley Drosophila Genome Project. I even would go so far as to say that it was Sue’s 1995 paper with Ed Lewis on the “Complete sequence of the bithorax complex of Drosophila” that first got me interested in “genomics.” I remember being completely in awe of the Genbank accession from this paper which was over 300,000 bp long! Man, this had to be the future. (In fact the accession number for the BX-C region, U31961, is etched in my brain like some telephone numbers from my childhood.) By the time I arrived at BDGP in 2001, the sequencing of the BX-C was already ancient history, as was the directed sequencing strategy used for this project.  These rapid changes made discovery of a set of discarded propaganda posters collecting dust in Reed George’s office that were made at the time (circa 1993) extolling the virtues of “Directed Genome Sequencing” as the key to “Deciphering the Fabric of Life” all the more poignant. I dug a photo I took of one of these posters today to commemerate the recognition of this pioneering effort (below). Here’s to a bygone era, and hats off to pioneers like Sue who paved the road for the rest of us in (Drosophila) genomics!


A Case for Junior/Senior Partnership Grants

Much has been made in recently years over funding crises in the US and Europe, which are the inevitable result of the Great Recession superimposed on top of the end of exponential growth in Science. Governments hamstrung by austerity measures or lack of political will have been forced to abandon increases in scientific funding, going so far even as to freeze funds for awarded grants in Spain (see translation here). The consequences of this stagnant period of inputs to scientific progress will be felt for many years to come, materially in terms of basic and applied discoveries, but also socially in terms of the impacts on an entire generation of scientists who are just beginning their independent careers.

Why are early stage researchers hit hardest by stagnation or decreases in funding? Simply because access to funding is not a level playing field for all scientists, and is in fact highly dependent on career stage and experience. Therefore, increased competition for resources is expected to hit younger scientists disproportionately harder relative to established researchers because of many factors, including:

  • less experience in the art of writing grants,
  • less experience in reviewing grants,
  • less experience serving on grant panels,
  • shorter scientific and management track record,
  • and a less highly developed social network.

The specific negative effect that a general increase in resource competition has on young researchers is (in my view) the best explanation for the extremely worrying downward trends in the proportion of young PIs receiving NIH grants, and the increasing upward trend in the age to receipt of first RO1 in the USA, shown in the following diagrams from the NIH Rock Talk Blog:

Thankfully, this issue which is being discussed seriously by NIH’s Deputy Director for Extramural Research, Dr. Sally Rockey, as publication of these data attests to.  [I would very much welcome if other funding agencies published similar demographic breakdowns of their funding to address whether this is a global effect.] However, not all see these trends as worrying and interpret them on socially-neutral demographic grounds.

To help combat the inherent age-based iniquities in access research funding, funding agencies typically ring-fence funding for early-stage researchers under a “New Investigator” type umbrella. In fact, Sally Rockey provides a link to an impressive history of initiatives the NIH has undertaken to tackle the New Investigator issue. But what is striking to me is that despite putting a series of different New Investigator mechanisms in place, the negative impacts on early-stage researchers have only worsened over the last three decades. Thus New Investigator programmes are clearly not enough to redress this issue, and new solutions must be sought out. Furthermore ring-fencing funding for junior researchers necessarily creates an us-vs-them mentality, which can have counterproductive repercussions among different scientific cohorts. And while New Investigator programmes are widely supported in principle, trade-offs in resource allocation can lead to unstable to changes in policy, as witnessed in the case of the now-defunct NERC New Investigator programme.

So, what of it? Is this post just another bemoaning the sorry state of affairs in funding for early-stage researchers? No, or at least, not only. Actually, my motivation is to constructively propose a relatively simple (naive?) mechanism to fund research projects that can address the inequities in funding across career stages, but which also has the additional benefit of engendering mentorship and transfer of skills across the generations: the Junior/Senior Partnership Grant. [As with all (good) ideas, such a model has been proposed before by the Women’s Cancer Network, but does not appear to be adopted by major federal funding agencies.]

The idea behind a Junior/Senior Partnership funding “scheme” is simple. Based on some criteria (years since PhD or first tenure-track position, number of successful PI awards, number of wrinkles, etc.) researchers would be classified as Junior or Senior. Based on your classification, to be eligible for an award under such a programme, at least one Junior and one Senior PI would need to be co-applicants on grant and have distinct contributions to the grant and project management. This simple mechanism would ensure that young PIs get a piece of the funding pie and allow them to establish a track record, just as a New Investigator schemes do.  But it would also obviate the need for reform to rely on the altruistic stepping aside by Senior scientists to make way for their Junior colleagues, as there would be positive (financial) incentives for them to lend a hand down the generations. And by reconfiguring resource allocation from “us-vs-them” to “we’re-all-in-this-together,” Junior/Senior Partnership Grants would further provide a natural mechanism for Senior PIs to transfer expertise in grant writing and project management to their Junior colleagues in a meaningful way, rather than in the lip-service manner that is normally paid in most institutions. Finally, and most importantly, the knowledge transfer through such a scheme would strengthen the future expertise base in Science, which all indicators would suggest is currently at risk.

Related Posts:

From Electron to Retrotransposon: “-on” the Origin of a Common Suffix in Molecular Biology

Over the last year or so, I have become increasingly interested in understanding the origin of major concepts in genetics and molecular biology. This is driven by several motivating factors, primarily to cure my ignorance/satisfy my curiosity but also to be able to answer student queries more authoritatively and unearth unsolved questions in biology. One of the more interesting stories I have stumbled across relates to why so many terms in molecular biology (e.g. codon, replicon, exon, intron, transposon, etc.) end with the suffix “-on”? While nowhere as pervasive the “-ome” suffix that has contaminated biological parlance of late,  the suffix “-on” has clearly left its mark in some of the most frequently used terms the lexicon of molecular biology.

According to Brenner (1996) [1], the common use of the suffix “-on” in molecular biological terms can be traced to Seymour Benzer’s dissection of the fine structure of the rII locus in bacteriophage T4, which overturned the classical idea that a gene is an indivisible unit:

To mark this new view of the gene, Seymour invented new terms for the now different units of mutation, recombination and function. As he was a physicist, he modelled his terms on those of physics and just as electrons, protons and neutrons replaced the once indivisible atom, so genes came to be composed of mutons, recons and cistrons. The the unit of function, the cistron was based on the cis–trans complementation test, of which only the trans part is usually done…Of these terms, only cistron came to be widely used. It is conjectured that the other two, the muton and the recon, disappeared because Seymour failed to follow the first rule for inventing new words, which is to check what they may mean in other languages…Seymour’s pioneering invention of units was followed by a spate of other new names not all of which will survive. One that seems to have taken root is codon, which I invented in 1957; and the terms intron and exon, coined by Walter Gilbert, are certain to survive as well. Operon is moot; it is still frequently used in prokaryotic genetics but as the weight of research shifts to eukaryotes, which do not have such units of regulation, it may be lost. Replicon, invented by Francis Jacob and myself in 1962, seems also to have survived, despite the fact that we paid insufficient attention to how it sounded in other languages.

Thus, the fact that many molecular biological terms end in “-on” (initiated by Benzer) owes its origin to patterns of nomenclature in chemistry/nuclear physics (which itself began with Stoney’s proposal of the term electron in 1894) and the desire to identify “fundamental units” of biological structure and function.

While Brenner’s commentary provides a crucial first-hand account to understand the origin of these terms, it does not provide any primary references concerning the coining of these terms. So I’ve spent some time digging out the original usage for a number of more common molecular biology “-ons”, which I thought many be of use or interest to others.

The terms reconmuton and cistron were defined by Benzer (1957) [2] as follows:

  • Recon: “The unit of recombination will be defined as the smallest element in the one-dimensional array that is interchangeable (but not divisible) by genetic recombination. One such element will be referred to as a “recon.””
  • Muton: “The unit of mutation, the “muton” will be defined as the smallest element that, when altered, can give rise to a mutant form of the organism.”
  • Cistron: “A unit of function can be defined genetically, independent of biochemical information, by means of the elegant cistrans comparison devised by [Ed] Lewis…Such a map segment, corresponding to a function which is unitary as defined by the cistrans test applied to the heterocaryon, will be defined as a cistron.”

I have not been able to find a definitive first reference that defines the term codon the fundamental unit of the genetic code. According to Brenner (1996) [1] and US National Library of Medicine’s Profiles in Science webpage on Marshall Nirenberg [2], the term codon was introduced by Brenner in 1957 “to describe the fundamental units engaged in protein synthesis, even though the units had yet to be fully determined. Francis Crick popularized the term in 1959. After 1962, Nirenberg began to use “codon” to characterize the three-letter RNA code words” [3].

The term operon was introduced by Jacob et al. (1960) [4] and defined as follows (italics theirs):

  • Operon: “Celle-ci comprendrait des unités d’expression coordonée (opérons) constituées par un opérateur et le group de gènes de structure coodoneés par lui.”

The term replicon was introduced by Jacob and Brenner (1963) [4] and defined as follows (italics theirs):

  • Replicon: “Il est donc clair qu’un chromosome (de bactérie ou de phage) ou un épisome constitue une unité de réplication indépendante ou réplicon, dont la reproduction est régie par la présence et l’activité de certain déterminants qu’il porte. Les caractères des réplicons exigent qu’ils déterminent des systèmes spécifique gouvernant leur propre réplication.”

Near and dear to my heart is the term transposon, which was first introduced by Hedges and Jacob (1974) [7] (italics theirs):

  • Transposon: “We designate DNA sequences with transposition potential as transposons (units of transposition)”

The very commonly used terms intron and exon were defined by Gilbert (1978) [6] as follows:

  • Intron & Exon: “The notion of the cistron, the genetic unit of function that one thought to correspond to a polypeptide chain, must be replaced by that of a transcription unit containing regions which will be lost from the mature messenger – which I suggest we call introns (for intragenic regions) – alternating with regions which will be expressed – exons.”

And finally, Boeke et al. (1985) [8] defined the term retrotransposon in the following passage (italics theirs):

  • Retrotransposon: “These observations, together with the finding that introns are spliced out of the Ty upon transposition, suggest that reverse transcription is a step in the transposition of Ty elements…We therefore propose the term retrotransposon  for Ty and related elements.”

So there you have it, from electron to retrotransposon in just a few steps. I’ve left out some lesser used terms with this suffix for the moment (e.g. regulon, stimulon, modulon), so as not to let this post go -on and -on. If anyone has any major terms to add here or corrections to my reading of the tea leaves, please let me know in the comments below.

[1] Brenner, S. (1995) “Loose end: Molecular biology by numbers… one.” Current Biology 5(8): 964.
[2] Benzer, S. (1957) “The Elementary Units of Heredity.” in Symposium on the Chemical Basis of Heredity p. 70–93.  Johns Hopkins University Press
[4] Jacob, F., et al. (1960) “L’opéron: groupe de gènes à expression coordonnée par un opérateur.” C.R. Acad. Sci. Paris 250: 1727-1729.
[5] Jacob, F., and S. Brenner. (1963) “Sur la regulation de la synthese du DNA chez les bacteries: l’hypothese du replicon.” C. R. Acad. Sci 246: 298-300.
[6] Gilbert, W. (1978) “Why genes in pieces?.” Nature 271(5645): 501.
[7] Hedges, R. W., and A. E. Jacob. (1974) “Transposition of ampicillin resistance from RP4 to other replicons.” Molecular and General Genetics MGG 132(1): 31-40.
[8] Boeke, J.D., et al. (1985) “Ty elements transpose through an RNA intermediate.” Cell 40(3): 491.
Jim Shapiro (University of Chicago) gave very helpful pointers to possible places where the term “transposon” might have originally have been introduced.