The Cost to Science of the ENCODE Publication Embargo

The big buzz in the genomics twittersphere today is the release of over 30 publications on the human ENCODE project. This is a heroic achievement, both in terms of science and publishing, with many groundbreaking discoveries in biology and pioneering developments in publishing to be found in this set of papers. It is a triumph that all of these papers are freely available to read, and much is being said elsewhere in the blogosphere about the virtues of this project and the lessons learned from the publication of these data. I’d like to pick up here on an important point made by Daniel MacArthur in his post about the delays in the publication of these landmark papers that have arisen from the common practice of embargoing papers in genomics. To be clear, I am not talking about embargoing the use of data (which is also problematic), but embargoing the release of manuscripts that have been accepted for publication after peer review.

MacArthur writes:

Many of us in the genomics community were aware of the progress the [ENCODE] project had been making via conference presentations and hallway conversations with participants. However, many other researchers who might have benefited from early access to the ENCODE data simply weren’t aware of its existence until today’s dramatic announcement – and as a result, these people are 6-12 months behind in their analyses.

It is important to emphasize that these publication delays are by design, and are driven primarily by the journals that set the publication schedules for major genomics papers. I saw first-hand how Nature sets the agenda for major genomics papers and their associated companion papers as part of the Drosophila 12 Genomes Project. This insider’s view left a distinctly bad taste in my mouth about how much control a single journal has over some of the most important community resource papers that are published in Biology.  To give more people insight into this process, I am posting the agenda set by Nature for publication (in reverse chronological order) of the main Drosophila 12 Genomes paper, which went something like this:

7 Nov 2007: papers are published, embargo lifted on main/companion papers
28 Sept 2007: papers must be in production
21 Sept 2007: revised versions of papers received
17 Aug 2007: reviews are returned to authors
27 Jul 2007: papers are submitted

Not only was acceptance of the manuscript essentially assumed by the Nature editorial staff, the entire timeline was spelled out in advance, with an embargo built in to the process from the outset. Seeing this process unfold first hand was shocking to me, and has made me very skeptical of the power that the major journals have to dictate terms about how we, and other journals, publish our work.

Personally, I cannot see how this embargo system serves anyone in science other than the major journals. There is no valid scientific reason that major genome papers and their companions cannot be made available as online accepted preprints, as is now standard practice in the publishing industry. As scientists, we have a duty to ensure that the science we produce is released to the general public and community of scientists as rapidly and openly as possible. We do not have a duty to serve the agenda of a journal to increase their cachet or revenue stream. I am aware that we need to accept delays due to quality control via the peer review and publication process. But the delays due to the normal peer review process are bad enough, as ably discussed recently by Leslie Voshall. Why on earth would we accept that journals build in further unnecessary delays into the publication process?

This of course leads to the pertinent question: how harmful is this system of embargoes? Well, we can estimate put an upper estimate on * this pretty easily from the submission/acceptance dates of the main and companion ENCODE papers (see table below). In general, most ENCODE papers were embargoed for a minimum of 2 months but some were embargoed for up to nearly 7 months. Ignoring (unfairly) the direct impact that these delays may have on the careers of PhD students and post-docs involved, something on the order of 112 months of access to these important papers have been lost to all scientists by this single embargo. Put another way, nearly up to * 10 years of access time to these papers has been collectively lost to science because of the ENCODE embargo. To the extent that these papers are crucial for understanding the human genome, and the consequences this knowledge has for human health, this decade lost to humanity is clearly unacceptable. Let us hope that the ENCODE project puts an end to the era of journal-mandated embargoes in genomics.

DOI Date Received Date Accepted Date published Months in review Months in embargo
nature11247 24-Nov-11 29-May-12 05-Sep-12 6.0 3.2
nature11233 10-Dec-11 15-May-12 05-Sep-12 5.1 3.6
nature11232 15-Dec-11 15-May-12 05-Sep-12 4.9 3.6
nature11212 11-Dec-11 10-May-12 05-Sep-12 4.9 3.8
nature11245 09-Dec-11 22-May-12 05-Sep-12 5.3 3.4
nature11279 09-Dec-11 01-Jun-12 05-Sep-12 5.6 3.1
gr.134445.111 06-Nov-11 07-Feb-12 05-Sep-12 3.0 6.8
gr.134957.111 16-Nov-11 01-May-12 05-Sep-12 5.4 4.1
gr.133553.111 17-Oct-11 05-Jun-12 05-Sep-12 7.5 3.0
gr.134767.111 11-Nov-11 03-May-12 05-Sep-12 5.6 4.0
gr.136838.111 21-Dec-11 30-Apr-12 05-Sep-12 4.2 4.1
gr.127761.111 16-Jun-11 27-Mar-12 05-Sep-12 9.2 5.2
gr.136101.111 09-Dec-11 30-Apr-12 05-Sep-12 4.6 4.1
gr.134890.111 23-Nov-11 10-May-12 05-Sep-12 5.5 3.8
gr.134478.111 07-Nov-11 01-May-12 05-Sep-12 5.7 4.1
gr.135129.111 21-Nov-11 08-Jun-12 05-Sep-12 6.5 2.9
gr.127712.111 15-Jun-11 27-Mar-12 05-Sep-12 9.2 5.2
gr.136366.111 13-Dec-11 04-May-12 05-Sep-12 4.6 4.0
gr.136127.111 16-Dec-11 24-May-12 05-Sep-12 5.2 3.4
gr.135350.111 25-Nov-11 22-May-12 05-Sep-12 5.8 3.4
gr.132159.111 17-Sep-11 07-Mar-12 05-Sep-12 5.5 5.9
gr.137323.112 05-Jan-12 02-May-12 05-Sep-12 3.8 4.1
gr.139105.112 25-Mar-12 07-Jun-12 05-Sep-12 2.4 2.9
gr.136184.111 10-Dec-11 10-May-12 05-Sep-12 4.9 3.8
gb-2012-13-9-r48 21-Dec-11 08-Jun-12 05-Sep-12 5.5 2.9
gb-2012-13-9-r49 28-Mar-12 08-Jun-12 05-Sep-12 2.3 2.9
gb-2012-13-9-r50 04-Dec-11 18-Jun-12 05-Sep-12 6.4 2.5
gb-2012-13-9-r51 23-Mar-12 25-Jun-12 05-Sep-12 3.0 2.3
gb-2012-13-9-r52 09-Mar-12 25-May-12 05-Sep-12 2.5 3.3
gb-2012-13-9-r53 29-Mar-12 19-Jun-12 05-Sep-12 2.6 2.5
Min 2.3 2.3
Max 9.2 6.8
Avg 5.1 3.7
Sum 152.7 112.1


* Based on a converation on twitter with Chris Cole, I’ve revised this to be estimate to reflect the upper bound, rather than a point estimate of time lost to science.

19 thoughts on “The Cost to Science of the ENCODE Publication Embargo

  1. Thank you very much for your post.
    Do you think that there are advantages of this embargo?

    Personally, I feel a bit overwhelmed by these 30 articles published at once. I don’t think I am going to read all of them. But if these articles were published on different dates, it would have been even more difficult to follow them all. At least, we are going to have a lot of posts in the blogosphere about these articles in the following days.

    • “Do you think that there are advantages of this embargo?”

      I can see no direct advantages to science for embargoes. I can, however, see advantages to the journal Nature, in that they get to control the publication process. Steve Russell (University of Cambridge) noted on twitter that “Funders need press big splashes to leverage more cash from their masters”, which is an argument in favor of an indirect advantage to science via funding. I am less inclined to think that politicians (at least in the UK) are that easily swayed by a media circus, and make key societal decisions based on rational debate.

      “But if these articles were published on different dates, it would have been even more difficult to follow them all”

      I’m not sure the all-at-once vs at-acceptance would make much of a difference. Couldn’t the same umbrella have been constructed after all the papers have been published? In fact, I’d argue that having too many papers to read at once will prevent most people from reading them all in a timely manner.

  5. I’m surprised you didn’t mention the role pre-publication servers (e.g., arXiv) can play in bypassing embargoes. Nature (but not Genome Research, if I recall correctly) allows submission to pre-pub servers, so, presumably, this issue could be avoided if the ENCODE consortium had submitted their manuscripts to arXiv.

  6. Interesting to think how the arXiv would work for consortiums. You’d probably need reasonable buy in from members, otherwise they’d be some serious disagreement. Likely to cause problems, but feel like this kind of disruption would be good in long term. Too much time is wasted by wasted waiting for big publicly-funded resources projects to publish.

    • I’m pretty sure if one group broke ranks and submitted to arXiv, there is nothing that the consortium masters could do.

      However, as long as Genome Research doesn’t permit depositing preprints and remains the main journal for genome companion papers, their regressive policy will effectively keep groups from exercising their right to release their papers as quickly as possible.

  9. “However, as long as Genome Research doesn’t permit depositing preprints and remains the main journal for genome companion papers, their regressive policy will effectively keep groups from exercising their right to release their papers as quickly as possible.”

    i) PNAS is a good alternative.

    ii) If over 30 visible bioinformatics blogs post a common commentary on the same day asking Genome Research to change policy regarding, that will work. What do you think? Want to turn the table on them? :)

  12. When it comes to journals, I have come to accept that the way they do things are not always thought through very well.

    I think that this embargo actually *lowers* citation rates. If other people are like me, they see this big ball of hundreds of pages of papers, skim the abstracts and just not find the time to read a lot of it. Usually, scientific readers get a new journal, find one or two interesting papers, put them onto their desk and eventually read them on the train/plane/etc. I doubt that most researchers can accomodate their reading habits easily to this very unusual pattern that goes “here are some hundred pages of results that take 2 years to read, all published on one single day, read it now”. I don’t, have other things to do this week. And next week.

    It probably sounded good for the management of Nature, like the presentation of a car or the presentation of a new operating system. But we are not car drivers and Microsoft has learnt that by presenting Windows 8 in many small steps, previews and betas, they can have a much smoother launch and get more people interested.

    But that’s just my opinion, not hard data. Someone with Scopus access could test the hypothesis within 10 minutes using the first ENCODE special issue from a few years ago. Was the citation rate of the analysis papers (not the main one in Nature) lower than the average citation rate of a Genome Research paper? How about the distribution of citation rates within these papers?

