Why You Should Reject the “Rejection Improves Impact” Meme

Over the last two weeks, a meme has been making the rounds in the scientific twittersphere that goes something like “Rejection of a scientific manuscript improves its eventual impact”. This idea is based a recent analysis of patterns of manuscript submission reported in Science by Calcagno et al., which has been actively touted in the scientific press and seems to have touched a nerve with many scientists.

Nature News reported on this article on the first day of its publication (11 Oct 2012), with the statement that “papers published after having first been rejected elsewhere receive significantly more citations on average than ones accepted on first submission” (emphasis mine). The Scientist led its piece on the same day entitled “The Benefits of Rejection” with the claim that “Chances are, if a researcher resubmits her work to another journal, it will be cited more often”. Science Insider led the next day with the claim that “Rejection before publication is rare, and for those who are forced to revise and resubmit, the process will boost your citation record”. Influential science media figure Ed Yong tweeted “What doesn’t kill you makes you stronger – papers get more citations if they were initially rejected”. The message from the scientific media is clear: submitting your papers to selective journals and having them rejected is ultimately worth it, since you’ll get more citations when they are published somewhere lower down the scientific publishing food chain.

I will take on faith that the primary result of Calcagno et al. that underlies this meme is sound, since it has been vetted by the highest standard of editorial and peer review at Science magazine. However, I do note that it not possible to independently verify this result since the raw data for this analysis was not made available at the time of publication (contravening Science’s “Making Data Maximally Available Policy“), and has not been made available even after being queried. What I want to explore here is why this meme is so uncritically being propagated in the scientific press and twittersphere.

As succinctly noted by Joe Pickrell, anyone who takes even a cursory look at the basis for this claim would see that it is at best a weak effect*, and is clearly being overblown by the media and scientists alike.

https://twitter.com/joe_pickrell/status/256756126140477442

Taken at face value, the way I read this graph is that papers that are rejected then published elsewhere have a median value of ~0.95 citations, whereas papers that are accepted at the first journal they are submitted to have a median value of ~0.90 citations. Although not explicitly stated in the figure legend or in the main text, I assume these results are on a natural log scale since, based on the font and layout, this plot was most likely made in R and the natural scale is the default in R (also, the authors refer the natural scale in a different figure earlier in the text). Thus, the median number of citations per article that rejection may provide an author is on the order of ~0.1. Even if this result is on the log10 scale, this difference translates to a boost of less than one citation. While statistically significant, this can hardly be described as a “significant increase” in citation. Still excited?

More importantly, the analysis of the effects of rejection on citation ~~is univariate and~~ ignores ~~all~~ most other possible confounding explanatory variables. It is easy to imagine a large number of other confounding effects that could lead to this weak difference (number of reviews obtained, choice of original and final journals, number of authors, rejection rate/citation differences among discipline or subdiscipline, etc., etc.). In fact, in panel B of the same figure 4, the authors show a stronger effect of changing discipline on the number of citations in resubmitted manuscripts. Why a deeper multivariate analysis was not performed to back up the headline claim that “rejection improves impact” is hard to understand from a critical perspective. [UPDATE 26/10/2012: Bala Iyengar pointed out to me a page on the author’s website that discusses the effects of controlling for year and publishing journal on the citation effect, which led me to re-read the paper and supplemental materials more closely and see that these two factors are in fact controlled for in the main analysis of the paper. No other possible confounding factors are controlled for however.]

So what is going on here? Why did Science allow such a weak effect with a relatively superficial analysis to be published in the one of the supposedly most selective journals? Why are major science media outlets pushing this incredibly small boost in citations that is (possibly) associated with rejection? Likewise, why are scientists so uncritically posting links to the Nature and Scientist news pieces and repeating “Rejection Improves Impact” meme?

I believe the answer to the first two questions is clear: Nature and Science have a vested interest in making the case that it is in the best interest of scientists to submit their most important work to (their) highly selective journals and risk having it be rejected. This gives Nature and Science first crack at selecting the best science and serves to maintain their hegemony in the scientific publishing marketplace. If this interpretation is true, it is an incredibly self-serving stance for Nature and Science to take, and one that may back-fire since, on the whole, scientists are not stupid people who blindly accept nonsense. More importantly though, using the pages of Science and Nature as a marketing campaign to convince scientists to submit their work to these journals risks their credibility as arbiters of “truth”. If Science and Nature go so far as to publish and hype weak, self-serving scientometric effects to get us to submit our work there, what’s to say that would they not do the same for actual scientific results?

But why are scientists taking the bait on this one? This is more difficult to understand, but most likely has to do with the possibility that most people repeating this meme have not read the paper. Topsy records over 700 and 150 tweets to the Nature News and Scientist news pieces, but only ~10 posts to the original article in Science. Taken at face value, roughly 80-fold more scientists are reading the news about this article than reading the article itself. To be fair, this is due in part to the fact that the article is not open access and is behind a paywall, whereas the news pieces are freely available**. But this is only the proximal cause. The ultimate cause is likely that many scientists are happy to receive (uncritically, it seems) any justification, however tenuous, for continuing to play the high-impact factor journal sweepstakes. Now we have a scientifically valid reason to take the risk of being rejected by top-tier journals, even if it doesn’t pay off. Right? Right?

The real shame in the “Rejection Improves Impact” spin is that an important take-home message of Calcagno et al. is that the vast majority of papers (>75%) are published in the first journal to which they are submitted. As a scientific community we should continue to maintain and improve this trend, selecting the appropriate home for our work on initial submission. Justifying pipe-dreams that waste precious time based on self-serving spin that benefits the closed-access publishing industry should be firmly: Rejected.

Don’t worry, it’s probably in the best interest of Science and Nature that you believe this meme.

* To be fair, Science Insider does acknowledge that the effect is weak: “previously rejected papers had a slight bump in the number of times they were cited by other papers” (emphasis mine).

** Following a link available on the author’s website, you can access this article for free here.

References
Calcagno, V., Demoinet, E., Gollner, K., Guidi, L., Ruths, D., & de Mazancourt, C. (2012). Flows of Research Manuscripts Among Scientific Journals Reveal Hidden Submission Patterns Science DOI: 10.1126/science.1227833

Mike Taylor says:

October 25, 2012 at 10:21 pm

THANK YOU for taking the time to smack down the curiously blind reporting that has followed this study around.

Reply
- caseybergman says:
  
  October 26, 2012 at 7:06 pm
  
  Thanks, this issue has been bugging me for two weeks and seemed not to be going away. Hopefully this post should stem the tide a bit.
  
  Reply
Jim Woodgett (@jwoodgett) says:

October 25, 2012 at 10:23 pm

Used this exact example in a talk last week about misleading headlines vs actual data in a talk to new grad students but your take is much more compelling.

Reply
- caseybergman says:
  
  October 26, 2012 at 7:09 pm
  
  Glad to hear I’m not the only one trying to beat back this spin!
  
  Reply
Jason H. Moore, Ph.D. says:

October 26, 2012 at 12:25 pm

Overblown small effects….sounds a lot like genome-wide association studies :) The journals are now primed….

Reply
- caseybergman says:
  
  October 26, 2012 at 7:10 pm
  
  Absolutely, I’ll swipe Jim’s idea to use this as a teaching example on effect sizes and P-values in the stats unit I’m teaching on currently.
  
  Reply
Ben Moore (@benjaminlmoore) says:

October 26, 2012 at 1:24 pm

Haven’t read the paper in question but at the time I thought the simpler problem with this conclusion is that high impact journals have higher rejection rates, and papers in high impact journals tend to get more citations. Given that most papers end up in their first submitted journal, an increase in citations seems a foregone conclusion.

Have I missed something? Did they control for this somehow?

Reply
- caseybergman says:
  
  October 26, 2012 at 8:35 pm
  
  Interesting point, and yes I think rejection rates of the original journal are an important confounding factor that should have been taken into account.
  
  Reply
Why You Should Reject the “Rejection Improves Impact” Meme | csid
Ramiro Morales Hojas says:

October 26, 2012 at 2:51 pm

A nice analysis of the subject. Thanks for putting it in writing.

However, is it true that “>75% papers are published in the first journal to which they are submitted”? This is certainly not my experience, and I also doubt it is the case for those researchers with whom I have had any contact. But I don’t have any data to back up this, it is just a feeling that may be biased by my circunstances, as Ortega y Gasset put it “yo soy yo y mis circunstancia”. Also, selecting for the appropriate journal does not grant acceptance. Many times the view you have of your work is not corresponded by what the reviewers see in it. And this could be a point in favour of pre-prints.

Reply
Tim Vines says:

October 26, 2012 at 6:38 pm

Thanks for this analysis. I think you’re conflating the paper itself, which includes this (weak) result as small part of a much broader picture of manuscript flow with the part the media picked up- the authors can’t be blamed for the latter. Everyone and their dog has a strongly held opinion about science publishing (peer review is broken! Science and Nature are Evil! OA journals will destroy the world!), but there’s a massive dearth of sensible studies of what’s actually going on. The Calcagno paper is scientometrics’ equivalent of the human genome, in that its scale vastly surpasses many previous studies. I just wish that they’d release the data so we could get one with evaluating all the other claims and counter claims that people are making.

Reply
- Mike Taylor says:
  
  October 26, 2012 at 6:58 pm
  
  “I just wish that they’d release the data so we could get one with evaluating all the other claims and counter claims that people are making.”
  
  +1
  
  Reply
  - Tim Vines says:
    
    October 26, 2012 at 7:23 pm
    
    I actually just wrote to him and said as much. Maybe they will? It’s also possible that they promised the respondents complete confidentiality, which would tie their hands…
  - Mike Taylor says:
    
    October 26, 2012 at 7:26 pm
    
    Thanks for writing. They can preserve anonymity while still releasing all the data that matters. That ought to be good enough.
  - Mark A. Mandel says:
    
    November 2, 2012 at 4:27 pm
    
    More about submission flows
- caseybergman says:
  
  October 26, 2012 at 8:05 pm
  
  I agree that you can’t blame authors for how the media may pick up your story, but I’m not so sure that the authors have no role in how this message got out. In this regard I note that the authors themselves use the misleading term “significantly more citations” in their abstract (“Resubmissions from other journals received significantly more citations than first-intent submissions…”). This may be just due to terseness of wording in an abstract, but it is undeniably ambiguous. For a politicially contentious result like this, it would have been wise to use more measured language such as “slight but significant increase in citation number”. Would this take on describing the result gotten it into Science?
  
  On the issue of data release, I agree there is no question that the data should have been made available at the time of publication.
  
  Reply
anon says:

October 27, 2012 at 5:31 am

The REAL morale to take from this affair: If you’re a young scholar in need of a career boost, try to write an article whose existence would be beneficial to the old gatekeepers.

Joking aside, I would naturally assume a much more significant effect than they caught: merely by submitting the paper, you make more people aware of it, namely the reviewers themselves. In obscure fields where the total number of people who will read the paper at all is in the single digits, submitting to an extra journal could have a massive impact on the number of total readers!

Reply
vcalcagno says:

October 27, 2012 at 4:51 pm

HI all, thanks for taking time to comment on our results! I’ll just make a few simple points:
-1- I have posted the raw data for more than 75% of our article one week ago on my website and several people downloaded it including Casey. As I have explained on the Science website, I cannot release the last bit (citation counts) because of an ethical issue: this would allow identifying the respondents. McGill’sEtical Board (who had validated the initial study) told us we could not release this data as it would violate what we promised to respondents. So there is no Science Open Data policy here, it is merely ethics I’m afraid. How many social science studies publish the identity of participants by the way?

-2- Figure 4 presents the raw data and are not well-suited to estimate effect size (we do not in the paper). I wish reviewers had been as careful as you guys; we would have improved the Figure! I’ve been lazy on that one I admit. To compensate, I have posted on my research site much more Figures and explanations with effect sizes.

The benefits of rejection, continued

Go see it by yourself. The effect is maybe not huge (but who’d seriously think it would be, a priori?) but I would not say it is tiny or so small to mean nothing. All this is subjective, but an effect is here and it is consistent. Remember that this 3 to 5 years post publication: most articles were cited zero to 50 times. And the effect is quite clear in this range. It is the first time one can look at that, we barely scratched the surface in this Science report.

-3- I do not think there is a conspiracy so that Science published this because it puts them in a good position…. Maybe the 200,000 email sents and 80,000 data points to build the first resubmission network are a more parcimonious explanation. Of course I’m just a junior scientist so I don’t know, maybe I’m too naive… Anyways they took 8 months to have this accepted, I guess they could have been faster if they were so eager to publish propaganda.

-4- We did not control the buzz, or very little. I’m the first one to be annoyed with the over-focus on Figure 4A and speculations around it… So please people, just read the research, and try not to get intoxicated by the buzz.

V

Reply
- caseybergman says:
  
  October 27, 2012 at 8:23 pm
  
  Hi Vincent, thanks so much for taking the time to post your comments here.
  
  Let me first apologize for the initial version of my post containing an inaccuracy about your work – this was an oversight due to my initial misreading. I tried to correct it as soon as possible as @balapagos alerted me to the follow-up post on your blog (I think I missed this probably since the posts were out of chronological order), which made me re-read the supplement more carefully. This was entirely unintentional, and I hope didn’t lead to any further misconceptions about this issue.
  
  1 – As far as making data available, I commend your making the network data available so quickly after publication. My point about data not being available specifically concerned the citation data behind figure 4, which remain inaccessible, not the entire study. I understand there are ethical issues about making citation data available, and I respect the decision to maintain participant confidentiality. In fact, I doubt ISI would let your release this data as well.
  
  Nevertheless, there is the relatively serious issue that in the absence of making these data available, it is impossible to replicate and build on your results. Moreover, not making data available explicitly goes against Science’s own policies. It would also be good to know if the Science editorial team ever requested you to archive the data with them, and what discussions you might have had with the editorial team about making data available during review.
  
  2 – Thanks for taking the time to post this additional analysis, which I see does further support the effect (showing the shift in the distribution of citation counts more clearly). I must say, however, I am not able to estimate the actual magnitude of the effect (i.e. differences in median citation number) from these new plots very well. My interpretation from 4A is that the magnitude of the median difference overall is very small (<1 citation). Is this your interpretation as well? Clearly more analysis would have been useful in the paper, given that this result has become such a hot issue. And again, it is a shame that others can't work on this dataset to help get to the bottom of what is going on here.
  
  3 – To be clear, I have not said there is a conspiracy theory here, and this was the reason the paper was accepted; this was not my intention either. I fully acknowledge the heroic effort you've undertaken to assemble this dataset and the have no issue with the importance of the network results. What I said is that the work is clearly in alignment with the interests of highly selective journals like Nature and Science. When most humans receive a bit of information that is consitent with their preconceptions, they are more likely to accept this information as being true. It is this kind of *hidden bias* that I fear would have been operating in the editorial staff that I am raising as an issue, since a result like this justifies the worldview of organizations like thse. I think it is fair to raise issue with the motivations and biases of the editorial staff of selective journals, simply because biases like these can play a subconscious but important role in the subjective judgement about whether to review or accept a paper, and further how to handle it's post-acceptance press coverage. As far as I know, Science does not make all of its papers available online first through Science Express, so why did they choose to do so for this paper?
  
  Additionally, if the rejection-citation result had been no difference, or a difference in the opposite direction, I doubt this would have contributed to the story much and probably been left on the cutting room floor or buried in the supplement. No difference is a negative result, and Science is not in the business of publishing negative results. A first-submission-leads-to-higher-citation result is expected since people tend to submit to higher impact factor journals first, and if they were accepted on first submission this would naturally lead to this relationship. Thus, there is only one outcome that is *novel* and would lead to the Science editorial staff seeing as having any “wow” factor. This novelty bias is consistent with their orginzational bias, and again I think would lead to a more favorable treatment of the paper relative to a paper, relative to one that just had the network story with no citation analysis or a null/opposite citation result.
  
  I actually take bigger issue with Nature's news piece, which I see as an disingenuous or just plain bad piece of reporting. The "Don't worry, its probably for the best" caption says it all. Why do they lead with Figure 4A? And why are they reporting on a relatively small result in their "arch-rival" journal? Which leads us to…
  
  4 – I'm glad to see we are on the same page here. My intent with this post was honestly only to provide counter-spin to the massive buzz that is being uncritically propogated by the press and many of my colleagues. Yes, this happens all the time in science, this is not unusual in this sense. What is special about this instance is that this meme touches on an important issue that affects the lives and livelihoods of many young scientists, and is seriously damaging the flow of scientific information in the view of many people. The idea that "you should always submit to the highest impact factor journal first and risk rejection" or that "if you papers aren't getting rejected, you're not submitting high enough" are sadly very prevalant and are forced on many junior scientists by senior scientists and technocrats. This misguided viewpoint overflows the publishing system with many excess submissions, places a heavy burden on the reviewer pool, and leads to scientists over-hyping their work to get into the highest impact journals. The latter effect is very serious, as it may in fact be leading to a distortion of the science that gets published, and may behind why higher impact journals have a higher retraction rate. Thus, I view the "rejection-improves-impact" meme as a very dangerous idea, which has political and economic importance outside the academic confines of scientometrics. Thus I felt that the one-sided media campaign needed a voice on the other side to force people to actually look at the data and consider why the buzz is being generated. I'm fully sympathetic to how you must feel about your work getting caught in the cross-fire and I'm grateful that you've not taken this issue personally. Again this was not my intent.
  
  Reply
Derek Ruths says:

October 27, 2012 at 6:54 pm

Thanks very much for this thoughtful post as well as the followup comments. I’d like to add a couple thoughts to Vincent’s above.

In my reading of the discussion to this point, I see two big concerns. First that the results of the paper have been overstated, particularly in the media; second, that the raw data underlying the study has not been made available.

On the first point, I have two comments.

1. I completely agree that media has reduced the paper to a rather misleading tagline. Beyond overstating the results, I’m also very concerned about the causal relationship that is suggested: we most certainly aren’t suggesting that getting your paper rejected makes it better. It’s the process of receiving critical feedback and revising it with this in mind. It’s not the rejection part that of the process that makes a difference. I suspect that most scientists realize this, but it still bothers me that it’s being stated this way. More to the broader point of media coverage, this is incredibly hard to control. Furthermore, media outlets are both trying to drum up readers and make the results of the paper accessible to a broad audience. So some simplification is to be expected. Note that I’m not saying that the meme level status that this idea has reached is an appropriate outcome, just that overstating things seems to be something that the media in its current form is good at. Countering this is very hard.

2. In terms of the statistical significance and the size of the change, I agree that the effect isn’t earth-shatteringly large. Certainly, I don’t think anyone would expect the change to be massive. However, keep in mind that it was impossible to control for every confounding variable. Thus, the observed effect is attenuated by the presence of other confounding factors that we couldn’t remove (this is simply due to the size of the dataset and the variables that were available to us). So the way I interpret the published result is that (a) there *is* an effect but (b) we really don’t know how big it is. I suspect, for instance, that the size of the effect could vary by field, journal, and publisher. These would all be really interesting factors to tease out in a followup study.

As for the raw data, we would *LOVE* to share it – possibly me most of all. Last year I was the data chair at a large data mining conference (ICWSM) and created an initiative to encourage data sharing for all papers published at the conference. So I deeply believe that data sharing is important. However, so is upholding the agreements we make to the participants in research. In the case of this work, each participant was guaranteed the anonymity of their response. Consider what this means: if we release data that allows the identity of even *one* author to be reverse engineered, we have broken the contract that made this research possible in the first place. So it’s true that the McGill Board of Ethics ruled against sharing the data. But, realistically, even before we received their ruling, it was clear that sharing the raw data was out of the question. We’ve discussed different ways of anonymizing it and simply couldn’t come up with any strategies that would fully protect the identities and responses of the authors while delivering some near-complete version of the data would be useful for validating our results or conducting further research. I’m very sympathetic to anyone who is frustrated with this outcome, but in this case we really don’t have a choice.

Reply
- caseybergman says:
  
  October 27, 2012 at 9:00 pm
  
  Hi Derek – Thanks also for posting your views on what is apparently a hotter topic than you might have imagined.
  
  1 – It is great to hear that you are also concerned about the way this result is being spun, especially the issue of causality. As I note in my reply to Vincent, I trust you understand my post was meant to help counter the spin making the rounds in the media, and more importantly, in coffee rooms, PI offices and tenure committee meetings all around the world. Possibly this might have better come from the authors themselves, and I apologize if this post has come across as being too harsh about your work.
  
  2 – Clearly there is still much to do on this topic, and I would be particilar interested to see how the number of reviews (which you probably don’t have data on), rejection rates and discipline affect these results. Though, I’m not sure whether controlling for other factors necessarily would be expected to lead to an increase in the effect size, as it may in fact be a correlated confounding variable that is leading to the “rejection” effect. This raises the general problem with data mining — one never knows when to quit incorporating additional factors, and moreover, in the end it is very difficult to prove some unkown factor isn’t still lurking in the shadows.
  
  Which somewhat begs the entire question of the value of citation analyses like these – what ultimately can they *prove*, and therefore what is their value scientifically? I’ve dabbled in this topic myself (who hasn’t?) and come out the other side of the rabbit hole with the impression that scientometrics is ultimately quantitative navel gazing. The problem is that technocrats take the numbers seriously, and thus scientometrics quickly becomes a tool of oppression, both by scientists and the publishing industry. Maybe I’m too cynical about this, but after having grand schemes of how to predict (and write grants on) future high-impact research based on citation analysis, I’ve decided to get on with actually doing the science I know and love.
  
  Reply
Energy, the Russell group, Research Advisory Panel and the Research Environment | Professor Douglas Kell's blog
Links 11/3/12 | Mike the Mad Biologist