Simplifying Access to Paywalled Literature with Mobile Vouchers

Increasingly I read new scientific papers on a mobile device, often at home in the evening when I’m not on my university’s network. Most of the articles I read come from scientists on Twitter, Twitterbots or RSS feeds, which I try to read directly from my Twitter or RSS clients (Tweetbot and Feedly for iOS, respectively). Virtually every day, I hit paywalls trying to read non-open access papers from these sources, which aggravate me, waste my time, and require a variety of workarounds to (legally) access papers that differ depending on the publisher/journal.

For publishers that expose an obvious “Institutional login” option, I will typically try to log in using the UK Federation Shibboleth authentication system, which uses my university credentials. But Tweetbot and Feedly don’t store my Shibboleth user/pass, so for each article I either have to manually enter my user/pass, or open the page in Safari where my Shibboleth user/pass are stored. This app switch breaks my flow and leads to tab proliferation, neither of which are optimal. Some journals that use an institutional login temporarily store my details for around a week so I don’t have to do this every time I read a paper, but I still find myself entering the my details for the same journals over and over.

For journals that don’t have an institutional login option or hide this option from plain view, I tend to switch from Twitter/RSS to my IPad Settings in order to log in to my university VPN. The VPN login on my iPad similarly does not store my password, requiring me to type in my university password over and over. This wouldn’t be such a big deal, but my university’s requirement of including one uppercase Egyptian hieroglyph and one lowercase Celtic rune makes entering my password with the iOS keyboard a hassle.

In going through this frustrating routine yet again today trying to access an article in Genetics, I stumbled on a nice feature that I hadn’t seen before called “Mobile Vouchers” that allows me to avoid this rigmarole in the future. As explained on the Genetics Mobile Voucher FAQ:

A voucher is a code that will tie your mobile device to your institution’s subscriptions. This voucher will grant you access to protected content while not on your institution’s network. Each mobile device must be vouched for individually and vouchers are only valid for the publisher for which it is issued.

Obtaining a voucher is super easy. If you are not on your university network, you first need to be logged into your VPN to obtain a voucher. Once on your university network, just visit http://www.genetics.org/voucher/get, enter your name/email address and then submit. This will issue a voucher that you can use immediately to authenticate your device (it will also email you with this information). Voilà, no paywalls for Genetics on your iPad for the next six months or so. In addition to decreasing frustration and increasing flow for scientists, I can see this technology being really useful for PhD students, postdocs and visiting scientists to retain access to the literature for a few months after the end of their positions.

I was surprised I hadn’t seen this before, since it eliminates one of my chronic annoyances as a consumer of the digital scientific literature. Maybe others would disagree, but I would say that publishers haven’t done a very good job of advertising this very useful feature. Googling around, I didn’t find much on mobile vouchers other than a SlideShare presentation from Highwire press from 2011, which suggests the technology has been around for some time:

 

I also couldn’t find much information on which journals offer this service, but a few google searches led me to the following list of publishers/journals that offer mobile vouchers. It appears that most of these journals use HighWire press to serve their content, and that vouchers can operate at the publisher (e.g. Oxford University Press) or journal (e.g. Genetics, PNAS) scale. The OUP voucher is particularly useful since it covers Molecular Biology and Evolution and Bioinformatics, which (together with Genetics) are the journals I hit paywalls for most frequently. Since these vouchers do expire eventually, I thought it would be good to bookmark these links for future use and to highlight this very useful tech tip. Links to other publishers and any other information on mobile vouchers would be most welcome in the comments.

Oxford University Press
http://services.oxfordjournals.org/site/subscriptions/mobile-voucher-faq.xhtml

Royal Society
http://admincenter.royalsocietypublishing.org/cgi/voucher-use

Rockefeller Press
http://www.rupress.org/site/subscriptions/mobile-voucher-faq.xhtml

Lyell
http://www.lyellcollection.org/site/subscriptions/mobile-voucher-faq.xhtml

Sage
http://online.sagepub.com/site/subscriptions/mobile-voucher-faq.xhtml

BMJ
http://journals.bmj.com/site/subscriptions/mobile-voucher-faq.xhtml

AACR
http://www.aacrjournals.org/site/Access/mobile_vouchers.xhtml

Genetics
http://www.genetics.org/site/subscriptions/mobile-voucher-faq.xhtml

PNAS
http://www.pnas.org/site/subscriptions/mobile-voucher-faq.xhtml

JBC
http://www.jbc.org/site/subscriptions/mobile-voucher-faq.xhtml

Endocrine
http://www.eje-online.org/site/subscriptions/mobile-voucher-faq.xhtml

J. Neuroscience
http://www.jneurosci.org/site/subscriptions/mobile-voucher-faq.xhtml

GeoScienceWorld
http://www.geoscienceworld.org/site/subscriptions/mobile-voucher-faq.xhtml

Economic Geology
http://www.segweb.org/SEG/Publications/SEG/_Publications/Mobile_Vouchers.aspx

Keeping Up with the Scientific Literature using Twitterbots: The FlyPapers Experiment

A year ago I created a simple “twitterbot” to stay on top of the Drosophila literature called FlyPapers, which tweets links to new abstracts in Pubmed and preprints in arXiv from a dedicated twitter account (@fly_papers). While most ‘bots on Twitter post spam or creative nonsense, an increasing number of people are exploring the use of twitterbots for more productive academic purposes. For example, Rod Page set up the @evoldir twitterbot way back in 2009 as an alternative to receiving email posts to the Evoldir mailing list, and likewise Gordon McNickle developed the @EcoLog_L twitterbot for the Ecolog-L mailing list. Similar to FlyPapers, others have established twitterbots for domain-specific literature feeds, such as the @BioPapers  for Quantitative Biology preprints on arXiv, @EcoEvoJournals for publications in the areas of Ecology & Evolution and @PlantEcologyBot for papers on Plant Ecology. More recently, Alberto Acerbi developed the @CultEvoBot to post links to blogs and new articles on the topic of cultural evolution. (I recommend reading posts by Rod, Gordon and Alberto for further insight into how and why they established these twitterbots.) One year in, I thought I’d summarize my thoughts on the FlyPapers experiment, and to make good on a promise I made to describe my set-up in case others are interested.

First, a few words on my motivation for creating FlyPapers. I have been receiving a daily update of all papers in the area of Drosophila in one form or another for nearly 10 years. My philosophy is that it is relatively easy to keep up on a daily basis with what is being published, but it’s virtually impossible to catch up when you let the river of information flow for too long. I first started receiving daily email updates from NCBI, which cluttered up my inbox and often got buried. Then I migrated to using RSS on Google Reader, which led to a similar problem of many unread posts accumulating that needed to be marked as “read”. Ultimately, I realized what I want from a personalized publication feed — a flow of links to articles that can be quickly scanned and clicked, but which requires no other action and can be ignored when I’m busy — was better suited to a Twitter client than a RSS reader. Moreover, in the spirit of “maximizing the value of your keystrokes“, it seemed that a feed that was useful for me might also be useful for others, and that Twitter was the natural medium to try sharing this feed since many scientists are already using twitter to post links to papers. Thus FlyPapers was born.

Setting up FlyPapers was straightforward and required no specialist know-how. I first created a dedicated Twitter account with a “catchy” name. Next, I created an account with dlvr.it, which takes a RSS/Twitter/email feed as input and routes the output to the FlyPapers Twitter account. I then set up an RSS feed from NCBI based on a search for the term “Drosophila” and add this as a source to the dlvr.it route. Shortly thereafter, I added a RSS feed for preprints in Arxiv using the search term “Drosophila” and added this to the same dlvr.it route. (Unfortunately, neither PeerJ Preprints nor bioRxiv currently have the ability to set up custom RSS feeds, and thus are not included in the FlyPapers stream.) NCBI and Arxiv only push new articles once a day, and each article is posted automatically as a distinct tweet for ease of viewing, bookmarking and sharing. The only gotcha I experienced in setting the system up was making sure when creating the Pubmed RSS feed to set the “number of items displayed” high enough (=100). If the number of articles posted in one RSS update exceeds the limit you set when you create the Pubmed RSS feed, Pubmed will post a URL to a Pubmed query for the entire set of papers as one RSS item, rather than post links to each individual paper. (For Gordon’s take on how he set up his Twitterbots, see this thread.) [UPDATE 25/2/14: Rob Lanfear has posted detailed instructions for setting up a twitterbot using the strategy I describe above at https://github.com/roblanf/phypapers. See his comment below for more information.]

So, has the experiment worked? Personally, I am finding FlyPapers a much more convenient way to stay on top of the Drosophila literature than any previous method I have used. Apparently others are finding this feed useful as well.

One year in, FlyPapers now has 333 followers in 16 countries, which is a far bigger and wider following than I would have ever imagined. Some of the followers are researchers I am familiar with in the Drosophila world, but most are students or post-docs I don’t know, which suggests the feed is finding relevant target audiences via natural processes on Twitter. The account has now posted 3,877 tweets, or ~10-11 tweets per day on average, which gives a rough scale for the amount of research being published annually on Drosophila. Around 10% of tweeted papers are getting retweeted (n=386) or favorited (n=444) by at least one person, and the breadth of topics being favorited/retweeted spans virtually all of Drosophila biology. These facts suggest that developing a twitterbot for domain-specific literature can indeed attract substantial numbers of like-minded individuals, and that automatically tweeting links to articles enables a significant proportion of papers in a field to easily be seen, bookmarked and shared.

Overall, I’m very pleased with the way FlyPapers is developing. I had hoped that one of the outcomes of this experiment would be to help promote Drosophila research, and this appears to be working. I had not expected it would act as a general hub for attracting Drosophila researchers who are active on Twitter, which is a nice surprise. One issue I hadn’t considered a year ago was the potential that ‘bots like FlyPapers might have to “game” Altmetics scores. Frankly, any metric that would be so easily gamed by a primitive bot like FlyPapers probably has no real intrisic value. However, it is true that this bot does add +1 to the twitter count for all Drosophila papers. My thoughts on this are that any attempt to correct the potential influence of ‘bots on Altmetrics scores should unduly not penalize the real human engagement bots can facilitate, so I’d say it is fair to -1 the orginal FlyPapers tweets in an Altmetrics calculation, but retain the retweets created by humans.

One final consequence of putting all new Drosophila literature onto Twitter that I would not have anticipated is that some tweets have been picked up by other social media outlets, including disease-advocacy accounts that quickly pushed basic research findings out to their target audience:

This final point suggests that there may be wider impacts from having more research articles automatically injected into the Twitter ecosystem. Maybe those pesky twitterbots aren’t always so bad after all.

UPDATE:

For those interested in setting up their own scientific twitterbot, see Rob Lanfear’s excellent and easy-to-follow instructions here. Peter Carlton has also outlined another method for setting up a twitterbot here, as has Sho Iwamoto here.

RELATED POSTS:

Battling Administrivia Using an Intramural Question & Answer Forum

The life of a modern academic involves juggling many disparate tasks, and like a computer using more physical memory than it has, swapping between various tasks leads to inefficiency and low performance in our jobs. Personally, the time fragmentation and friction induced by transitioning from task to task seems to be one of the main sources of stress in my work life.  The main reason for this is that many daily tasks on my to-do list are essential but fiddly and time-consuming administrivia (placing orders, filling in forms, entering marks into a database) that prevent me from getting to the things that I enjoy about being an academic: doing research, interacting with students, reading papers, etc.

I would go so far as to say that the mismatch between the desires of most academics and the reality of their jobs is the main source of academic “burnout” and low morale in what otherwise should be an awesome profession. I would also venture that administrivia is one of the major sources of the long hours we endure, since after wading through the “chaff”, we will (dammit!) put in the time on nights and weekends for the things we are most passionate about to sustain our souls. And based on the frequency of sentiments relating to this topic flowing through my Twitter feed, I’d say the negative impact of adminsitrivia is a pervasive problem in modern academic life, not restricted to any one institute.

While it is tempting to propose ameliorating the administrivia problem by simply eliminating bureaucracy, the growth of the administrative sector in higher education makes this solution a virtual impossibility. I have ultimately become resigned to the fact that the fundamentally inefficient nature of university bureaucracy cannot be structurally reformed and begun to seek other solutions to make my work life better. In doing so, I believe I’ve hit on a simple solution to the adminstrivia problem that I’m hoping might help others as well. In fact, I’m now convinced this solution is simple and powerful enough to actually be effective.

Accepting that it cannot be fully eliminated, my view is that the key to reducing the time and morale burden of administrivia is to realize that most routine tasks in University life are just protocols that require some amount of tacit knowledge about policies or procedures. Thus, all that is needed to reduce the negative impact of administrivia to its lowest possible level is to develop a system whereby accurate and relevant protocols can be placed at one’s fingertips so that they can be completed as fast as possible. The problem is that such protocols either don’t exist, don’t exist in a written form, or exist as scattered documents across various filesystems and offices that you have to expend substantial time finding. So how do we develop such protocols without generating more bureaucracy and exacerbating the problem we are attempting to solve?

My source of inspiration for ameliorating administrivia with minimal overhead comes from the positive experiences I have had using online Question and Answering (Q & A) forums based on the Stack Exchange model (principally the BioStars site for answering questions about bioinformatics).  For those not familiar with such systems, the Q & A model popularized by the Stack Exchange platform (and its clones) is a system that allows questions to be asked and answers to be voted on, moderated, edited and commented on in a very intuitive and user-friendly manner. For some reason I am not able to fully explain, the engineering behind the Q & A model naturally facilitates both knowledge exchange and community building in a way that is on the whole extremely positive, and seems to prevent the worst aspects of human nature commonly found on older internet forums and commenting systems.

So here is my proposal to battling the impact of academic administrivia: implement an intramural, University-specific Q & A forum for academic and administrative staff to pose and answer each other’s practical questions, converting tacit knowledge stored in people’s heads, inboxes and intranets into a single knowledge-bank that can be efficiently used and re-used by others who have the same queries. The need for an “intramural” solution and the reason this strategy cannot be applied globally, as it has for Linux administration, Poker or Biblical Hermeneutics, is that Universities (for better or worse) have their own local policies and procedures that can’t be easily shared or benefit from general worldwide input.

We have been piloting the use of the Open Source Question Answer (OSQA) platform (a clone of Stack Exchange) among a subset of our faculty for about a year, with good uptake and virtually unanimous endorsement from everyone who has used it. We currently require a real name policy for users, have limited the system to questions of procedure only, and have encouraged users to answer their own questions after solving burdensome tasks. To make things easy to administer technically, we are using an out of the box virtual machine of OSQA provided by Bitnami. The anonymized screenshot below gives a flavor of the banal, yet time-consuming queries that arise repeatedly in our institution that such a system makes easier to accomplish. I trust colleagues at other institutions will find similar tasks frustratingly familiar.

Untitled

The main reason I am posting this idea now is that I am scheduled to give a demo and presentation to my Dean and management team this week to propose rolling this system out to a wider audience. In preparation for this pitch, I’ve been trying to assemble a list of pros and cons that I am sure is incomplete and would benefit from the input of other people familiar with how Universities and Q & A platforms work.

The pros of an intramural Q & A platform for battling administrivia I’ve come up with so far include:

  • Increasing efficiency, leading to higher productivity for both academic and administrative staff;
  • Reducing the sense of frustration about bureaucratic tasks, leading to higher morale;
  • Improving sense of empowerment and community among academic and administrative staff;
  • Providing better documentation of procedures and policies;
  • Serving as an “aide memoire”;
  • Aiding the success of junior academic staff;
  • Ameliorating the effects of administrative turnover;
  • Providing a platform for people who may not speak up in staff meetings to contribute;
  • Allows “best practice” to emerge through crowd-sourcing;
  • Identifying common problems that should be prioritized for improvement;
  • Identifying like-minded problem solvers in a big institution;
  • Integrating easily around existing IT platforms;
  • Ability to be deployed at any scale (lab group, department, faculty, school, etc.)
  • Allows information to be accessible 24/7 when admininstrative offices are closed (H/T @jdenavascues).

I confess struggling to find true cons, but these might include (rejoinders in parentheses):

  • Security risks (can be solved with proper administration and authentication)
  • Inappropriate content (real name policy should minimize, can be solved with proper moderation);
  • Answers might be “impolitic” (real name policy should minimize, can be solved with proper moderation; H/T @DrLabRatOry)
  • Time wasting (unlikely since whole point is to enhance productivity);
  • Lack of uptake (even if the 90-9-1 rule applies, it is an improvement on the status quo);
  • Perceived as threat to administrative staff (far from it, this approach benefits administrative staff as much as academic staff);
  • Information could be come stale (can be solved with proper moderation and periodic updating).

I’d be very interested to get feedback from others about this general strategy (especially by Tues PM 17 Sep 2013), thoughts on related efforts, or how intramural Q & A platforms could be used in other ways in an academic setting beyond battling administrivia in the comments below.

Related Posts:

Twitter Tips for Scientific Journals

The growing influence of social media in the lives of Scientists has come to the forefront again recently with a couple of new papers that provide An Introduction to Social Media for Scientists and a more focussed discussion of The Role of Twitter in the Life Cycle of a Scientific Publication. Bringing these discussions into traditional journal article format is important for spreading the word about social media in Science outside the echo chamber of social media itself. But perhaps more importantly, in my view, is that these motivating papers reflect a desire for Scientists to participate, and urge others to participate, in shaping a new space for scientific exchange in the 21st century.

Just as Scientists themselves are adopting social media, many scientific journals/magazines are as well. However, most discussions about the role of social media in scientific exchange overlook the issue of how we Scientists believe traditional media outlets, like scientific journals, should engage in this new forum. For example in the Darling et al. paper on the The Role of Twitter in the Life Cycle of a Scientific Publication, little is said about the role of journal Twitter accounts in the life cycle of publications beyond noting:

…to encourage fruitful post-publication critique and interactions, scientific journals could appoint dedicated online tweet editors who can storify and post tweets related to their papers.

This oversight is particularly noteworthy for several reasons. First, it is fact that many journals, and journal staff, play active roles in engaging with the scientific debate on social media and are not simply passive players in the new scientific landscape.  Second, Scientists need to be aware that journals extensively monitor our discussions and activity on social media in ways that were not previously possible, and we need to consider how this affects the future of scientific publishing. Third, Scientists should see social media represents an opportunity to establish new working relationships with journals that break down the old models that increasingly seem to harm both Science and Scientists.

In the same way that we Scientists are offering tips/advice to each other for how to participate in the new media, I feel that this conversation should also be extended to what we feel are best practices for journals to engage in the scientific process through social media. To kick this off, I’d like to list some do’s and don’ts for how I think journals should handle their presence on Twitter, based on my experiences following, watching and interacting with journals on Twitter over the last couple of years.

  • Do engage with (and have a presence on) social media. Twitter is rapidly on the uptake with scientists, and is the perfect forum to quickly transmit/receive information to/from your author pool/readership. I find it a little strange in fact if a journal doesn’t have a Twitter account these days.
  • Do establish a social media policy for your official Twitter account. Better yet, make it public, so Scientists know the scope of what we should expect from your account.
  • Don’t use information from Twitter to influence editorial or production processes, such as the acceptance/rejection of papers or choice of reviewers.  This should be an explicit part of your social media policy. Information on social media could be incorrect and by using unverified information from Twitter you could allow competitors/allies to block/promote each other’s work.
  • Don’t use a journal Twitter account as a table of contents for your journal. Email TOCs or RSS feeds exist for this purpose already.
  • Do tweet highlights from your journal or other journals. This is actually what I am looking for in a journal Twitter account, just as I am from the accounts of other Scientists.
  • Do use journal accounts to retweet unmodified comments from Scientists or other media outlets about papers in your journal. This is a good way for Scientists to find other researchers interested in a topic and know what is being said about work in your journal. But leave the original tweet intact, so we can trace it to the originator and so it doesn’t look like you have edited the sentiment to suit your interests.
  • Don’t use journal account to express personal opinions. I find it totally inappropriate that individuals at some journals hide behind the journal name and avatar to use journal twitter accounts as a soapbox to express their personal opinions. This is a really dangerous thing for a journal to do since it reinforces stereotypes about the fickleness of editors that love to wield the power that their journal provides them. It’s also a bad idea since the opinions of one or a few people may unintentionally affect a journal or publisher.
  • Do encourage your staff to create personal accounts and be active on social media. Editors and other journal staff should be encouraged to express personal opinions about science, tweet their own highlights, etc. This is a great way for Scientists to get to know your staff (for better or worse) and build an opinion about who is handling our work at your journal. But it should go without saying that personal opinions should be made through personal accounts, so we can follow/unfollow these people like any other member of the community and so their opinions do not leverage the imprimatur of your journal.
  • Do use journal Twitter accounts to respond to feedback/complaints/queries. Directly replying to comments from the community on Twitter is a great way to build trust in your journal.  If you can’t or don’t want to reply to a query in the open, just reply by asking the person to email your helpdesk. Either way shows good faith that you are listening to our concerns and want to engage. Ignoring comments from Scientists is bad PR and can allow issues to amplify beyond your control, with possible negative impacts on your journal (image) in the long run.
  • Don’t use journal Twitter accounts to tweet from meetings. To me this is a form of expressing personal opinion that looks like you are endorsing certain Scientists/fields/meetings or, worse yet, that you are looking to solicit them to submit their work to your journal, which smacks of desperation and favoritism. Use personal accounts instead to tweet from meetings, since after all what is reported is a personal assessment.

These are just my first thoughts on this issue (anonymised to protect the guilty), which I hope will act as a springboard for others to comment below on how they think journals should manage their presence on Twitter for the benefit of the Scientific community.

Launch of the PLOS Text Mining Collection

Just a quick post to announce that the PLOS Text Mining Collection is now live!

This PLOS Collection arose out of a twitter conversation with Theo Bloom last year, and has come together through the hard work of the authors of the papers in the Collection, the PLOS Collections team (in particular Sam Moore and Jennifer Horsely), and my co-organizers Larry Hunter and Andrey Rzhetsky. Many thanks to all for seeing this effort to completion.

Because of the large body of work in the area of text mining published in PLOS, we struggled with how best to present all these papers in the collection without diluting the experience for the reader. In the end, we decided only to highlight new work from the last two years and major reviews/tutorials at the time of launch. However, as this is a living collection, new articles will be included in the future, and the aim is to include previously published work as well. We hope to see many more papers in the area of text mining published in the PLOS family of journals in the future.

An overview of the PLOS Text Mining Collection is below (cross-posted at the PLOS EveryONE blog) and a commentary on Collection is available at the Official PLOS Blog entitled “A mine of information – the PLOS Text Mining Collection“.

Background to the PLOS Text Mining Collection

Text Mining is an interdisciplinary field combining techniques from linguistics, computer science and statistics to build tools that can efficiently retrieve and extract information from digital text. Over the last few decades, there has been increasing interest in text mining research because of the potential commercial and academic benefits this technology might enable. However, as with the promises of many new technologies, the benefits of text mining are still not clear to most academic researchers.

This situation is now poised to change for several reasons. First, the rate of growth of the scientific literature has now outstripped the ability of individuals to keep pace with new publications, even in a restricted field of study. Second, text-mining tools have steadily increased in accuracy and sophistication to the point where they are now suitable for widespread application. Finally, the rapid increase in availability of digital text in an Open Access format now permits text-mining tools to be applied more freely than ever before.

To acknowledge these changes and the growing body of work in the area of text mining research, today PLOS launches the Text Mining Collection, a compendium of major reviews and recent highlights published in the PLOS family of journals on the topic of text mining. As one of the major publishers of the Open Access scientific literature, it is perhaps no coincidence that research in text mining in PLOS journals is flourishing. As noted above, the widespread application and societal benefits of text mining is most easily achieved under an Open Access model of publishing, where the barriers to obtaining published articles are minimized and the ability to remix and redistribute data extracted from text is explicitly permitted. Furthermore, PLOS is one of the few publishers who is actively promoting text mining research by providing an open Application Programming Interface to mine their journal content.

Text Mining in PLOS

Since virtually the beginning of its history [1], PLOS has actively promoted the field of text mining by publishing reviews, opinions, tutorials and dozens of primary research articles in this area in PLOS Biology, PLOS Computational Biology and, increasingly, PLOS ONE. Because of the large number of text mining papers in PLOS journals, we are only able to highlight a subset of these works in the first instance of the PLOS Text Mining Collection. These include major reviews and tutorials published over the last decade [1][2][3][4][5][6], plus a selection of research papers from the last two years [7][8][9][10][11][12][13][14][15][16][17][18][19] and three new papers arising from the call for papers for this collection [20][21][22].
The research papers included in the collection at launch provide important overviews of the field and reflect many exciting contemporary areas of research in text mining, such as:

  • methods to extract textual information from figures [7];
  • methods to cluster [8] and navigate [15] the burgeoning biomedical literature;
  • integration of text-mining tools into bioinformatics workflow systems [9];
  • use of text-mined data in the construction of biological networks [10];
  • application of text-mining tools to non-traditional textual sources such as electronic patient records [11] and social media [12];
  • generating links between the biomedical literature and genomic databases [13];
  • application of text-mining approaches in new areas such as the Environmental Sciences [14] and Humanities [16][17];
  • named entity recognition [18];
  • assisting the development of ontologies [19];
  • extraction of biomolecular interactions and events [20][21]; and
  • assisting database curation [22].

Looking Forward

As this is a living collection, it is worth discussing two issues we hope to see addressed in articles that are added to the PLOS text mining collection in the future: scaling up and opening up. While application of text mining tools to abstracts of all biomedical papers in the MEDLINE database is increasingly common, there have been remarkably few efforts that have applied text mining to the entirety of the full text articles in a given domain, even in the biomedical sciences [4][23]. Therefore, we hope to see more text mining applications scaled up to use the full text of all Open Access articles. Scaling up will maximize the utility of text-mining technologies and the uptake by end users, but also demonstrate that demand for access to full text articles exists by the text mining and wider academic communities.

Likewise, we hope to see more text-mining software systems made freely or openly available in the future. As an example of the state of affairs in the field, only 25% of the research articles highlighted in the PLOS text mining collection at launch provide source code or executable software of any kind [13][16][19][21]. The lack of availability of software or source code accompanying published research articles is, of course, not unique to the field of text mining. It is a general problem limiting progress and reproducibility in many fields of science, which authors, reviewers and editors have a duty to address. Making release of open source software the rule, rather than the exception, should further catalyze advances in text mining, as it has in other fields of computational research that have made extremely rapid progress in the last decades (such as genome bioinformatics).

By opening up the code base in text mining research, and deploying text-mining tools at scale on the rapidly growing corpus of full-text Open Access articles, we are confident this powerful technology will make good on its promise to catalyze scholarly endeavors in the digital age.

References

1. Dickman S (2003) Tough mining: the challenges of searching the scientific literature. PLoS biology 1: e48. doi:10.1371/journal.pbio.0000048.
2. Rebholz-Schuhmann D, Kirsch H, Couto F (2005) Facts from Text—Is Text Mining Ready to Deliver? PLoS Biol 3: e65. doi:10.1371/journal.pbio.0030065.
3. Cohen B, Hunter L (2008) Getting started in text mining. PLoS computational biology 4: e20. doi:10.1371/journal.pcbi.0040020.
4. Bourne PE, Fink JL, Gerstein M (2008) Open access: taking full advantage of the content. PLoS computational biology 4: e1000037+. doi:10.1371/journal.pcbi.1000037.
5. Rzhetsky A, Seringhaus M, Gerstein M (2009) Getting Started in Text Mining: Part Two. PLoS Comput Biol 5: e1000411. doi:10.1371/journal.pcbi.1000411.
6. Rodriguez-Esteban R (2009) Biomedical Text Mining and Its Applications. PLoS Comput Biol 5: e1000597. doi:10.1371/journal.pcbi.1000597.
7. Kim D, Yu H (2011) Figure text extraction in biomedical literature. PloS one 6: e15338. doi:10.1371/journal.pone.0015338.
8. Boyack K, Newman D, Duhon R, Klavans R, Patek M, et al. (2011) Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches. PLoS ONE 6: e18029. doi:10.1371/journal.pone.0018029.
9. Kolluru B, Hawizy L, Murray-Rust P, Tsujii J, Ananiadou S (2011) Using workflows to explore and optimise named entity recognition for chemistry. PloS one 6: e20181. doi:10.1371/journal.pone.0020181.
10. Hayasaka S, Hugenschmidt C, Laurienti P (2011) A network of genes, genetic disorders, and brain areas. PloS one 6: e20907. doi:10.1371/journal.pone.0020907.
11. Roque F, Jensen P, Schmock H, Dalgaard M, Andreatta M, et al. (2011) Using electronic patient records to discover disease correlations and stratify patient cohorts. PLoS computational biology 7: e1002141. doi:10.1371/journal.pcbi.1002141.
12. Salathé M, Khandelwal S (2011) Assessing Vaccination Sentiments with Online Social Media: Implications for Infectious Disease Dynamics and Control. PLoS Comput Biol 7: e1002199. doi:10.1371/journal.pcbi.1002199.
13. Baran J, Gerner M, Haeussler M, Nenadic G, Bergman C (2011) pubmed2ensembl: a resource for mining the biological literature on genes. PloS one 6: e24716. doi:10.1371/journal.pone.0024716.
14. Fisher R, Knowlton N, Brainard R, Caley J (2011) Differences among major taxa in the extent of ecological knowledge across four major ecosystems. PloS one 6: e26556. doi:10.1371/journal.pone.0026556.
15. Hossain S, Gresock J, Edmonds Y, Helm R, Potts M, et al. (2012) Connecting the dots between PubMed abstracts. PloS one 7: e29509. doi:10.1371/journal.pone.0029509.
16. Ebrahimpour M, Putniņš TJ, Berryman MJ, Allison A, Ng BW-H, et al. (2013) Automated authorship attribution using advanced signal classification techniques. PLoS ONE 8: e54998. doi:10.1371/journal.pone.0054998.
17. Acerbi A, Lampos V, Garnett P, Bentley RA (2013) The Expression of Emotions in 20th Century Books. PLoS ONE 8: e59030. doi:10.1371/journal.pone.0059030.
18. Groza T, Hunter J, Zankl A (2013) Mining Skeletal Phenotype Descriptions from Scientific Literature. PLoS ONE 8: e55656. doi:10.1371/journal.pone.0055656.
19. Seltmann KC, Pénzes Z, Yoder MJ, Bertone MA, Deans AR (2013) Utilizing Descriptive Statements from the Biodiversity Heritage Library to Expand the Hymenoptera Anatomy Ontology. PLoS ONE 8: e55674. doi:10.1371/journal.pone.0055674.
20. Van Landeghem S, Bjorne J, Wei C-H, Hakala K, Pyysal S, et al. (2013) Large-Scale Event Extraction from Literature with Multi-Level Gene Normalization. PLOS ONE 8: e55814. doi:10.1371/journal.pone.0055814
21. Liu H, Hunter L, Keselj V, Verspoor K (2013) Approximate Subgraph Matching-based Literature Mining for Biomedical Events and Relations. PLoS ONE 8(4): e60954. doi:10.1371/journal.pone.0060954
22. Davis A, Weigers T, Johnson R, Lay J, Lennon-Hopkins K, et al. (2013) Text mining effectively scores and ranks the literature for improving chemical-gene-disease curation at the Comparative Toxicogenomics Database. PLOS ONE 8: e58201. doi:10.1371/journal.pone.0058201
23. Bergman CM (2012) Why Are There So Few Efforts to Text Mine the Open Access Subset of PubMed Central? https://caseybergman.wordpress.com/2012/03/02/why-are-there-so-few-efforts-to-text-mine-the-open-access-subset-of-pubmed-central/.

Suggesting Reviewers in the Era of arXiv and Twitter

Along with many others in the evolutionary genetics community, I’ve recently converted to using arXiv as a preprint server for new papers from my lab. In so doing, I’ve confronted an unexpected ethical question concerning pre-printing and the use of social media, which I was hoping to generate some discussion about as this practice becomes more common in the scientific community. The question concerns the suggestion of reviewers for a journal submission of a paper that has previously been submitted to arXiv and then subsequently discussed on social media platforms like Twitter. Specifically put, the question is: is it ethical to suggest reviewers for a journal submission based on tweets about your arXiv preprint?

To see how this ethical issue arises, I’ll first describe my current workflow for submitting to arXiv and publicizing it on Twitter. Then, I’ll propose an alternative that might be considered to be “gaming” the system, and discuss precedents in the pre-social media world that might inform the resolution of this issue.

My current workflow for submission to arXiv and announcement on twitter is as follows:

  1. submit manuscript to a journal with suggested reviewers based on personal judgement;
  2. deposit the same version of the manuscript that was submitted to journal in arXiv;
  3. wait until arXiv submission is live and then tweet links to the arXiv preprint.

From doing this a few times (as well as benefiting from additional Twitter exposure via Haldane’s Sieve), I’ve realized that there can often be fairly substantive feedback about an arXiv submission via twitter in the form of who (re)tweets links to it and what people are saying about the manuscript. It doesn’t take much thought to realize that this information could potentially be used to influence a journal submission in the form of which reviewers to suggest or oppose using an alternative workflow:

  1. submit manuscript to arXiv;
  2. wait until arXiv submission is live and then tweet about it;
  3. moniter and assimilate feedback from Twitter;
  4. submit manuscript to journal with suggested and opposed reviewers based on Twitter activity.

This second workflow incidentally also arises under the first workflow if your initial journal submission is rejected, since there would naturally be a time lag in which it would be difficult to fully ignore activity on Twitter about an arXiv submission.

Now, I want to be clear that I haven’t and don’t intend to use the second workflow (yet), since I have not fully decided if this an ethical approach to suggesting reviewers. Nevertheless, I lean towards the view that it is no more or less ethical than the current mechanisms of selecting suggested reviewers based on: (1) perceived allies/rivals with relevant expertise or (2) informal feedback on the work in question presented at meetings.

In the former case of using who you perceive to be for or against your work, you are relying on personal experience and subjective opinions about researchers in your field, both good and bad, to inform your choice of suggested or opposed reviewers. This is some sense no different qualitatively to using information on Twitter prior to journal submission, but is instead based on a closed network using past information, rather than an open network using information specific to the piece of work in question. The latter case of suggesting reviewers based on feedback from meeting presentations is perhaps more similar to the matter at hand, and I suspect would be considered by most scientists to be a perfectly valid mechanism to suggest or oppose reviewers for a journal submission.

Now, of course I recognize that suggested reviewers are just that, and editors can use or ignore these suggestions as they wish, so this issue may in fact be moot. However, based on my experience, suggested reviewers are indeed frequently used by editors (if not, why would they be there?). Thus resolving whether smoking out opinions on Twitter is considered “fair play” is probably something the scientific community should consider more thoroughly in the near future, and I’d be happy to hear what other folks think about this in the comments below.

Goodbye F1000, Hello Faculty of a Million

Dr. Seuss' The Sneetches

In the children’s story The Sneetches, Dr. Suess’ presents a world where certain members of society are marked by an arbitrary badge of distinction, and a canny opportunist uses this false basis of prestige for his financial gain*. What does this morality tale have to do with the scientific article recommendation service Faculty of 1000?  Read on…

Currently ~3000 papers are published each day in the biosciences**. Navigating this sea of information to find articles relevant to your work is no small matter. Researchers can either sink or swim with the aid of (i) machine-based technologies based on search or text-mining tools or (ii) human-based technologies like blogs or social networking services that highlight relevant work through expert recommendation.

One of the first expert recommendation services was Faculty of 1000, a service launched in 2002 with the aim of “identifying and evaluating the most significant articles from biomedical research publications” though a peer-nominated “Faculty” of experts in various subject domains. Since the launch of F1000, several other mechanisms for expert literature recommendation have also come to the foreground, including academic social bookmarking tools like citeulike or Mendeley, the rise of Research Blogging, and new F1000-like services such as annotatr, The Third Reviewer PaperCritic and TiNYARM.

Shortly after I started my group at the University of Manchester in 2005 I was invited to join the F1000 Faculty, which I gratefully accepted. At the time, I felt that it was a mark of distinction to be invited into this select club, since I felt that it would be a good platform to voice my opinions on what work I thought was notable. I was under no illusion that my induction was based only on merit, since this invitation came from my former post-doc mentor Michael Ashburner. I overlooked this issue at the time, since when you are invited to join the “in-club” as a junior faculty member, it is very tempting since you think things like this will play a positive role in your career progression. [Whether being in F1000 has helped my career I can’t say, but certainly it can’t have hurt, and I (sheepishly) admit to using it on grant and promotion applications in the past.]

Since then, I’ve tried to contribute to F1000 when I can [PAYWALL], but since it is not a core part of my job, I’ve only contributed ~15 reviews in 5 years. My philosophy has been only to contribute reviews on articles I think are of particular note and might be missed otherwise, not to review major papers in Nature/Science that everyone is already aware of. As time has progressed and it has become harder to commit time to non-essential tasks, I’ve contributed less and less, and the F1000 staff has pestered me frequently with reminders and phone calls to submit reviews. At times the pestering has been so severe that I have considered resigning just to get them off my back. And I’ve noticed that some colleagues I have a lot of respect for have also resigned from F1000, which made me wonder if they were likewise fed up with F1000’s nagging.

This summer, while reading a post on the Tree of Life blog, Jonathan Eisen made a parenthetical remark about quitting F1000, which made me more aware of why their nagging was really getting to me:

I even posted a “dissent” regarding one of [Paul Hebert’s] earlier papers on Faculty of 1000 (which I used to contribute to before they become non open access).

This comment made me realize that the F1000 recommendation service is just another closed-access venture for publishers to make money off a product generated for free by the goodwill and labor of academics. Like closed access journals, my University pays twice to get F1000 content — once for my labor and once for the subscription to the service. But unlike a normal closed-access journal, in the case of F1000 there is not even a primary scientific publication to justify the arrangement. So by contributing to F1000, essentially I take time away from my core research and teaching activities to allow a company to commercialize my IP and pay someone to nag me! What’s even more strange about this situation is that there is no rational open-access equivalent of literature review services like F1000. By analogy with the OA publishing of the primary literature, for “secondary” services I would pay a company to post one of my reviews on someone else’s article. (Does Research Blogging for free sound like a better option to anyone?)

Thus I’ve come to realize that is unjustified to contribute secondary commentary to F1000 on Open Access grounds, in the same way it is unjustified to submit primary papers to closed-access journals. If I really support Open Access publishing, then to contribute to F1000 I must either must either be a hypocrite or make an artificial distinction between the primary and secondary literature. But this gets to the crux of the matter: to the extent that recommendation services like F1000 are crucial for researchers to make sense of the onslaught of published data, then surely these critical reviews should be Open for all, just as the primary literature should be. On the other hand, if such services are not crucial, why am I giving away my IP for free to a company to capitalize on?

Well, this question has been on my mind for a while and I have looked into whether there might be evidence that F1000 evaluations have a real scientific worth in terms of highlighting good publications that might provide a reason to keep contributing to the system. On this point the evidence is scant and mixed. An analysis by the Wellcome Trust finds a very weak correlation between F1000 evaluations and the evaluations of an internal panel of experts (driven almost entirely by a few clearly outstanding papers), with the majority of highly cited papers being missed by F1000 reviewers. An analysis by the MRC shows a ~2-fold increase in the median number of citations (from 2 to 4) for F1000 reviewed articles relative to other MRC-funded research. Likewise, an analysis of the Ecology literature shows similar trends, with marginally higher citation rates for F1000 reviewed work, but with many high impact papers being missed. [Added 28 April 2012: Moreover, multifactorial analysis by Priem et al on a range of altmetric measures of impact for 24,331 PLoS articles clearly shows that the “F1000 indicator did not have shared variability with any of the derived factors” and that “Mendeley bookmark counts correlate more closely to Web of Science citations counts than expert ratings of F1000”.] Therefore the available evidence indicates that F1000 reviews do not capture the majority of good work being published, and the work that is reviewed is only of marginally higher importance (in terms of citation) than unreviewed work.

So if (i) it goes against my OA principles, (ii) there is no evidence (on average) that my opinion matters quantitatively much more than anyone else’s, and (iii) there are equivalent open access systems to use, why should I continue contributing to F1000? The only answer I can come up with is that by being a F1000 reviewer, I gain a certain prestige for being in the “in club,” as well as by some prestige-by-association for aligning myself with publications or scientists I perceive to be important. When stripped down like this, being a member of F1000 seems pretty close to being a Sneetch with a star, and that the F1000 business model is not too different than that used by Sylvester McMonkey McBean. Realizing this has made me feel more than a bit ashamed for letting the allure of being in the old-boys club and my scientific ego trick me into something I cannot rationally justify.

So, needless to say I have recently decided to resign from F1000. I will instead continue to contribute my tagged articles to citeulike (as I have for several years) and contribute more substantial reviews to this blog via the Research Blogging portal and push the use of other Open literature recommendation systems like PaperCritic, who have recently made their user-supplied content available under a Creative Commons license. (Thanks for listening PaperCritic!).

By supporting these Open services rather than the closed F1000 system (and perhaps convincing others to do the same) I feel more at home among the ranks of the true crowd-sourced “Faculty of 1,000,000” that we need to help filter the onslaught of publications. And just as Sylvester McMonkey McBean’s Star-On machine provided a disruptive technology for overturning perceptions of prestige by giving everyone a star in The Sneetches, I’m hopeful that these open-access web 2.0 systems will also do some good towards democratizing personal recommendation of the scientific literature.

* Note: This post should in no way be taken as an ad hominem against F1000 or its founder Vitek Tracz, who I respect very much as a pioneer of Open Access biomedical publishing

** This number is an estimate based on the real figure of ~2.5K papers/day in deposited in MEDLINE, extrapolated to the large number of non-biomedical journals that are not indexed by MEDLINE.  If any has better data on this, please comment below.