Nominations for the Benjamin Franklin Award for Open Access in the Life Sciences

Earlier this week I recieved an email with the annual call for nominations for the Benjamin Franklin Award for Open Access in the Life Sciences. While I am in general not that fussed about the importance of acadamic accolades, I think this a great award since it recognizes contributions in a sub-discipne of biology — computational biology, or bioinformatics — that are specifically done in the spririt of open innovation. By placing the emphasis on recognizing openness as an achievement, the Franklin Award goes beyond other related honors (such as those awarded by the International Society for Computational Biology) and, in my view, captures the essence of the true spirit of what scientists should be striving for in their work.

In looking over the past recipients, few would argue that the award has not been given out to major contributors to the open source/open access movements in biology. In thinking about who might be appropriate to add to this list, two people sprang to mind who I’ve had the good fortune to work with in the past, both of whom have made a major impresion on my (and many others’) thinking and working practices in computational biology.  So without further ado, here are my nominations for the 2012 Benjamin Franklin Award for Open Access in the Life Sciences (in chronological order of my interaction with them)…

Suzanna Lewis

Suzanna Lewis (Lawrence Berkeley National Laboratory) is one of the pioneers of developing open standards and software for genome annotation and ontologies. She led the team repsonsible for the systematic annotation of the Drosophila melanogaster genome, which included development of the Gadfly annotation pipeline and database framework, and the annotation curation/visualization tool Apollo. Lewis’ work in genome annotation also includes playing instrumental roles in the GASP community assessement exercises to evaluate the state of the art in genome annotation, development of the Gbrowser genome browser, and the data coordination center for modENCODE project. In addition to her work in genome annotation, Lewis has been a leader in the development of open biological ontologies (OBO, NCBO), contributing to the Gene Ontology, Sequence Ontology, and Uberon anatomy ontologies, and developing open software for editing and navigating ontologies (AmiGO, OBO-Edit, and Phenote).

Carole Goble

Carole Goble (University of Manchester) is widely recognized as a visionary in the development of software to support automated workflows in biology. She has been a leader of the myGrid and Open Middleware Infrastructure Institute consortia, which have generated a large number of highly innovative open resources for e-research in the life sciences including the Taverna Workbench for developing and deploying workflows, the BioCatalogue registry of bioinformatics web services, and the social-networking inspired myExperiment workflow repository. Goble has also played an instrumental role in the development of semantic-web tools for constructing and analyzing life science ontologies, the development of ontologies for describing bioinformatics resources, as well as ontology-based tools such as RightField for managing life science data.

I hope others join me in acknowledging the outputs of these two open innovators as being more than worthy of the Franklin Award, support their nomination, and cast votes in their favor this year and/or in years to come!

Why the Research Works Act Doesn’t Affect Text-mining Research

As the the central digital repository for life science publications, PubMed Central (PMC) is one of the most significant resources for making the Open Access movement a tangible reality for researchers and citizens around the world. Articles in PMC are deposited through two routes: either automatically by journals that participate in the PMC system, or directly by authors for journals that do not. Author deposition of peer-reviewed manuscripts in PMC is mandated by funders in order to make the results of publicly- or charity-funded research maximally accessible, and has led to over 180,000 articles being made free (gratis) to download that would otherwise be locked behind closed-access paywalls. Justifiably, there has been outrage over recent legislation (the Research Works Act) that would repeal the NIH madate in the USA and thereby prevent important research from being freely available.

However, from a text-miner’s perspective author-deposited manuscripts in PMC are closed access since, while they can be downloaded and read individually, virtually none (<200) are available from the PMC’s Open Access subset that includes all articles that are free (libre) to download in bulk and text/data mine. This includes ~99% of the author deposited manuscripts from the journal Nature, despite a clear statement from 2009 entitled “Nature Publishing Group allows data- and text-mining on self-archived manuscripts”. In short, funder mandates only make manuscripts public but not open, and thus whether the RWA is passed or not is actually moot from a text/data-mining perspective.

Why is this important? The simple reason is that there are currently only ~400,000 articles in the PMC Open Access subset, and therefore author-deposited manuscripts are only two-fold less abundant than all articles currently available for text/data-mining. Thus what could be a potentially rich source of data for large-scale information extraction remains locked away from programmatic analysis. This is especially tragic considering the fact that at the point of manuscript acceptance, publishers have invested little-to-nothing into the publication process and their claim to copyright is most tenuous.

So instead of discussing whether we should support the status quo of weak funder mandates by working to block the RWA or expand NIH-like mandates (e.g. as under the Federal Research Public Access Act, FRPAA), the real discussion that needs to be had is how to make funder mandates stronger to insist (at a minimum) that author-deposited manuscripts be available for text/data-mining research. More-of-the same, not matter how much, only takes us half the distance towards the ultimate goals of the Open Access movement, and doesn’t permit the crucial text/data mining research that is needed to make sense of the deluge of information in the scientific literature.

Credits: Max Haussler for making me aware of the lack of author manuscripts in PMC a few years back, and Heather Piwowar for recently jump-starting the key conversation on how to push for improved text/data mining rights in future funder mandates.

