Archive for January, 2012

A Open Archive of My F1000 Reviews

Following on from a recent conversation with David Stephens on Twitter about my decision to resign from Faculty of 1000, F1000 has clarified their terms for the submission of evaluations and confirmed that it is permissible to “reproduce personal evaluations on institutional & personal blogs if you clearly reference F1000″.

As such, I am delighted to be able to repost here an Open Archive of my F1000 contributions. Additionally, this post acts in a second capacity as my first contribution to the Research Blogging Network. Hopefully these commentraies will be of interest to some, and should add support to the Altmetrics profiles for these papers through systems like Total Impact.

Nelson CE, Hersh BM, & Carroll SB (2004). The regulatory content of intergenic DNA shapes genome architecture. Genome biology, 5 (4) PMID: 15059258

My review: This article reports that genes with complex expression have longer intergenic regions in both D. melanogaster and C. elegans, and introduces several innovative and complementary approaches to quantify the complexity of gene expression in these organisms. Additionally, the structure of intergenic DNA in genes with high complexity (e.g. receptors, specific transcription factors) is shown to be longer and more evenly distributed over 5′ and 3′ regions in D. melanogaster than in C. elegans, whereas genes with low complexity (e.g. metabolic genes, general transcription factors) are shown to have similar intergenic lengths in both species and exhibit no strong differences in length between 5′ and 3′ regions. This work suggests that the organization of noncoding DNA may reflect constraints on transcriptional regulation and that gene structure may yield insight into the functional complexity of uncharacterized genes in compact animal genomes. (@F1000:

Li R, Ye J, Li S, Wang J, Han Y, Ye C, Wang J, Yang H, Yu J, Wong GK, & Wang J (2005). ReAS: Recovery of ancestral sequences for transposable elements from the unassembled reads of a whole genome shotgun. PLoS computational biology, 1 (4) PMID: 16184192

My review: This paper presents a novel method for automating the laborious task of constructing libraries of transposable element (TE) consensus sequences. Since repetitive TE sequences confound whole-genome shotgun (WGS) assembly algorithms, sequence reads from TEs are initially screened from WGS assemblies based on overrepresented k-mer frequencies. Here, the authors invert the same principle, directly identifying TE consensus sequences from those same reads containing high frequency k-mers. The method was shown to identify all high copy number TEs and increase the effectiveness of repeat masking in the rice genome. By circumventing the inherent difficulties of TE consensus reconstruction from erroneously assembled genome sequences, and by providing a method to identify TEs prior to WGS assembly, this method provides a new strategy to increase the accuracy of WGS assemblies as well as our understanding of the TEs in genome sequences. (@F1000:

Rifkin SA, Houle D, Kim J, & White KP (2005). A mutation accumulation assay reveals a broad capacity for rapid evolution of gene expression. Nature, 438 (7065), 220-3 PMID: 16281035

My review: This paper reports empirical estimates of the mutational input to gene expression variation in Drosophila, knowledge of which is critical for understanding the mechanisms governing regulatory evolution. These direct estimates of mutational variance are compared to gene expression differences across species, revealing that the majority of genes have lower expression divergence than is expected if evolving solely by mutation and genetic drift. Mutational variances on a gene-by-gene basis range over several orders of magnitude and are shown to vary with gene function and developmental context. Similar results in C. elegans [1] provide strong support for stabilizing selection as the dominant mode of gene expression evolution. (@F1000:

References: 1. Denver DR, Morris K, Streelman JT, Kim SK, Lynch M, & Thomas WK (2005). The transcriptional consequences of mutation and natural selection in Caenorhabditis elegans. Nature genetics, 37 (5), 544-8 PMID: 15852004

Caspi A, & Pachter L (2006). Identification of transposable elements using multiple alignments of related genomes. Genome research, 16 (2), 260-70 PMID: 16354754

My review: This paper reports an innovative strategy for the de novo detection of transposable elements (TEs) in genome sequences based on comparative genomic data. By capitalizing on the fact that bursts of TE transposition create large insertions in multiple genomic locations, the authors show that detection of repeat insertion regions (RIRs) in alignments of multiple Drosophila genomes has high sensitivity to identify both individual instances and families of known TEs. This approach opens a new direction in the field of repeat detection and provides added value to TE annotations by placing insertion events in a phylogenetic context. (@F1000

Simons C, Pheasant M, Makunin IV, & Mattick JS (2006). Transposon-free regions in mammalian genomes. Genome research, 16 (2), 164-72 PMID: 16365385

My review: This paper presents an intriguing analysis of transposon-free regions (TFRs) in the human and mouse genomes, under the hypothesis that TFRs indicate genomic regions where transposon insertion is deleterious and removed by purifying selection. The authors test and reject a model of random transposon distribution and investigate the properties of TFRs, which appear to be conserved in location across species and enriched for genes (especially transcription factors and micro-RNAs). An alternative mutational hypothesis not considered by the authors is the possibility for clustered transposon integration (i.e. preferential insertion into regions of the genome already containing transposons), which may provide a non-selective explanation for the apparent excess of TFRs in the human and mouse genomes. (@F1000:

Wheelan SJ, Scheifele LZ, Martínez-Murillo F, Irizarry RA, & Boeke JD (2006). Transposon insertion site profiling chip (TIP-chip). Proceedings of the National Academy of Sciences of the United States of America, 103 (47), 17632-7 PMID: 17101968

My review: This paper demonstrates the utility of whole-genome microarrays for the high-throughput mapping of eukaryotic transposable element (TE) insertions on a genome-wide basis. With an experimental design guided by first computationally digesting the genome into suitable fragments, followed by linker-PCR to amplify TE flanking regions and subsequent hybridization to tiling arrays, this method was shown to recover all detectable TE insertions with essentially no false positives in yeast. Although limited to species with available genome sequences, this approach circumvents inefficiencies and biases associated with the alternative of whole-genome shotgun resequencing to detect polymorphic TEs on a genome-wide scale. Application of this or related technologies (e.g. [1]) to more complex genomes should fill gaps in our understanding of the contribution of TE insertions to natural genetic variation. (@F1000:

References: 1. Gabriel A, Dapprich J, Kunkel M, Gresham D, Pratt SC, & Dunham MJ (2006). Global mapping of transposon location. PLoS genetics, 2 (12) PMID: 17173485

Haag-Liautard C, Dorris M, Maside X, Macaskill S, Halligan DL, Houle D, Charlesworth B, & Keightley PD (2007). Direct estimation of per nucleotide and genomic deleterious mutation rates in Drosophila. Nature, 445 (7123), 82-5 PMID: 17203060

My review: This paper presents the first direct estimates of nucleotide mutation rates across the Drosophila genome derived from mutation accumulation experiments. By using DHPLC to scan over 20 megabases of genomic DNA, the authors obtain several fundamental results concerning mutation at the molecular level in Drosophila: SNPs are more frequent than indels; deletions are more frequent than insertions; mutation rates are similar across coding, intronic and intergenic regions; and mutation rates may vary across genetic backgrounds. Results in D. melanogaster contrast with those obtained from mutation accumulation experiments in C. elegans (see [1], where indels are more frequent than SNPs, and insertions are more frequent than deletions), indicating that basic mutation processes may vary across metazoan taxa. (@F1000:

References: 1. Denver DR, Morris K, Lynch M, & Thomas WK (2004). High mutation rate and predominance of insertions in the Caenorhabditis elegans nuclear genome. Nature, 430 (7000), 679-82 PMID: 15295601

Katzourakis A, Pereira V, & Tristem M (2007). Effects of recombination rate on human endogenous retrovirus fixation and persistence. Journal of virology, 81 (19), 10712-7 PMID: 17634225

My review: This study shows that the persistence, but not the integration, of long-terminal repeat (LTR) containing human endogenous retroviruses (HERVs) is associated with local recombination rate, and suggests a link between intra-strand homologous recombination and meiotic exchange. This inference about the mechanisms controlling the transposable element (TE) abundance is obtained by demonstrating that total HERV density (full-length elements plus solo LTRs) is not correlated with recombination rate, whereas the ratio of full-length HERVs relative to solo LTRs is. This work relies critically on advanced computational methods to join TE fragments, demonstrating the need for such algorithms to make accurate inferences about the evolution of mobile DNA and to reveal new insights into genome biology. (@F1000:

Giordano J, Ge Y, Gelfand Y, Abrusán G, Benson G, & Warburton PE (2007). Evolutionary history of mammalian transposons determined by genome-wide defragmentation. PLoS computational biology, 3 (7) PMID: 17630829

My review: This article reports the first comprehensive stratigraphic record of transposable element (TE) activity in mammalian genomes based on several innovative computational methods that use information encoded in patterns of TE nesting. The authors first develop an efficient algorithm for detecting nests of TEs by intelligently joining TE fragments identified by RepeatMasker, which (in addition to providing an improved genome annotation) outputs a global “interruption matrix” that can be used by a second novel algorithm which generates a chronological ordering of TE activity by minimizing the nesting of young TEs into old TEs. Interruption matrix analysis yields results that support previous phylogenetic analyses of TE activity in humans but are not dependent on the assumption of a molecular clock. Comparison of the chronological orders of TE activity in six mammalian genomes provides unique insights into the ancestral and lineage-specific record of global TE activity in mammals. (@F1000:

Schuemie MJ, & Kors JA (2008). Jane: suggesting journals, finding experts. Bioinformatics (Oxford, England), 24 (5), 727-8 PMID: 18227119

My review: This paper introduces a fast method for finding related articles and relevant journals/experts based on user input text and should help improve the referencing, review and publication of biomedical manuscripts. The JANE (Journal/Author Name Estimator) method uses a standard word frequency approach to find similar documents, then adds the scores in the top 50 records to produce a ranked list of journals or authors. Using either the abstract or full-text, JANE suggested quite sensible journals and authors in seconds for a manuscript we have in press, while the related eTBLAST method [1] failed to complete while I wrote this review. JANE should prove to be a very useful text mining tool for authors and editors alike. (@F1000:

References: 1. Errami M, Wren JD, Hicks JM, & Garner HR (2007). eTBLAST: a web server to identify expert reviewers, appropriate journals and similar publications. Nucleic acids research, 35 (Web Server issue) PMID: 17452348

Pask AJ, Behringer RR, & Renfree MB (2008). Resurrection of DNA function in vivo from an extinct genome. PloS one, 3 (5) PMID: 18493600

My review: This paper reports the first transgenic analysis of a cis-regulatory element cloned from an extinct species. Although no differences were seen in the expression pattern of the collagen (Col2A1) enhancer from the extinct Tasmanian tiger and extant mouse, this work is an important proof of principle for using ancient DNA in the evolutionary analysis of gene regulation. (@F1000:

Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, & Brilliant L (2009). Detecting influenza epidemics using search engine query data. Nature, 457 (7232), 1012-4 PMID: 19020500

My review: A landmark paper in health bioinformatics demonstrating that Google searches can predict influenza trends in the United States. Predicting infectious disease outbreaks currently relies on patient reports gathered through clinical settings and submitted to government agencies such as the CDC. The possible use of patient “self-reporting” through internet search queries offers unprecedented real-time access to temporal and regional trends in infectious diseases. Here, the authors use a linear modeling strategy to learn which Google search terms best correlate with regional trends in influenza-related illness. This model explains flu trends over a 5 year period with startling accuracy, and was able to predict flu trends during 2007-2008 with a 1-2 week lead time ahead of CDC reports. The phenomenal use of crowd-based predictive health informatics revolutionizes the role of the internet in biomedical research and will likely set an important precedent in many areas of natural sciences. (@F1000:

Taher L, & Ovcharenko I (2009). Variable locus length in the human genome leads to ascertainment bias in functional inference for non-coding elements. Bioinformatics (Oxford, England), 25 (5), 578-84 PMID: 19168912

My review: This paper raises the important observation that differences in the length of genes can bias their functional classification using the Gene Ontology, and provides a simple method to correct for this inherent feature of genome architecture. A basic observation of genome biology is that genes differ widely in their size and structure within and between species. Understanding the causes and consequences of this variation in gene structure is an open challenge in genome biology. Previously, Nelson and colleagues [1] have shown, in flies and worms, that the length of intergenic regions is correlated with the regulatory complexity of genes and that genes from different Gene Ontology (GO) categories have drastically different lengths. Here, Taher and Ovcharenko confirm this observation of functionally non-random gene length in the human genome, and discuss the implications of this feature of genome organization on analyses that employ the GO for functional inference. Specifically, these authors show that random selection of noncoding DNA sequences from the human genome leads to the false inference of over- and under-representation of specific GO categories that preferentially contain longer or shorter genes, respectively. This finding has important implications for the large number of studies that employ a combination of gene expression microarrays and GO enrichment analysis, since gene expression is largely controlled by noncoding DNA. The authors provide a simple method to correct for this bias in GO analyses, and show that previous reports of the enrichment of “ultraconserved” noncoding DNA sequences in vertebrate developmental genes [2] may be a statistical artifact. (@F1000:

References: 1. Nelson CE, Hersh BM, & Carroll SB (2004). The regulatory content of intergenic DNA shapes genome architecture. Genome biology, 5 (4) PMID: 15059258

2. Bejerano G, Pheasant M, Makunin I, Stephen S, Kent WJ, Mattick JS, & Haussler D (2004). Ultraconserved elements in the human genome. Science (New York, N.Y.), 304 (5675), 1321-5 PMID: 15131266

Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, & Manolio TA (2009). Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proceedings of the National Academy of Sciences of the United States of America, 106 (23), 9362-7 PMID: 19474294

My review: This article introduces results from human genome-wide association studies (GWAS) into the realm of large-scale functional genomic data mining. These authors compile the first curated database of trait-associated single-nucleotide polymorphisms (SNPs) from GWAS studies ( that can be mined for general features of SNPs underlying phenotypes in humans. By analyzing 531 SNPs from 151 GWAS studies, the authors discover that trait-associated SNPs are predominantly in non-coding regions (43% intergenic, 45% intronic), but that non-synonymous and promoter trait-associated SNPs are enriched relative to expectations. The database is actively maintained and growing, and currently contains 3943 trait-associated SNPs from 796 publications. This important resource will facilitate data mining and integration with high-throughput functional genomics data (e.g. ChIP-seq), as well as meta-analyses, to address important questions in human genetics, such as the discovery of loci that affects multiple traits. While the interface to the GWAS catalog is rather limited, a related project ( [1] provides a much more powerful interface for searching and browsing data from the GWAS catalog. (@F1000:

References: 1. Thorisson GA, Lancaster O, Free RC, Hastings RK, Sarmah P, Dash D, Brahmachari SK, & Brookes AJ (2009). HGVbaseG2P: a central genetic association database. Nucleic acids research, 37 (Database issue) PMID: 18948288

Tamames J, & de Lorenzo V (2010). EnvMine: a text-mining system for the automatic extraction of contextual information. BMC bioinformatics, 11 PMID: 20515448

My review: This paper describes EnvMine, an innovative text-mining tool to obtain physico-chemical and geographical information about environmental genomics samples. This work represents a pioneering effort to apply text-mining technologies in the domain of ecology, providing novel methods to extract the units and variables of physico-chemical entities, as well as link the location of samples to worldwide geographic coordinates via Google Maps. Application of EnvMine to full-text articles in the environmental genomics database envDB {1} revealed very high system performance, suggesting that information extracted by EnvMine will be of use to researchers seeking meta-data about environmental samples across different domains of biology. (@F1000:

References: 1. Tamames J, Abellán JJ, Pignatelli M, Camacho A, & Moya A (2010). Environmental distribution of prokaryotic taxa. BMC microbiology, 10 PMID: 20307274

On Refusing to Review for Chromosome Research

As I have been doing for the last few years, I recently declined a review invitation from the Springer-owned closed access journal Chromosome Research using The Asburner Response. In the past, I have had mainly positive responses from editor’s concerning my decision not to review for them.

However the most recent reply I received from Dr. Herbet Macgregor troubled me (below), since I think it reveals many of the misconceptions and challenges (see points 2 and 9 especially) that we are up against in terms of getting our peers and colleagues to understand the principles of the Open Access movement. Nevertheless, it is clear that my refusal to review in this case has made some small progress — with Springer’s Open Access policy being more clearly displayed on the Chromosome Research home page, and internal dialogue about the Open Access issue being raised by the journal staff.

I will continue to post similar refusual replies, so that the dialogue on this issue is made more transparent for those involved in the Open Access movement.

And remember to keep the closed access reviewer boycott alive after the SOPA/PIPA/RWA madness is behind us!


From: Herbert Macgregor <>
Date: 19 January 2012 18:34:18 GMT
To: <>
Cc: <>, Walther Traut <>, “Butler, Peter, Springer SBM NL” <>
Subject: Fw: CR

Dear Dr. Bergman,

Professor Walther Traut, one of our Associate Editors has passed your comments on reviewing and Open Access journals to me for comment.

I have the impression that you have joined the current movement in the USA to persuade major science publishers that science should be free for all rather than for sale: a principle that I and my colleagues on Chromosome Research strongly uphold, of course.  I would, however, like to make the following points about your comments to Professor Traut.

(1) How do we manage to get up to 700 downloads of individual papers in just one month if nobody is supposed to be reading them because they’re not OA? look at our website and frequently downloaded papers.

(2)  Where do authors get the money to pay $3000 to pay for OA?  Answer – the taxpayer or the chariity!  Publication in CR is FREE.

(3)  CR does offer Open Choice at the same cost as it would charge for Open Access (too much, I agree, but we’re working on that) – so why do only about 5% of our authors opt for this?

(4) All our authors are recommended to place their papers in their Institutional Repositories and most do (many are required to do so).  They are then freely available world-wide through any of the major search engines – and nobody has to pay anything at all!

(5)  The publication programme for CR is currently incompatible with OA, mainly on account of our Special Issues – which happen to be freely downloadable, courtesy of our publisher.

(6)  The matter of OA for CR is currently under discussion between ourselves (editors) and Springer sbm, with the interests of our authors, readers and chromosome science very much at heart. We are very well aware of all the pros and cons and moral principles and I am confident that our future policy will be very much in keeping with all that is best in modern science publishing.

(7) We are constantly watching the OA debate and we are able to direct Chromosome Research within the framework of modern trends in science publishing.

(8) Although  some of the major science publishers may seem to be profiting from their publication of taxpayer funded science, they nonetheless perform an extremely  valuable and important role and are currently working hard with editors and scientists to adjust to the opportunities and demands of the internet age.

(9) I would suggest that to use the crusade for OA as an excuse for refusing to review is in denial of the responsibilties of a publicly or charitably funded academic and scientist.  OA or not, peer review is at the very epicentre of scientific progress amd we should uphold it, come what may.

(10) We, the editors of CR, would welcome further debate with you, although we would naturally prefer that you devote your valuable time and expertise to helping us with the assessment of manuscripts submitted to our journal.

(11) You may not know that BMC is now owned by Springer, so you can be confident that the incentive to “modernise” is strong!

We are grateful to you for alerting us to the fact that we nowhere declare publicly our journal’s policy with regard to OA.  This we should do without delay.  The essence of it is that we offer authors Open Choice and we encourage them to place their  papers on their institution’s repository. The only restriction on the latter is that they may not use the publisher’s final pdf, which I am sure you will agree is fair enough.  All our Special Issue papers are freely downloadable.

Lastly and from am entirely personal standpoint, you should understand that for/against OA is not a simple black/white issue. I have been publishing in science journals for 53 years and editing them for 40 years and I would ask you to give persons like myself credit for understanding and acting in the best interests of our profession.

With best wishes

Herbert Macgregor
Chromosome Research

cc Conly Rieder

cc Prof. Walther Traut
Associate Editor

Why Doesn’t the Ecological Society of America Allow Their Open Access Content to be Text Mined?

A recent tweet from Todd Vision and blog post by Jonathan Eisen’s have alerted me to the shameful defense of the status quo in scientific publishing advanced by the the Ecological Society of America concerning the Office of Science and Technology Policy’s recent request for information on Open Access. This particular thread caught my eye because I still have fresh bruises from being denied access to Open Access ESA journal content for text-mining research. Denied access to Open content – how is this possible, you say?

Over the last two years I have been targetting scientific society’s whose journal’s are not in the tiny fraction of the scientific literature in the PubMed Central Open Access subset, hoping to encourage them to release their content for text-mining research projects in my group (e.g. My attitude has been that Society’s are the ones to go after, since they often hold the copyrights and are typically run by colleagues who I can directly appeal to. After productive (yet rather protracted) communication with The Genetics Society of America, the UK Genetics Society and the Society for Molecular Biology and Evolution, we’ve been able to obtain back-content for Genetics, Heredity and Molecular Biology and Evolution for our projects.* Heredity has gone far enough to announce that their content is now open for text-mining research on their home page (victory!)

In stark contrast, a similar line of inquiry with the Ecological Society of America has led to a very sour and unproductive experience which I will summarize here to demonstrate the the ESA’s recent response letter to the OSTP is consistent with a general attitude of protecting their journal content. This narrative echoes Peter Murray-Rust’s painful story of his years negotiating with Elsevier for access to content, which likewise has no positive conclusion.

While Katherine McCarter is true in saying that the ESA publishes a subset of their content under OA licenses, it is not true that this content in is in any meaningful way “open” in a 21st-century, linked-data, remix-and-reuse context. Why? Because like virtually all of the ecological, agricultural and environmental literature, ESA OA content is not deposited in a public archive like PubMed Central, and can only be accessed via the ESA journal website.  However, this content is not accessible to text-mining since the ESA journal permissions clearly state:

Altering, recompiling, systematic or programmatic copying, or reselling of text or other information from ESA Journals in any form or medium is prohibited. Systematic or programmatic downloading, service bureau redistribution services, printing for fee-for-service purposes and/or the systematic making of print or electronic copies for transmission to non-subscribing institutions are prohibited.

Since I have been burned in the past by aggressive closed-access publishers shutting down my office IP for naively downloading content that my univeristy has a site license for, I dutifully went down the proper channel of requesting permissions to automatically download ESA content from the Permissions Editor, Dr. Cliff Duke. For the record, I can say that Dr. Duke has been faultlessly professional throughout the process and was positive about my initial request, an excerpt of which follows:

From: Cliff Duke <>
Date: 28 June 2011 14:19:18 GMT+01:00
To: Casey Bergman <>
Subject: RE: request for permission to use ESA content in text-mining research


In answer to your question — not that I recall, but I also don’t recall any previous similar requests, and I’ve been the permissions editor for about seven years. However, I doubt your request will be the last such, given the increasing interest in this kind of research.


—–Original Message—–
From: Casey Bergman []
Sent: Tuesday, June 28, 2011 9:14 AM
To: Cliff Duke
Subject: Re: request for permission to use ESA content in text-mining research

Dear Dr. Duke -

Many thanks for the very quick reply.  I appreciate your efforts in bringing this to the attention of the ESA leadership and I fully understand that this may take some time to sort out (it took several months with GSA).

One quick question at this stage: is there any precedent at ESA for permitting bulk access for text mining research?

Best regards,

—–Original Message—–
On 28 Jun 2011, at 13:48, Cliff Duke wrote:

Dr. Bergman,

I will discuss your request with our executive director and editors and get back to you as soon as I can. Our director is on travel this week, and I am on vacation next week, so it may be a couple of weeks before you hear back from us. Let me know if you have any questions meanwhile.

Cliff Duke

Clifford S. Duke, Ph.D.
Permissions Editor

Ecological Society of America
1990 M Street NW, Suite 700
Washington, DC 20036
Phone: (202) 833-8773
Fax: (202) 833-8775
E-mail: csduke [at]

—–Original Message—–
From: Casey Bergman []
Sent: Tuesday, June 28, 2011 8:34 AM
To: Cliff Duke
Subject: request for permission to use ESA content in text-mining research

Dear Dr. Duke -

Greetings, I am a researcher at the University of Manchester, with an interest in application of text and data mining to biological problems at the interface of computational and evolutionary biology. I am writing to request permission to use ESA journal content in text-mining research project that I am developing to submit as proposal to the UK Natural Environment Research Council. Specifically, I would like to request permission to automate download of the entire/the open-access subset of all ESA titles, which I understand is not permitted under the standard ESA policy (


Dr Duke then put me in touch with the Managing Editor of the ESA journals, David Baldwin who promptly ignored my request for several months, despite repeated emails and phone calls on e.g. 6 September 2011, 27 September 2011, 13 October 2011.  I finally received one email from David Baldwin on 20 October 2011, where he promised (but failed) to get back to me a few days later:

From: Casey Bergman <>
Date: 20 October 2011 12:42:01 GMT+01:00
To: J David Baldwin <>
Cc: Cliff Duke <>
Bcc: Casey Bergman <casey.bergman@xxx.xx>
Subject: Re: request for permission to use ESA content in text-mining research

Dear David -

Many thanks for replying. I fully understand that nonstandard requests can take some time. My previous interactions with Genetics and Heredity have also taken many months to lead to positive decisions on releasing content for text mining research.

All that is required in the short term is explicit permission to execute automated downloads on the site that abide by the limits of your systems (<50 sessions in 10 minutes).  No other technical issues need to be addressed on your side.

Also, I am happy in the first instance to restrict automated downloads to the Open Access subset of ESA publications, if a decision to permit access to the entirety of ESA content is more difficult.

Best regards,

On 20 Oct 2011, at 12:03, J David Baldwin wrote:

Dear Dr. Bergman–

I know you are keen to discuss this, but I’m afraid you have picked a particularly bad week for contacting me. Today (Thursday, 20 October) won’t be any better than the past two days, and tomorrow (Friday) I’ll be out of town. I’ll look over your request before Monday, 24 October, and will e-mail you again by then. I’m afraid I get handed off the nonstandard requests (“the buck stops here”), and yet I have my own priorities (i.e., working to keep the journal issues on schedule).


—–Original Message—–
From: Casey Bergman []
Sent: Thursday, October 13, 2011 9:01 AM
To: J David Baldwin
Cc: Cliff Duke
Subject: Re: request for permission to use ESA content in text-mining research

Dear David -

Would there be a convenient time for you sometime in the next week or so to discuss how we might be able to access ESA content programmatically?  I am +5 hrs to you, so late afternoon for me = morning for you is typically the best timeline to arrange a call.

Best regards,

Well I can report that as of 6 January 2012, the buck certain has stopped with David Baldwin on this issue, since he still refuses to respond to the most minimal request for permission to automatically download only the Open Access subset of the ESA content. Since no effort is required on his part other than to say “yes”, I take his inaction to speak for the ESA that they have no interest in supporting text and data mining research on their content — even their OA content — which is fully consistent with their specious arguments for protectionism of society subsidies through closed access publishing put forward in their response letter to the OSTP. Given the undeniable importance of data in the ecological literature for science and society, the ESA should be ashamed for locking away this precious resource from the world and being adamant in their position that this is in any way morally or ethically justified.

I look forward to David Baldwin’s response on this request, and hope the the ESA is more progressive in their outlook toward open access publishing and text/data mining in the coming years….

* Credits to Tracey DePellegrin Connelly, Scott Hawley and Lauren McIntrye for helping to free Genetics content;  Roger Butlin for helping free Heredity content; and Ken Wolfe, Soojin Yi and the SMBE council for helping to free MBE content.

Related Posts:

Twitter Updates


Get every new post delivered to your Inbox.

Join 72 other followers