<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: Why Are There So Few Efforts to Text Mine the Open Access Subset of PubMed Central?</title>
	<atom:link href="http://caseybergman.wordpress.com/2012/03/02/why-are-there-so-few-efforts-to-text-mine-the-open-access-subset-of-pubmed-central/feed/" rel="self" type="application/rss+xml" />
	<link>http://caseybergman.wordpress.com/2012/03/02/why-are-there-so-few-efforts-to-text-mine-the-open-access-subset-of-pubmed-central/</link>
	<description></description>
	<lastBuildDate>Sat, 15 Jun 2013 21:49:40 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: Launch of the PLOS Text Mining Collection &#124; I wish you&#039;d made me angry earlier</title>
		<link>http://caseybergman.wordpress.com/2012/03/02/why-are-there-so-few-efforts-to-text-mine-the-open-access-subset-of-pubmed-central/#comment-2946</link>
		<dc:creator><![CDATA[Launch of the PLOS Text Mining Collection &#124; I wish you&#039;d made me angry earlier]]></dc:creator>
		<pubDate>Wed, 17 Apr 2013 21:10:57 +0000</pubDate>
		<guid isPermaLink="false">http://caseybergman.wordpress.com/?p=110#comment-2946</guid>
		<description><![CDATA[[...] 1. Dickman S (2003) Tough mining: the challenges of searching the scientific literature. PLoS biology 1: e48. doi:10.1371/journal.pbio.0000048. 2. Rebholz-Schuhmann D, Kirsch H, Couto F (2005) Facts from Text—Is Text Mining Ready to Deliver? PLoS Biol 3: e65. doi:10.1371/journal.pbio.0030065. 3. Cohen B, Hunter L (2008) Getting started in text mining. PLoS computational biology 4: e20. doi:10.1371/journal.pcbi.0040020. 4. Bourne PE, Fink JL, Gerstein M (2008) Open access: taking full advantage of the content. PLoS computational biology 4: e1000037+. doi:10.1371/journal.pcbi.1000037. 5. Rzhetsky A, Seringhaus M, Gerstein M (2009) Getting Started in Text Mining: Part Two. PLoS Comput Biol 5: e1000411. doi:10.1371/journal.pcbi.1000411. 6. Rodriguez-Esteban R (2009) Biomedical Text Mining and Its Applications. PLoS Comput Biol 5: e1000597. doi:10.1371/journal.pcbi.1000597. 7. Kim D, Yu H (2011) Figure text extraction in biomedical literature. PloS one 6: e15338. doi:10.1371/journal.pone.0015338. 8. Boyack K, Newman D, Duhon R, Klavans R, Patek M, et al. (2011) Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches. PLoS ONE 6: e18029. doi:10.1371/journal.pone.0018029. 9. Kolluru B, Hawizy L, Murray-Rust P, Tsujii J, Ananiadou S (2011) Using workflows to explore and optimise named entity recognition for chemistry. PloS one 6: e20181. doi:10.1371/journal.pone.0020181. 10. Hayasaka S, Hugenschmidt C, Laurienti P (2011) A network of genes, genetic disorders, and brain areas. PloS one 6: e20907. doi:10.1371/journal.pone.0020907. 11. Roque F, Jensen P, Schmock H, Dalgaard M, Andreatta M, et al. (2011) Using electronic patient records to discover disease correlations and stratify patient cohorts. PLoS computational biology 7: e1002141. doi:10.1371/journal.pcbi.1002141. 12. Salathé M, Khandelwal S (2011) Assessing Vaccination Sentiments with Online Social Media: Implications for Infectious Disease Dynamics and Control. PLoS Comput Biol 7: e1002199. doi:10.1371/journal.pcbi.1002199. 13. Baran J, Gerner M, Haeussler M, Nenadic G, Bergman C (2011) pubmed2ensembl: a resource for mining the biological literature on genes. PloS one 6: e24716. doi:10.1371/journal.pone.0024716. 14. Fisher R, Knowlton N, Brainard R, Caley J (2011) Differences among major taxa in the extent of ecological knowledge across four major ecosystems. PloS one 6: e26556. doi:10.1371/journal.pone.0026556. 15. Hossain S, Gresock J, Edmonds Y, Helm R, Potts M, et al. (2012) Connecting the dots between PubMed abstracts. PloS one 7: e29509. doi:10.1371/journal.pone.0029509. 16. Ebrahimpour M, Putniņš TJ, Berryman MJ, Allison A, Ng BW-H, et al. (2013) Automated authorship attribution using advanced signal classification techniques. PLoS ONE 8: e54998. doi:10.1371/journal.pone.0054998. 17. Acerbi A, Lampos V, Garnett P, Bentley RA (2013) The Expression of Emotions in 20th Century Books. PLoS ONE 8: e59030. doi:10.1371/journal.pone.0059030. 18. Groza T, Hunter J, Zankl A (2013) Mining Skeletal Phenotype Descriptions from Scientific Literature. PLoS ONE 8: e55656. doi:10.1371/journal.pone.0055656. 19. Seltmann KC, Pénzes Z, Yoder MJ, Bertone MA, Deans AR (2013) Utilizing Descriptive Statements from the Biodiversity Heritage Library to Expand the Hymenoptera Anatomy Ontology. PLoS ONE 8: e55674. doi:10.1371/journal.pone.0055674. 20. Van Landeghem S, Bjorne J, Wei C-H, Hakala K, Pyysal S, et al. (2013) Large-Scale Event Extraction from Literature with Multi-Level Gene Normalization. PLOS ONE 8: e55814. doi:10.1371/journal.pone.0055814 21. Liu H, Hunter L, Keselj V, Verspoor K (2013) Approximate Subgraph Matching-based Literature Mining for Biomedical Events and Relations. PLoS ONE 8(4): e60954. doi:10.1371/journal.pone.0060954 22. Davis A, Weigers T, Johnson R, Lay J, Lennon-Hopkins K, et al. (2013) Text mining effectively scores and ranks the literature for improving chemical-gene-disease curation at the Comparative Toxicogenomics Database. PLOS ONE 8: e58201. doi:10.1371/journal.pone.0058201 23. Bergman CM (2012) Why Are There So Few Efforts to Text Mine the Open Access Subset of PubMed Central? http://caseybergman.wordpress.com/2012/03/02/why-are-there-so-few-efforts-to-text-mine-the-open-acce.... [...]]]></description>
		<content:encoded><![CDATA[<p>[...] 1. Dickman S (2003) Tough mining: the challenges of searching the scientific literature. PLoS biology 1: e48. doi:10.1371/journal.pbio.0000048. 2. Rebholz-Schuhmann D, Kirsch H, Couto F (2005) Facts from Text—Is Text Mining Ready to Deliver? PLoS Biol 3: e65. doi:10.1371/journal.pbio.0030065. 3. Cohen B, Hunter L (2008) Getting started in text mining. PLoS computational biology 4: e20. doi:10.1371/journal.pcbi.0040020. 4. Bourne PE, Fink JL, Gerstein M (2008) Open access: taking full advantage of the content. PLoS computational biology 4: e1000037+. doi:10.1371/journal.pcbi.1000037. 5. Rzhetsky A, Seringhaus M, Gerstein M (2009) Getting Started in Text Mining: Part Two. PLoS Comput Biol 5: e1000411. doi:10.1371/journal.pcbi.1000411. 6. Rodriguez-Esteban R (2009) Biomedical Text Mining and Its Applications. PLoS Comput Biol 5: e1000597. doi:10.1371/journal.pcbi.1000597. 7. Kim D, Yu H (2011) Figure text extraction in biomedical literature. PloS one 6: e15338. doi:10.1371/journal.pone.0015338. 8. Boyack K, Newman D, Duhon R, Klavans R, Patek M, et al. (2011) Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches. PLoS ONE 6: e18029. doi:10.1371/journal.pone.0018029. 9. Kolluru B, Hawizy L, Murray-Rust P, Tsujii J, Ananiadou S (2011) Using workflows to explore and optimise named entity recognition for chemistry. PloS one 6: e20181. doi:10.1371/journal.pone.0020181. 10. Hayasaka S, Hugenschmidt C, Laurienti P (2011) A network of genes, genetic disorders, and brain areas. PloS one 6: e20907. doi:10.1371/journal.pone.0020907. 11. Roque F, Jensen P, Schmock H, Dalgaard M, Andreatta M, et al. (2011) Using electronic patient records to discover disease correlations and stratify patient cohorts. PLoS computational biology 7: e1002141. doi:10.1371/journal.pcbi.1002141. 12. Salathé M, Khandelwal S (2011) Assessing Vaccination Sentiments with Online Social Media: Implications for Infectious Disease Dynamics and Control. PLoS Comput Biol 7: e1002199. doi:10.1371/journal.pcbi.1002199. 13. Baran J, Gerner M, Haeussler M, Nenadic G, Bergman C (2011) pubmed2ensembl: a resource for mining the biological literature on genes. PloS one 6: e24716. doi:10.1371/journal.pone.0024716. 14. Fisher R, Knowlton N, Brainard R, Caley J (2011) Differences among major taxa in the extent of ecological knowledge across four major ecosystems. PloS one 6: e26556. doi:10.1371/journal.pone.0026556. 15. Hossain S, Gresock J, Edmonds Y, Helm R, Potts M, et al. (2012) Connecting the dots between PubMed abstracts. PloS one 7: e29509. doi:10.1371/journal.pone.0029509. 16. Ebrahimpour M, Putniņš TJ, Berryman MJ, Allison A, Ng BW-H, et al. (2013) Automated authorship attribution using advanced signal classification techniques. PLoS ONE 8: e54998. doi:10.1371/journal.pone.0054998. 17. Acerbi A, Lampos V, Garnett P, Bentley RA (2013) The Expression of Emotions in 20th Century Books. PLoS ONE 8: e59030. doi:10.1371/journal.pone.0059030. 18. Groza T, Hunter J, Zankl A (2013) Mining Skeletal Phenotype Descriptions from Scientific Literature. PLoS ONE 8: e55656. doi:10.1371/journal.pone.0055656. 19. Seltmann KC, Pénzes Z, Yoder MJ, Bertone MA, Deans AR (2013) Utilizing Descriptive Statements from the Biodiversity Heritage Library to Expand the Hymenoptera Anatomy Ontology. PLoS ONE 8: e55674. doi:10.1371/journal.pone.0055674. 20. Van Landeghem S, Bjorne J, Wei C-H, Hakala K, Pyysal S, et al. (2013) Large-Scale Event Extraction from Literature with Multi-Level Gene Normalization. PLOS ONE 8: e55814. doi:10.1371/journal.pone.0055814 21. Liu H, Hunter L, Keselj V, Verspoor K (2013) Approximate Subgraph Matching-based Literature Mining for Biomedical Events and Relations. PLoS ONE 8(4): e60954. doi:10.1371/journal.pone.0060954 22. Davis A, Weigers T, Johnson R, Lay J, Lennon-Hopkins K, et al. (2013) Text mining effectively scores and ranks the literature for improving chemical-gene-disease curation at the Comparative Toxicogenomics Database. PLOS ONE 8: e58201. doi:10.1371/journal.pone.0058201 23. Bergman CM (2012) Why Are There So Few Efforts to Text Mine the Open Access Subset of PubMed Central? <a href="http://caseybergman.wordpress.com/2012/03/02/why-are-there-so-few-efforts-to-text-mine-the-open-acce" rel="nofollow">http://caseybergman.wordpress.com/2012/03/02/why-are-there-so-few-efforts-to-text-mine-the-open-acce</a>&#8230;. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ronald Kostoff</title>
		<link>http://caseybergman.wordpress.com/2012/03/02/why-are-there-so-few-efforts-to-text-mine-the-open-access-subset-of-pubmed-central/#comment-315</link>
		<dc:creator><![CDATA[Ronald Kostoff]]></dc:creator>
		<pubDate>Wed, 13 Jun 2012 16:12:59 +0000</pubDate>
		<guid isPermaLink="false">http://caseybergman.wordpress.com/?p=110#comment-315</guid>
		<description><![CDATA[Casey,

I&#039;ve read the latest postings above, and they all seem to point in the same direction.  Very few full text studies are being done, despite the potential benefits.  That is my observation as well.  Usually, in paradoxical situations like this, one has to examine incentives for deeper insights.

For an example from a completely different area of study, most people believe that interdisciplinary research has myriad benefits, but publications reflecting real interdisciplinary research are relatively sparse.  I examined this paradox in a Bioscience paper in 2002, and showed that, despite the flowery words about the benefits of interdisciplinary research, the reality was mainly disincentives to pursue this type of research.  There is far more &#039;bang to the buck&#039;  in publishing traditional focused research in a discipline, where every slight change in a parameter could result in another publication.  While interdisciplinary research could result in great benefits to science not available through more narrowly focused research, the time and effort required to understand the different disciplines and their inter-relationships does not, in most cases, pay off in terms of the metrics used to evaluate research productivity.

As another example, climate change seems to be bearing down upon us, yet essentially nothing is being done to counter it.  I&#039;ve examined the motivations of all the major stakeholders related to climate change, and all are comfortable with the status quo, albeit for different reasons.

From the few studies I&#039;ve done in full text mining, far more and richer information is possible than mining titles or Abstracts.  However, mining full text is intrinsically more difficult than mining Abstracts, and as in the examples above, I&#039;m not sure it provides more &#039;bang for the buck&#039; of interest to most researchers.  In other words, the incentives for going to full text may not be there.

However, I don&#039;t buy the arguments of limited coverage as a valid reason for lack of studies.  Many research studies could be categorized as proof-of-principle demonstration, and for that only limited coverage databases are required.  Some of the original studies with Textpresso demonstrate that.

The best way to move full text mining forward is to use the limited full text databases available, compare full text results with Abstract only results, and show the benefits (and additional costs as well).  If these studies could show that a benefit-cost advantage exists for full text, full text mining would be well on itsur way to gaining acceptance.  Our limited full text studies in 2009, only part of which were published, convinced me the benefits far outweighed the costs, but that was only one data point.  Far more is required to convince a wider public.

As a postscript, there is another advantage of full text that I haven&#039;t seen mentioned elsewhere.  Most publication information retrieval is based on text.  Queries tend to be words and word combinations.  One can then use these retrievals to explore citation networks and find additional relevant articles outside the text query terms used.  But, full text especially contains much more than words/phrases.  There are symbols and graphics of all types, like equations and curves.  In theory, at least, these symbols could be used as search terms.  One might have an interesting curve, and want to identify such curves in other literatures and how they were interpreted.  Since curves typically are only presented in full text, this unexplored area has the potential of great payoff for full text mining.]]></description>
		<content:encoded><![CDATA[<p>Casey,</p>
<p>I&#8217;ve read the latest postings above, and they all seem to point in the same direction.  Very few full text studies are being done, despite the potential benefits.  That is my observation as well.  Usually, in paradoxical situations like this, one has to examine incentives for deeper insights.</p>
<p>For an example from a completely different area of study, most people believe that interdisciplinary research has myriad benefits, but publications reflecting real interdisciplinary research are relatively sparse.  I examined this paradox in a Bioscience paper in 2002, and showed that, despite the flowery words about the benefits of interdisciplinary research, the reality was mainly disincentives to pursue this type of research.  There is far more &#8216;bang to the buck&#8217;  in publishing traditional focused research in a discipline, where every slight change in a parameter could result in another publication.  While interdisciplinary research could result in great benefits to science not available through more narrowly focused research, the time and effort required to understand the different disciplines and their inter-relationships does not, in most cases, pay off in terms of the metrics used to evaluate research productivity.</p>
<p>As another example, climate change seems to be bearing down upon us, yet essentially nothing is being done to counter it.  I&#8217;ve examined the motivations of all the major stakeholders related to climate change, and all are comfortable with the status quo, albeit for different reasons.</p>
<p>From the few studies I&#8217;ve done in full text mining, far more and richer information is possible than mining titles or Abstracts.  However, mining full text is intrinsically more difficult than mining Abstracts, and as in the examples above, I&#8217;m not sure it provides more &#8216;bang for the buck&#8217; of interest to most researchers.  In other words, the incentives for going to full text may not be there.</p>
<p>However, I don&#8217;t buy the arguments of limited coverage as a valid reason for lack of studies.  Many research studies could be categorized as proof-of-principle demonstration, and for that only limited coverage databases are required.  Some of the original studies with Textpresso demonstrate that.</p>
<p>The best way to move full text mining forward is to use the limited full text databases available, compare full text results with Abstract only results, and show the benefits (and additional costs as well).  If these studies could show that a benefit-cost advantage exists for full text, full text mining would be well on itsur way to gaining acceptance.  Our limited full text studies in 2009, only part of which were published, convinced me the benefits far outweighed the costs, but that was only one data point.  Far more is required to convince a wider public.</p>
<p>As a postscript, there is another advantage of full text that I haven&#8217;t seen mentioned elsewhere.  Most publication information retrieval is based on text.  Queries tend to be words and word combinations.  One can then use these retrievals to explore citation networks and find additional relevant articles outside the text query terms used.  But, full text especially contains much more than words/phrases.  There are symbols and graphics of all types, like equations and curves.  In theory, at least, these symbols could be used as search terms.  One might have an interesting curve, and want to identify such curves in other literatures and how they were interpreted.  Since curves typically are only presented in full text, this unexplored area has the potential of great payoff for full text mining.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: caseybergman</title>
		<link>http://caseybergman.wordpress.com/2012/03/02/why-are-there-so-few-efforts-to-text-mine-the-open-access-subset-of-pubmed-central/#comment-306</link>
		<dc:creator><![CDATA[caseybergman]]></dc:creator>
		<pubDate>Sat, 26 May 2012 07:35:18 +0000</pubDate>
		<guid isPermaLink="false">http://caseybergman.wordpress.com/?p=110#comment-306</guid>
		<description><![CDATA[David Springate provides his thoughts on this issue here: http://www.datajujitsu.co.uk/2012/03/stop-whining-and-start-mining.html]]></description>
		<content:encoded><![CDATA[<p>David Springate provides his thoughts on this issue here: <a href="http://www.datajujitsu.co.uk/2012/03/stop-whining-and-start-mining.html" rel="nofollow">http://www.datajujitsu.co.uk/2012/03/stop-whining-and-start-mining.html</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: caseybergman</title>
		<link>http://caseybergman.wordpress.com/2012/03/02/why-are-there-so-few-efforts-to-text-mine-the-open-access-subset-of-pubmed-central/#comment-293</link>
		<dc:creator><![CDATA[caseybergman]]></dc:creator>
		<pubDate>Thu, 19 Apr 2012 20:44:11 +0000</pubDate>
		<guid isPermaLink="false">http://caseybergman.wordpress.com/?p=110#comment-293</guid>
		<description><![CDATA[Joanna Ptolomey comments on this post here: http://web.vivavip.com/go/livewire/68588]]></description>
		<content:encoded><![CDATA[<p>Joanna Ptolomey comments on this post here: <a href="http://web.vivavip.com/go/livewire/68588" rel="nofollow">http://web.vivavip.com/go/livewire/68588</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Hong Yu</title>
		<link>http://caseybergman.wordpress.com/2012/03/02/why-are-there-so-few-efforts-to-text-mine-the-open-access-subset-of-pubmed-central/#comment-280</link>
		<dc:creator><![CDATA[Hong Yu]]></dc:creator>
		<pubDate>Tue, 03 Apr 2012 15:16:47 +0000</pubDate>
		<guid isPermaLink="false">http://caseybergman.wordpress.com/?p=110#comment-280</guid>
		<description><![CDATA[Dear Casey,  you are correct. There is an additional tool: 

Shashank Agarwal, Hong Yu &quot;Figure summarizer browser extensions for PubMed Central&quot; Bioinformatics. 2011; 27(12):1723-1724.]]></description>
		<content:encoded><![CDATA[<p>Dear Casey,  you are correct. There is an additional tool: </p>
<p>Shashank Agarwal, Hong Yu &#8220;Figure summarizer browser extensions for PubMed Central&#8221; Bioinformatics. 2011; 27(12):1723-1724.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: caseybergman</title>
		<link>http://caseybergman.wordpress.com/2012/03/02/why-are-there-so-few-efforts-to-text-mine-the-open-access-subset-of-pubmed-central/#comment-278</link>
		<dc:creator><![CDATA[caseybergman]]></dc:creator>
		<pubDate>Tue, 03 Apr 2012 06:31:48 +0000</pubDate>
		<guid isPermaLink="false">http://caseybergman.wordpress.com/?p=110#comment-278</guid>
		<description><![CDATA[Hi Hong -

I already had the FigureSearch paper &quot;Figure text extraction in biomedical literature&quot; in the list, but if there are others, please let me know.

Best,
Casey]]></description>
		<content:encoded><![CDATA[<p>Hi Hong -</p>
<p>I already had the FigureSearch paper &#8220;Figure text extraction in biomedical literature&#8221; in the list, but if there are others, please let me know.</p>
<p>Best,<br />
Casey</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Hong Yu</title>
		<link>http://caseybergman.wordpress.com/2012/03/02/why-are-there-so-few-efforts-to-text-mine-the-open-access-subset-of-pubmed-central/#comment-277</link>
		<dc:creator><![CDATA[Hong Yu]]></dc:creator>
		<pubDate>Mon, 02 Apr 2012 21:24:55 +0000</pubDate>
		<guid isPermaLink="false">http://caseybergman.wordpress.com/?p=110#comment-277</guid>
		<description><![CDATA[The Biomedical FigureSearch engine (http://figuresearch.askhermes.org) is also a product of the entire PubMed Central data, as well as the Elsevier corpus. 

-Hong Yu]]></description>
		<content:encoded><![CDATA[<p>The Biomedical FigureSearch engine (<a href="http://figuresearch.askhermes.org" rel="nofollow">http://figuresearch.askhermes.org</a>) is also a product of the entire PubMed Central data, as well as the Elsevier corpus. </p>
<p>-Hong Yu</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Rothamsted, Council, ELC and the bioeconomy &#124; Professor Douglas Kell&#039;s blog</title>
		<link>http://caseybergman.wordpress.com/2012/03/02/why-are-there-so-few-efforts-to-text-mine-the-open-access-subset-of-pubmed-central/#comment-249</link>
		<dc:creator><![CDATA[Rothamsted, Council, ELC and the bioeconomy &#124; Professor Douglas Kell&#039;s blog]]></dc:creator>
		<pubDate>Mon, 12 Mar 2012 08:11:51 +0000</pubDate>
		<guid isPermaLink="false">http://caseybergman.wordpress.com/?p=110#comment-249</guid>
		<description><![CDATA[[...] Readers of this blog may well have come via the BBSRC website. The site was recently refreshed to ensure we are providing the best possible ‘user experience’ while clearly presenting examples of the impact of the research we fund. We are now seeking your comments on how you use the site. Continuing last week’s Open Access (OA) theme, I noted a useful and interesting paper rehearsing some of the issues. Other benefits of OA include the ability to assess the interest in and utility of articles by a variety of novel metrics, though despite the accessibility of biomedical abstracts it remains early days. [...]]]></description>
		<content:encoded><![CDATA[<p>[...] Readers of this blog may well have come via the BBSRC website. The site was recently refreshed to ensure we are providing the best possible ‘user experience’ while clearly presenting examples of the impact of the research we fund. We are now seeking your comments on how you use the site. Continuing last week’s Open Access (OA) theme, I noted a useful and interesting paper rehearsing some of the issues. Other benefits of OA include the ability to assess the interest in and utility of articles by a variety of novel metrics, though despite the accessibility of biomedical abstracts it remains early days. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ronald Kostoff</title>
		<link>http://caseybergman.wordpress.com/2012/03/02/why-are-there-so-few-efforts-to-text-mine-the-open-access-subset-of-pubmed-central/#comment-245</link>
		<dc:creator><![CDATA[Ronald Kostoff]]></dc:creator>
		<pubDate>Thu, 08 Mar 2012 14:29:25 +0000</pubDate>
		<guid isPermaLink="false">http://caseybergman.wordpress.com/?p=110#comment-245</guid>
		<description><![CDATA[Casey,

I don&#039;t see a way to respond to the Nature Editorial online, but I&#039;ll combine your blog response and the Editorial response here.

The Editorial includes the statement &quot;The promise is yet to be backed up with concrete examples of scientific success&quot;.  That&#039;s only true in part.  Our research group has been generating potential discovery over the past five years.  I addressed the issue in detail in an ARIST chapter [1], and more recently in a comprehensive update of our discovery approach [2].

However, the critics quoted in the Editorial may have a case.  Useful text mining results have both a quantity and quality component.  Typically, the computer and its built-in algorithmic rules do a reasonable job on the quantity component, but not on quality.  For the latter, human judgment is essential.  Much of the focus has been concentrated solely on the algorithms.
For full text, I have seen very little published in the literature-related discovery area.  I did some full text mining when I worked at MITRE.  I addressed two aspects: information retrieval and information extraction.  I published the information retrieval results in Journal of Information Science in 2010 [3].  If queries are generated properly, full text information retrieval can provide orders of magnitude increase in relevant articles retrieved, depending on the category of interest.  In fact, we have found that the types of queries required to retrieve relevant documents from searching the full text are the same types of queries required to pinpoint concepts/papers with high discovery potential.  For intel work, many of the categories of interest (e.g., suppliers, hardware, software specifics, etc) can only be found in the full text.

For the information extraction, we examined a few approaches.  The difficulty was extracting some of the important rare events.  Operationally, important rare events (low frequency phenomena) had to be related to high frequency phenomena.  Standard text mining procedures, such as in scientometrics, are based on statistical analysis of high frequency phenomena and are not really applicable to the low frequency extraction problem.  I found a way to relate the important low frequency events to the high frequency and extract these rare events, but cannot comment on the downstream etiology of the results.  The bottom line is that much useful information can be obtained from full text mining, if the right algorithms are used, and some human judgment is applied as well.

RNK

REFERENCES
[1].  Kostoff, R.N., Block, J.A., Solka, J.A., Briggs, M.B., Rushenberg, R.L., Stump, J.A., Johnson, D., Wyatt, J.R.   “Literature-Related Discovery”.  ARIST.  43.  243-285.  2008.
[2].  Kostoff RN.  Literature-related discovery and innovation — update.  Technological Forecasting and Social Change (2012).  doi:10.1016/j.techfore.2012.02.002.  Also, see appended below.
[3].  Kostoff RN.  “Expanded information retrieval using full text searching”.  Journal of Information Science.  36:1.  104-113.  2010.

APPENDIX - ANNOUNCEMENT LETTER FOR LRDI UPDATE PUBLICATION
When reference [2] went online, I distributed an announcement letter to a few colleagues.  I reproduce the letter below; it summarizes the contents and provides access.

A recent publication updates the Literature-Related Discovery and Innovation (LRDI) technique, which identifies prevention and remediation measures for chronic and infectious diseases [1]. The information technology-based LRDI technique may be of interest to researchers in text mining, bioinformatics, and literature-based discovery, and the potential medical applications may be of special interest to researchers/clinicians focused on preventing, reducing, halting, or reversing progression of chronic and infectious diseases.  To illustrate the potential power of LRDI, the article emphasizes the relationship between the results of our 2007 LRDI multiple sclerosis (MS) study and a recent demonstration of MS reversal.

The findings in the update [1] include:
* the role of comprehensive and precise information retrieval in discovery and innovation
* the value of interdisciplinary research in discovery and innovation
* the critical role of hormesis and synergy in preventative measures and accelerated healing
* the critical need for cause removal in reversal of chronic disease
* the severe under-reporting of critical variables in the clinical trials literature
* the severe under-utilization of the broad biomedical literature for reversing chronic disease
* concerns about the credibility and integrity of the medical literature in areas that concern commercial and government/political sensitivities

Dr. Ronald N. Kostoff

 

References
[1].  Kostoff RN.  Literature-Related Discovery and Innovation - Update.  Technological Forecasting and Social Change (2012).  doi:10.1016/j.techfore.2012.02.002.
*Pre-print full text version can be accessed at (http://stip.gatech.edu/wp-content/uploads/2012/02/LRD-UPDATE_TFSC_7_REV.pdf).  
*Journal posting access (http://dx.doi.org/10.1016/j.techfore.2012.02.002).]]></description>
		<content:encoded><![CDATA[<p>Casey,</p>
<p>I don&#8217;t see a way to respond to the Nature Editorial online, but I&#8217;ll combine your blog response and the Editorial response here.</p>
<p>The Editorial includes the statement &#8220;The promise is yet to be backed up with concrete examples of scientific success&#8221;.  That&#8217;s only true in part.  Our research group has been generating potential discovery over the past five years.  I addressed the issue in detail in an ARIST chapter [1], and more recently in a comprehensive update of our discovery approach [2].</p>
<p>However, the critics quoted in the Editorial may have a case.  Useful text mining results have both a quantity and quality component.  Typically, the computer and its built-in algorithmic rules do a reasonable job on the quantity component, but not on quality.  For the latter, human judgment is essential.  Much of the focus has been concentrated solely on the algorithms.<br />
For full text, I have seen very little published in the literature-related discovery area.  I did some full text mining when I worked at MITRE.  I addressed two aspects: information retrieval and information extraction.  I published the information retrieval results in Journal of Information Science in 2010 [3].  If queries are generated properly, full text information retrieval can provide orders of magnitude increase in relevant articles retrieved, depending on the category of interest.  In fact, we have found that the types of queries required to retrieve relevant documents from searching the full text are the same types of queries required to pinpoint concepts/papers with high discovery potential.  For intel work, many of the categories of interest (e.g., suppliers, hardware, software specifics, etc) can only be found in the full text.</p>
<p>For the information extraction, we examined a few approaches.  The difficulty was extracting some of the important rare events.  Operationally, important rare events (low frequency phenomena) had to be related to high frequency phenomena.  Standard text mining procedures, such as in scientometrics, are based on statistical analysis of high frequency phenomena and are not really applicable to the low frequency extraction problem.  I found a way to relate the important low frequency events to the high frequency and extract these rare events, but cannot comment on the downstream etiology of the results.  The bottom line is that much useful information can be obtained from full text mining, if the right algorithms are used, and some human judgment is applied as well.</p>
<p>RNK</p>
<p>REFERENCES<br />
[1].  Kostoff, R.N., Block, J.A., Solka, J.A., Briggs, M.B., Rushenberg, R.L., Stump, J.A., Johnson, D., Wyatt, J.R.   “Literature-Related Discovery”.  ARIST.  43.  243-285.  2008.<br />
[2].  Kostoff RN.  Literature-related discovery and innovation — update.  Technological Forecasting and Social Change (2012).  doi:10.1016/j.techfore.2012.02.002.  Also, see appended below.<br />
[3].  Kostoff RN.  “Expanded information retrieval using full text searching”.  Journal of Information Science.  36:1.  104-113.  2010.</p>
<p>APPENDIX &#8211; ANNOUNCEMENT LETTER FOR LRDI UPDATE PUBLICATION<br />
When reference [2] went online, I distributed an announcement letter to a few colleagues.  I reproduce the letter below; it summarizes the contents and provides access.</p>
<p>A recent publication updates the Literature-Related Discovery and Innovation (LRDI) technique, which identifies prevention and remediation measures for chronic and infectious diseases [1]. The information technology-based LRDI technique may be of interest to researchers in text mining, bioinformatics, and literature-based discovery, and the potential medical applications may be of special interest to researchers/clinicians focused on preventing, reducing, halting, or reversing progression of chronic and infectious diseases.  To illustrate the potential power of LRDI, the article emphasizes the relationship between the results of our 2007 LRDI multiple sclerosis (MS) study and a recent demonstration of MS reversal.</p>
<p>The findings in the update [1] include:<br />
* the role of comprehensive and precise information retrieval in discovery and innovation<br />
* the value of interdisciplinary research in discovery and innovation<br />
* the critical role of hormesis and synergy in preventative measures and accelerated healing<br />
* the critical need for cause removal in reversal of chronic disease<br />
* the severe under-reporting of critical variables in the clinical trials literature<br />
* the severe under-utilization of the broad biomedical literature for reversing chronic disease<br />
* concerns about the credibility and integrity of the medical literature in areas that concern commercial and government/political sensitivities</p>
<p>Dr. Ronald N. Kostoff</p>
<p>References<br />
[1].  Kostoff RN.  Literature-Related Discovery and Innovation &#8211; Update.  Technological Forecasting and Social Change (2012).  doi:10.1016/j.techfore.2012.02.002.<br />
*Pre-print full text version can be accessed at (<a href="http://stip.gatech.edu/wp-content/uploads/2012/02/LRD-UPDATE_TFSC_7_REV.pdf" rel="nofollow">http://stip.gatech.edu/wp-content/uploads/2012/02/LRD-UPDATE_TFSC_7_REV.pdf</a>).<br />
*Journal posting access (<a href="http://dx.doi.org/10.1016/j.techfore.2012.02.002" rel="nofollow">http://dx.doi.org/10.1016/j.techfore.2012.02.002</a>).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Heather Piwowar</title>
		<link>http://caseybergman.wordpress.com/2012/03/02/why-are-there-so-few-efforts-to-text-mine-the-open-access-subset-of-pubmed-central/#comment-244</link>
		<dc:creator><![CDATA[Heather Piwowar]]></dc:creator>
		<pubDate>Wed, 07 Mar 2012 22:12:45 +0000</pubDate>
		<guid isPermaLink="false">http://caseybergman.wordpress.com/?p=110#comment-244</guid>
		<description><![CDATA[Yes, I think it would be valuable for sure.  Different estimate, and seeing all the different uses would be interesting, inspiring, and show off more gaps.  

Need all these uses before concluding how useful PMC OA has been for text mining thus far.

&gt; corpus-wide text-mining efforts are rare, and what this implies for the state of the art

ok, I&#039;m with you on this.  Good point.]]></description>
		<content:encoded><![CDATA[<p>Yes, I think it would be valuable for sure.  Different estimate, and seeing all the different uses would be interesting, inspiring, and show off more gaps.  </p>
<p>Need all these uses before concluding how useful PMC OA has been for text mining thus far.</p>
<p>&gt; corpus-wide text-mining efforts are rare, and what this implies for the state of the art</p>
<p>ok, I&#8217;m with you on this.  Good point.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
