Category Posts Navigation

Text Mining

Posted by Marcus Zillman

Text Mining

Text Mining from Science and Technology Resources on the Internet by Kristen Cooper. Taken from the overview: “As defined by Bernard Reilly (2012), president of the Center for Research Libraries, text mining is “the automated processing of large amounts of digital data or textual content for the purpose of information retrieval, extraction, interpretation, and analysis.” The first step is to find or build a corpus, or the collection of text that a researcher wishes to work with. Most often researchers will need to download this corpus to either their computers or an alternative storage platform. Once this has been done, different tools can be used to find patterns, biases, and other trends that are present in the text (Reilly 2012). Within higher education, text mining is most often found among the digital humanities and linguistics studies. However it is growing in popularity in the science and technology fields. It is possible to find many examples of how text mining is beginning to be utilized in the sciences. It allows users to search across a large set of documents to find connections that would be prohibitively expensive in terms of time to attempt to read individually. An example of this can be seen in the biomedical sciences where Frijters et al. (2010) used text mining to search in MedLine for drugs that could interfere with cell proliferation. Another example can be found with the works of the EXFOR library, which contains experimental nuclear reaction data. Hirdt and Brown (2016) used text mining to build a graph of the relationships between the reactions in the library. They were then able to use this information to identify reactions that are important to researchers but have been understudied. Text mining also has an ability to discover themes and relationships within a corpus through a technique called topic modeling. In a 2016 research study, the authors use topic modeling to determine the proportion of the analyzed text discussing a specific phenomenon, in this case forest fragmentation, and to determine the concepts that are most strongly associated with this phenomenon (Nunez-Mir et al. 2016). In an example from environmental science, Grubert and Siders (2016) use topic modeling to find empirical support for the theory that climate change has become an important topic in environmental lifecycle assessment over time, and revealed a secondary finding of this increase coming at the expense of attention to human health. Finally, the sheer amount of information available to researchers, educators, and scholars makes it increasingly difficult to stay current on a particular topic or field. Anne Okerson (2013) points out that text mining can be a useful and time saving factor in doing a systematic review. Text mining therefore presents librarians with the opportunity to develop skills in a new area that has the potential to be of great use to patrons.” This will be added to Data Mining Resources Subject Tracer™. This will be added to the tools section of Research Resources Subject Tracer™. This will be added to Entrepreneurial Resources Subject Tracer™.

Leave a Reply

Facebook Comments

Browse Categories

AwarenessWatch Newsletter