Posts by Category: Data Mining Resources

OpenMinted – Open Service Oriented e-Infrastructure for Scientific and Scholarly Text and Data Mining

November 02, 2017

OpenMinted – Open Service Oriented e-Infrastructure for Scientific and Scholarly Text and Data Mining
http://openminted.eu/

OpenMinted sets out to create an open, service-oriented e-Infrastructure for Text and Data Mining (TDM) of scientific and scholarly content. Researchers can collaboratively create, discover, share and re-use Knowledge from a wide range of text-based scientific related sources in a seamless way. This will be added to Data Mining Resources Subject Tracer™. This will be added to the tools section of Research Resources Subject Tracer™.

64 views

Awareness Watch Newsletter V1511 November 2017

October 28, 2017

Awareness Watch Newsletter V1511 November 2017
http://AwarenessWatch.VirtualPrivateLibrary.net/V15N11.pdf
Awareness Watch™ Newsletter Blog and Archives
http://www.AwarenessWatch.com/

The November 2017 V15N11 Awareness Watch Newsletter is a freely available 56 page .pdf document (416KB) from the above URL. This month’s featured report covers my Data Mining Resources 2018 and is a comprehensive listing of data mining search engines, directories, subject guides and index resources and sites on the Internet. The below list of sources is taken from my Subject Tracer™ Information Blog titled Data Mining Resources Resources and is constantly updated with Subject Tracer™ bots at the following URLs: http://www.DataMiningResources.info/. These resources and sources will help you to discover the many pathways available through the Internet to find the latest data mining resources and sites. As this site is constantly updated it would be to your benefit to bookmark and return to the above URL frequently. The Awareness Watch Spotters cover many excellent and newly released annotated current awareness research sources and tools as well as the latest identified Internet happenings and resources including a number of really neat and must-have tools! The Awareness Watch Article Review covers Why Blogs Endure: A Study of Recent College Graduates and Motivations for Blog Readership by Alison J. Head, Michele Van Hoeck, and Kirsten Hostetler.

57 views

GROBID

October 09, 2017

GROBID
https://github.com/kermitt2/grobid

GROBID (or Grobid) means GeneRation Of BIbliographic Data. GROBID is a machine learning library for extracting, parsing and re-structuring raw documents such as PDF into structured TEI-encoded documents with a particular focus on technical and scientific publications. First developments started in 2008 as a hobby. In 2011 the tool has been made available in open source. Work on GROBID has been steady as side project since the beginning and is expected to continue until at least 2020.

99 views

Data Mining Resources 2018 Whitepaper Dataset Link Compilation

October 05, 2017

Data Mining Resources 2018 Whitepaper Dataset Link Compilation
http://www.DataMiningResources.info/

I have just updated my Data Mining Resources 2018 Subject Tracer™ Whitepaper Dataset Link Compilation and it is now a 33 page (286KB) .pdf white paper document is available from the above URL link. It lists alphabetically the latest resources and sources for data mining available from the Internet.[Completely updated with all links validated and new URLs added on October 5, 2017] Additional white papers and resources by Marcus P. Zillman are available by clicking here.

77 views

ELKI: Environment for Developing KDD-Applications Supported by Index-Structures

July 29, 2017

ELKI: Environment for Developing KDD-Applications Supported by Index-Structures
https://elki-project.github.io/

ELKI is an open source (AGPLv3) data mining software written in Java. The focus of ELKI is research in algorithms, with an emphasis on unsupervised methods in cluster analysis and outlier detection. In order to achieve high performance and scalability, ELKI offers data index structures such as the R*-tree that can provide major performance gains. ELKI is designed to be easy to extend for researchers and students in this domain, and welcomes contributions of additional methods. ELKI aims at providing a large collection of highly parameterizable algorithms, in order to allow easy and fair evaluation and benchmarking of algorithms. This will be added to Data Mining Resources Subject Tracer™.

123 views

Mallet – MAchine Learning for LanguagE Toolkit

July 29, 2017

Mallet – MAchine Learning for LanguagE Toolkit
http://mallet.cs.umass.edu/

MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text. MALLET includes sophisticated tools for document classification: efficient routines for converting text to “features”, a wide variety of algorithms (including Naïve Bayes, Maximum Entropy, and Decision Trees), and code for evaluating classifier performance using several commonly used metrics. In addition to classification, MALLET includes tools for sequence tagging for applications such as named-entity extraction from text. Algorithms include Hidden Markov Models, Maximum Entropy Markov Models, and Conditional Random Fields. These methods are implemented in an extensible system for finite state transducers. This will be added to Data Mining Resources Subject Tracer™. This will be added to Artificial Intelligence Resources Subject Tracer™.

111 views

Deep Learning for Java – Open Source, Distributed, Deep Learning Library for the JVM

July 28, 2017

Deep Learning for Java – Open Source, Distributed, Deep Learning Library for the JVM
https://deeplearning4j.org/

Deeplearning4j is the first commercial-grade, open-source, distributed deep-learning library written for Java and Scala. Integrated with Hadoop and Spark, DL4J is designed to be used in business environments on distributed GPUs and CPUs. Deeplearning4j aims to be cutting-edge plug and play, more convention than configuration, which allows for fast prototyping for non-researchers. DL4J is customizable at scale. Released under the Apache 2.0 license, all derivatives of DL4J belong to their authors. DL4J can import neural net models from most major frameworks via Keras, including TensorFlow, Caffe, Torch and Theano, bridging the gap between the Python ecosystem and the JVM with a cross-team toolkit for data scientists, data engineers and DevOps. This will be added to Data Mining Resources Subject Tracer™. This will be added to Artificial Intelligence Resources Subject Tracer™.

139 views

Weka 3: Data Mining Software in Java

July 28, 2017

Weka 3: Data Mining Software in Java
http://www.cs.waikato.ac.nz/ml/weka/index.html

Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes. Found only on the islands of New Zealand, the Weka is a flightless bird with an inquisitive nature. The name is pronounced like this, and the bird sounds like this. Weka is open source software issued under the GNU General Public License. They have put together several free online courses that teach machine learning and data mining using Weka. Check out the website for the courses for details on when and how to enroll. The videos for the courses are available on Youtube. Yes, it is possible to apply Weka to big data! This will be added to Data Mining Resources Subject Tracer™. This will be added to Artificial Intelligence Resources Subject Tracer™.

104 views

Updated> Data Mining Resources 2017 Whitepaper Dataset Link Compilation

July 26, 2017

Updated> Data Mining Resources 2017 Whitepaper Dataset Link Compilation
http://www.DataMiningResources.info/

I have just updated my Data Mining Resources 2017 Subject Tracer™ Whitepaper Dataset Link Compilation and it is now a 33 page (286KB) .pdf white paper document is available from the above URL link. It lists alphabetically the latest resources and sources for data mining available from the Internet.[Completely updated with all links validated and new URLs added on July 8, 2017] Additional white papers and resources by Marcus P. Zillman are available by clicking here.

87 views

Text Mining

June 27, 2017

Text Mining
http://www.istl.org/17-spring/internet.html

Text Mining from Science and Technology Resources on the Internet by Kristen Cooper. Taken from the overview: “As defined by Bernard Reilly (2012), president of the Center for Research Libraries, text mining is “the automated processing of large amounts of digital data or textual content for the purpose of information retrieval, extraction, interpretation, and analysis.” The first step is to find or build a corpus, or the collection of text that a researcher wishes to work with. Most often researchers will need to download this corpus to either their computers or an alternative storage platform. Once this has been done, different tools can be used to find patterns, biases, and other trends that are present in the text (Reilly 2012). Within higher education, text mining is most often found among the digital humanities and linguistics studies. However it is growing in popularity in the science and technology fields. It is possible to find many examples of how text mining is beginning to be utilized in the sciences. It allows users to search across a large set of documents to find connections that would be prohibitively expensive in terms of time to attempt to read individually. An example of this can be seen in the biomedical sciences where Frijters et al. (2010) used text mining to search in MedLine for drugs that could interfere with cell proliferation. Another example can be found with the works of the EXFOR library, which contains experimental nuclear reaction data. Hirdt and Brown (2016) used text mining to build a graph of the relationships between the reactions in the library. They were then able to use this information to identify reactions that are important to researchers but have been understudied. Text mining also has an ability to discover themes and relationships within a corpus through a technique called topic modeling. In a 2016 research study, the authors use topic modeling to determine the proportion of the analyzed text discussing a specific phenomenon, in this case forest fragmentation, and to determine the concepts that are most strongly associated with this phenomenon (Nunez-Mir et al. 2016). In an example from environmental science, Grubert and Siders (2016) use topic modeling to find empirical support for the theory that climate change has become an important topic in environmental lifecycle assessment over time, and revealed a secondary finding of this increase coming at the expense of attention to human health. Finally, the sheer amount of information available to researchers, educators, and scholars makes it increasingly difficult to stay current on a particular topic or field. Anne Okerson (2013) points out that text mining can be a useful and time saving factor in doing a systematic review. Text mining therefore presents librarians with the opportunity to develop skills in a new area that has the potential to be of great use to patrons.” This will be added to Data Mining Resources Subject Tracer™. This will be added to the tools section of Research Resources Subject Tracer™. This will be added to Entrepreneurial Resources Subject Tracer™.

170 views