Posts by Category: Data Mining Resources

ELKI: Environment for Developing KDD-Applications Supported by Index-Structures

July 29, 2017

ELKI: Environment for Developing KDD-Applications Supported by Index-Structures
https://elki-project.github.io/

ELKI is an open source (AGPLv3) data mining software written in Java. The focus of ELKI is research in algorithms, with an emphasis on unsupervised methods in cluster analysis and outlier detection. In order to achieve high performance and scalability, ELKI offers data index structures such as the R*-tree that can provide major performance gains. ELKI is designed to be easy to extend for researchers and students in this domain, and welcomes contributions of additional methods. ELKI aims at providing a large collection of highly parameterizable algorithms, in order to allow easy and fair evaluation and benchmarking of algorithms. This will be added to Data Mining Resources Subject Tracer™.

72 views

Mallet – MAchine Learning for LanguagE Toolkit

July 29, 2017

Mallet – MAchine Learning for LanguagE Toolkit
http://mallet.cs.umass.edu/

MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text. MALLET includes sophisticated tools for document classification: efficient routines for converting text to “features”, a wide variety of algorithms (including Naïve Bayes, Maximum Entropy, and Decision Trees), and code for evaluating classifier performance using several commonly used metrics. In addition to classification, MALLET includes tools for sequence tagging for applications such as named-entity extraction from text. Algorithms include Hidden Markov Models, Maximum Entropy Markov Models, and Conditional Random Fields. These methods are implemented in an extensible system for finite state transducers. This will be added to Data Mining Resources Subject Tracer™. This will be added to Artificial Intelligence Resources Subject Tracer™.

70 views

Deep Learning for Java – Open Source, Distributed, Deep Learning Library for the JVM

July 28, 2017

Deep Learning for Java – Open Source, Distributed, Deep Learning Library for the JVM
https://deeplearning4j.org/

Deeplearning4j is the first commercial-grade, open-source, distributed deep-learning library written for Java and Scala. Integrated with Hadoop and Spark, DL4J is designed to be used in business environments on distributed GPUs and CPUs. Deeplearning4j aims to be cutting-edge plug and play, more convention than configuration, which allows for fast prototyping for non-researchers. DL4J is customizable at scale. Released under the Apache 2.0 license, all derivatives of DL4J belong to their authors. DL4J can import neural net models from most major frameworks via Keras, including TensorFlow, Caffe, Torch and Theano, bridging the gap between the Python ecosystem and the JVM with a cross-team toolkit for data scientists, data engineers and DevOps. This will be added to Data Mining Resources Subject Tracer™. This will be added to Artificial Intelligence Resources Subject Tracer™.

76 views

Weka 3: Data Mining Software in Java

July 28, 2017

Weka 3: Data Mining Software in Java
http://www.cs.waikato.ac.nz/ml/weka/index.html

Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes. Found only on the islands of New Zealand, the Weka is a flightless bird with an inquisitive nature. The name is pronounced like this, and the bird sounds like this. Weka is open source software issued under the GNU General Public License. They have put together several free online courses that teach machine learning and data mining using Weka. Check out the website for the courses for details on when and how to enroll. The videos for the courses are available on Youtube. Yes, it is possible to apply Weka to big data! This will be added to Data Mining Resources Subject Tracer™. This will be added to Artificial Intelligence Resources Subject Tracer™.

62 views

Updated> Data Mining Resources 2017 Whitepaper Dataset Link Compilation

July 26, 2017

Updated> Data Mining Resources 2017 Whitepaper Dataset Link Compilation
http://www.DataMiningResources.info/

I have just updated my Data Mining Resources 2017 Subject Tracer™ Whitepaper Dataset Link Compilation and it is now a 33 page (286KB) .pdf white paper document is available from the above URL link. It lists alphabetically the latest resources and sources for data mining available from the Internet.[Completely updated with all links validated and new URLs added on July 8, 2017] Additional white papers and resources by Marcus P. Zillman are available by clicking here.

49 views

Text Mining

June 27, 2017

Text Mining
http://www.istl.org/17-spring/internet.html

Text Mining from Science and Technology Resources on the Internet by Kristen Cooper. Taken from the overview: “As defined by Bernard Reilly (2012), president of the Center for Research Libraries, text mining is “the automated processing of large amounts of digital data or textual content for the purpose of information retrieval, extraction, interpretation, and analysis.” The first step is to find or build a corpus, or the collection of text that a researcher wishes to work with. Most often researchers will need to download this corpus to either their computers or an alternative storage platform. Once this has been done, different tools can be used to find patterns, biases, and other trends that are present in the text (Reilly 2012). Within higher education, text mining is most often found among the digital humanities and linguistics studies. However it is growing in popularity in the science and technology fields. It is possible to find many examples of how text mining is beginning to be utilized in the sciences. It allows users to search across a large set of documents to find connections that would be prohibitively expensive in terms of time to attempt to read individually. An example of this can be seen in the biomedical sciences where Frijters et al. (2010) used text mining to search in MedLine for drugs that could interfere with cell proliferation. Another example can be found with the works of the EXFOR library, which contains experimental nuclear reaction data. Hirdt and Brown (2016) used text mining to build a graph of the relationships between the reactions in the library. They were then able to use this information to identify reactions that are important to researchers but have been understudied. Text mining also has an ability to discover themes and relationships within a corpus through a technique called topic modeling. In a 2016 research study, the authors use topic modeling to determine the proportion of the analyzed text discussing a specific phenomenon, in this case forest fragmentation, and to determine the concepts that are most strongly associated with this phenomenon (Nunez-Mir et al. 2016). In an example from environmental science, Grubert and Siders (2016) use topic modeling to find empirical support for the theory that climate change has become an important topic in environmental lifecycle assessment over time, and revealed a secondary finding of this increase coming at the expense of attention to human health. Finally, the sheer amount of information available to researchers, educators, and scholars makes it increasingly difficult to stay current on a particular topic or field. Anne Okerson (2013) points out that text mining can be a useful and time saving factor in doing a systematic review. Text mining therefore presents librarians with the opportunity to develop skills in a new area that has the potential to be of great use to patrons.” This will be added to Data Mining Resources Subject Tracer™. This will be added to the tools section of Research Resources Subject Tracer™. This will be added to Entrepreneurial Resources Subject Tracer™.

116 views

Awareness Watch Talk Show for Wednesday May 3, 2017 at 2:00pm EDST – Data Mining Resources 2017

May 03, 2017

Awareness Watch Talk Show for Wednesday May 3, 2017 at 2:00pm EDST – Data Mining Resources 2017
http://www.BlogTalkRadio.com/AwarenessWatch/

This program will be featuring my just updated Data Mining Resources 2017 2017 . We will be highlighting the latest and greatest resources and sources for data mining covering search engines, subject directories, articles, guides and tracers….literally everything on the Internet for DATA MINING!! We will also discussing my latest freely available Awareness Watch Newsletter V15N5 May 2017 featuring Searching the Internet 2017 – The Primer as well as my freely available May 2017 Zillman Column highlighting Privacy Resources 2017. You may call in to ask your questions at (718)508-9839. The show is live and thirty minutes in length starting at 2:00pm EDST on Wednesday, May 3, 2017 and then archived for easy review and access. Listen, Call and Enjoy!!

179 views

Updated> Data Mining Resources 2017 Whitepaper Dataset Link Compilation

April 29, 2017

Updated> Data Mining Resources 2017 Whitepaper Dataset Link Compilation
http://www.DataMiningResources.info/

I have just updated my Data Mining Resources 2017 Subject Tracer™ Whitepaper Dataset Link Compilation and it is now a 33 page (287KB) .pdf white paper document is available from the above URL link. It lists alphabetically the latest resources and sources for data mining available from the Internet.[Completely updated with all links validated and new URLs added on April 29, 2017] Additional white papers and resources by Marcus P. Zillman are available by clicking here.

167 views

Updated> Data Mining Resources 2017 Whitepaper Dataset Link Compilation

February 13, 2017

Updated> Data Mining Resources 2017 Whitepaper Dataset Link Compilation
http://www.DataMiningResources.info/

I have just updated my Data Mining Resources 2017 Subject Tracer™ Whitepaper Dataset Link Compilation and it is now a 33 page (287KB) .pdf white paper document is available from the above URL link. It lists alphabetically the latest resources and sources for data mining available from the Internet.[Completely updated with all links validated and new URLs added on February 13, 2017] Additional white papers and resources by Marcus P. Zillman are available by clicking here.

226 views

Updated> Data Mining Resources 2017 Whitepaper Dataset Link Compilation

December 22, 2016

Updated> Data Mining Resources 2017 Whitepaper Dataset Link Compilation
http://www.DataMiningResources.info/

I have just updated my Data Mining Resources 2017 Subject Tracer™ Whitepaper Dataset Link Compilation and it is now a 33 page (287KB) .pdf white paper document is available from the above URL link. It lists alphabetically the latest resources and sources for data mining available from the Internet.[Completely updated with all links validated and new URLs added on December 22, 2016] Additional white papers and resources by Marcus P. Zillman are available by clicking here.

419 views