Posts by Category: Data Mining Resources

DeepDive – Analyze Data On a Deeper Level Than Ever Before

February 17, 2015

DeepDive – Analyze Data On a Deeper Level Than Ever Before
http://deepdive.stanford.edu/

DeepDive is a new type of system that enables developers to analyze data on a deeper level than ever before. DeepDive is a trained system: it uses machine learning techniques to leverage on domain-specific knowledge and incorporates user feedback to improve the quality of its analysis. DeepDive differs from traditional systems in several ways: a) DeepDive is aware that data is often noisy and imprecise: names are misspelled, natural language is ambiguous, and humans make mistakes. Taking such imprecisions into account, DeepDive computes calibrated probabilities for every assertion it makes. For example, if DeepDive produces a fact with probability 0.9 it means the fact is 90% likely to be true; b) DeepDive is able to use large amounts of data from a variety of sources. Applications built using DeepDive have extracted data from millions of documents, web pages, PDFs, tables, and figures; c) DeepDive allows developers to use their knowledge of a given domain to improve the quality of the results by writing simple rules that inform the inference (learning) process. DeepDive can also take into account user feedback on the correctness of the predictions, with the goal of improving the predictions; d) DeepDive is able to use the data to learn “distantly”. In contrast, most machine learning systems require tedious training for each prediction. In fact, many DeepDive applications, especially at early stages, need no traditional training data at all; and e) DeepDive’s secret is a scalable, high-performance inference and learning engine. For the past few years, we have been working to make the underlying algorithms run as fast as possible. The techniques pioneered in this project are part of commercial and open source tools including MADlib, Impala, a product from Oracle, and low-level techniques, such as Hogwild!. They have also been included in Microsoft’s Adam. Examples of DeepDive applications include: 1) PaleoDeepDive – A knowledge base for Paleobiologists; 2) GeoDeepDive – Extracting dark data from geology journal articles; and 3) Wisci – Enriching Wikipedia with structured data. DeepDive is project led by Christopher Ré at Stanford University. This will added to the tools section of Research Resources Subject Tracer™ Information Blog. This will be added to Deep Web Research and Discovery Resources 2015. This will be added to Knowledge Discovery Resources Subject Tracer™. This will be added to Data Mining Resources Subject Tracer™.

801 views

Updated> Data Mining Resources Dataset Link Compilation

January 24, 2015

Updated> Data Mining Resources Dataset Link Compilation
http://www.DataMiningResources.info/

I have updated my Data Mining Resources Subject Tracer™ Dataset and it is now a 26 page (251KB) .pdf white paper document is available from the above URL link. It lists alphabetically the latest resources and sources for data mining available from the Internet.[Updated January 24, 2015] Additional white papers and resources by Marcus P. Zillman are available by clicking here.

498 views

Updated> Data Mining Resources Dataset Link Compilation

November 11, 2014

Updated> Data Mining Resources Dataset Link Compilation
http://www.DataMiningResources.info/

I have updated my Data Mining Resources Subject Tracer™ Dataset and it is now a 26 page (251KB) .pdf white paper document is available from the above URL link. It lists alphabetically the latest resources and sources for data mining available from the Internet.[Updated November 11, 2014] Additional white papers and resources by Marcus P. Zillman are available by clicking here.

762 views

Updated> Data Mining Resources

July 31, 2014

Updated> Data Mining Resources
http://www.DataMiningResources.info/

I have updated my Data Mining Resources Subject Tracer™ and it is now a 26 page (252KB) .pdf white paper document is available from the above URL link. It lists alphabetically the latest resources and sources for data mining available from the Internet.[Updated July 31, 2014] Additional white papers and resources by Marcus P. Zillman are available by clicking here.

792 views

Updated> Data Mining Resources

May 07, 2014

Updated> Data Mining Resources
http://www.DataMiningResources.info/

I have updated my Data Mining Resources Subject Tracer™ and it is now a 26 page (251KB) .pdf white paper document is available from the above URL link. It lists alphabetically the latest resources and sources for data mining available from the Internet.[Updated May 7, 2014] Additional white papers and resources by Marcus P. Zillman are available by clicking here.

809 views

Updated> Data Mining Resources

February 19, 2014

Updated> Data Mining Resources
http://www.DataMiningResources.info/

I have updated my Data Mining Resources Subject Tracer™ and it is now a 26 page (250KB) .pdf white paper document is available from the above URL link. It lists alphabetically the latest resources and sources for data mining available from the Internet.[Updated February 18, 2014] Additional white papers and resources by Marcus P. Zillman are available by clicking here.

914 views

Awareness Watch Talk Show for Saturday February 1, 2014 at 2:00pm EST

February 01, 2014

Awareness Watch Talk Show for Saturday February 1, 2014 at 2:00pm EST
http://www.BlogTalkRadio.com/AwarenessWatch/

This program will feature my Subject Tracer Data Mining Resources and available directly from the Subject Tracer Gateway at the Virtual Private Library. Data Mining has become a top topic in the last several months with many folks wanting to find out additional information about the subject. This show will highlight many of the latest resources and sites available from the Internet. We will also be reviewing the latest happenings from
my blog during the last week. We will be also discussing my latest freely available Awareness Watch Newsletter V12N2 February 2013 featuring Privacy Resources 2014 and my freely available February 2014 Zillman Column titled Education and Academic Resources 2014. You may call in to ask your questions at (718)508-9839. The show is live and thirty minutes in length starting at 2:00pm EST on Saturday, February 1, 2014 and then archived for easy review and access. Listen, Call and Enjoy!!

1043 views

Rattle – Data Mining Toolkit in R

January 14, 2014

Rattle – Data Mining Toolkit in R
https://code.google.com/p/rattle/

Rattle (the R Analytical Tool To Learn Easily) provides a simple and logical interface for data mining. It is a new data mining application based on the open source and free statistical language R using the Gnome graphical interface. The application runs under GNU/Linux and MS/Windows. The aim is to provide an intuitive interface that takes you through the basic steps of data mining, as well as illustrating the R code that is used to achieve this. Whilst the tool itself may be
sufficient for all of a user’s needs, it also provides a stepping stone to more sophisticated processing and modelling in R itself, for sophisticated and unconstrained data mining. This will be added to Data Mining Resources Subject Tracer™.

940 views

Orange – Open Source Data Visualization and Analysis for Novice and Experts

December 23, 2013

Orange – Open Source Data Visualization and Analysis for Novice and Experts
http://orange.biolab.si/

Open source data visualization and analysis for novice and experts. Data mining through visual programming or Python scripting. Components for machine learning. Add-ons for bioinformatics and text mining. Packed with features for data analytics. This will be added to Statistics Resources and Big Data Subject Tracer™ This will be added to Data Mining Resources Subject Tracer™.

910 views

SCaVis – Scientific Computation and Visualization Environment

December 23, 2013

SCaVis – Scientific Computation and Visualization Environment
http://jwork.org/scavis/

SCaVis is an environment for scientific computation, data analysis and data visualization designed for scientists, engineers and students. The program incorporates many open-source software packages into a coherent interface using the concept of dynamic scripting. SCaVis can be used everywhere where an analysis of large numerical data volumes, data mining, statistical analysis and mathematics are essential (natural sciences, engineering, modeling and analysis of financial markets). SCaVis is fully multiplatform and runs on any platform where Java is installed. As a Java application, SCaVis takes the full advantage of multicore processors. This will be added to Statistics Resources and Big Data Subject Tracer™ This will be added to Data mining Resources Subject Tracer™.

994 views