DeepDive – Analyze Data On a Deeper Level Than Ever Before
DeepDive is a new type of system that enables developers to analyze data on a deeper level than ever before. DeepDive is a trained system: it uses machine learning techniques to leverage on domain-specific knowledge and incorporates user feedback to improve the quality of its analysis. DeepDive differs from traditional systems in several ways: a) DeepDive is aware that data is often noisy and imprecise: names are misspelled, natural language is ambiguous, and humans make mistakes. Taking such imprecisions into account, DeepDive computes calibrated probabilities for every assertion it makes. For example, if DeepDive produces a fact with probability 0.9 it means the fact is 90% likely to be true; b) DeepDive is able to use large amounts of data from a variety of sources. Applications built using DeepDive have extracted data from millions of documents, web pages, PDFs, tables, and figures; c) DeepDive allows developers to use their knowledge of a given domain to improve the quality of the results by writing simple rules that inform the inference (learning) process. DeepDive can also take into account user feedback on the correctness of the predictions, with the goal of improving the predictions; d) DeepDive is able to use the data to learn “distantly”. In contrast, most machine learning systems require tedious training for each prediction. In fact, many DeepDive applications, especially at early stages, need no traditional training data at all; and e) DeepDive’s secret is a scalable, high-performance inference and learning engine. For the past few years, we have been working to make the underlying algorithms run as fast as possible. The techniques pioneered in this project are part of commercial and open source tools including MADlib, Impala, a product from Oracle, and low-level techniques, such as Hogwild!. They have also been included in Microsoft’s Adam. Examples of DeepDive applications include: 1) PaleoDeepDive – A knowledge base for Paleobiologists; 2) GeoDeepDive – Extracting dark data from geology journal articles; and 3) Wisci – Enriching Wikipedia with structured data. DeepDive is project led by Christopher Ré at Stanford University. This will added to the tools section of Research Resources Subject Tracer™ Information Blog. This will be added to Deep Web Research and Discovery Resources 2015. This will be added to Knowledge Discovery Resources Subject Tracer™. This will be added to Data Mining Resources Subject Tracer™.