LatentView
LatentView is a tool that is used to automatically analyze unstructured data, such as text documents. It can be used to extract […]
Ligature
Ligature is a term used to refer to the process of combining two or more adjacent characters into a single character. This […]
Consistency
Consistency is a metric of how well the annotated data correspond with one another. For instance, if two distinct annotationers identify the […]
Perplexity
Perplexity is a measure of how well a probability model predicts a sample. It is often used in the text analytics industry […]
Stemming
Stemming is the process of reducing inflected (or sometimes derived) words to their stem, base or root form—generally a written word form. […]
Ingestion
Ingestion is the process of acquiring unstructured text data from a variety of sources in order to be able to perform further […]
Trigram
Trigram is a term used to refer to a group of three successive words. In this context, it is often used as […]
Natural Language Processing Tools
Natural Language Processing Tools is a term used to describe software that can automatically process and analyze large amounts of natural language […]
Univariate Analysis
Univariate analysis is used to understand each piece of data within a dataset on its own. This means that each variable is […]
Culling
Culling is the process of removing unhelpful or uninformative content from a document or set of documents before further processing. This may […]
Geometric Distribution
The geometric distribution is used to calculate the probability that a given word will appear in a document. For example, if we […]
Grapheme
Grapheme is a term used to refer to the smallest unit of meaning in a language. This may be a letter, a […]