Recent

Blogs.

We write regularly about different terminology and jargon you’ll hear in the text analytics industry. Join us in our blog to make the complex, simple.

LatentView

LatentView is a tool that is used to automatically analyze unstructured data, such as text documents. It can be used to extract…

Ligature

Ligature is a term used to refer to the process of combining two or more adjacent characters into a single character. This…

Consistency

Consistency is a metric of how well the annotated data correspond with one another. For instance, if two distinct annotationers identify the…

Perplexity

Perplexity is a measure of how well a probability model predicts a sample. It is often used in the text analytics industry…

Stemming

Stemming is the process of reducing inflected (or sometimes derived) words to their stem, base or root form—generally a written word form.…

Ingestion

Ingestion is the process of acquiring unstructured text data from a variety of sources in order to be able to perform further…

Trigram

Trigram is a term used to refer to a group of three successive words. In this context, it is often used as…

Natural Language Processing Tools

Natural Language Processing Tools is a term used to describe software that can automatically process and analyze large amounts of natural language…

Univariate Analysis

Univariate analysis is used to understand each piece of data within a dataset on its own. This means that each variable is…

Culling

Culling is the process of removing unhelpful or uninformative content from a document or set of documents before further processing. This may…

Geometric Distribution

The geometric distribution is used to calculate the probability that a given word will appear in a document. For example, if we…

Grapheme

Grapheme is a term used to refer to the smallest unit of meaning in a language. This may be a letter, a…
This is a staging enviroment