Lemmatize

To understand, it is best to understand the root word of lemmatize, which is “lemma”. Lemma is a unit of lexical meaning; a word considered as the basic element of vocabulary (often used in combination with other terms, as in `headword’, `lemma-lexeme’, or `lexicalized’).

To further explain, the lemma of the word “working” is “work”.

In morphology and lexicography, a lemma (plural lemmas or lemmata) is the canonical form of a set of words. In English dictionaries, for instance, the lemma is the headword. The stem of the word working is work, to which the suffix -ing is added to form the inflected form working.

When one lemmatizes words, it’s basically reducing the words to their dictionary word, or their lemma form.

Lemmatisation (or lemmatization) in linguistics is the process of grouping together the inflected forms of a word so they can be analysed as a single item. For example, the words “was” and “were” would be grouped together, as would “better” and “best”. This is often done with the help of a lemmatiser, which takes a word and returns its lemma. Lemmatisation is used in many applications, such as spell-checking, information retrieval, and machine translation.

Lemmatize vs Stemming

Lemmatisation is very similar to stemming, but the difference lies in what is considered the “base” form of a word.

With stemming, the base form is determined purely by rules, without regard for dictionary lookups. This can often lead to incorrect results, such as “fish” being stemmed to “f”, which is then not a word.

Lemmatisation, on the other hand, uses dictionary lookups to find the base form of a word. In the example above, the lemma of “fish” is “fish”, not “f”.

When to Lemmatize

Lemmatisation is generally used for processing texts to perform certain tasks such as:

  • information retrieval
  • text classification
  • document clustering
  • machine translation
  • natural language processing tasks such as part-of-speech tagging and named entity recognition.

Leave a Reply

Your email address will not be published. Required fields are marked *

Unlock the power of actionable insights with AI-based natural language processing.

Follow Us

© 2023 VeritasNLP, All Rights Reserved. Website designed by Mohit Ranpura.
This is a staging enviroment