Stop words are words that are filtered out before or after the processing of natural language data (text). Though “stop words” usually refer to the most common words in a language, there are no definitive stop word lists. Stop words may be common, but they carry little meaning – excluding them from analysis often improves results.

Examples of Stop Words

The following are examples of stop words in the English language:

“a”, “about”, “above”, “after”, “again”, “against”, “all”, “am”, “an”, “and”,


“between”,”both”,”but”,”by”,”can’t”, “cannot”,”could”,”couldn’t”,”did”, “didn’t”.

Stop Words Removal Tools

Different tools use different lists of stop words. Some common stop word removal tools are:

  • NLTK (Natural Language Toolkit): NLTK is a python library that comes with a pre-defined set of stop words (about 150) for multiple languages.
  • Stop Word Filter: This is a Java-based tool that uses a list of stop words.
  • Snowball: Snowball is a small string processing language designed for use in Information Retrieval. It has a list of stop words for multiple languages.
  • R: R has a package called tm (text mining) that includes a set of stop words for multiple languages.

Advantages of Using Stop Words

There are a few advantages to using stop words:

  • It can help improve the results of your text analytics by removing common, meaningless words.
  • It can make your text analytics more efficient by reducing the amount of data that needs to be processed.

Disadvantages of Using Stop Words

Stop words also have a few disadvantages:

  • They can remove important context from your data. For example, the word “not” is a stop word, so if you are trying to analyze sentiment and the text includes the phrase “not good”, the stop word removal would change the meaning of the phrase.
  • They can create issues with homonyms. For example, the word “fly” could be removed as a stop word, but then the text would lose its meaning if it included the phrase “fly fishing”.

