Unstructured Text is any text that hasn’t been ordered, categorized, labeled, or otherwise grouped together in a deliberate, consistent way. Let’s look at some examples of unstructured text.
Examples of Unstructured Text
Here we see some unstructured text from a random Wikipedia page. The actual content isn’t important in this example, what is important is that the paragraphs, sentences, or words don’t have any classification applied in a structured way:
There are some footnotes, section headers, and other aspects that could arguably make the text more structured that it might otherwise be considered without those. Yet most would agree this text is unstructured by and large.
Here is another example of unstructured text taken from an author biography:
As you can see, the layout of unstructured text looks like it is more at home in a MS Word Document than an Excel Spreadsheet. However, it is entirely possible to have unstructured text in a spreadsheet. Here is an example of both structured data (columns A and B) together with unstructured text (in column C). Strictly speaking the text is somewhat structured because each portion of text is aligned with the author name and date. But within each cell, the text is not organized or labeled according to what ideas, sentiments, emotions or other factors that text contains. There is a lot more organizational structure that could be added to the unstructured text in column C:
Example of Structured Text
Structured text on the other hand will always have some order to it. Often times it is found in spreadsheets or databases. Here are three examples of structured text data in a spreadsheet: Customer Name (column A), State (column B), and City (column C):
Structured text data will always contain only a certain kind of information such as Name, and/or have a limited number of options like State. Usually structured text is much shorter than unstructured text.
Converting Unstructured Text to Structured Text
The process of organizing your unstructured text into structured text goes by many names. Some call it “classifying” or “annotating” or even just “organizing” the text data. Whatever you call it, the end result is the same: structured text data.
It can be very difficult (or even impossible) for a human to manually convert structured text content into an ordered form that allows for easier processing and analysis. This is why the entire field of Natural Language Processing (NLP) exists.
At Veritas NLP, this is our specialty. After all, it’s part of our name! You can see some of our industry models for more information about the kinds of unstructured text we classify for our clients.