A library is a collection of texts (usually documents) that have been pre-processed and annotated for use in specific tasks such as training machine learning models. This may include tasks such as part-of-speech tagging, lemmatization, and Named Entity Recognition.
Libraries are often created with specific tasks or domains in mind – for example, there may be a library of medical texts for training a machine learning model to predict disease risk, or a library of customer service transcripts for training a model to identify customer satisfaction levels.
How is library used outside of Text Analytics?
The term library can also refer to:
- A physical space where books and other resources are stored (e.g. a library in a school or university)
- A collection of software components that can be reused in different projects (e.g. a programming library)
- A digital repository of information (e.g. an online library of research papers)
How is library similar to other terms?
The term corpus is often used interchangeably with library, although strictly speaking a corpus is just a collection of texts and does not need to be annotated or pre-processed for specific tasks. The term data set is also used interchangeably with library, although again strictly speaking a data set just refers to a collection of data (which may or may not be textual in nature).
Examples of libraries
- spaCy: an open-source library for Natural Language Processing in Python
- NLTK: a leading platform for building Python programs to work with human language data
- Gensim: an open-source library for unsupervised topic modeling and natural language processing, implemented in Python
- Stanford CoreNLP: an integrated suite of software tools for human language processing
Benefits of Library
Libraries can be used to train machine learning models to perform specific tasks such as part-of-speech tagging, lemmatization, and Named Entity Recognition.
It can be created with specific tasks or domains in mind, making them more relevant and useful for training machine learning models.
Moreover, it can be reused in different projects, making them more efficient and cost-effective.