MIME (Multipurpose internet Mail Extension) type is an internet standard that specifies the format of files transmitted over the internet. MIME types were originally created to make it possible to send non-textual data, such as images and video, over the internet.
In the text analytics industry, MIME type generally refers to the format of a document file. The purpose of specifying the MIME type of a document file is so that text analytics software can properly interpret the file. For example, a .docx file has a different MIME type than a .pdf file.
MIME types are sometimes confused with other similar terms, such as file extension and content-type. However, these terms are not interchangeable. File extension is the three or four letter suffix at the end of a file name, such as .docx or .pdf. Content-type refers to the type of content contained in a file, such as text, image, or video.
Specifying the MIME type of a document file is important for text analytics software to be able to accurately interpret and process the file. Without specifying the MIME type, text analytics software may not be able to properly read and analyze the contents of the file.
There are many different MIME types used for document files, but some of the most common MIME types used in text analytics include:
- application/msword – MS Word Document
- application/pdf – PDF Document
- text/plain – Plain Text File
- text/html – HTML File
- image/jpeg – JPEG Image
- image/png – PNG Image
- video/mpeg – MPEG Video File
How is MIME determined ?
MIME type is determined by the file’s extension. For example, a file with a .docx extension has a MIME type of application/msword. To find out the MIME type of a file, you can either check the file’s properties or look up the file’s extension in a MIME type reference chart.
How do I specify the MIME type of a file?
In order to specify the MIME type of a file, you will need to set the Content-Type header for the file. The Content-Type header tells the text analytics software what kind of file it is dealing with.
To set the Content-Type header for a file, you will need to edit the .htaccess file for your website. The .htaccess file is a configuration file for the Apache web server.
In the .htaccess file, you will need to add a line that looks like this:
AddType application/msword .docx
This line tells the Apache web server that any files with a .docx extension should be treated as MS Word documents.
You can learn more about setting the Content-Type header for a file from the Apache documentation.