You can use python-docx2txt library to read text from Microsoft Word documents. It is an improvement over python-docx library as it can, in addition, extract text from links, headers and footers. It can even extract images.
You can install it by running: pip install docx2txt
.
Let's download and read the first Microsoft document on here:
import docx2txt
my_text = docx2txt.process("test.docx")
print(my_text)
Here is a screenshot of the Terminal output the above code:
EDIT:
This does NOT work for .doc files. The only reason I am keep this answer is that it seems there are people who find it useful for .docx files.