There's a very simple light-weight python package stop-words
just for this sake.
Fist install the package using:
pip install stop-words
Then you can remove your words in one line using list comprehension:
from stop_words import get_stop_words
filtered_words = [word for word in dataset if word not in get_stop_words('english')]
This package is very light-weight to download (unlike nltk), works for both Python 2
and Python 3
,and it has stop words for many other languages like:
Arabic
Bulgarian
Catalan
Czech
Danish
Dutch
English
Finnish
French
German
Hungarian
Indonesian
Italian
Norwegian
Polish
Portuguese
Romanian
Russian
Spanish
Swedish
Turkish
Ukrainian