[python] How to get rid of punctuation using NLTK tokenizer?

Below code will remove all punctuation marks as well as non alphabetic characters. Copied from their book.

http://www.nltk.org/book/ch01.html

import nltk

s = "I can't do this now, because I'm so tired.  Please give me some time. @ sd  4 232"

words = nltk.word_tokenize(s)

words=[word.lower() for word in words if word.isalpha()]

print(words)

output

['i', 'ca', 'do', 'this', 'now', 'because', 'i', 'so', 'tired', 'please', 'give', 'me', 'some', 'time', 'sd']

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to nlp

Replace specific text with a redacted version using Python How to return history of validation loss in Keras How to compute precision, recall, accuracy and f1-score for the multiclass case with scikit learn? Python NLTK: SyntaxError: Non-ASCII character '\xc3' in file (Sentiment Analysis -NLP) Stopword removal with NLTK How to get rid of punctuation using NLTK tokenizer? Calculate cosine similarity given 2 sentence strings How do I tokenize a string sentence in NLTK? How to compute the similarity between two text documents? How do I do word Stemming or Lemmatization?

Examples related to tokenize

How to get rid of punctuation using NLTK tokenizer? How do I tokenize a string sentence in NLTK? Splitting string into multiple rows in Oracle Parse (split) a string in C++ using string delimiter (standard C++) How to use stringstream to separate comma separated strings Split string with PowerShell and do something with each token Splitting comma separated string in a PL/SQL stored proc Convert comma separated string to array in PL/SQL Is there a function to split a string in PL/SQL? How to split a string in shell and get the last field

Examples related to nltk

re.sub erroring with "Expected string or bytes-like object" how to check which version of nltk, scikit learn installed? Python NLTK: SyntaxError: Non-ASCII character '\xc3' in file (Sentiment Analysis -NLP) NLTK and Stopwords Fail #lookuperror Resource u'tokenizers/punkt/english.pickle' not found How do I download NLTK data? Stopword removal with NLTK n-grams in python, four, five, six grams? pip issue installing almost any library How to get rid of punctuation using NLTK tokenizer?