[python] How do I tokenize a string sentence in NLTK?

As @PavelAnossov answered, the canonical answer, use the word_tokenize function in nltk:

from nltk import word_tokenize
sent = "This is my text, this is a nice way to input text."
word_tokenize(sent)

If your sentence is truly simple enough:

Using the string.punctuation set, remove punctuation then split using the whitespace delimiter:

import string
x = "This is my text, this is a nice way to input text."
y = "".join([i for i in x if not in string.punctuation]).split(" ")
print y

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to nlp

Replace specific text with a redacted version using Python How to return history of validation loss in Keras How to compute precision, recall, accuracy and f1-score for the multiclass case with scikit learn? Python NLTK: SyntaxError: Non-ASCII character '\xc3' in file (Sentiment Analysis -NLP) Stopword removal with NLTK How to get rid of punctuation using NLTK tokenizer? Calculate cosine similarity given 2 sentence strings How do I tokenize a string sentence in NLTK? How to compute the similarity between two text documents? How do I do word Stemming or Lemmatization?

Examples related to tokenize

How to get rid of punctuation using NLTK tokenizer? How do I tokenize a string sentence in NLTK? Splitting string into multiple rows in Oracle Parse (split) a string in C++ using string delimiter (standard C++) How to use stringstream to separate comma separated strings Split string with PowerShell and do something with each token Splitting comma separated string in a PL/SQL stored proc Convert comma separated string to array in PL/SQL Is there a function to split a string in PL/SQL? How to split a string in shell and get the last field

Examples related to nltk

re.sub erroring with "Expected string or bytes-like object" how to check which version of nltk, scikit learn installed? Python NLTK: SyntaxError: Non-ASCII character '\xc3' in file (Sentiment Analysis -NLP) NLTK and Stopwords Fail #lookuperror Resource u'tokenizers/punkt/english.pickle' not found How do I download NLTK data? Stopword removal with NLTK n-grams in python, four, five, six grams? pip issue installing almost any library How to get rid of punctuation using NLTK tokenizer?