Best way to strip punctuation from a string

Question

It seems like there should be a simpler way than   import string s    string  With  Punctuation     Sample string  out   s translate string maketrans         string punctuation    Is there

User · Answer

myString translate None  string punctuation

User · Answer

Remove stop words from the text file using Python   print      THIS IS HOW TO REMOVE STOP WORS        with open  one txt   r  as myFile       str1 myFile read        stop words   not    is    it    By   between   This   By   A   when   And   up   Then   was   by   It   If   can   an   he   This   or   And   a   i   it   am   at   on   in   of   to   is   so   too   my   the   and   but   are   very   here   even   from   them   then   than   this   that   though   be   But   these       myList         myList extend str1 split            for i in myList           if i not in stop words               print                               print i end   n

User · Answer

For Python 3 str or Python 2 unicode values  str translate   only takes a dictionary  codepoints  integers  are looked up in that mapping and anything mapped to None is removed   To remove  some   punctuation then  use   import string  remove punct map   dict fromkeys map ord  string punctuation   s translate remove punct map    The dict fromkeys   class method makes it trivial to create the mapping  setting all values to None based on the sequence of keys   To remove all punctuation  not just ASCII punctuation  your table needs to be a little bigger  see J F  Sebastian s answer  Python 3 version    import unicodedata import sys  remove punct map   dict fromkeys i for i in range sys maxunicode                                   if unicodedata category chr i   startswith  P

User · Answer

For Python 3 str or Python 2 unicode values  str translate   only takes a dictionary  codepoints  integers  are looked up in that mapping and anything mapped to None is removed   To remove  some   punctuation then  use   import string  remove punct map   dict fromkeys map ord  string punctuation   s translate remove punct map    The dict fromkeys   class method makes it trivial to create the mapping  setting all values to None based on the sequence of keys   To remove all punctuation  not just ASCII punctuation  your table needs to be a little bigger  see J F  Sebastian s answer  Python 3 version    import unicodedata import sys  remove punct map   dict fromkeys i for i in range sys maxunicode                                   if unicodedata category chr i   startswith  P

User · Answer

Here is a function I wrote  It s not very efficient  but it is simple and you can add or remove any punctuation that you desire   def stripPunc wordList          Strips punctuation from list of words        puncList                                                   amp                     for punc in puncList          for word in wordList              wordList  word replace punc     for word in wordList      return wordList

User · Answer

string punctuation misses loads of punctuation marks that are commonly used in the real world  How about a solution that works for non-ASCII punctuation   import regex s   u string  With  Some  Really Weird Non ASCII    Punctuation     remove   regex compile ur   p C   p M   p P   p S   p Z      regex UNICODE  remove sub u     s  strip     Personally  I believe this is the best way to remove punctuation from a string in Python because    It removes all Unicode punctuation It s easily modifiable  e g  you can remove the   S  if you want to remove punctuation  but keep symbols like    You can get really specific about what you want to keep and what you want to remove  for example   Pd  will only remove dashes  This regex also normalizes whitespace  It maps tabs  carriage returns  and other oddities to nice  single spaces    This uses Unicode character properties  which you can read more about on Wikipedia

User · Answer

From an efficiency perspective  you re not going to beat   s translate None  string punctuation    For higher versions of Python use the following code   s translate str maketrans         string punctuation     It s performing raw string operations in C with a lookup table - there s not much that will beat that but writing your own C code   If speed isn t a worry  another option though is   exclude   set string punctuation  s      join ch for ch in s if ch not in exclude    This is faster than s replace with each char  but won t perform as well as non-pure python approaches such as regexes or string translate  as you can see from the below timings   For this type of problem  doing it at as low a level as possible pays off   Timing code   import re  string  timeit  s    string  With  Punctuation  exclude   set string punctuation  table   string maketrans        regex   re compile    s     re escape string punctuation    def test set s       return    join ch for ch in s if ch not in exclude   def test re s      From Vinko s solution  with fix      return regex sub     s   def test trans s       return s translate table  string punctuation   def test repl s      From S Lott s solution     for c in string punctuation          s s replace c         return s  print  sets         timeit Timer  f s     from   main   import s test set as f   timeit 1000000  print  regex        timeit Timer  f s     from   main   import s test re as f   timeit 1000000  print  translate    timeit Timer  f s     from   main   import s test trans as f   timeit 1000000  print  replace      timeit Timer  f s     from   main   import s test repl as f   timeit 1000000    This gives the following results   sets        19 8566138744 regex       6 86155414581 translate   2 12455511093 replace     28 4436721802

User · Answer

From an efficiency perspective  you re not going to beat   s translate None  string punctuation    For higher versions of Python use the following code   s translate str maketrans         string punctuation     It s performing raw string operations in C with a lookup table - there s not much that will beat that but writing your own C code   If speed isn t a worry  another option though is   exclude   set string punctuation  s      join ch for ch in s if ch not in exclude    This is faster than s replace with each char  but won t perform as well as non-pure python approaches such as regexes or string translate  as you can see from the below timings   For this type of problem  doing it at as low a level as possible pays off   Timing code   import re  string  timeit  s    string  With  Punctuation  exclude   set string punctuation  table   string maketrans        regex   re compile    s     re escape string punctuation    def test set s       return    join ch for ch in s if ch not in exclude   def test re s      From Vinko s solution  with fix      return regex sub     s   def test trans s       return s translate table  string punctuation   def test repl s      From S Lott s solution     for c in string punctuation          s s replace c         return s  print  sets         timeit Timer  f s     from   main   import s test set as f   timeit 1000000  print  regex        timeit Timer  f s     from   main   import s test re as f   timeit 1000000  print  translate    timeit Timer  f s     from   main   import s test trans as f   timeit 1000000  print  replace      timeit Timer  f s     from   main   import s test repl as f   timeit 1000000    This gives the following results   sets        19 8566138744 regex       6 86155414581 translate   2 12455511093 replace     28 4436721802

User · Answer

From an efficiency perspective  you re not going to beat   s translate None  string punctuation    For higher versions of Python use the following code   s translate str maketrans         string punctuation     It s performing raw string operations in C with a lookup table - there s not much that will beat that but writing your own C code   If speed isn t a worry  another option though is   exclude   set string punctuation  s      join ch for ch in s if ch not in exclude    This is faster than s replace with each char  but won t perform as well as non-pure python approaches such as regexes or string translate  as you can see from the below timings   For this type of problem  doing it at as low a level as possible pays off   Timing code   import re  string  timeit  s    string  With  Punctuation  exclude   set string punctuation  table   string maketrans        regex   re compile    s     re escape string punctuation    def test set s       return    join ch for ch in s if ch not in exclude   def test re s      From Vinko s solution  with fix      return regex sub     s   def test trans s       return s translate table  string punctuation   def test repl s      From S Lott s solution     for c in string punctuation          s s replace c         return s  print  sets         timeit Timer  f s     from   main   import s test set as f   timeit 1000000  print  regex        timeit Timer  f s     from   main   import s test re as f   timeit 1000000  print  translate    timeit Timer  f s     from   main   import s test trans as f   timeit 1000000  print  replace      timeit Timer  f s     from   main   import s test repl as f   timeit 1000000    This gives the following results   sets        19 8566138744 regex       6 86155414581 translate   2 12455511093 replace     28 4436721802

User · Answer

import re s    string  With  Punctuation     Sample string  out   re sub r   a-zA-Z0-9 s        s

User · Answer

I usually use something like this    gt  gt  gt  s    string  With  Punctuation     Sample string  gt  gt  gt  import string  gt  gt  gt  for c in string punctuation          s  s replace c          gt  gt  gt  s  string With Punctuation

User · Answer

Regular expressions are simple enough  if you know them    import re s    string  With  Punctuation   s   re sub r    w s      s

User · Answer

Try that one    regex sub r  p P       s

User · Answer

From an efficiency perspective  you re not going to beat   s translate None  string punctuation    For higher versions of Python use the following code   s translate str maketrans         string punctuation     It s performing raw string operations in C with a lookup table - there s not much that will beat that but writing your own C code   If speed isn t a worry  another option though is   exclude   set string punctuation  s      join ch for ch in s if ch not in exclude    This is faster than s replace with each char  but won t perform as well as non-pure python approaches such as regexes or string translate  as you can see from the below timings   For this type of problem  doing it at as low a level as possible pays off   Timing code   import re  string  timeit  s    string  With  Punctuation  exclude   set string punctuation  table   string maketrans        regex   re compile    s     re escape string punctuation    def test set s       return    join ch for ch in s if ch not in exclude   def test re s      From Vinko s solution  with fix      return regex sub     s   def test trans s       return s translate table  string punctuation   def test repl s      From S Lott s solution     for c in string punctuation          s s replace c         return s  print  sets         timeit Timer  f s     from   main   import s test set as f   timeit 1000000  print  regex        timeit Timer  f s     from   main   import s test re as f   timeit 1000000  print  translate    timeit Timer  f s     from   main   import s test trans as f   timeit 1000000  print  replace      timeit Timer  f s     from   main   import s test repl as f   timeit 1000000    This gives the following results   sets        19 8566138744 regex       6 86155414581 translate   2 12455511093 replace     28 4436721802

User · Answer

Try that one    regex sub r  p P       s

User · Answer

Here is a function I wrote  It s not very efficient  but it is simple and you can add or remove any punctuation that you desire   def stripPunc wordList          Strips punctuation from list of words        puncList                                                   amp                     for punc in puncList          for word in wordList              wordList  word replace punc     for word in wordList      return wordList

User · Answer

Not necessarily simpler  but a different way  if you are more familiar with the re family    import re  string s    string  With  Punctuation     Sample string  out   re sub    s     re escape string punctuation       s

User · Answer

This might not be the best solution however this is how I did it   import string f   lambda x     join  i for i in x if i not in string punctuation

User · Answer

Here s a one-liner for Python 3 5   import string  l ots  o f  p u n c t u a ti  on       translate str maketrans  a None for a in string punctuation

User · Answer

FIRST METHOD  Storing all punctuations in a variable     punctuation              -  newstring     Creating empty string word raw input  Enter string     for i in word       if i not in punctuation                     newstring  i print  The string without punctuation is  newstring   SECOND METHOD word raw input  Enter string     punctuation              -  newstring word translate None punctuation  print  The string without punctuation is  newstring    Output for both methods Enter string  hello  welcome -to python programming language     The string without punctuation is  hello welcome topythonprogramminglanguage

User · Answer

Not necessarily simpler  but a different way  if you are more familiar with the re family    import re  string s    string  With  Punctuation     Sample string  out   re sub    s     re escape string punctuation       s

User · Answer

gt  gt  gt  s    string  With  Punctuation    gt  gt  gt  s   re sub r    w s      s   gt  gt  gt  re split r  s    s      string    With    Punctuation

User · Answer

I haven t seen this answer yet  Just use a regex  it removes all characters besides word characters   w  and number characters   d   followed by a whitespace character   s    import re s    string  With  Punctuation     Sample string  out   re sub ur    w d s         s

User · Answer

Why none of you use this       join filter str isalnum  s      Too slow

User · Answer

string punctuation is ASCII only  A more correct  but also much slower  way is to use the unicodedata module     - - coding  utf-8 - - from unicodedata import category s   u String     with -    punctation        s      join ch for ch in s if category ch  0      P   print  stripped   s   You can generalize and strip other types of characters as well      join ch for ch in s if category ch  0  not in  SP     It will also strip characters like        which may or may not be  punctuation  depending on one s point of view

User · Answer

Here s one other easy way to do it using RegEx import re  punct   re compile r   w      sentence    This   is   a   sample   sentence     Text with punctuation tokenized    m group   for m in punct finditer sentence   sentence       join tokenized  print sentence    This is a sample sentence

User · Answer

Just as an update  I rewrote the  Brian example in Python 3 and made changes to it to move regex compile step inside of the function  My thought here was to time every single step needed to make the function work  Perhaps you are using distributed computing and can t have regex object shared between your workers and need to have re compile step at each worker  Also  I was curious to time two different implementations of maketrans for Python 3  table   str maketrans  key  None for key in string punctuation     vs   table   str maketrans         string punctuation    Plus I added another method to use set  where I take advantage of intersection function to reduce number of iterations   This is the complete code   import re  string  timeit  s    string  With  Punctuation    def test set s       exclude   set string punctuation      return    join ch for ch in s if ch not in exclude    def test set2 s        punctuation   set string punctuation      for punct in set s  intersection  punctuation           s   s replace punct           return     join s split      def test re s      From Vinko s solution  with fix      regex   re compile    s     re escape string punctuation       return regex sub     s    def test trans s       table   str maketrans  key  None for key in string punctuation       return s translate table    def test trans2 s       table   str maketrans         string punctuation      return s translate table     def test repl s      From S Lott s solution     for c in string punctuation          s s replace c         return s   print  sets         timeit Timer  f s     from   main   import s test set as f   timeit 1000000   print  sets2         timeit Timer  f s     from   main   import s test set2 as f   timeit 1000000   print  regex        timeit Timer  f s     from   main   import s test re as f   timeit 1000000   print  translate    timeit Timer  f s     from   main   import s test trans as f   timeit 1000000   print  translate2    timeit Timer  f s     from   main   import s test trans2 as f   timeit 1000000   print  replace      timeit Timer  f s     from   main   import s test repl as f   timeit 1000000     This is my results   sets        3 1830138750374317 sets2        2 189873124472797 regex       7 142953420989215 translate   4 243278483860195 translate2   2 427158243022859 replace     4 579746678471565

User · Answer

Considering unicode  Code checked in python3   from unicodedata import category text    hi  how are you   text without punc      join ch for ch in text if not category ch  startswith  P

User · Answer

I like to use a function like this   def scrub abc       while abc -1  is in list string punctuation           abc abc  -1      while abc 0  is in list string punctuation           abc abc 1       return abc

User · Answer

I usually use something like this    gt  gt  gt  s    string  With  Punctuation     Sample string  gt  gt  gt  import string  gt  gt  gt  for c in string punctuation          s  s replace c          gt  gt  gt  s  string With Punctuation

User · Answer

I usually use something like this    gt  gt  gt  s    string  With  Punctuation     Sample string  gt  gt  gt  import string  gt  gt  gt  for c in string punctuation          s  s replace c          gt  gt  gt  s  string With Punctuation

User · Answer

Here s a one-liner for Python 3 5   import string  l ots  o f  p u n c t u a ti  on       translate str maketrans  a None for a in string punctuation

User · Answer

I haven t seen this answer yet  Just use a regex  it removes all characters besides word characters   w  and number characters   d   followed by a whitespace character   s    import re s    string  With  Punctuation     Sample string  out   re sub ur    w d s         s

User · Answer

Here s one other easy way to do it using RegEx import re  punct   re compile r   w      sentence    This   is   a   sample   sentence     Text with punctuation tokenized    m group   for m in punct finditer sentence   sentence       join tokenized  print sentence    This is a sample sentence

User · Answer

myString translate None  string punctuation

User · Answer

string punctuation misses loads of punctuation marks that are commonly used in the real world  How about a solution that works for non-ASCII punctuation   import regex s   u string  With  Some  Really Weird Non ASCII    Punctuation     remove   regex compile ur   p C   p M   p P   p S   p Z      regex UNICODE  remove sub u     s  strip     Personally  I believe this is the best way to remove punctuation from a string in Python because    It removes all Unicode punctuation It s easily modifiable  e g  you can remove the   S  if you want to remove punctuation  but keep symbols like    You can get really specific about what you want to keep and what you want to remove  for example   Pd  will only remove dashes  This regex also normalizes whitespace  It maps tabs  carriage returns  and other oddities to nice  single spaces    This uses Unicode character properties  which you can read more about on Wikipedia

User · Answer

Not necessarily simpler  but a different way  if you are more familiar with the re family    import re  string s    string  With  Punctuation     Sample string  out   re sub    s     re escape string punctuation       s

User · Answer

with open  one txt   r  as myFile       str1 myFile read        print str1        punctuation                                                             for i in punctuation           str1   str1 replace i               myList            myList extend str1 split       print  str1   for i in myList       print i end   n       print

User · Answer

Not necessarily simpler  but a different way  if you are more familiar with the re family    import re  string s    string  With  Punctuation     Sample string  out   re sub    s     re escape string punctuation       s

User · Answer

Just as an update  I rewrote the  Brian example in Python 3 and made changes to it to move regex compile step inside of the function  My thought here was to time every single step needed to make the function work  Perhaps you are using distributed computing and can t have regex object shared between your workers and need to have re compile step at each worker  Also  I was curious to time two different implementations of maketrans for Python 3  table   str maketrans  key  None for key in string punctuation     vs   table   str maketrans         string punctuation    Plus I added another method to use set  where I take advantage of intersection function to reduce number of iterations   This is the complete code   import re  string  timeit  s    string  With  Punctuation    def test set s       exclude   set string punctuation      return    join ch for ch in s if ch not in exclude    def test set2 s        punctuation   set string punctuation      for punct in set s  intersection  punctuation           s   s replace punct           return     join s split      def test re s      From Vinko s solution  with fix      regex   re compile    s     re escape string punctuation       return regex sub     s    def test trans s       table   str maketrans  key  None for key in string punctuation       return s translate table    def test trans2 s       table   str maketrans         string punctuation      return s translate table     def test repl s      From S Lott s solution     for c in string punctuation          s s replace c         return s   print  sets         timeit Timer  f s     from   main   import s test set as f   timeit 1000000   print  sets2         timeit Timer  f s     from   main   import s test set2 as f   timeit 1000000   print  regex        timeit Timer  f s     from   main   import s test re as f   timeit 1000000   print  translate    timeit Timer  f s     from   main   import s test trans as f   timeit 1000000   print  translate2    timeit Timer  f s     from   main   import s test trans2 as f   timeit 1000000   print  replace      timeit Timer  f s     from   main   import s test repl as f   timeit 1000000     This is my results   sets        3 1830138750374317 sets2        2 189873124472797 regex       7 142953420989215 translate   4 243278483860195 translate2   2 427158243022859 replace     4 579746678471565

User · Answer

For the convenience of usage  I sum up the note of striping punctuation from a string in both Python 2 and Python 3  Please refer to other answers for the detailed description     Python 2  import string  s    string  With  Punctuation   table   string maketrans        new s   s translate table  string punctuation         Output  string without punctuation     Python 3  import string  s    string  With  Punctuation   table   str maketrans dict fromkeys string punctuation      OR  key  None for key in string punctuation  new s   s translate table                             Output  string without punctuation

User · Answer

import re s    string  With  Punctuation     Sample string  out   re sub r   a-zA-Z0-9 s        s

User · Answer

I usually use something like this    gt  gt  gt  s    string  With  Punctuation     Sample string  gt  gt  gt  import string  gt  gt  gt  for c in string punctuation          s  s replace c          gt  gt  gt  s  string With Punctuation

User · Answer

Regular expressions are simple enough  if you know them    import re s    string  With  Punctuation   s   re sub r    w s      s

User · Answer

Here s a solution without regex   import string  input text     where  and  or  then    punctuation replacer   string maketrans string punctuation      len string punctuation       print     join input text translate punctuation replacer  split    strip    Output gt  gt  where and or then    Replaces the punctuations with spaces  Replace multiple spaces in between words with a single space  Remove the trailing spaces  if any with strip

User · Answer

This might not be the best solution however this is how I did it   import string f   lambda x     join  i for i in x if i not in string punctuation

User · Answer

For the convenience of usage  I sum up the note of striping punctuation from a string in both Python 2 and Python 3  Please refer to other answers for the detailed description     Python 2  import string  s    string  With  Punctuation   table   string maketrans        new s   s translate table  string punctuation         Output  string without punctuation     Python 3  import string  s    string  With  Punctuation   table   str maketrans dict fromkeys string punctuation      OR  key  None for key in string punctuation  new s   s translate table                             Output  string without punctuation

User · Answer

I like to use a function like this   def scrub abc       while abc -1  is in list string punctuation           abc abc  -1      while abc 0  is in list string punctuation           abc abc 1       return abc

User · Answer

A one-liner might be helpful in not very strict cases      join  c for c in s if c isalnum   or c isspace

User · Answer

A one-liner might be helpful in not very strict cases      join  c for c in s if c isalnum   or c isspace

User · Answer

with open  one txt   r  as myFile       str1 myFile read        print str1        punctuation                                                             for i in punctuation           str1   str1 replace i               myList            myList extend str1 split       print  str1   for i in myList       print i end   n       print

User · Answer

FIRST METHOD  Storing all punctuations in a variable     punctuation              -  newstring     Creating empty string word raw input  Enter string     for i in word       if i not in punctuation                     newstring  i print  The string without punctuation is  newstring   SECOND METHOD word raw input  Enter string     punctuation              -  newstring word translate None punctuation  print  The string without punctuation is  newstring    Output for both methods Enter string  hello  welcome -to python programming language     The string without punctuation is  hello welcome topythonprogramminglanguage

User · Answer

Considering unicode  Code checked in python3   from unicodedata import category text    hi  how are you   text without punc      join ch for ch in text if not category ch  startswith  P

User · Answer

Why none of you use this       join filter str isalnum  s      Too slow

User · Answer

gt  gt  gt  s    string  With  Punctuation    gt  gt  gt  s   re sub r    w s      s   gt  gt  gt  re split r  s    s      string    With    Punctuation

User · Answer

Remove stop words from the text file using Python   print      THIS IS HOW TO REMOVE STOP WORS        with open  one txt   r  as myFile       str1 myFile read        stop words   not    is    it    By   between   This   By   A   when   And   up   Then   was   by   It   If   can   an   he   This   or   And   a   i   it   am   at   on   in   of   to   is   so   too   my   the   and   but   are   very   here   even   from   them   then   than   this   that   though   be   But   these       myList         myList extend str1 split            for i in myList           if i not in stop words               print                               print i end   n

User · Answer

string punctuation is ASCII only  A more correct  but also much slower  way is to use the unicodedata module     - - coding  utf-8 - - from unicodedata import category s   u String     with -    punctation        s      join ch for ch in s if category ch  0      P   print  stripped   s   You can generalize and strip other types of characters as well      join ch for ch in s if category ch  0  not in  SP     It will also strip characters like        which may or may not be  punctuation  depending on one s point of view

User · Answer

Here s a solution without regex   import string  input text     where  and  or  then    punctuation replacer   string maketrans string punctuation      len string punctuation       print     join input text translate punctuation replacer  split    strip    Output gt  gt  where and or then    Replaces the punctuations with spaces  Replace multiple spaces in between words with a single space  Remove the trailing spaces  if any with strip

[python] Best way to strip punctuation from a string

Examples related to python

Examples related to string

Examples related to punctuation