Remove all special characters punctuation and spaces from string

Question

I need to remove all special characters  punctuation and spaces from a string so that I only have letters and numbers

User · Answer

Differently than everyone else did using regex, I would try to exclude every character that is not what I want, instead of enumerating explicitly what I don't want.

For example, if I want only characters from 'a to z' (upper and lower case) and numbers, I would exclude everything else:

import re
s = re.sub(r"[^a-zA-Z0-9]","",s)

This means "substitute every character that is not a number, or a character in the range 'a to z' or 'A to Z' with an empty string".

In fact, if you insert the special character ^ at the first place of your regex, you will get the negation.

Extra tip: if you also need to lowercase the result, you can make the regex even faster and easier, as long as you won't find any uppercase now.

import re
s = re.sub(r"[^a-z0-9]","",s.lower())

User · Answer

string punctuation contains following characters             amp        -     lt                   You can use translate and maketrans functions to map punctuations to empty values  replace   import string   This  is  A test   translate str maketrans         string punctuation     Output    This is A test

User · Answer

Assuming you want to use a regex and you want need Unicode-cognisant 2 x code that is 2to3-ready    gt  gt  gt  import re  gt  gt  gt  rx   re compile u   W      re UNICODE   gt  gt  gt  data   u   join unichr i  for i in range 256    gt  gt  gt  rx sub u    data  u 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz xaa xb2  snip   xfe xff   gt  gt  gt

User · Answer

This will remove all non-alphanumeric characters except spaces.

string = "Special $#! characters   spaces 888323"
''.join(e for e in string if (e.isalnum() or e.isspace()))

Special characters spaces 888323

User · Answer

s   re sub r  -          lt  gt                    s

User · Answer

usr bin python import re  strs    how much for the maple syrup   20 99  That s ricidulous     print strs nstr   re sub r            r   strs  print nstr nestr   re sub r   a-zA-Z0-9    r   nstr  print nestr   you can add more special character and that will be replaced by    means nothing i e they will be removed

User · Answer

Here is a regex to match a string of characters that are not a letters or numbers     A-Za-z0-9     Here is the Python command to do a regex substitution   re sub    A-Za-z0-9         mystring

User · Answer

After seeing this  I was interested in expanding on the provided answers by finding out which executes in the least amount of time  so I went through and checked some of the proposed answers with timeit against two of the example strings    string1    Special     characters   spaces 888323  string2    how much for the maple syrup   20 99  That s ricidulous       Example 1    join e for e in string if e isalnum      string1 - Result  10 7061979771 string2 - Result  7 78372597694   Example 2  import re re sub    A-Za-z0-9         string    string1 - Result  7 10785102844 string2 - Result  4 12814903259   Example 3  import re re sub   W       string    string1 - Result  3 11899876595 string2 - Result  2 78014397621   The above results are a product of the lowest returned result from an average of  repeat 3  2000000   Example 3 can be 3x faster than Example 1

User · Answer

import re my string      Strings are amongst the most popular data types in Python  We can create the strings by enclosing characters in quotes  Python treats single quotes the    same as double quotes        if we need to count the word python that ends with or without     or     at end  count   0 for i in text      if i endswith               text count    re sub     a-z           r  1   i      count    1 print  The count of Python      text count  python

User · Answer

import re abc    askhnl   askdjalsdk  ddd   abc replace           print  ddd    and you shall see your result as    askhnlaskdjalsdk

User · Answer

Shorter way     import re cleanString   re sub   W       string     If you want spaces between words and numbers substitute    with

User · Answer

Removing Punctuations  Numbers  and Special Characters  Example  -    Code  combi  tidy tweet     combi  tidy tweet   str replace    a-zA-Z             Result -   Thanks

User · Answer

This can be done without regex    gt  gt  gt  string    Special     characters   spaces 888323   gt  gt  gt     join e for e in string if e isalnum     Specialcharactersspaces888323    You can use str isalnum    S isalnum   - gt  bool  Return True if all characters in S are alphanumeric and there is at least one character in S  False otherwise     If you insist on using regex  other solutions will do fine  However note that if it can be done without using a regular expression  that s the best way to go about it

User · Answer

Use translate   import string  def clean instr       return instr translate None  string punctuation          Caveat  Only works on ascii strings

User · Answer

Python 2    I think just filter str isalnum  string  works  In  20   filter str isalnum   string with special chars like       etcs    Out 20    stringwithspecialcharslikeetcs    Python 3    In Python3  filter    function would return an itertable object  instead of string unlike in above   One has to join back to get a string from itertable      join filter str isalnum  string      or to pass list in join use  not sure but can be fast a bit       join   filter str isalnum  string      note  unpacking in   args  valid from Python    3 5

User · Answer

The most generic approach is using the  categories  of the unicodedata table which classifies every single character  E g  the following code filters only printable characters based on their category   import unicodedata   strip of crap characters  based on the Unicode database   categorization    http   www sql-und-xml de unicode-database  kategorien  PRINTABLE   set   Lu    Ll    Nd    Zs     def filter non printable s       result          ws last   False     for c in s          c   unicodedata category c  in PRINTABLE and c or u            result append c      return u   join result  replace u     u       Look at the given URL above for all related categories  You also can of course filter by the punctuation categories

User · Answer

For other languages like German  Spanish  Danish  French etc that contain special characters  like German  quot Umlaute quot  as             simply add these to the regex search string  Example for German  re sub    A-Z      a-z0-9         mystring

[python] Remove all special characters, punctuation and spaces from string

The answer is

Example 1

Example 2

Example 3

Python 2.*

Python 3.*

Examples related to python

Examples related to regex

Examples related to string

Tags