[python] replace special characters in a string python

I am using urllib to get a string of html from a website and need to put each word in the html document into a list.

Here is the code I have so far. I keep getting an error. I have also copied the error below.

import urllib.request

url = input("Please enter a URL: ")

z=urllib.request.urlopen(url)
z=str(z.read())
removeSpecialChars = str.replace("!@#$%^&*()[]{};:,./<>?\|`~-=_+", " ")

words = removeSpecialChars.split()

print ("Words list: ", words[0:20])

Here is the error.

Please enter a URL: http://simleyfootball.com
Traceback (most recent call last):
  File "C:\Users\jeremy.KLUG\My Documents\LiClipse Workspace\Python Project 2\Module2.py", line 7, in <module>
    removeSpecialChars = str.replace("!@#$%^&*()[]{};:,./<>?\|`~-=_+", " ")
TypeError: replace() takes at least 2 arguments (1 given)

This question is related to python string list replace urllib

The answer is


You need to call replace on z and not on str, since you want to replace characters located in the string variable z

removeSpecialChars = z.replace("!@#$%^&*()[]{};:,./<>?\|`~-=_+", " ")

But this will not work, as replace looks for a substring, you will most likely need to use regular expression module re with the sub function:

import re
removeSpecialChars = re.sub("[!@#$%^&*()[]{};:,./<>?\|`~-=_+]", " ", z)

Don't forget the [], which indicates that this is a set of characters to be replaced.


replace operates on a specific string, so you need to call it like this

removeSpecialChars = z.replace("!@#$%^&*()[]{};:,./<>?\|`~-=_+", " ")

but this is probably not what you need, since this will look for a single string containing all that characters in the same order. you can do it with a regexp, as Danny Michaud pointed out.

as a side note, you might want to look for BeautifulSoup, which is a library for parsing messy HTML formatted text like what you usually get from scaping websites.


You can replace the special characters with the desired characters as follows,

import string
specialCharacterText = "H#y #@w @re &*)?"
inCharSet = "!@#$%^&*()[]{};:,./<>?\|`~-=_+\""
outCharSet = "                               " #corresponding characters in inCharSet to be replaced
splCharReplaceList = string.maketrans(inCharSet, outCharSet)
splCharFreeString = specialCharacterText.translate(splCharReplaceList)

One way is to use re.sub, that's my preferred way.

import re
my_str = "hey th~!ere"
my_new_string = re.sub('[^a-zA-Z0-9 \n\.]', '', my_str)
print my_new_string

Output:

hey there

Another way is to use re.escape:

import string
import re

my_str = "hey th~!ere"

chars = re.escape(string.punctuation)
print re.sub(r'['+chars+']', '',my_str)

Output:

hey there

Just a small tip about parameters style in python by PEP-8 parameters should be remove_special_chars and not removeSpecialChars

Also if you want to keep the spaces just change [^a-zA-Z0-9 \n\.] to [^a-zA-Z0-9\n\.]


Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to string

How to split a string in two and store it in a field String method cannot be found in a main class method Kotlin - How to correctly concatenate a String Replacing a character from a certain index Remove quotes from String in Python Detect whether a Python string is a number or a letter How does String substring work in Swift How does String.Index work in Swift swift 3.0 Data to String? How to parse JSON string in Typescript

Examples related to list

Convert List to Pandas Dataframe Column Python find elements in one list that are not in the other Sorting a list with stream.sorted() in Java Python Loop: List Index Out of Range How to combine two lists in R How do I multiply each element in a list by a number? Save a list to a .txt file The most efficient way to remove first N elements in a list? TypeError: list indices must be integers or slices, not str Parse JSON String into List<string>

Examples related to replace

How do I find and replace all occurrences (in all files) in Visual Studio Code? How to find and replace with regex in excel How to replace text in a column of a Pandas dataframe? How to replace negative numbers in Pandas Data Frame by zero Replacing few values in a pandas dataframe column with another value How to replace multiple patterns at once with sed? Using tr to replace newline with space replace special characters in a string python Replace None with NaN in pandas dataframe Batch script to find and replace a string in text file within a minute for files up to 12 MB

Examples related to urllib

installing urllib in Python3.6 SSL: CERTIFICATE_VERIFY_FAILED with Python3 Python: Importing urllib.quote Python 2.7.10 error "from urllib.request import urlopen" no module named request python save image from url no module named urllib.parse (How should I install it?) 'module' has no attribute 'urlencode' urllib and "SSL: CERTIFICATE_VERIFY_FAILED" Error UnicodeEncodeError: 'charmap' codec can't encode characters replace special characters in a string python