I encountered this problem today as well. I am using Windows and the system language by default is Chinese. Hence, someone may encounter this Unicode error similarly. Simply add encoding = 'utf-8'
:
with open("test.html", "r", encoding='utf-8') as f:
text= f.read()