[python] Easy way to convert a unicode list to a list containing python strings?

Template of the list is:

EmployeeList =  [u'<EmpId>', u'<Name>', u'<Doj>', u'<Salary>']

I would like to convert from this

EmployeeList =  [u'1001', u'Karick', u'14-12-2020', u'1$']

to this:

EmployeeList =  ['1001', 'Karick', '14-12-2020', '1$']

After conversion, I am actually checking if "1001" exists in EmployeeList.values().

This question is related to python python-2.7 unicode encoding

The answer is


Just json.dumps will fix the problem

json.dumps function actually converts all the unicode literals to string literals and it will be easy for us to load the data either in json file or csv file.

sample code:

import json
EmployeeList =  [u'1001', u'Karick', u'14-12-2020', u'1$']
result_list = json.dumps(EmployeeList)
print result_list

output: ["1001", "Karick", "14-12-2020", "1$"]


You can do this by using json and ast modules as follows

>>> import json, ast
>>>
>>> EmployeeList =  [u'1001', u'Karick', u'14-12-2020', u'1$']
>>>
>>> result_list = ast.literal_eval(json.dumps(EmployeeList))
>>> result_list
['1001', 'Karick', '14-12-2020', '1$']

[str(x) for x in EmployeeList] would do a conversion, but it would fail if the unicode string characters do not lie in the ascii range.

>>> EmployeeList = [u'1001', u'Karick', u'14-12-2020', u'1$']
>>> [str(x) for x in EmployeeList]
['1001', 'Karick', '14-12-2020', '1$']


>>> EmployeeList = [u'1001', u'????', u'14-12-2020', u'1$']
>>> [str(x) for x in EmployeeList]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)

Just use

unicode_to_list = list(EmployeeList)

We can use map function

print map(str, EmployeeList)

There are several ways to do this. I converted like this

def clean(s):
    s = s.replace("u'","")
    return re.sub("[\[\]\'\s]", '', s)

EmployeeList = [clean(i) for i in str(EmployeeList).split(',')]

After that you can check

if '1001' in EmployeeList:
    #do something

Hope it will help you.


how about:

def fix_unicode(data):
    if isinstance(data, unicode):
        return data.encode('utf-8')
    elif isinstance(data, dict):
        data = dict((fix_unicode(k), fix_unicode(data[k])) for k in data)
    elif isinstance(data, list):
        for i in xrange(0, len(data)):
            data[i] = fix_unicode(data[i])
    return data

Encode each value in the list to a string:

[x.encode('UTF8') for x in EmployeeList]

You need to pick a valid encoding; don't use str() as that'll use the system default (for Python 2 that's ASCII) which will not encode all possible codepoints in a Unicode value.

UTF-8 is capable of encoding all of the Unicode standard, but any codepoint outside the ASCII range will lead to multiple bytes per character.

However, if all you want to do is test for a specific string, test for a unicode string and Python won't have to auto-encode all values when testing for that:

u'1001' in EmployeeList.values()

Just simply use this code

EmployeeList = eval(EmployeeList)
EmployeeList = [str(x) for x in EmployeeList]

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to python-2.7

Numpy, multiply array with scalar Not able to install Python packages [SSL: TLSV1_ALERT_PROTOCOL_VERSION] How to create a new text file using Python Could not find a version that satisfies the requirement tensorflow Python: Pandas pd.read_excel giving ImportError: Install xlrd >= 0.9.0 for Excel support Display/Print one column from a DataFrame of Series in Pandas How to calculate 1st and 3rd quartiles? How can I read pdf in python? How to completely uninstall python 2.7.13 on Ubuntu 16.04 Check key exist in python dict

Examples related to unicode

How to resolve TypeError: can only concatenate str (not "int") to str (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape UnicodeEncodeError: 'ascii' codec can't encode character at special name Python NLTK: SyntaxError: Non-ASCII character '\xc3' in file (Sentiment Analysis -NLP) HTML for the Pause symbol in audio and video control Javascript: Unicode string to hex Concrete Javascript Regex for Accented Characters (Diacritics) Replace non-ASCII characters with a single space UTF-8 in Windows 7 CMD NameError: global name 'unicode' is not defined - in Python 3

Examples related to encoding

How to check encoding of a CSV file UnicodeEncodeError: 'ascii' codec can't encode character at special name Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings What is the difference between utf8mb4 and utf8 charsets in MySQL? The character encoding of the plain text document was not declared - mootool script UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128) How to encode text to base64 in python UTF-8 output from PowerShell Set Encoding of File to UTF8 With BOM in Sublime Text 3 Replace non-ASCII characters with a single space