You could try setting up a filter using the unicodedata.category()
function:
import unicodedata
printable = {'Lu', 'Ll'}
def filter_non_printable(str):
return ''.join(c for c in str if unicodedata.category(c) in printable)
See Table 4-9 on page 175 in the Unicode database character properties for the available categories