I have a phone number(string), e.g. "+123-456-7890", that I want to turn into a list that looks like: [+, 1, 2, 3, -, ...., 0].
Why? So I can go iterate through the list and remove all the symbols, so I'm left with a list of only digits, which I can then convert back to a string.
What's the best way to solve this problem? None of the solutions I've come across are applicable, because I don't have any special characters in-between the digits (so I can't split the string there.)
Any ideas? I really appreciate it!
Edit - this is what I've tried:
x = row.translate(None, string.digits)
list = x.split()
Also:
filter(lambda x: x isdigit())
Instead of converting to a list, you could just iterate over the first string and create a second string by adding each of the digit characters you find to that new string.
''.join(filter(str.isdigit, "+123-456-7890"))
Did you try list(x)??
y = '+123-456-7890'
c =list(y)
c
['+', '1', '2', '3', '-', '4', '5', '6', '-', '7', '8', '9', '0']
You can use the re module:
import re
re.sub(r'\D', '', '+123-456-7890')
This will replace all non-digits with ''.
Make a list(your_string).
>>> s = "mep"
>>> list(s)
['m', 'e', 'p']
I know this question has been answered, but just to point out what timeit
has to say about the solutions efficiency. Using these parameters:
size = 30
s = [str(random.randint(0, 9)) for i in range(size)] + (size/3) * ['-']
random.shuffle(s)
s = ''.join(['+'] + s)
timec = 1000
That is the "phone number" has 30 digits, 1 plus sing and 10 '-'. I've tested these approaches:
def justdigits(s):
justdigitsres = ""
for char in s:
if char.isdigit():
justdigitsres += str(char)
return justdigitsres
re_compiled = re.compile(r'\D')
print('Filter: %ss' % timeit.Timer(lambda : ''.join(filter(str.isdigit, s))).timeit(timec))
print('GE: %ss' % timeit.Timer(lambda : ''.join(n for n in s if n.isdigit())).timeit(timec))
print('LC: %ss' % timeit.Timer(lambda : ''.join([n for n in s if n.isdigit()])).timeit(timec))
print('For loop: %ss' % timeit.Timer(lambda : justdigits(s)).timeit(timec))
print('RE: %ss' % timeit.Timer(lambda : re.sub(r'\D', '', s)).timeit(timec))
print('REC: %ss' % timeit.Timer(lambda : re_compiled.sub('', s)).timeit(timec))
print('Translate: %ss' % timeit.Timer(lambda : s.translate(None, '+-')).timeit(timec))
And came out with these results:
Filter: 0.0145790576935s
GE: 0.0185861587524s
LC: 0.0151798725128s
For loop: 0.0242128372192s
RE: 0.0120108127594s
REC: 0.00868797302246s
Translate: 0.00118899345398s
Apparently GEs and LCs are still slower than a regex or a compiled regex. And apparently my CPython 2.6.6 didn't optimize the string addition that much. translate
appears to be the fastest (which is expected as the problem is stated as "ignore these two symbols", rather than "get these numbers" and I believe is quite low-level).
And for size = 100
:
Filter: 0.0357120037079s
GE: 0.0465779304504s
LC: 0.0428011417389s
For loop: 0.0733139514923s
RE: 0.0213229656219s
REC: 0.0103371143341s
Translate: 0.000978946685791s
And for size = 1000
:
Filter: 0.212141036987s
GE: 0.198996067047s
LC: 0.196880102158s
For loop: 0.365696907043s
RE: 0.0880808830261s
REC: 0.086804151535s
Translate: 0.00587010383606s
A python string is a list of characters. You can iterate over it right now!
justdigits = ""
for char in string:
if char.isdigit():
justdigits += str(char)
You can use str.translate
, you just have to give it the right arguments:
>>> dels=''.join(chr(x) for x in range(256) if not chr(x).isdigit())
>>> '+1-617-555-1212'.translate(None, dels)
'16175551212'
N.b.: This won't work with unicode strings in Python2, or at all in Python3. For those environments, you can create a custom class to pass to unicode.translate
:
>>> class C:
... def __getitem__(self, i):
... if unichr(i).isdigit():
... return i
...
>>> u'+1-617.555/1212'.translate(C())
u'16175551212'
This works with non-ASCII digits, too:
>>> print u'+\u00b9-\uff1617.555/1212'.translate(C()).encode('utf-8')
ยน6175551212
Source: Stackoverflow.com