[python] String formatting: % vs. .format vs. string literal

Python 2.6 introduced the str.format() method with a slightly different syntax from the existing % operator. Which is better and for what situations?

Python 3.6 has now introduced another string formatting format of string literals (aka "f" strings) via the syntax f"my string". Is this formatting option better than the others?

  1. The following uses each method and has the same outcome, so what is the difference?

     #!/usr/bin/python
     sub1 = "python string!"
     sub2 = "an arg"
    
     sub_a = "i am a %s" % sub1
     sub_b = "i am a {0}".format(sub1)
     sub_c = f"i am a {sub1}"
    
     arg_a = "with %(kwarg)s!" % {'kwarg':sub2}
     arg_b = "with {kwarg}!".format(kwarg=sub2)
     arg_c = f"with {sub2}!"
    
     print(sub_a)    # "i am a python string!"
     print(sub_b)    # "i am a python string!"
     print(sub_c)    # "i am a python string!"
    
     print(arg_a)    # "with an arg!"
     print(arg_b)    # "with an arg!"
     print(arg_c)    # "with an arg!"
    
  2. Furthermore when does string formatting occur in Python? For example, if my logging level is set to HIGH will I still take a hit for performing the following % operation? And if so, is there a way to avoid this?

     log.debug("some debug info: %s" % some_info)
    

This question is related to python performance logging string-formatting f-string

The answer is


But please be careful, just now I've discovered one issue when trying to replace all % with .format in existing code: '{}'.format(unicode_string) will try to encode unicode_string and will probably fail.

Just look at this Python interactive session log:

Python 2.7.2 (default, Aug 27 2012, 19:52:55) 
[GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2
; s='?'
; u=u'?'
; s
'\xd0\xb9'
; u
u'\u0439'

s is just a string (called 'byte array' in Python3) and u is a Unicode string (called 'string' in Python3):

; '%s' % s
'\xd0\xb9'
; '%s' % u
u'\u0439'

When you give a Unicode object as a parameter to % operator it will produce a Unicode string even if the original string wasn't Unicode:

; '{}'.format(s)
'\xd0\xb9'
; '{}'.format(u)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u0439' in position 0: ordinal not in range(256)

but the .format function will raise "UnicodeEncodeError":

; u'{}'.format(s)
u'\xd0\xb9'
; u'{}'.format(u)
u'\u0439'

and it will work with a Unicode argument fine only if the original string was Unicode.

; '{}'.format(u'i')
'i'

or if argument string can be converted to a string (so called 'byte array')


As of Python 3.6 (2016) you can use f-strings to substitute variables:

>>> origin = "London"
>>> destination = "Paris"
>>> f"from {origin} to {destination}"
'from London to Paris'

Note the f" prefix. If you try this in Python 3.5 or earlier, you'll get a SyntaxError.

See https://docs.python.org/3.6/reference/lexical_analysis.html#f-strings


But one thing is that also if you have nested curly-braces, won't work for format but % will work.

Example:

>>> '{{0}, {1}}'.format(1,2)
Traceback (most recent call last):
  File "<pyshell#3>", line 1, in <module>
    '{{0}, {1}}'.format(1,2)
ValueError: Single '}' encountered in format string
>>> '{%s, %s}'%(1,2)
'{1, 2}'
>>> 

% gives better performance than format from my test.

Test code:

Python 2.7.2:

import timeit
print 'format:', timeit.timeit("'{}{}{}'.format(1, 1.23, 'hello')")
print '%:', timeit.timeit("'%s%s%s' % (1, 1.23, 'hello')")

Result:

> format: 0.470329046249
> %: 0.357107877731

Python 3.5.2

import timeit
print('format:', timeit.timeit("'{}{}{}'.format(1, 1.23, 'hello')"))
print('%:', timeit.timeit("'%s%s%s' % (1, 1.23, 'hello')"))

Result

> format: 0.5864730989560485
> %: 0.013593495357781649

It looks in Python2, the difference is small whereas in Python3, % is much faster than format.

Thanks @Chris Cogdon for the sample code.

Edit 1:

Tested again in Python 3.7.2 in July 2019.

Result:

> format: 0.86600608
> %: 0.630180146

There is not much difference. I guess Python is improving gradually.

Edit 2:

After someone mentioned python 3's f-string in comment, I did a test for the following code under python 3.7.2 :

import timeit
print('format:', timeit.timeit("'{}{}{}'.format(1, 1.23, 'hello')"))
print('%:', timeit.timeit("'%s%s%s' % (1, 1.23, 'hello')"))
print('f-string:', timeit.timeit("f'{1}{1.23}{\"hello\"}'"))

Result:

format: 0.8331376779999999
%: 0.6314778750000001
f-string: 0.766649943

It seems f-string is still slower than % but better than format.


For python version >= 3.6 (see PEP 498)

s1='albha'
s2='beta'

f'{s1}{s2:>10}'

#output
'albha      beta'

One situation where % may help is when you are formatting regex expressions. For example,

'{type_names} [a-z]{2}'.format(type_names='triangle|square')

raises IndexError. In this situation, you can use:

'%(type_names)s [a-z]{2}' % {'type_names': 'triangle|square'}

This avoids writing the regex as '{type_names} [a-z]{{2}}'. This can be useful when you have two regexes, where one is used alone without format, but the concatenation of both is formatted.


Python 3.6.7 comparative:

#!/usr/bin/env python
import timeit

def time_it(fn):
    """
    Measure time of execution of a function
    """
    def wrapper(*args, **kwargs):
        t0 = timeit.default_timer()
        fn(*args, **kwargs)
        t1 = timeit.default_timer()
        print("{0:.10f} seconds".format(t1 - t0))
    return wrapper


@time_it
def new_new_format(s):
    print("new_new_format:", f"{s[0]} {s[1]} {s[2]} {s[3]} {s[4]}")


@time_it
def new_format(s):
    print("new_format:", "{0} {1} {2} {3} {4}".format(*s))


@time_it
def old_format(s):
    print("old_format:", "%s %s %s %s %s" % s)


def main():
    samples = (("uno", "dos", "tres", "cuatro", "cinco"), (1,2,3,4,5), (1.1, 2.1, 3.1, 4.1, 5.1), ("uno", 2, 3.14, "cuatro", 5.5),) 
    for s in samples:
        new_new_format(s)
        new_format(s)
        old_format(s)
        print("-----")


if __name__ == '__main__':
    main()

Output:

new_new_format: uno dos tres cuatro cinco
0.0000170280 seconds
new_format: uno dos tres cuatro cinco
0.0000046750 seconds
old_format: uno dos tres cuatro cinco
0.0000034820 seconds
-----
new_new_format: 1 2 3 4 5
0.0000043980 seconds
new_format: 1 2 3 4 5
0.0000062590 seconds
old_format: 1 2 3 4 5
0.0000041730 seconds
-----
new_new_format: 1.1 2.1 3.1 4.1 5.1
0.0000092650 seconds
new_format: 1.1 2.1 3.1 4.1 5.1
0.0000055340 seconds
old_format: 1.1 2.1 3.1 4.1 5.1
0.0000052130 seconds
-----
new_new_format: uno 2 3.14 cuatro 5.5
0.0000053380 seconds
new_format: uno 2 3.14 cuatro 5.5
0.0000047570 seconds
old_format: uno 2 3.14 cuatro 5.5
0.0000045320 seconds
-----

Assuming you're using Python's logging module, you can pass the string formatting arguments as arguments to the .debug() method rather than doing the formatting yourself:

log.debug("some debug info: %s", some_info)

which avoids doing the formatting unless the logger actually logs something.


If your python >= 3.6, F-string formatted literal is your new friend.

It's more simple, clean, and better performance.

In [1]: params=['Hello', 'adam', 42]

In [2]: %timeit "%s %s, the answer to everything is %d."%(params[0],params[1],params[2])
448 ns ± 1.48 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [3]: %timeit "{} {}, the answer to everything is {}.".format(*params)
449 ns ± 1.42 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [4]: %timeit f"{params[0]} {params[1]}, the answer to everything is {params[2]}."
12.7 ns ± 0.0129 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)

PEP 3101 proposes the replacement of the % operator with the new, advanced string formatting in Python 3, where it would be the default.


Yet another advantage of .format (which I don't see in the answers): it can take object properties.

In [12]: class A(object):
   ....:     def __init__(self, x, y):
   ....:         self.x = x
   ....:         self.y = y
   ....:         

In [13]: a = A(2,3)

In [14]: 'x is {0.x}, y is {0.y}'.format(a)
Out[14]: 'x is 2, y is 3'

Or, as a keyword argument:

In [15]: 'x is {a.x}, y is {a.y}'.format(a=a)
Out[15]: 'x is 2, y is 3'

This is not possible with % as far as I can tell.


I would add that since version 3.6, we can use fstrings like the following

foo = "john"
bar = "smith"
print(f"My name is {foo} {bar}")

Which give

My name is john smith

Everything is converted to strings

mylist = ["foo", "bar"]
print(f"mylist = {mylist}")

Result:

mylist = ['foo', 'bar']

you can pass function, like in others formats method

print(f'Hello, here is the date : {time.strftime("%d/%m/%Y")}')

Giving for example

Hello, here is the date : 16/04/2018


As I discovered today, the old way of formatting strings via % doesn't support Decimal, Python's module for decimal fixed point and floating point arithmetic, out of the box.

Example (using Python 3.3.5):

#!/usr/bin/env python3

from decimal import *

getcontext().prec = 50
d = Decimal('3.12375239e-24') # no magic number, I rather produced it by banging my head on my keyboard

print('%.50f' % d)
print('{0:.50f}'.format(d))

Output:

0.00000000000000000000000312375239000000009907464850 0.00000000000000000000000312375239000000000000000000

There surely might be work-arounds but you still might consider using the format() method right away.


Something that the modulo operator ( % ) can't do, afaik:

tu = (12,45,22222,103,6)
print '{0} {2} {1} {2} {3} {2} {4} {2}'.format(*tu)

result

12 22222 45 22222 103 22222 6 22222

Very useful.

Another point: format(), being a function, can be used as an argument in other functions:

li = [12,45,78,784,2,69,1254,4785,984]
print map('the number is {}'.format,li)   

print

from datetime import datetime,timedelta

once_upon_a_time = datetime(2010, 7, 1, 12, 0, 0)
delta = timedelta(days=13, hours=8,  minutes=20)

gen =(once_upon_a_time +x*delta for x in xrange(20))

print '\n'.join(map('{:%Y-%m-%d %H:%M:%S}'.format, gen))

Results in:

['the number is 12', 'the number is 45', 'the number is 78', 'the number is 784', 'the number is 2', 'the number is 69', 'the number is 1254', 'the number is 4785', 'the number is 984']

2010-07-01 12:00:00
2010-07-14 20:20:00
2010-07-28 04:40:00
2010-08-10 13:00:00
2010-08-23 21:20:00
2010-09-06 05:40:00
2010-09-19 14:00:00
2010-10-02 22:20:00
2010-10-16 06:40:00
2010-10-29 15:00:00
2010-11-11 23:20:00
2010-11-25 07:40:00
2010-12-08 16:00:00
2010-12-22 00:20:00
2011-01-04 08:40:00
2011-01-17 17:00:00
2011-01-31 01:20:00
2011-02-13 09:40:00
2011-02-26 18:00:00
2011-03-12 02:20:00

As a side note, you don't have to take a performance hit to use new style formatting with logging. You can pass any object to logging.debug, logging.info, etc. that implements the __str__ magic method. When the logging module has decided that it must emit your message object (whatever it is), it calls str(message_object) before doing so. So you could do something like this:

import logging


class NewStyleLogMessage(object):
    def __init__(self, message, *args, **kwargs):
        self.message = message
        self.args = args
        self.kwargs = kwargs

    def __str__(self):
        args = (i() if callable(i) else i for i in self.args)
        kwargs = dict((k, v() if callable(v) else v) for k, v in self.kwargs.items())

        return self.message.format(*args, **kwargs)

N = NewStyleLogMessage

# Neither one of these messages are formatted (or calculated) until they're
# needed

# Emits "Lazily formatted log entry: 123 foo" in log
logging.debug(N('Lazily formatted log entry: {0} {keyword}', 123, keyword='foo'))


def expensive_func():
    # Do something that takes a long time...
    return 'foo'

# Emits "Expensive log entry: foo" in log
logging.debug(N('Expensive log entry: {keyword}', keyword=expensive_func))

This is all described in the Python 3 documentation (https://docs.python.org/3/howto/logging-cookbook.html#formatting-styles). However, it will work with Python 2.6 as well (https://docs.python.org/2.6/library/logging.html#using-arbitrary-objects-as-messages).

One of the advantages of using this technique, other than the fact that it's formatting-style agnostic, is that it allows for lazy values e.g. the function expensive_func above. This provides a more elegant alternative to the advice being given in the Python docs here: https://docs.python.org/2.6/library/logging.html#optimization.


Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to performance

Why is 2 * (i * i) faster than 2 * i * i in Java? What is the difference between spark.sql.shuffle.partitions and spark.default.parallelism? How to check if a key exists in Json Object and get its value Why does C++ code for testing the Collatz conjecture run faster than hand-written assembly? Most efficient way to map function over numpy array The most efficient way to remove first N elements in a list? Fastest way to get the first n elements of a List into an Array Why is "1000000000000000 in range(1000000000000001)" so fast in Python 3? pandas loc vs. iloc vs. at vs. iat? Android Recyclerview vs ListView with Viewholder

Examples related to logging

How to redirect docker container logs to a single file? Console logging for react? Hide strange unwanted Xcode logs Where are logs located? Retrieve last 100 lines logs Spring Boot - How to log all requests and responses with exceptions in single place? How do I get logs from all pods of a Kubernetes replication controller? Where is the Docker daemon log? How to log SQL statements in Spring Boot? How to do logging in React Native?

Examples related to string-formatting

JavaScript Chart.js - Custom data formatting to display on tooltip Format in kotlin string templates Converting Float to Dollars and Cents Chart.js - Formatting Y axis String.Format not work in TypeScript Use StringFormat to add a string to a WPF XAML binding How to format number of decimal places in wpf using style/template? How to left align a fixed width string? Convert Java Date to UTC String Format a Go string without printing?

Examples related to f-string

Fixed digits after decimal with f-strings String formatting: % vs. .format vs. string literal