What is the most efficient string concatenation method in python

Question

Is there any efficient mass string concatenation method in Python (like StringBuilder in C# or StringBuffer in Java)? I found following methods here:

Simple concatenation using +
Using string list and join method
Using UserString from MutableString module
Using character array and the array module
Using cStringIO from StringIO module

But what do you experts use or suggest, and why?

[A related question here]

User · Accepted Answer

You may be interested in this  An optimization anecdote by Guido   Although it is worth remembering also that this is an old article and it predates the existence of things like    join  although I guess string joinfields is more-or-less the same   On the strength of that  the array module may be fastest if you can shoehorn your problem into it   But    join is probably fast enough and has the benefit of being idiomatic and thus easier for other python programmers to understand   Finally  the golden rule of optimization  don t optimize unless you know you need to  and measure rather than guessing   You can measure different methods using the timeit module  That can tell you which is fastest  instead of random strangers on the internet making guesses

User · Answer

join sequenceofstrings  is what usually works best -- simplest and fastest

User · Answer

For a small set of short strings  i e  2 or 3 strings of no more than a few characters   plus is still way faster  Using mkoistinen s wonderful script in Python 2 and 3   plus 2 679107467004  100 00  as fast  join 3 653773699996  73 32  as fast  form 6 594011374000  40 63  as fast  intp 4 568015249999  58 65  as fast    So when your code is doing a huge number of separate small concatenations  plus is the preferred way if speed is crucial

User · Answer

It depends on what you re doing   After Python 2 5  string concatenation with the   operator is pretty fast  If you re just concatenating a couple of values  using the   operator works best    gt  gt  gt  x   timeit Timer stmt   a     b     gt  gt  gt  x timeit   0 039999961853027344   gt  gt  gt  x   timeit Timer stmt     join   a    b       gt  gt  gt  x timeit   0 76200008392333984   However  if you re putting together a string in a loop  you re better off using the list joining method    gt  gt  gt  join stmt           joined str          for i in xrange 100000         joined str    str i           gt  gt  gt  x   timeit Timer join stmt   gt  gt  gt  x timeit 100  13 278000116348267   gt  gt  gt  list stmt           str list          for i in xrange 100000         str list append str i          join str list           gt  gt  gt  x   timeit Timer list stmt   gt  gt  gt  x timeit 100  12 401000022888184      but notice that you have to be putting together a relatively high number of strings before the difference becomes noticeable

User · Answer

I ran into a situation where I needed to have an appendable string of unknown size. These are the benchmark results (python 2.7.3):

$ python -m timeit -s 's=""' 's+="a"'
10000000 loops, best of 3: 0.176 usec per loop
$ python -m timeit -s 's=[]' 's.append("a")'
10000000 loops, best of 3: 0.196 usec per loop
$ python -m timeit -s 's=""' 's="".join((s,"a"))'
100000 loops, best of 3: 16.9 usec per loop
$ python -m timeit -s 's=""' 's="%s%s"%(s,"a")'
100000 loops, best of 3: 19.4 usec per loop

This seems to show that '+=' is the fastest. The results from the skymind link are a bit out of date.

(I realize that the second example is not complete, the final list would need to be joined. This does show, however, that simply preparing the list takes longer than the string concat.)

User · Answer

Inspired by  JasonBaker s benchmarks  here s a simple one comparing 10  quot abcdefghijklmnopqrstuvxyz quot  strings  showing that  join   is faster  even with this tiny increase in variables  Catenation  gt  gt  gt  x   timeit Timer stmt   quot abcdefghijklmnopqrstuvxyz quot     quot abcdefghijklmnopqrstuvxyz quot     quot abcdefghijklmnopqrstuvxyz quot     quot abcdefghijklmnopqrstuvxyz quot     quot abcdefghijklmnopqrstuvxyz quot     quot abcdefghijklmnopqrstuvxyz quot     quot abcdefghijklmnopqrstuvxyz quot     quot abcdefghijklmnopqrstuvxyz quot     quot abcdefghijklmnopqrstuvxyz quot     quot abcdefghijklmnopqrstuvxyz quot     quot abcdefghijklmnopqrstuvxyz quot     gt  gt  gt  x timeit   0 9828147209324385  Join  gt  gt  gt  x   timeit Timer stmt   quot  quot  join   quot abcdefghijklmnopqrstuvxyz quot    quot abcdefghijklmnopqrstuvxyz quot    quot abcdefghijklmnopqrstuvxyz quot    quot abcdefghijklmnopqrstuvxyz quot    quot abcdefghijklmnopqrstuvxyz quot    quot abcdefghijklmnopqrstuvxyz quot    quot abcdefghijklmnopqrstuvxyz quot    quot abcdefghijklmnopqrstuvxyz quot    quot abcdefghijklmnopqrstuvxyz quot    quot abcdefghijklmnopqrstuvxyz quot    quot abcdefghijklmnopqrstuvxyz quot       gt  gt  gt  x timeit   0 6114138159765048

User · Answer

One Year later  let s test mkoistinen s answer with python 3 4 3    plus 0 963564149000  95 83  as fast  join 0 923408469000  100 00  as fast  form 1 501130934000  61 51  as fast  intp 1 019677452000  90 56  as fast    Nothing changed  Join is still the fastest method  With intp being arguably the best choice in terms of readability you might want to use intp nevertheless

User · Answer

Python 3 6 changed the game for string concatenation of known components with Literal String Interpolation   Given the test case from mkoistinen s answer  having strings  domain    some really long example com  lang    en  path    some really long path     The contenders are    f http    domain   lang   path   - 0 151   s   http    s  s  s     domain  lang  path  - 0 321   s  http       domain         lang         path - 0 356   s    join   http      domain       lang       path   - 0 249   s  notice that building a constant-length tuple is slightly faster than building a constant-length list     Thus currently the shortest and the most beautiful code possible is also fastest   In alpha versions of Python 3 6 the implementation of f   strings was the slowest possible - actually the generated byte code is pretty much equivalent to the    join   case with unnecessary calls to str   format   which without arguments would just return self unchanged  These inefficiencies were addressed before 3 6 final   The speed can be contrasted with the fastest method for Python 2  which is   concatenation on my computer  and that takes 0 203   s with 8-bit strings  and 0 259   s if the strings are all Unicode

User · Answer

As per John Fouhy s answer  don t optimize unless you have to  but if you re here and asking this question  it may be precisely because you have to  In my case  I needed assemble some URLs from string variables    fast  I noticed no one  so far  seems to be considering the string format method  so I thought I d try that and  mostly for mild interest  I thought I d toss the string interpolation operator in there for good measuer  To be honest  I didn t think either of these would stack up to a direct     operation or a    join    But guess what  On my Python 2 7 5 system  the string interpolation operator rules them all and string format   is the worst performer     concatenate test py  from   future   import print function import timeit  domain    some really long example com  lang    en  path    some really long path   iterations   1000000  def meth plus           Using   operator        return  http       domain         lang         path  def meth join           Using    join          return    join   http      domain       lang       path    def meth form           Using string format        return  http    0   1   2   format domain  lang  path   def meth intp           Using string interpolation        return  http    s  s  s     domain  lang  path   plus   timeit Timer stmt  meth plus     setup  from   main   import meth plus   join   timeit Timer stmt  meth join     setup  from   main   import meth join   form   timeit Timer stmt  meth form     setup  from   main   import meth form   intp   timeit Timer stmt  meth intp     setup  from   main   import meth intp    plus val   plus timeit iterations  join val   join timeit iterations  form val   form timeit iterations  intp val   intp timeit iterations   min val   min  plus val  join val  form val  intp val    print  plus  0 12f   0 2f   as fast      plus val   100   min val   plus val      print  join  0 12f   0 2f   as fast      join val   100   min val   join val      print  form  0 12f   0 2f   as fast      form val   100   min val   form val      print  intp  0 12f   0 2f   as fast      intp val   100   min val   intp val        The results     python2 7 concatenate test py plus 0 360787868500  90 81  as fast  join 0 452811956406  72 36  as fast  form 0 502608060837  65 19  as fast  intp 0 327636957169  100 00  as fast    If I use a shorter domain and shorter path  interpolation still wins out  The difference is more pronounced  though  with longer strings   Now that I had a nice test script  I also tested under Python 2 6  3 3 and 3 4  here s the results  In Python 2 6  the plus operator is the fastest  On Python 3  join wins out  Note  these tests are very repeatable on my system  So   plus  is always faster on 2 6   intp  is always faster on 2 7 and  join  is always faster on Python 3 x     python2 6 concatenate test py plus 0 338213920593  100 00  as fast  join 0 427221059799  79 17  as fast  form 0 515371084213  65 63  as fast  intp 0 378169059753  89 43  as fast     python3 3 concatenate test py plus 0 409130576998  89 20  as fast  join 0 364938726001  100 00  as fast  form 0 621366866995  58 73  as fast  intp 0 419064424001  87 08  as fast     python3 4 concatenate test py plus 0 481188605998  85 14  as fast  join 0 409673971997  100 00  as fast  form 0 652010936996  62 83  as fast  intp 0 460400978001  88 98  as fast     python3 5 concatenate test py plus 0 417167026084  93 47  as fast  join 0 389929617057  100 00  as fast  form 0 595661019906  65 46  as fast  intp 0 404455224983  96 41  as fast    Lesson learned    Sometimes  my assumptions are dead wrong  Test against the system env  you ll be running in production  String interpolation isn t dead yet    tl dr    If you using 2 6  use the   operator  if you re using 2 7 use the     operator  if you re using 3 x use    join

User · Answer

it pretty much depends on the relative sizes of the new string after every new concatenation. With the + operator, for every concatenation a new string is made. If the intermediary strings are relatively long, the + becomes increasingly slower because the new intermediary string is being stored.

Consider this case:

from time import time
stri=''
a='aagsdfghfhdyjddtyjdhmfghmfgsdgsdfgsdfsdfsdfsdfsdfsdfddsksarigqeirnvgsdfsdgfsdfgfg'
l=[]
#case 1
t=time()
for i in range(1000):
    stri=stri+a+repr(i)
print time()-t

#case 2
t=time()
for i in xrange(1000):
    l.append(a+repr(i))
z=''.join(l)
print time()-t

#case 3
t=time()
for i in range(1000):
    stri=stri+repr(i)
print time()-t

#case 4
t=time()
for i in xrange(1000):
    l.append(repr(i))
z=''.join(l)
print time()-t

Results

1 0.00493192672729

2 0.000509023666382

3 0.00042200088501

4 0.000482797622681

In the case of 1&2, we add a large string, and join() performs about 10 times faster. In case 3&4, we add a small string, and '+' performs slightly faster

User · Answer

For python 3 8 6 3 9  I had to do some dirty hacks  because perfplot was giving out some errors  Here assume that x 0  is a a and x 1  is b   The plot is nearly same for large data  For small data   Taken by perfplot and this is the code  large data    range 8   small data    range 4   import perfplot  from random import choice from string import ascii lowercase as letters  def generate random x       data      join choice letters  for i in range x       sata      join choice letters  for i in range x       return  data sata   def fstring func x       return  ord i  for i in f  x 0   x 1      def format func x       return  ord i  for i in  quot      quot  format x 0   x 1     def replace func x       return  ord i  for i in  quot    quot  replace      x 0   replace      x 1     def join func x       return  ord i  for i in  quot  quot  join  x 0   x 1      perfplot show      setup lambda n  generate random n       kernels           fstring func          format func          replace func          join func             n range  int k    2 5  for k in range 4       When medium data is there  and 4 strings are there x 0   x 1   x 2   x 3  instead of 2 string  def generate random x       a       join choice letters  for i in range x       b       join choice letters  for i in range x       c       join choice letters  for i in range x       d       join choice letters  for i in range x       return  a b c d    Better to stick with fstrings

User · Answer

Probably  new f-strings in Python 3 6  is the most efficient way of concatenating strings   Using  s   gt  gt  gt  timeit timeit    name    Some      age   100       s is  s      name  age      number   10000  0 0029734770068898797   Using  format       gt  gt  gt  timeit timeit    name    Some      age   100         is      format name  age      number   10000  0 004015227983472869   Using f   gt  gt  gt  timeit timeit    name    Some      age   100     f  name  is  age        number   10000  0 0019175919878762215   Source  https   realpython com python-f-strings

[python] What is the most efficient string concatenation method in python?

The answer is

Catenation

Join

Examples related to python

Examples related to string

Tags