Any reason not to use to concatenate two strings

Question

A common antipattern in Python is to concatenate a sequence of strings using   in a loop  This is bad because the Python interpreter has to create a new string object for each iteration  and it ends up taking quadratic time   Recent versions of CPython can apparently optimize this in some cases  but other implementations can t  so programmers are discouraged from relying on this      join is the right way to do this   However  I ve heard it said  including here on Stack Overflow  that you should never  ever use   for string concatenation  but instead always use    join or a format string  I don t understand why this is the case if you re only concatenating two strings  If my understanding is correct  it shouldn t take quadratic time  and I think a   b is cleaner and more readable than either    join  a  b   or   s s     a  b    Is it good practice to use   to concatenate two strings  Or is there a problem I m not aware of

User · Answer

According to Python docs  using str join   will give you performance consistence across various implementations of Python  Although CPython optimizes away the quadratic behavior of s   s   t  other Python implementations may not      CPython implementation detail  If s and t are both strings  some   Python implementations such as CPython can usually perform an in-place   optimization for assignments of the form s   s   t or s    t  When   applicable  this optimization makes quadratic run-time much less   likely  This optimization is both version and implementation   dependent  For performance sensitive code  it is preferable to use the   str join   method which assures consistent linear concatenation   performance across versions and implementations    Sequence Types in Python docs  see the foot note  6

User · Answer

The assumption that one should never  ever use   for string concatenation  but instead always use    join may be a myth  It is true that using   creates unnecessary temporary copies of immutable string object but the other not oft quoted fact is that calling join in a loop would generally add the overhead of function call  Lets take your example   Create two lists  one from the linked SO question and another a bigger fabricated   gt  gt  gt  myl1     A   B   C   D   E   F    gt  gt  gt  myl2  chr random randint 65 90   for i in range 0 10000     Lets create two functions  UseJoin and UsePlus to use the respective join and   functionality    gt  gt  gt  def UsePlus        return  myl i    myl i   1  for i in range 0 len myl   2     gt  gt  gt  def UseJoin            join  myl i  myl i   1    for i in range 0 len myl   2     Lets run timeit with the first list   gt  gt  gt  myl myl1  gt  gt  gt  t1 timeit Timer  UsePlus     from   main   import UsePlus    gt  gt  gt  t2 timeit Timer  UseJoin     from   main   import UseJoin    gt  gt  gt  print    2f usec pass     1000000   t1 timeit number 100000  100000  2 48 usec pass  gt  gt  gt  print    2f usec pass     1000000   t2 timeit number 100000  100000  2 61 usec pass  gt  gt  gt     They have almost the same runtime   Lets use cProfile   gt  gt  gt  myl myl2  gt  gt  gt  cProfile run  UsePlus              5 function calls in 0 001 CPU seconds     Ordered by  standard name     ncalls  tottime  percall  cumtime  percall filename lineno function          1    0 001    0 001    0 001    0 001  lt pyshell 1376 gt  1 UsePlus          1    0 000    0 000    0 001    0 001  lt string gt  1  lt module gt           1    0 000    0 000    0 000    0 000  len          1    0 000    0 000    0 000    0 000  method  disable  of   lsprof Profiler  objects          1    0 000    0 000    0 000    0 000  range     gt  gt  gt  cProfile run  UseJoin              5005 function calls in 0 029 CPU seconds     Ordered by  standard name     ncalls  tottime  percall  cumtime  percall filename lineno function          1    0 015    0 015    0 029    0 029  lt pyshell 1388 gt  1 UseJoin          1    0 000    0 000    0 029    0 029  lt string gt  1  lt module gt           1    0 000    0 000    0 000    0 000  len          1    0 000    0 000    0 000    0 000  method  disable  of   lsprof Profiler  objects       5000    0 014    0 000    0 014    0 000  method  join  of  str  objects          1    0 000    0 000    0 000    0 000  range    And it looks that using Join  results in unnecessary function calls which could add to the overhead   Now coming back to the question  Should one discourage the use of   over join in all cases   I believe no  things should be taken into consideration   Length of the String in Question No of Concatenation Operation    And off-course in a development pre-mature optimization is evil

User · Answer

When working with multiple people  it s sometimes difficult to know exactly what s happening   Using a format string instead of concatenation can avoid one particular annoyance that s happened a whole ton of times to us   Say  a function requires an argument  and you write it expecting to get a string   In  1   def foo zeta               print  bar      zeta  In  2   foo  bang   bar  bang   So  this function may be used pretty often throughout the code   Your coworkers may know exactly what it does  but not necessarily be fully up-to-speed on the internals  and may not know that the function expects a string   And so they may end up with this   In  3   foo 23  --------------------------------------------------------------------------- TypeError                                 Traceback  most recent call last    home izkata  lt ipython console gt  in  lt module gt      home izkata  lt ipython console gt  in foo zeta   TypeError  cannot concatenate  str  and  int  objects   There would be no problem if you just used a format string   In  1   def foo zeta               print  bar   s    zeta                            In  2   foo  bang   bar  bang  In  3   foo 23  bar  23   The same is true for all types of objects that define   str    which may be passed in as well   In  1   from datetime import date  In  2   zeta   date 2012  4  15   In  3   print  bar      zeta --------------------------------------------------------------------------- TypeError                                 Traceback  most recent call last    home izkata  lt ipython console gt  in  lt module gt     TypeError  cannot concatenate  str  and  datetime date  objects  In  4   print  bar   s    zeta bar  2012-04-15   So yes   If you can use a format string do it and take advantage of what Python has to offer

User · Answer

I have done a quick test   import sys  str   e    a xxxxxxxxxx very xxxxxxxxxx long xxxxxxxxxx string xxxxxxxxxx n   for i in range int sys argv 1         str   str   e   and timed it   mslade mickpc  binks micks ruby tests  time python  binks micks junk strings py  8000000 8000000 times  real    0m2 165s user    0m1 620s sys     0m0 540s mslade mickpc  binks micks ruby tests  time python  binks micks junk strings py  16000000 16000000 times  real    0m4 360s user    0m3 480s sys     0m0 870s   There is apparently an optimisation for the a   a   b case   It does not exhibit O n 2  time as one might suspect   So at least in terms of performance  using   is fine

User · Answer

There is nothing wrong in concatenating two strings with    Indeed it s easier to read than    join  a  b     You are right though that concatenating more than 2 strings with   is an O n 2  operation  compared to O n  for join  and thus becomes inefficient  However this has not to do with using a loop  Even a   b   c       is O n 2   the reason being that each concatenation produces a new string   CPython2 4 and above try to mitigate that  but it s still advisable to use join when concatenating more than 2 strings

User · Answer

Plus operator is perfectly fine solution to concatenate two Python strings  But if you keep adding more than two strings  n   25    you might want to think something else      join  a  b  c   trick is a performance optimization

User · Answer

join  a  b   is better solution than     Because Code should be written in a way that does not disadvantage other implementations of Python  PyPy  Jython  IronPython  Cython  Psyco  and such   form a    b or a   a   b is fragile even in CPython and isn t present at all in implementations that don t use refcounting  reference counting is a technique of storing the number of references  pointers  or handles to a resource such as an object  block of memory  disk space or other resource   https   www python org dev peps pep-0008  programming-recommendations

User · Answer

I use the following with python 3 8  string4   f  string1  string2  string3

[python] Any reason not to use '+' to concatenate two strings?

Examples related to python

Examples related to string-concatenation

Examples related to anti-patterns