String concatenation of two pandas columns

Question

I have a following DataFrame   from pandas import   df   DataFrame   foo    a   b   c     bar   1  2  3      It looks like this       bar foo 0    1   a 1    2   b 2    3   c   Now I want to have something like        bar 0    1 is a 1    2 is b 2    3 is c   How can I achieve this  I tried the following   df  foo       s is  s     df  bar    df  foo      but it gives me a wrong result    gt  gt  gt print df ix 0   bar                                                    a foo    0    a 1    b 2    c Name  bar is 0    1 1    2 2 Name  0   Sorry for a dumb question  but this one pandas  combine two columns in a DataFrame wasn t helpful for me

User · Answer

This question has already been answered  but I believe it would be good to throw some useful methods not previously discussed into the mix  and compare all methods proposed thus far in terms of performance   Here are some useful solutions to this problem  in increasing order of performance      DataFrame agg  This is a simple str format-based approach   df  baz     df agg   0 bar   is  0 foo    format  axis 1  df   foo  bar     baz 0   a    1  1 is a 1   b    2  2 is b 2   c    3  3 is c   You can also use f-string formatting here   df  baz     df agg lambda x  f  x  bar    is  x  foo      axis 1  df   foo  bar     baz 0   a    1  1 is a 1   b    2  2 is b 2   c    3  3 is c     char array-based Concatenation  Convert the columns to concatenate as chararrays  then add them together   a   np char array df  bar   values  b   np char array df  foo   values   df  baz      a   b  is     b  astype str  df   foo  bar     baz 0   a    1  1 is a 1   b    2  2 is b 2   c    3  3 is c     List Comprehension with zip  I cannot overstate how underrated list comprehensions are in pandas     df  baz      str x      is     y for x  y in zip df  bar    df  foo         Alternatively  using str join to concat  will also scale better    df  baz               join  str x    is   y   for x  y in zip df  bar    df  foo         df   foo  bar     baz 0   a    1  1 is a 1   b    2  2 is b 2   c    3  3 is c   List comprehensions excel in string manipulation  because string operations are inherently hard to vectorize  and most pandas  vectorised  functions are basically wrappers around loops  I have written extensively about this topic in For loops with pandas - When should I care   In general  if you don t have to worry about index alignment  use a list comprehension when dealing with string and regex operations    The list comp above by default does not handle NaNs  However  you could always write a function wrapping a try-except if you needed to handle it   def try concat x  y       try          return str x      is     y     except  ValueError  TypeError           return np nan   df  baz      try concat x  y  for x  y in zip df  bar    df  foo         perfplot Performance Measurements    Graph generated using perfplot  Here s the complete code listing   Functions   def brenbarn df       return df assign baz df bar map str      is     df foo   def danielvelkov df       return df assign baz df apply          lambda x   s is  s     x  bar   x  foo    axis 1    def chrimuelle df       return df assign          baz df  bar   astype str  str cat df  foo   values  sep   is      def vladimiryashin df       return df assign baz df astype str  apply lambda x    is   join x   axis 1    def erickfis df       return df assign          baz df apply lambda x  f  x  bar    is  x  foo      axis 1    def cs1 format df       return df assign baz df agg   0 bar   is  0 foo    format  axis 1    def cs1 fstrings df       return df assign baz df agg lambda x  f  x  bar    is  x  foo      axis 1    def cs2 df       a   np char array df  bar   values      b   np char array df  foo   values       return df assign baz  a   b  is     b  astype str    def cs3 df       return df assign          baz  str x      is     y for x  y in zip df  bar    df  foo

User · Answer

You could also use  df  bar     df  bar   str cat df  foo   values astype str   sep   is

User · Answer

I have encountered a specific case from my side with 10 11 rows in my dataframe  and in this case none of the proposed solution is appropriate  I have used categories  and this should work fine in all cases when the number of unique string is not too large  This is easily done in the R software with XxY with factors but I could not find any other way to do it in python  I m new to python   If anyone knows a place where this is implemented I d be glad to know  def Create Interaction var df Varnames                df data frame      list of 2 column names  say  quot X quot  and  quot Y quot        The two columns should be strings or categories     convert strings columns to categories     Add a column with the  quot interaction of X and Y quot    X x Y  with name       quot Interaction-X Y quot              df loc    Varnames 0     df loc    Varnames 0   astype  quot category quot       df loc    Varnames 1     df loc    Varnames 1   astype  quot category quot       CatVar    quot Interaction- quot     quot - quot  join Varnames      Var0Levels   pd DataFrame enumerate df loc   Varnames 0   cat categories   rename columns  0    quot code0 quot  1    quot name0 quot        Var1Levels   pd DataFrame enumerate df loc   Varnames 1   cat categories   rename columns  0    quot code1 quot  1    quot name1 quot        NbLevels len Var0Levels       names   pd DataFrame list itertools product dict enumerate df loc   Varnames 0   cat categories                                                    dict enumerate df loc   Varnames 1   cat categories                               columns   code0    code1    merge Var0Levels on  quot code0 quot   merge Var1Levels on  quot code1 quot       names names assign Interaction  str x          y for x  y in zip names  quot name0 quot    names  quot name1 quot          names  quot code01 quot   names  quot code0 quot     NbLevels names  quot code1 quot       df loc   CatVar  df loc   Varnames 0   cat codes NbLevels df loc   Varnames 1   cat codes     df loc    CatVar    df  CatVar   replace names set index  quot code01 quot     quot Interaction quot    to dict    Interaction    CatVar      df loc    CatVar    df loc    CatVar  astype  quot category quot       return df

User · Answer

df  bar     df bar map str      is     df foo

User · Answer

DanielVelkov answer is the proper one BUT using string literals is faster     Daniel s  timeit df apply lambda x   s is  s     x  bar   x  foo    axis 1     963   s    157   s per loop  mean    std  dev  of 7 runs  1000 loops each     String literals - python 3  timeit df apply lambda x  f  x  bar    is  x  foo      axis 1     849   s    4 28   s per loop  mean    std  dev  of 7 runs  1000 loops each

User · Answer

series str cat is the most flexible way to approach this problem  For  df   pd DataFrame   foo    a   b   c     bar   1  2  3    df foo str cat df bar astype str   sep   is      gt  gt  gt   0    a is 1      1    b is 2      2    c is 3      Name  foo  dtype  object  OR df bar astype str  str cat df foo  sep   is      gt  gt  gt   0    1 is a      1    2 is b      2    3 is c      Name  bar  dtype  object  Unlike  join    which is for joining list contained in a single Series   this method is for joining 2 Series together  It also allows you to ignore or replace NaN values as desired

User · Answer

df astype str  apply lambda x    is   join x   axis 1   0    1 is a 1    2 is b 2    3 is c dtype  object

User · Answer

The problem in your code is that you want to apply the operation on every row  The way you ve written it though takes the whole  bar  and  foo  columns  converts them to strings and gives you back one big string  You can write it like   df apply lambda x   s is  s     x  bar   x  foo    axis 1    It s longer than the other answer but is more generic  can be used with values that are not strings

[python] String concatenation of two pandas columns

`DataFrame.agg`

`char.array`-based Concatenation

List Comprehension with `zip`

`perfplot` Performance Measurements

Examples related to python

Examples related to string

Examples related to pandas

Examples related to numpy

Examples related to dataframe