Adding new column to existing DataFrame in Python pandas

Question

I have the following indexed DataFrame with named columns and rows not- continuous numbers             a         b         c         d 2  0 671399  0 101208 -0 181532  0 241273 3  0 446172 -0 243316  0 051767  1 577318 5  0 614758  0 075793 -0 451460 -0 012493   I would like to add a new column   e   to the existing data frame and do not want to change anything in the data frame  i e   the new column always has the same length as the DataFrame     0   -0 335485 1   -1 166658 2   -0 385571 dtype  float64   How can I add column e to the above example

User · Answer

For the sake of completeness - yet another solution using DataFrame.eval() method:

Data:

In [44]: e
Out[44]:
0    1.225506
1   -1.033944
2   -0.498953
3   -0.373332
4    0.615030
5   -0.622436
dtype: float64

In [45]: df1
Out[45]:
          a         b         c         d
0 -0.634222 -0.103264  0.745069  0.801288
4  0.782387 -0.090279  0.757662 -0.602408
5 -0.117456  2.124496  1.057301  0.765466
7  0.767532  0.104304 -0.586850  1.051297
8 -0.103272  0.958334  1.163092  1.182315
9 -0.616254  0.296678 -0.112027  0.679112

Solution:

In [46]: df1.eval("e = @e.values", inplace=True)

In [47]: df1
Out[47]:
          a         b         c         d         e
0 -0.634222 -0.103264  0.745069  0.801288  1.225506
4  0.782387 -0.090279  0.757662 -0.602408 -1.033944
5 -0.117456  2.124496  1.057301  0.765466 -0.498953
7  0.767532  0.104304 -0.586850  1.051297 -0.373332
8 -0.103272  0.958334  1.163092  1.182315  0.615030
9 -0.616254  0.296678 -0.112027  0.679112 -0.622436

User · Answer

If the column you are trying to add is a series variable then just    df  new columns name   series variable name  this will do it for you   This works well even if you are replacing an existing column just type the new columns name same as the column you want to replace It will just overwrite the existing column data with the new series data

User · Answer

Use the original df1 indexes to create the series   df1  e     pd Series np random randn sLength   index df1 index      Edit 2015 Some reported getting the SettingWithCopyWarning with this code  However  the code still runs perfectly with the current pandas version 0 16 1    gt  gt  gt  sLength   len df1  a     gt  gt  gt  df1           a         b         c         d 6 -0 269221 -0 026476  0 997517  1 294385 8  0 917438  0 847941  0 034235 -0 448948   gt  gt  gt  df1  e     pd Series np random randn sLength   index df1 index   gt  gt  gt  df1           a         b         c         d         e 6 -0 269221 -0 026476  0 997517  1 294385  1 757167 8  0 917438  0 847941  0 034235 -0 448948  2 228131   gt  gt  gt  p version short version  0 16 1    The SettingWithCopyWarning aims to inform of a possibly invalid assignment on a copy of the Dataframe  It doesn t necessarily say you did it wrong  it can trigger false positives  but from 0 13 0 it let you know there are more adequate methods for the same purpose  Then  if you get the warning  just follow its advise  Try using  loc row index col indexer    value instead   gt  gt  gt  df1 loc    f     pd Series np random randn sLength   index df1 index   gt  gt  gt  df1           a         b         c         d         e         f 6 -0 269221 -0 026476  0 997517  1 294385  1 757167 -0 050927 8  0 917438  0 847941  0 034235 -0 448948  2 228131  0 006109  gt  gt  gt     In fact  this is currently the more efficient method as described in pandas docs    Edit 2017  As indicated in the comments and by  Alexander  currently the best method to add the values of a Series as a new column of a DataFrame could be using assign   df1   df1 assign e pd Series np random randn sLength   values

User · Answer

x pd DataFrame  1 2 3 4 5    y pd DataFrame  5 4 3 2 1    z pd concat  x y  axis 1

User · Answer

Easiest ways -  data  new col     list of values  data loc       new col     list of values   This way you avoid what is called chained indexing when setting new values in a pandas object  Click here to read further

User · Answer

It seems that in recent Pandas versions the way to go is to use df assign   df1   df1 assign e np random randn sLength    It doesn t produce SettingWithCopyWarning

User · Answer

If you want to set the whole new column to an initial base value  e g  None   you can do this  df1  e     None  This actually would assign  object  type to the cell  So later you re free to put complex data types  like list  into individual cells

User · Answer

I was looking for a general way of adding a column of numpy nans to a dataframe without getting the dumb SettingWithCopyWarning   From the following    the answers here this question about passing a variable as a keyword argument this method for generating a numpy array of NaNs in-line   I came up with this   col    column name  df   df assign    col numpy full len df   numpy nan

User · Answer

Foolproof   df loc     NewCol      New Val    Example   df   pd DataFrame data np random randn 20  4   columns   A    B    C    D     df             A         B         C         D 0  -0 761269  0 477348  1 170614  0 752714 1   1 217250 -0 930860 -0 769324 -0 408642 2  -0 619679 -1 227659 -0 259135  1 700294 3  -0 147354  0 778707  0 479145  2 284143 4  -0 529529  0 000571  0 913779  1 395894 5   2 592400  0 637253  1 441096 -0 631468 6   0 757178  0 240012 -0 553820  1 177202 7  -0 986128 -1 313843  0 788589 -0 707836 8   0 606985 -2 232903 -1 358107 -2 855494 9  -0 692013  0 671866  1 179466 -1 180351 10 -1 093707 -0 530600  0 182926 -1 296494 11 -0 143273 -0 503199 -1 328728  0 610552 12 -0 923110 -1 365890 -1 366202 -1 185999 13 -2 026832  0 273593 -0 440426 -0 627423 14 -0 054503 -0 788866 -0 228088 -0 404783 15  0 955298 -1 430019  1 434071 -0 088215 16 -0 227946  0 047462  0 373573 -0 111675 17  1 627912  0 043611  1 743403 -0 012714 18  0 693458  0 144327  0 329500 -0 655045 19  0 104425  0 037412  0 450598 -0 923387   df drop  3  5  8  10  18   inplace True   df             A         B         C         D 0  -0 761269  0 477348  1 170614  0 752714 1   1 217250 -0 930860 -0 769324 -0 408642 2  -0 619679 -1 227659 -0 259135  1 700294 4  -0 529529  0 000571  0 913779  1 395894 6   0 757178  0 240012 -0 553820  1 177202 7  -0 986128 -1 313843  0 788589 -0 707836 9  -0 692013  0 671866  1 179466 -1 180351 11 -0 143273 -0 503199 -1 328728  0 610552 12 -0 923110 -1 365890 -1 366202 -1 185999 13 -2 026832  0 273593 -0 440426 -0 627423 14 -0 054503 -0 788866 -0 228088 -0 404783 15  0 955298 -1 430019  1 434071 -0 088215 16 -0 227946  0 047462  0 373573 -0 111675 17  1 627912  0 043611  1 743403 -0 012714 19  0 104425  0 037412  0 450598 -0 923387  df loc     NewCol     0  df            A         B         C         D  NewCol 0  -0 761269  0 477348  1 170614  0 752714       0 1   1 217250 -0 930860 -0 769324 -0 408642       0 2  -0 619679 -1 227659 -0 259135  1 700294       0 4  -0 529529  0 000571  0 913779  1 395894       0 6   0 757178  0 240012 -0 553820  1 177202       0 7  -0 986128 -1 313843  0 788589 -0 707836       0 9  -0 692013  0 671866  1 179466 -1 180351       0 11 -0 143273 -0 503199 -1 328728  0 610552       0 12 -0 923110 -1 365890 -1 366202 -1 185999       0 13 -2 026832  0 273593 -0 440426 -0 627423       0 14 -0 054503 -0 788866 -0 228088 -0 404783       0 15  0 955298 -1 430019  1 434071 -0 088215       0 16 -0 227946  0 047462  0 373573 -0 111675       0 17  1 627912  0 043611  1 743403 -0 012714       0 19  0 104425  0 037412  0 450598 -0 923387       0

User · Answer

If you get the SettingWithCopyWarning  an easy fix is to copy the DataFrame you are trying to add a column to   df   df copy   df  col name     values

User · Answer

I would like to add a new column   e   to the existing data frame and do not change anything in the data frame   The series always got the same length as a dataframe      I assume that the index values in e match those in df1   The easiest way to initiate a new column named e  and assign it the values from your series e   df  e     e values   assign  Pandas 0 16 0    As of Pandas 0 16 0  you can also use assign  which assigns new columns to a DataFrame and returns a new object  a copy  with all the original columns in addition to the new ones   df1   df1 assign e e values    As per this example  which also includes the source code of the assign function   you can also include more than one column   df   pd DataFrame   a    1  2    b    3  4     gt  gt  gt  df assign mean a df a mean    mean b df b mean       a  b  mean a  mean b 0  1  3     1 5     3 5 1  2  4     1 5     3 5   In context with your example    np random seed 0  df1   pd DataFrame np random randn 10  4   columns   a    b    c    d    mask   df1 applymap lambda x  x  lt -0 7  df1   df1 -mask any axis 1   sLength   len df1  a    e   pd Series np random randn sLength     gt  gt  gt  df1           a         b         c         d 0  1 764052  0 400157  0 978738  2 240893 2 -0 103219  0 410599  0 144044  1 454274 3  0 761038  0 121675  0 443863  0 333674 7  1 532779  1 469359  0 154947  0 378163 9  1 230291  1 202380 -0 387327 -0 302303   gt  gt  gt  e 0   -1 048553 1   -1 420018 2   -1 706270 3    1 950775 4   -0 509652 dtype  float64  df1   df1 assign e e values    gt  gt  gt  df1           a         b         c         d         e 0  1 764052  0 400157  0 978738  2 240893 -1 048553 2 -0 103219  0 410599  0 144044  1 454274 -1 420018 3  0 761038  0 121675  0 443863  0 333674 -1 706270 7  1 532779  1 469359  0 154947  0 378163  1 950775 9  1 230291  1 202380 -0 387327 -0 302303 -0 509652   The description of this new feature when it was first introduced can be found here

User · Answer

to insert a new column at a given location  0  lt   loc  lt   amount of columns  in a data frame  just use Dataframe insert   DataFrame insert loc  column  value    Therefore  if you want to add the column e at the end of a data frame called df  you can use   e    -0 335485  -1 166658  -0 385571      DataFrame insert loc len df columns   column  e   value e    value can be a Series  an integer  in which case all cells get filled with this one value   or an array-like structure  https   pandas pydata org pandas-docs stable reference api pandas DataFrame insert html

User · Answer

First create a python s list of e that has relevant data   Use this       df  e     list of e

User · Answer

The following is what I did    But I m pretty new to pandas and really Python in general  so no promises   df   pd DataFrame   1  2    3  4    5 6    columns list  AB     newCol    3 5 7  newName    C   values   np insert df values df shape 1  newCol axis 1  header   df columns values tolist   header append newName   df   pd DataFrame values columns header

User · Answer

To add a new column   e   to the existing data frame    df1 loc    e     Series np random randn sLength

User · Answer

Let me just add that  just like for hum3   loc didn t solve the SettingWithCopyWarning and I had to resort to df insert    In my case false positive was generated by  fake  chain indexing  dict  a    e    where  e  is the new column  and dict  a   is a DataFrame coming from dictionary   Also note that if you know what you are doing  you can switch of the warning using pd options mode chained assignment   None and than use one of the other solutions given here

User · Answer

To create an empty column  df  i     None

User · Answer

If we want to assign a scaler value eg  10 to all rows of a new column in a df  df   df assign new col lambda x 10     x is each row passed in to the lambda func  df will now have new column  new col  with value 10 in all rows

User · Answer

this is a special case of adding a new column to a pandas dataframe  Here  I am adding a new feature column based on an existing column data of the dataframe   so  let our dataFrame has columns  feature 1    feature 2    probability score  and we have to add a new column  predicted class  based on data in column  probability score    I will use map   function from python and also define a function of my own which will implement the logic on how to give a particular class label to every row in my dataFrame   data   pd read csv  data csv    def myFunction x        implement your logic here     if so and so          return a    return b  variable 1   data  probability score   predicted class   variable 1 map myFunction   data  predicted class     predicted class     check dataFrame  new column is included based on an existing column data for each row data head

User · Answer

If the data frame and Series object have the same index  pandas concat also works here   import pandas as pd df            a            b           c           d  0  0 671399     0 101208   -0 181532    0 241273  1  0 446172    -0 243316    0 051767    1 577318  2  0 614758     0 075793   -0 451460   -0 012493  e   pd Series  -0 335485  -1 166658  -0 385571       e  0   -0 335485  1   -1 166658  2   -0 385571  dtype  float64    here we need to give the series object a name which converts to the new  column name    in the result df   pd concat  df  e rename  e     axis 1  df             a            b           c           d           e  0  0 671399     0 101208   -0 181532    0 241273   -0 335485  1  0 446172    -0 243316    0 051767    1 577318   -1 166658  2  0 614758     0 075793   -0 451460   -0 012493   -0 385571   In case they don t have the same index   e index   df index df   pd concat  df  e rename  e     axis 1

User · Answer

One thing to note  though  is that if you do  df1  e     Series np random randn sLength   index df1 index    this will effectively be a left join on the df1 index  So if you want to have an outer join effect  my probably imperfect solution is to create a dataframe with index values covering the universe of your data  and then use the code above  For example   data   pd DataFrame index all possible values  df1  e     Series np random randn sLength   index df1 index

User · Answer

Doing this directly via NumPy will be the most efficient   df1  e     np random randn sLength      Note my original  very old  suggestion was to use map  which is much slower    df1  e     df1  a   map lambda x  np random random

User · Answer

Before assigning a new column  if you have indexed data  you need to sort the index  At least in my case I had to   data set index   index column    inplace True   if index is unsorted  assignment of a new column will fail          data sort index inplace   True  data loc  index value1    column y     np random randn data loc  index value1    column x   shape 0

User · Answer

Super simple column assignment  A pandas dataframe is implemented as an ordered dict of columns   This means that the   getitem      can not only be used to get a certain column  but   setitem        can be used to assign a new column   For example  this dataframe can have a column added to it by simply using the    accessor      size      name color 0    big      rose   red 1  small    violet  blue 2  small     tulip   red 3  small  harebell  blue  df  protected       no    no    no    yes        size      name color protected 0    big      rose   red        no 1  small    violet  blue        no 2  small     tulip   red        no 3  small  harebell  blue       yes   Note that this works even if the index of the dataframe is off   df index    3 2 1 0  df  protected       no    no    no    yes       size      name color protected 3    big      rose   red        no 2  small    violet  blue        no 1  small     tulip   red        no 0  small  harebell  blue       yes       is the way to go  but watch out   However  if you have a pd Series and try to assign it to a dataframe where the indexes are off  you will run in to trouble  See example   df  protected     pd Series   no    no    no    yes        size      name color protected 3    big      rose   red       yes 2  small    violet  blue        no 1  small     tulip   red        no 0  small  harebell  blue        no   This is because a pd Series by default has an index enumerated from 0 to n  And the pandas      method tries to be  smart   What actually is going on   When you use the      method pandas is quietly performing an outer join or outer merge using the index of the left hand dataframe and the index of the right hand series  df  column     series  Side note  This quickly causes cognitive dissonance  since the     method is trying to do a lot of different things depending on the input  and the outcome cannot be predicted unless you just know how pandas works  I would therefore advice against the     in code bases  but when exploring data in a notebook  it is fine   Going around the problem  If you have a pd Series and want it assigned from top to bottom  or if you are coding productive code and you are not sure of the index order  it is worth it to safeguard for this kind of issue   You could downcast the pd Series to a np ndarray or a list  this will do the trick   df  protected     pd Series   no    no    no    yes    values   or  df  protected     list pd Series   no    no    no    yes       But this is not very explicit   Some coder may come along and say  Hey  this looks redundant  I ll just optimize this away    Explicit way  Setting the index of the pd Series to be the index of the df is explicit   df  protected     pd Series   no    no    no    yes    index df index    Or more realistically  you probably have a pd Series already available   protected series   pd Series   no    no    no    yes    protected series index   df index  3     no 2     no 1     no 0    yes   Can now be assigned  df  protected     protected series      size      name color protected 3    big      rose   red        no 2  small    violet  blue        no 1  small     tulip   red        no 0  small  harebell  blue       yes   Alternative way with df reset index    Since the index dissonance is the problem  if you feel that the index of the dataframe should not dictate things  you can simply drop the index  this should be faster  but it is not very clean  since your function now probably does two things   df reset index drop True  protected series reset index drop True  df  protected     protected series      size      name color protected 0    big      rose   red        no 1  small    violet  blue        no 2  small     tulip   red        no 3  small  harebell  blue       yes   Note on df assign  While df assign make it more explicit what you are doing  it actually has all the same problems as the above      df assign protected pd Series   no    no    no    yes         size      name color protected 3    big      rose   red       yes 2  small    violet  blue        no 1  small     tulip   red        no 0  small  harebell  blue        no   Just watch out with df assign that your column is not called self  It will cause errors  This makes df assign smelly  since there are these kind of artifacts in the function   df assign self pd Series   no    no    no    yes    TypeError  assign   got multiple values for keyword argument  self    You may say   Well  I ll just not use self then   But who knows how this function changes in the future to support new arguments  Maybe your column name will be an argument in a new update of pandas  causing problems with upgrading

User · Answer

I got the dreaded SettingWithCopyWarning  and it wasn t fixed by using the iloc syntax  My DataFrame was created by read sql from an ODBC source  Using a suggestion by lowtech above  the following worked for me   df insert len df columns    e   pd Series np random randn sLength    index df index     This worked fine to insert the column at the end  I don t know if it is the most efficient  but I don t like warning messages  I think there is a better solution  but I can t find it  and I think it depends on some aspect of the index  Note  That this only works once and will give an error message if trying to overwrite and existing column  Note As above and from 0 16 0 assign is the best solution  See documentation http   pandas pydata org pandas-docs stable generated pandas DataFrame assign html pandas DataFrame assign  Works well for data flow type where you don t overwrite your intermediate values

User · Answer

If you just need to create a new empty column then the shortest solution is  df loc     e     pd Series

User · Answer

This is the simple way of adding a new column  df  e     e

[python] Adding new column to existing DataFrame in Python pandas

Examples related to python

Examples related to pandas

Examples related to dataframe

Examples related to chained-assignment