Create pandas Dataframe by appending one row at a time

Question

I understand that pandas is designed to load fully populated DataFrame but I need to create an empty DataFrame then add rows  one by one  What is the best way to do this    I successfully created an empty DataFrame with    res   DataFrame columns   lib    qty1    qty2      Then I can add a new row and fill a field with    res   res set value len res    qty1   10 0    It works but seems very odd  -   it fails for adding string value   How can I add a new row to my DataFrame  with different columns type

User · Answer

If you have a data frame df and want to add a list new list as a new row to df  you can simply do  df loc len df     new list  If you want to add a new data frame new df under data frame df  then you can use  df append new df

User · Answer

For efficient appending see How to add an extra row to a pandas dataframe and Setting With Enlargement   Add rows through loc ix on non existing key index data  e g     In  1   se   pd Series  1 2 3    In  2   se Out 2    0    1 1    2 2    3 dtype  int64  In  3   se 5    5   In  4   se Out 4    0    1 0 1    2 0 2    3 0 5    5 0 dtype  float64   Or   In  1   dfi   pd DataFrame np arange 6  reshape 3 2                             columns   A   B                In  2   dfi Out 2       A  B 0  0  1 1  2  3 2  4  5  In  3   dfi loc    C     dfi loc    A    In  4   dfi Out 4       A  B  C 0  0  1  0 1  2  3  2 2  4  5  4 In  5   dfi loc 3    5  In  6   dfi Out 6       A  B  C 0  0  1  0 1  2  3  2 2  4  5  4 3  5  5  5

User · Answer

before going to add a row  we have to convert the dataframe to dictionary there you can see the keys as columns in dataframe and values of the columns are again stored in the dictionary but there key for every column is the index number in dataframe  That idea make me to write the below code   df2 df to dict   values   s 101   hyderabad  10 20 16 13 15 12 12 13 25 26 25 27  good   bad    this is total row that we are going to add i 0 for x in df columns     here df columns gives us the main dictionary key     df2 x  101  values i     here the 101 is our index number it is also key of sub dictionary     i  1

User · Answer

if you want to add row at the end append it as a list  valuestoappend    va1 val2 val3  res   res append pd Series valuestoappend index     lib    qty1    qty2    ignore index   True

User · Answer

Instead of a list of dictionaries as in ShikharDua s answer  we can also represent our table as a dictionary of lists  where each list stores one column in row-order  given we know our columns beforehand  At the end we construct our DataFrame once   For c columns and n rows  this uses 1 dictionary and c lists  versus 1 list and n dictionaries  The list of dictionaries method has each dictionary storing all keys and requires creating a new dictionary for every row  Here we only append to lists  which is constant time and theoretically very fast     current data data     Animal    cow    horse     Color    blue    red       adding a new row  be careful to ensure every column gets another value  data  Animal   append  mouse   data  Color   append  black      at the end  construct our DataFrame df   pd DataFrame data      Animal  Color   0    cow   blue   1  horse    red   2  mouse  black

User · Answer

initial data     lib   np array  1 2 3 4     qty1    1 2 3 4    qty2    1 2 3 4    df   pd DataFrame initial data   df  lib qty1    qty2 0   1   1   1 1   2   2   2 2   3   3   3 3   4   4   4  val 1    10  val 2    14  val 3    20   df append pd DataFrame   lib   val 1   qty1   val 2   qty2   val 3     lib qty1    qty2 0   1   1   1 1   2   2   2 2   3   3   3 3   4   4   4 0   10  14  20   You can use for loop to iterate through values or can add arrays of values  val 1    10  11  12  13  val 2    14  15  16  17  val 3    20  21  22  43   df append pd DataFrame   lib   val 1   qty1   val 2   qty2   val 3     lib qty1    qty2 0   1   1   1 1   2   2   2 2   3   3   3 3   4   4   4 0   10  14  20 1   11  15  21 2   12  16  22 3   13  17  43

User · Answer

Create a new record data frame  and add to old data frame  pass list of values and corresponding column names to create a new record  data frame   new record   pd DataFrame   0  abcd  0 1 123   columns   a   b   c   d   e     old data frame   pd concat  old data frame new record

User · Answer

You can append a single row as a dictionary using the ignore index option    gt  gt  gt  f   pandas DataFrame data     Animal    cow   horse     Color    blue    red      gt  gt  gt  f   Animal Color 0    cow  blue 1  horse   red  gt  gt  gt  f append   Animal   mouse    Color   black    ignore index True    Animal  Color 0    cow   blue 1  horse    red 2  mouse  black

User · Answer

You can also build up a list of lists and convert it to a dataframe -   import pandas as pd  columns     i   double   square   rows       for i in range 6       row    i  i 2  i i      rows append row   df   pd DataFrame rows  columns columns    giving       i   double  square 0   0   0   0 1   1   2   1 2   2   4   4 3   3   6   9 4   4   8   16 5   5   10  25

User · Answer

Figured out a simple and nice way   gt  gt  gt  df      A  B  C one  1  2  3  gt  gt  gt  df loc  quot two quot      4 5 6   gt  gt  gt  df      A  B  C one  1  2  3 two  4  5  6  Note the caveat with performance as noted in the comments

User · Answer

Another way to do it  probably not very performant      add a row def add row df  row       colnames   list df columns      ncol   len colnames      assert ncol    len row    Length of row must be the same as width of DataFrame   s    row     return df append pd DataFrame  row   columns colnames     You can also enhance the DataFrame class like this   import pandas as pd def add row self  row       self loc len self index     row pd DataFrame add row   add row

User · Answer

This is not an answer to the OP question but a toy example to illustrate the answer of  ShikharDua above which I found very useful    While this fragment is trivial  in the actual data I had 1 000s of rows  and many columns  and I wished to be able to group by different columns and then perform the stats below for more than one taget column  So having a reliable method for building the data frame one row at a time was a great convenience  Thank you  ShikharDua     import pandas as pd   BaseData   pd DataFrame    Customer      Acme   Mega   Acme   Acme   Mega   Acme                               Territory       West   East   South   West   East   South                               Product       Econ   Luxe   Econ   Std   Std   Econ     BaseData  columns     Customer   Num Unique Products    List Unique Products    rows list    for name  group in BaseData groupby  Customer        RecordtoAdd     initialise an empty dict      RecordtoAdd update   Customer    name         RecordtoAdd update   Num Unique Products    len pd unique group  Product                 RecordtoAdd update   List Unique Products    pd unique group  Product                              rows list append RecordtoAdd   AnalysedData   pd DataFrame rows list   print  Base Data    n  BaseData   n n Analysed Data    n  AnalysedData

User · Answer

Make it simple  By taking list as input which will be appended as row in data-frame -    import pandas as pd   res   pd DataFrame columns   lib    qty1    qty2      for i in range 5         res list   list map int  input   split           res   res append pd Series res list index   lib   qty1   qty2     ignore index True

User · Answer

NEVER grow a DataFrame  Yes  people have already explained that you should NEVER grow a DataFrame  and that you should append your data to a list and convert it to a DataFrame once at the end  But do you understand why  Here are the most important reasons  taken from my post here   It is always cheaper faster to append to a list and create a DataFrame in one go  Lists take up less memory and are a much lighter data structure to work with  append  and remove  dtypes are automatically inferred for your data  On the flip side  creating an empty frame of NaNs will automatically make them object  which is bad  An index is automatically created for you  instead of you having to take care to assign the correct index to the row you are appending   This is The Right Way    to accumulate your data data      for a  b  c in some function that yields data        data append  a  b  c    df   pd DataFrame data  columns   A    B    C     These options are horrible  append or concat inside a loop append and concat aren t inherently bad in isolation  The problem starts when you iteratively call them inside a loop - this results in quadratic memory usage    Creates empty DataFrame and appends df   pd DataFrame columns   A    B    C    for a  b  c in some function that yields data        df   df append   A   i   B   b   C   c   ignore index True          This is equally bad        df   pd concat               df  pd Series   A   i   B   b   C   c                  ignore index True    Empty DataFrame of NaNs Never create a DataFrame of NaNs as the columns are initialized with object  slow  un-vectorizable dtype     Creates DataFrame of NaNs and overwrites values  df   pd DataFrame columns   A    B    C    index range 5   for a  b  c in some function that yields data        df loc len df      a  b  c     The Proof is in the Pudding Timing these methods is the fastest way to see just how much they differ in terms of their memory and utility   Benchmarking code for reference   It s posts like this that remind me why I m a part of this community  People understand the importance of teaching folks getting the right answer with the right code  not the right answer with wrong code  Now you might argue that it is not an issue to use loc or append if you re only adding a single row to your DataFrame  However  people often look to this question to add more than just one row - often the requirement is to iteratively add a row inside a loop using data that comes from a function  see related question   In that case it is important to understand that iteratively growing a DataFrame is not a good idea

User · Answer

mycolumns     A    B   df   pd DataFrame columns mycolumns  rows     1 2   3 4   5 6   for row in rows      df loc len df     row

User · Answer

This will take care of adding an item to an empty DataFrame  The issue is that df index max      nan for the first index   df   pd DataFrame columns   timeMS    accelX    accelY    accelZ    gyroX    gyroY    gyroZ     df loc 0 if math isnan df index max    else df index max     1     x for x in range 7

User · Answer

If you know the number of entries ex ante  you should preallocate the space by also providing the index  taking the data example from a different answer    import pandas as pd import numpy as np   we know we re gonna have 5 rows of data numberOfRows   5   create dataframe df   pd DataFrame index np arange 0  numberOfRows   columns   lib    qty1    qty2        now fill it up row by row for x in np arange 0  numberOfRows        loc or iloc both work here since the index is natural numbers     df loc x     np random randint -1 1  for n in range 3   In 23   df Out 23       lib  qty1  qty2 0   -1    -1    -1 1    0     0     0 2   -1     0    -1 3    0    -1     0 4   -1     0     0   Speed comparison  In 30    timeit tryThis     function wrapper for this answer In 31    timeit tryOther     function wrapper without index  see  for example   fred  1000 loops  best of 3  1 23 ms per loop 100 loops  best of 3  2 31 ms per loop   And - as from the comments - with a size of 6000  the speed difference becomes even larger      Increasing the size of the array  12  and the number of rows  500  makes   the speed difference more striking  313ms vs 2 29s

User · Answer

Here is the way to add append a row in pandas DataFrame  def add row df  row       df loc -1    row     df index   df index   1       return df sort index    add row df   1 2 3      It can be used to insert append a row in empty or populated pandas DataFrame

User · Answer

If all data in your Dataframe has the same dtype you might use a numpy array  You can write rows directly into the predefined array and convert it to a dataframe at the end  Seems to be even faster than converting a list of dicts  import pandas as pd import numpy as np from string import ascii uppercase  startTime   time perf counter   numcols  numrows   5  10000 npdf   np ones  numrows  numcols   for row in range numrows       npdf row  0     np random randint 0  100   1  numcols   df5   pd DataFrame npdf  columns list ascii uppercase  numcols          print  Elapsed time    6 3f  seconds for   d  rows  format time perf counter   - startTime  numOfRows   print df5 shape

User · Answer

You can use generator object to create Dataframe  which will be more memory efficient over the list   num   10    Generator function to generate generator object def numgen func num       for i in range num           yield   name     format i    i i    i i i      Generator expression to generate generator object  Only once data get populated  can not be re used  numgen expression      name     format i    i i    i i i   for i in range num     df   pd DataFrame data numgen func num   columns   lib    qty1    qty2      To add raw to existing DataFrame you can use append method   df   df append     lib    name 20    qty1   20   qty2   400

User · Answer

You could use pandas concat   or DataFrame append    For details and examples  see Merge  join  and concatenate

User · Answer

In case you can get all data for the data frame upfront  there is a much faster approach than appending to a data frame     Create a list of dictionaries in which each dictionary corresponds to an input data row   Create a data frame from this list    I had a similar task for which appending to a data frame row by row took 30 min  and creating a data frame from a list of dictionaries completed within seconds   rows list      for row in input rows           dict1                get input row in dictionary format           key   col name         dict1 update blah              rows list append dict1   df   pd DataFrame rows list

User · Answer

It s been a long time  but I faced the same problem too  And found here a lot of interesting answers  So I was confused what method to use   In the case of adding a lot of rows to dataframe I interested in speed performance  So I tried 4 most popular methods and checked their speed   UPDATED IN 2019 using new versions of packages   Also updated after  FooBar comment  SPEED PERFORMANCE   Using  append  NPE s answer  Using  loc  fred s answer  Using  loc with preallocating  FooBar s answer  Using dict and create DataFrame in the end  ShikharDua s answer    Results  in secs     ------------ ------------- ------------- -------------     Approach     1000 rows     5000 rows    10 000 rows    ------------ ------------- ------------- -------------     append         0 69          3 39          6 78        ------------ ------------- ------------- -------------     loc w o        0 74          3 90          8 35         prealloc                                                ------------ ------------- ------------- -------------     loc with       0 24          2 58          8 70         prealloc                                                ------------ ------------- ------------- -------------     dict           0 012        0 046         0 084        ------------ ------------- ------------- -------------    Also thanks to  krassowski for useful comment - I updated the code   So I use addition through the dictionary for myself     Code   import pandas as pd import numpy as np import time  del df1  df2  df3  df4 numOfRows   1000   append startTime   time perf counter   df1   pd DataFrame np random randint 100  size  5 5    columns   A    B    C    D    E    for i in range  1 numOfRows-4       df1   df1 append  dict   a np random randint 100   for a in   A   B   C   D   E     ignore index True  print  Elapsed time    6 3f  seconds for   d  rows  format time perf counter   - startTime  numOfRows   print df1 shape      loc w o prealloc startTime   time perf counter   df2   pd DataFrame np random randint 100  size  5 5    columns   A    B    C    D    E    for i in range  1 numOfRows       df2 loc i     np random randint 100  size  1 5   0  print  Elapsed time    6 3f  seconds for   d  rows  format time perf counter   - startTime  numOfRows   print df2 shape      loc with prealloc df3   pd DataFrame index np arange 0  numOfRows   columns   A    B    C    D    E     startTime   time perf counter   for i in range  1 numOfRows       df3 loc i     np random randint 100  size  1 5   0  print  Elapsed time    6 3f  seconds for   d  rows  format time perf counter   - startTime  numOfRows   print df3 shape     dict startTime   time perf counter   row list      for i in range  0 5       row list append dict   a np random randint 100   for a in   A   B   C   D   E     for i in range  1 numOfRows-4       dict1   dict   a np random randint 100   for a in   A   B   C   D   E        row list append dict1   df4   pd DataFrame row list  columns   A   B   C   D   E    print  Elapsed time    6 3f  seconds for   d  rows  format time perf counter   - startTime  numOfRows   print df4 shape    P S  I believe  my realization isn t perfect  and maybe there is some optimization

User · Answer

pandas DataFrame append  DataFrame append self  other  ignore index False  verify integrity False  sort False     DataFrame   df   pd DataFrame   1  2    3  4    columns list  AB    df2   pd DataFrame   5  6    7  8    columns list  AB    df append df2    With ignore index set to True   df append df2  ignore index True

User · Answer

For the sake of Pythonic way  here add my answer   res   pd DataFrame columns   lib    qty1    qty2    res   res append    qty1  10 0    ignore index True  print res head        lib  qty1  qty2 0  NaN  10 0   NaN

User · Answer

All you need is loc df shape 0   or loc len df        Assuming your df has 4 columns  str  int  str  bool  df loc df shape 0       col1Value   100   col3Value   False     or  df loc len df       col1Value   100   col3Value   False

User · Answer

You can concatenate two DataFrames for this  I basically came across this problem to add a new row to an existing DataFrame with a character index not numeric   So  I input the data for a new row in a duct   and index in a list   new dict    put input for new row here  new list    put your index here   new df   pd DataFrame data new dict  index new list   df   pd concat  existing df  new df

User · Answer

You can use df loc i   where the row with index i will be what you specify it to be in the dataframe   gt  gt  gt  import pandas as pd  gt  gt  gt  from numpy random import randint   gt  gt  gt  df   pd DataFrame columns   lib    qty1    qty2     gt  gt  gt  for i in range 5    gt  gt  gt      df loc i      name    str i     list randint 10  size 2     gt  gt  gt  df      lib qty1 qty2 0  name0    3    3 1  name1    2    4 2  name2    2    8 3  name3    2    1 4  name4    9    6

User · Answer

We often see the construct df loc subscript        to assign to one DataFrame row  Mikhail Sam posted benchmarks containing  among others  this construct as well as the method using dict and create DataFrame in the end  He found the latter to be the fastest by far  But if we replace the df3 loc i          with preallocated DataFrame  in his code with df3 values i         the outcome changes significantly  in that that method performs similar to the one using dict  So we should more often take the use of df values subscript        into consideration  However note that  values takes a zero-based subscript  which may be different from the DataFrame index

[python] Create pandas Dataframe by appending one row at a time

Examples related to python

Examples related to pandas

Examples related to dataframe

Examples related to append