[python] Append an empty row in dataframe using pandas

I am trying to append an empty row at the end of dataframe but unable to do so, even trying to understand how pandas work with append function and still not getting it.

Here's the code:

import pandas as pd

excel_names = ["ARMANI+EMPORIO+AR0143-book.xlsx"]
excels = [pd.ExcelFile(name) for name in excel_names]
frames = [x.parse(x.sheet_names[0], header=None,index_col=None).dropna(how='all') for x in excels]
for f in frames:
    f.append(0, float('NaN'))
    f.append(2, float('NaN'))

There are two columns and random number of row.

with "print f" in for loop i Get this:

                             0                 1
0                   Brand Name    Emporio Armani
2                 Model number            AR0143
4                  Part Number            AR0143
6                   Item Shape       Rectangular
8   Dial Window Material Type           Mineral
10               Display Type          Analogue
12                 Clasp Type            Buckle
14               Case Material   Stainless steel
16              Case Diameter    31 millimetres
18               Band Material           Leather
20                 Band Length  Women's Standard
22                 Band Colour             Black
24                 Dial Colour             Black
26            Special Features       second-hand
28                    Movement            Quartz

This question is related to python python-2.7 pandas

The answer is


Append "empty" row to data frame and fill selected cells:

Generate empty data frame (no rows just columns a and b):

import pandas as pd    
col_names =  ["a","b"]
df  = pd.DataFrame(columns = col_names)

Append empty row at the end of the data frame:

df = df.append(pd.Series(), ignore_index = True)

Now fill the empty cell at the end (len(df)-1) of the data frame in column a:

df.loc[[len(df)-1],'a'] = 123

Result:

     a    b
0  123  NaN

And of course one can iterate over the rows and fill cells:

col_names =  ["a","b"]
df  = pd.DataFrame(columns = col_names)
for x in range(0,5):
    df = df.append(pd.Series(), ignore_index = True)
    df.loc[[len(df)-1],'a'] = 123

Result:

     a    b
0  123  NaN
1  123  NaN
2  123  NaN
3  123  NaN
4  123  NaN

You can also use:

your_dataframe.insert(loc=0, value=np.nan, column="")

where loc is your empty row index.


Assuming df is your dataframe,

df_prime = pd.concat([df, pd.DataFrame([[np.nan] * df.shape[1]], columns=df.columns)], ignore_index=True)

where df_prime equals df with an additional last row of NaN's.

Note that pd.concat is slow so if you need this functionality in a loop, it's best to avoid using it. In that case, assuming your index is incremental, you can use

df.loc[df.iloc[-1].name + 1,:] = np.nan

You can add a new series, and name it at the same time. The name will be the index of the new row, and all the values will automatically be NaN.

df.append(pd.Series(name='Afterthought'))

Assuming your df.index is sorted you can use:

df.loc[df.index.max() + 1] = None

It handles well different indexes and column types.

[EDIT] it works with pd.DatetimeIndex if there is a constant frequency, otherwise we must specify the new index exactly e.g:

df.loc[df.index.max() + pd.Timedelta(milliseconds=1)] = None

long example:

df = pd.DataFrame([[pd.Timestamp(12432423), 23, 'text_field']], 
                    columns=["timestamp", "speed", "text"],
                    index=pd.DatetimeIndex(start='2111-11-11',freq='ms', periods=1))
df.info()

<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 1 entries, 2111-11-11 to 2111-11-11 Freq: L Data columns (total 3 columns): timestamp 1 non-null datetime64[ns] speed 1 non-null int64 text 1 non-null object dtypes: datetime64[ns](1), int64(1), object(1) memory usage: 32.0+ bytes

df.loc[df.index.max() + 1] = None
df.info()

<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 2 entries, 2111-11-11 00:00:00 to 2111-11-11 00:00:00.001000 Data columns (total 3 columns): timestamp 1 non-null datetime64[ns] speed 1 non-null float64 text 1 non-null object dtypes: datetime64[ns](1), float64(1), object(1) memory usage: 64.0+ bytes

df.head()

                            timestamp                   speed      text
2111-11-11 00:00:00.000 1970-01-01 00:00:00.012432423   23.0    text_field
2111-11-11 00:00:00.001 NaT NaN NaN

You can add it by appending a Series to the dataframe as follows. I am assuming by blank you mean you want to add a row containing only "Nan". You can first create a Series object with Nan. Make sure you specify the columns while defining 'Series' object in the -Index parameter. The you can append it to the DF. Hope it helps!

from numpy import nan as Nan
import pandas as pd

>>> df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
...                     'B': ['B0', 'B1', 'B2', 'B3'],
...                     'C': ['C0', 'C1', 'C2', 'C3'],
...                     'D': ['D0', 'D1', 'D2', 'D3']},
...                     index=[0, 1, 2, 3])

>>> s2 = pd.Series([Nan,Nan,Nan,Nan], index=['A', 'B', 'C', 'D'])
>>> result = df1.append(s2)
>>> result
     A    B    C    D
0   A0   B0   C0   D0
1   A1   B1   C1   D1
2   A2   B2   C2   D2
3   A3   B3   C3   D3
4  NaN  NaN  NaN  NaN

The code below worked for me.

df.append(pd.Series([np.nan]), ignore_index = True)

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to python-2.7

Numpy, multiply array with scalar Not able to install Python packages [SSL: TLSV1_ALERT_PROTOCOL_VERSION] How to create a new text file using Python Could not find a version that satisfies the requirement tensorflow Python: Pandas pd.read_excel giving ImportError: Install xlrd >= 0.9.0 for Excel support Display/Print one column from a DataFrame of Series in Pandas How to calculate 1st and 3rd quartiles? How can I read pdf in python? How to completely uninstall python 2.7.13 on Ubuntu 16.04 Check key exist in python dict

Examples related to pandas

xlrd.biffh.XLRDError: Excel xlsx file; not supported Pandas Merging 101 How to increase image size of pandas.DataFrame.plot in jupyter notebook? Trying to merge 2 dataframes but get ValueError Python Pandas User Warning: Sorting because non-concatenation axis is not aligned How to show all of columns name on pandas dataframe? Pandas/Python: Set value of one column based on value in another column Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Python convert object to float