I have the following for loop:
for i in links:
data = urllib2.urlopen(str(i)).read()
data = json.loads(data)
data = pd.DataFrame(data.items())
data = data.transpose()
data.columns = data.iloc[0]
data = data.drop(data.index[[0]])
Each dataframe so created has most columns in common with the others but not all of them. Moreover, they all have just one row. What I need to to is to add to the dataframe all the distinct columns and each row from each dataframe produced by the for loop
I tried pandas concatenate or similar but nothing seemed to work. Any idea? Thanks.
There are 2 reasons you may append rows in a loop, 1. add to an existing df, and 2. create a new df.
to create a new df, I think its well documented that you should either create your data as a list and then create the data frame:
cols = ['c1', 'c2', 'c3']
lst = []
for a in range(2):
lst.append([1, 2, 3])
df1 = pd.DataFrame(lst, columns=cols)
df1
Out[3]:
c1 c2 c3
0 1 2 3
1 1 2 3
OR, Create the dataframe with an index and then add to it
cols = ['c1', 'c2', 'c3']
df2 = pd.DataFrame(columns=cols, index=range(2))
for a in range(2):
df2.loc[a].c1 = 4
df2.loc[a].c2 = 5
df2.loc[a].c3 = 6
df2
Out[4]:
c1 c2 c3
0 4 5 6
1 4 5 6
If you want to add to an existing dataframe, you could use either method above and then append the df's together (with or without the index):
df3 = df2.append(df1, ignore_index=True)
df3
Out[6]:
c1 c2 c3
0 4 5 6
1 4 5 6
2 1 2 3
3 1 2 3
Or, you can also create a list of dictionary entries and append those as in the answer above.
lst_dict = []
for a in range(2):
lst_dict.append({'c1':2, 'c2':2, 'c3': 3})
df4 = df1.append(lst_dict)
df4
Out[7]:
c1 c2 c3
0 1 2 3
1 1 2 3
0 2 2 3
1 2 2 3
Using the dict(zip(cols, vals)))
lst_dict = []
for a in range(2):
vals = [7, 8, 9]
lst_dict.append(dict(zip(cols, vals)))
df5 = df1.append(lst_dict)
First, create a empty DataFrame with column names, after that, inside the for loop, you must define a dictionary (a row) with the data to append:
df = pd.DataFrame(columns=['A'])
for i in range(5):
df = df.append({'A': i}, ignore_index=True)
df
A
0 0
1 1
2 2
3 3
4 4
If you want to add a row with more columns, the code will looks like this:
df = pd.DataFrame(columns=['A','B','C'])
for i in range(5):
df = df.append({'A': i,
'B': i * 2,
'C': i * 3,
}
,ignore_index=True
)
df
A B C
0 0 0 0
1 1 2 3
2 2 4 6
3 3 6 9
4 4 8 12
A more compact and efficient way would be perhaps:
cols = ['frame', 'count']
N = 4
dat = pd.DataFrame(columns = cols)
for i in range(N):
dat = dat.append({'frame': str(i), 'count':i},ignore_index=True)
output would be:
>>> dat
frame count
0 0 0
1 1 1
2 2 2
3 3 3
I have created a data frame in a for loop with the help of a temporary empty data frame. Because for every iteration of for loop, a new data frame will be created thereby overwriting the contents of previous iteration.
Hence I need to move the contents of the data frame to the empty data frame that was created already. It's as simple as that. We just need to use .append function as shown below :
temp_df = pd.DataFrame() #Temporary empty dataframe
for sent in Sentences:
New_df = pd.DataFrame({'words': sent.words}) #Creates a new dataframe and contains tokenized words of input sentences
temp_df = temp_df.append(New_df, ignore_index=True) #Moving the contents of newly created dataframe to the temporary dataframe
Outside the for loop, you can copy the contents of the temporary data frame into the master data frame and then delete the temporary data frame if you don't need it
Source: Stackoverflow.com