In R, how do you add a new row to a data frame once the data frame has already been initialized?
So far I have this:
df <- data.frame("hi", "bye")
names(df) <- c("hello", "goodbye")
#I am trying to add "hola" and "ciao" as a new row
de <- data.frame("hola", "ciao")
merge(df, de) # Adds to the same row as new columns
# Unfortunately, I couldn't find an rbind() solution that wouldn't give me an error
Any help would be appreciated
To formalize what someone else used setNames for:
add_row <- function(original_data, new_vals_list){
# appends row to dataset while assuming new vals are ordered and classed appropriately.
# new_vals must be a list not a single vector.
rbind(
original_data,
setNames(data.frame(new_vals_list), colnames(original_data))
)
}
It preserves class when legal and passes errors elsewhere.
m <- mtcars[ ,1:3]
m$cyl <- as.factor(m$cyl)
str(m)
#'data.frame': 32 obs. of 3 variables:
# $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
# $ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
# $ disp: num 160 160 108 258 360 ...
Factor preserved when adding 4, even though it was passed as a numeric.
str(add_row(m, list(20,4,160)))
#'data.frame': 33 obs. of 3 variables:
# $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
# $ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
# $ disp: num 160 160 108 258 360 ...
Attempting to pass a non- 4,6,8 would return an error that factor level is invalid.
str(add_row(m, list(20,3,160)))
# 'data.frame': 33 obs. of 3 variables:
# $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
# $ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
# $ disp: num 160 160 108 258 360 ...
Warning message:
In `[<-.factor`(`*tmp*`, ri, value = 3) :
invalid factor level, NA generated
There's now add_row()
from the tibble
or tidyverse
packages.
library(tidyverse)
df %>% add_row(hello = "hola", goodbye = "ciao")
Unspecified columns get an NA
.
I need to add stringsAsFactors=FALSE
when creating the dataframe.
> df <- data.frame("hello"= character(0), "goodbye"=character(0))
> df
[1] hello goodbye
<0 rows> (or 0-length row.names)
> df[nrow(df) + 1,] = list("hi","bye")
Warning messages:
1: In `[<-.factor`(`*tmp*`, iseq, value = "hi") :
invalid factor level, NA generated
2: In `[<-.factor`(`*tmp*`, iseq, value = "bye") :
invalid factor level, NA generated
> df
hello goodbye
1 <NA> <NA>
>
.
> df <- data.frame("hello"= character(0), "goodbye"=character(0), stringsAsFactors=FALSE)
> df
[1] hello goodbye
<0 rows> (or 0-length row.names)
> df[nrow(df) + 1,] = list("hi","bye")
> df[nrow(df) + 1,] = list("hola","ciao")
> df[nrow(df) + 1,] = list(hello="hallo",goodbye="auf wiedersehen")
> df
hello goodbye
1 hi bye
2 hola ciao
3 hallo auf wiedersehen
>
Let's make it simple:
df[nrow(df) + 1,] = c("v1","v2")
Not terribly elegant, but:
data.frame(rbind(as.matrix(df), as.matrix(de)))
From documentation of the rbind
function:
For
rbind
column names are taken from the first argument with appropriate names: colnames for a matrix...
I like list
instead of c
because it handles mixed data types better. Adding an additional column to the original poster's question:
#Create an empty data frame
df <- data.frame(hello=character(), goodbye=character(), volume=double())
de <- list(hello="hi", goodbye="bye", volume=3.0)
df = rbind(df,de, stringsAsFactors=FALSE)
de <- list(hello="hola", goodbye="ciao", volume=13.1)
df = rbind(df,de, stringsAsFactors=FALSE)
Note that some additional control is required if the string/factor conversion is important.
Or using the original variables with the solution from MatheusAraujo/Ytsen de Boer:
df[nrow(df) + 1,] = list(hello="hallo",goodbye="auf wiedersehen", volume=20.2)
Note that this solution doesn't work well with the strings unless there is existing data in the dataframe.
Make certain to specify
stringsAsFactors=FALSE
when creating the dataframe:
> rm(list=ls())
> trigonometry <- data.frame(character(0), numeric(0), stringsAsFactors=FALSE)
> colnames(trigonometry) <- c("theta", "sin.theta")
> trigonometry
[1] theta sin.theta
<0 rows> (or 0-length row.names)
> trigonometry[nrow(trigonometry) + 1, ] <- c("0", sin(0))
> trigonometry[nrow(trigonometry) + 1, ] <- c("pi/2", sin(pi/2))
> trigonometry
theta sin.theta
1 0 0
2 pi/2 1
> typeof(trigonometry)
[1] "list"
> class(trigonometry)
[1] "data.frame"
Failing to use stringsAsFactors=FALSE
when creating the dataframe will
result in the following error when attempting to add the new row:
> trigonometry[nrow(trigonometry) + 1, ] <- c("0", sin(0))
Warning message:
In `[<-.factor`(`*tmp*`, iseq, value = "0") :
invalid factor level, NA generated
Or, as inspired by @MatheusAraujo:
df[nrow(df) + 1,] = list("v1","v2")
This would allow for mixed data types.
There is a simpler way to append a record from one dataframe to another IF you know that the two dataframes share the same columns and types. To append one row from xx
to yy
just do the following where i
is the i
'th row in xx
.
yy[nrow(yy)+1,] <- xx[i,]
Simple as that. No messy binds. If you need to append all of xx
to yy
, then either call a loop or take advantage of R's sequence abilities and do this:
zz[(nrow(zz)+1):(nrow(zz)+nrow(yy)),] <- yy[1:nrow(yy),]
If you want to make an empty data frame and add contents in a loop, the following may help:
# Number of students in class
student.count <- 36
# Gather data about the students
student.age <- sample(14:17, size = student.count, replace = TRUE)
student.gender <- sample(c('male', 'female'), size = student.count, replace = TRUE)
student.marks <- sample(46:97, size = student.count, replace = TRUE)
# Create empty data frame
student.data <- data.frame()
# Populate the data frame using a for loop
for (i in 1 : student.count) {
# Get the row data
age <- student.age[i]
gender <- student.gender[i]
marks <- student.marks[i]
# Populate the row
new.row <- data.frame(age = age, gender = gender, marks = marks)
# Add the row
student.data <- rbind(student.data, new.row)
}
# Print the data frame
student.data
Hope it helps :)
Source: Stackoverflow.com