[r] Add new row to dataframe, at specific row-index, not appended?

The following code combines a vector with a dataframe:

newrow = c(1:4)
existingDF = rbind(existingDF,newrow)

However this code always inserts the new row at the end of the dataframe.

How can I insert the row at a specified point within the dataframe? For example, lets say the dataframe has 20 rows, how can I insert the new row between rows 10 and 11?

This question is related to r dataframe insert

The answer is


Here's a solution that avoids the (often slow) rbind call:

existingDF <- as.data.frame(matrix(seq(20),nrow=5,ncol=4))
r <- 3
newrow <- seq(4)
insertRow <- function(existingDF, newrow, r) {
  existingDF[seq(r+1,nrow(existingDF)+1),] <- existingDF[seq(r,nrow(existingDF)),]
  existingDF[r,] <- newrow
  existingDF
}

> insertRow(existingDF, newrow, r)
  V1 V2 V3 V4
1  1  6 11 16
2  2  7 12 17
3  1  2  3  4
4  3  8 13 18
5  4  9 14 19
6  5 10 15 20

If speed is less important than clarity, then @Simon's solution works well:

existingDF <- rbind(existingDF[1:r,],newrow,existingDF[-(1:r),])
> existingDF
   V1 V2 V3 V4
1   1  6 11 16
2   2  7 12 17
3   3  8 13 18
4   1  2  3  4
41  4  9 14 19
5   5 10 15 20

(Note we index r differently).

And finally, benchmarks:

library(microbenchmark)
microbenchmark(
  rbind(existingDF[1:r,],newrow,existingDF[-(1:r),]),
  insertRow(existingDF,newrow,r)
)

Unit: microseconds
                                                    expr     min       lq   median       uq       max
1                       insertRow(existingDF, newrow, r) 660.131 678.3675 695.5515 725.2775   928.299
2 rbind(existingDF[1:r, ], newrow, existingDF[-(1:r), ]) 801.161 831.7730 854.6320 881.6560 10641.417

Benchmarks

As @MatthewDowle always points out to me, benchmarks need to be examined for the scaling as the size of the problem increases. Here we go then:

benchmarkInsertionSolutions <- function(nrow=5,ncol=4) {
  existingDF <- as.data.frame(matrix(seq(nrow*ncol),nrow=nrow,ncol=ncol))
  r <- 3 # Row to insert into
  newrow <- seq(ncol)
  m <- microbenchmark(
   rbind(existingDF[1:r,],newrow,existingDF[-(1:r),]),
   insertRow(existingDF,newrow,r),
   insertRow2(existingDF,newrow,r)
  )
  # Now return the median times
  mediansBy <- by(m$time,m$expr, FUN=median)
  res <- as.numeric(mediansBy)
  names(res) <- names(mediansBy)
  res
}
nrows <- 5*10^(0:5)
benchmarks <- sapply(nrows,benchmarkInsertionSolutions)
colnames(benchmarks) <- as.character(nrows)
ggplot( melt(benchmarks), aes(x=Var2,y=value,colour=Var1) ) + geom_line() + scale_x_log10() + scale_y_log10()

@Roland's solution scales quite well, even with the call to rbind:

                                                              5       50     500    5000    50000     5e+05
insertRow2(existingDF, newrow, r)                      549861.5 579579.0  789452 2512926 46994560 414790214
insertRow(existingDF, newrow, r)                       895401.0 905318.5 1168201 2603926 39765358 392904851
rbind(existingDF[1:r, ], newrow, existingDF[-(1:r), ]) 787218.0 814979.0 1263886 5591880 63351247 829650894

Plotted on a linear scale:

linear

And a log-log scale:

log-log


You should try dplyr package

library(dplyr)
a <- data.frame(A = c(1, 2, 3, 4),
               B = c(11, 12, 13, 14))


system.time({
for (i in 50:1000) {
    b <- data.frame(A = i, B = i * i)
    a <- bind_rows(a, b)
}

})

Output

   user  system elapsed 
   0.25    0.00    0.25

In contrast with using rbind function

a <- data.frame(A = c(1, 2, 3, 4),
                B = c(11, 12, 13, 14))


system.time({
    for (i in 50:1000) {
        b <- data.frame(A = i, B = i * i)
        a <- rbind(a, b)
    }

})

Output

   user  system elapsed 
   0.49    0.00    0.49 

There is some performance gain.


for example you want to add rows of variable 2 to variable 1 of a data named "edges" just do it like this

allEdges <- data.frame(c(edges$V1,edges$V2))

The .before argument in dplyr::add_row can be used to specify the row.

dplyr::add_row(
  cars,
  speed = 0,
  dist = 0,
  .before = 3
)
#>    speed dist
#> 1      4    2
#> 2      4   10
#> 3      0    0
#> 4      7    4
#> 5      7   22
#> 6      8   16
#> ...

insertRow2 <- function(existingDF, newrow, r) {
  existingDF <- rbind(existingDF,newrow)
  existingDF <- existingDF[order(c(1:(nrow(existingDF)-1),r-0.5)),]
  row.names(existingDF) <- 1:nrow(existingDF)
  return(existingDF)  
}

insertRow2(existingDF,newrow,r)

  V1 V2 V3 V4
1  1  6 11 16
2  2  7 12 17
3  1  2  3  4
4  3  8 13 18
5  4  9 14 19
6  5 10 15 20

microbenchmark(
+   rbind(existingDF[1:r,],newrow,existingDF[-(1:r),]),
+   insertRow(existingDF,newrow,r),
+   insertRow2(existingDF,newrow,r)
+ )
Unit: microseconds
                                                    expr     min       lq   median       uq      max
1                       insertRow(existingDF, newrow, r) 513.157 525.6730 531.8715 544.4575 1409.553
2                      insertRow2(existingDF, newrow, r) 430.664 443.9010 450.0570 461.3415  499.988
3 rbind(existingDF[1:r, ], newrow, existingDF[-(1:r), ]) 606.822 625.2485 633.3710 653.1500 1489.216

Examples related to r

How to get AIC from Conway–Maxwell-Poisson regression via COM-poisson package in R? R : how to simply repeat a command? session not created: This version of ChromeDriver only supports Chrome version 74 error with ChromeDriver Chrome using Selenium How to show code but hide output in RMarkdown? remove kernel on jupyter notebook Function to calculate R2 (R-squared) in R Center Plot title in ggplot2 R ggplot2: stat_count() must not be used with a y aesthetic error in Bar graph R multiple conditions in if statement What does "The following object is masked from 'package:xxx'" mean?

Examples related to dataframe

Trying to merge 2 dataframes but get ValueError How to show all of columns name on pandas dataframe? Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Display all dataframe columns in a Jupyter Python Notebook How to convert column with string type to int form in pyspark data frame? Display/Print one column from a DataFrame of Series in Pandas Binning column with python pandas Selection with .loc in python Set value to an entire column of a pandas dataframe

Examples related to insert

How to insert current datetime in postgresql insert query How to add element in Python to the end of list using list.insert? Python pandas insert list into a cell Field 'id' doesn't have a default value? Insert a row to pandas dataframe Insert at first position of a list in Python How can INSERT INTO a table 300 times within a loop in SQL? How to refresh or show immediately in datagridview after inserting? Insert node at a certain position in a linked list C++ select from one table, insert into another table oracle sql query