[r] Repeat rows of a data.frame

I want to repeat the rows of a data.frame, each N times. The result should be a new data.frame (with nrow(new.df) == nrow(old.df) * N) keeping the data types of the columns.

Example for N = 2:

                        A B   C
  A B   C             1 j i 100
1 j i 100     -->     2 j i 100
2 K P 101             3 K P 101
                      4 K P 101

So, each row is repeated 2 times and characters remain characters, factors remain factors, numerics remain numerics, ...

My first attempt used apply: apply(old.df, 2, function(co) rep(co, each = N)), but this one transforms my values to characters and I get:

     A   B   C    
[1,] "j" "i" "100"
[2,] "j" "i" "100"
[3,] "K" "P" "101"
[4,] "K" "P" "101"

This question is related to r dataframe rows repeat

The answer is


try using for example

N=2
rep(1:4, each = N) 

as an index


A clean dplyr solution, taken from here

library(dplyr)
df <- tibble(x = 1:2, y = c("a", "b"))
df %>% slice(rep(1:n(), each = 2))

Another way to do this would to first get row indices, append extra copies of the df, and then order by the indices:

df$index = 1:nrow(df)
df = rbind(df,df)
df = df[order(df$index),][,-ncol(df)]

Although the other solutions may be shorter, this method may be more advantageous in certain situations.


For reference and adding to answers citing mefa, it might worth to take a look on the implementation of mefa::rep.data.frame() in case you don't want to include the whole package:

> data <- data.frame(a=letters[1:3], b=letters[4:6])
> data
  a b
1 a d
2 b e
3 c f
> as.data.frame(lapply(data, rep, 2))
  a b
1 a d
2 b e
3 c f
4 a d
5 b e
6 c f

Adding to what @dardisco mentioned about mefa::rep.data.frame(), it's very flexible.

You can either repeat each row N times:

rep(df, each=N)

or repeat the entire dataframe N times (think: like when you recycle a vectorized argument)

rep(df, times=N)

Two thumbs up for mefa! I had never heard of it until now and I had to write manual code to do this.


If you can repeat the whole thing, or subset it first then repeat that, then this similar question may be helpful. Once again:

library(mefa)
rep(mtcars,10) 

or simply

mefa:::rep.data.frame(mtcars)

The rep.row function seems to sometimes make lists for columns, which leads to bad memory hijinks. I have written the following which seems to work well:

library(plyr)
rep.row <- function(r, n){
  colwise(function(x) rep(x, n))(r)
}

My solution similar as mefa:::rep.data.frame, but a little faster and cares about row names:

rep.data.frame <- function(x, times) {
    rnames <- attr(x, "row.names")
    x <- lapply(x, rep.int, times = times)
    class(x) <- "data.frame"
    if (!is.numeric(rnames))
        attr(x, "row.names") <- make.unique(rep.int(rnames, times))
    else
        attr(x, "row.names") <- .set_row_names(length(rnames) * times)
    x
}

Compare solutions:

library(Lahman)
library(microbenchmark)
microbenchmark(
    mefa:::rep.data.frame(Batting, 10),
    rep.data.frame(Batting, 10),
    Batting[rep.int(seq_len(nrow(Batting)), 10), ],
    times = 10
)
#> Unit: milliseconds
#>                                            expr       min       lq     mean   median        uq       max neval cld
#>              mefa:::rep.data.frame(Batting, 10) 127.77786 135.3480 198.0240 148.1749  278.1066  356.3210    10  a 
#>                     rep.data.frame(Batting, 10)  79.70335  82.8165 134.0974  87.2587  191.1713  307.4567    10  a 
#>  Batting[rep.int(seq_len(nrow(Batting)), 10), ] 895.73750 922.7059 981.8891 956.3463 1018.2411 1127.3927    10   b

There is a lovely vectorized solution that repeats only certain rows n-times each, possible for example by adding an ntimes column to your data frame:

  A B   C ntimes
1 j i 100      2
2 K P 101      4
3 Z Z 102      1

Method:

df <- data.frame(A=c("j","K","Z"), B=c("i","P","Z"), C=c(100,101,102), ntimes=c(2,4,1))
df <- as.data.frame(lapply(df, rep, df$ntimes))

Result:

  A B   C ntimes
1 Z Z 102      1
2 j i 100      2
3 j i 100      2
4 K P 101      4
5 K P 101      4
6 K P 101      4
7 K P 101      4

This is very similar to Josh O'Brien and Mark Miller's method:

df[rep(seq_len(nrow(df)), df$ntimes),]

However, that method appears quite a bit slower:

df <- data.frame(A=c("j","K","Z"), B=c("i","P","Z"), C=c(100,101,102), ntimes=c(2000,3000,4000))

microbenchmark::microbenchmark(
  df[rep(seq_len(nrow(df)), df$ntimes),],
  as.data.frame(lapply(df, rep, df$ntimes)),
  times = 10
)

Result:

Unit: microseconds
                                      expr      min       lq      mean   median       uq      max neval
   df[rep(seq_len(nrow(df)), df$ntimes), ] 3563.113 3586.873 3683.7790 3613.702 3657.063 4326.757    10
 as.data.frame(lapply(df, rep, df$ntimes))  625.552  654.638  676.4067  668.094  681.929  799.893    10

Examples related to r

How to get AIC from Conway–Maxwell-Poisson regression via COM-poisson package in R? R : how to simply repeat a command? session not created: This version of ChromeDriver only supports Chrome version 74 error with ChromeDriver Chrome using Selenium How to show code but hide output in RMarkdown? remove kernel on jupyter notebook Function to calculate R2 (R-squared) in R Center Plot title in ggplot2 R ggplot2: stat_count() must not be used with a y aesthetic error in Bar graph R multiple conditions in if statement What does "The following object is masked from 'package:xxx'" mean?

Examples related to dataframe

Trying to merge 2 dataframes but get ValueError How to show all of columns name on pandas dataframe? Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Display all dataframe columns in a Jupyter Python Notebook How to convert column with string type to int form in pyspark data frame? Display/Print one column from a DataFrame of Series in Pandas Binning column with python pandas Selection with .loc in python Set value to an entire column of a pandas dataframe

Examples related to rows

SQL count rows in a table Converting rows into columns and columns into rows using R Delete rows containing specific strings in R How to append rows to an R data frame Excel Create Collapsible Indented Row Hierarchies Excel CSV. file with more than 1,048,576 rows of data Find which rows have different values for a given column in Teradata SQL More than 1 row in <Input type="textarea" /> Update multiple rows with different values in a single SQL query Repeat rows of a data.frame

Examples related to repeat

Create an array with same element repeated multiple times Fastest way to count number of occurrences in a Python list Repeat rows of a data.frame make image( not background img) in div repeat? Create sequence of repeated values, in sequence? Finding repeated words on a string and counting the repetitions do-while loop in R Repeat string to certain length Repeat a string in JavaScript a number of times