[r] How to convert a huge list-of-vector to a matrix more efficiently?

I have a list of length 130,000 where each element is a character vector of length 110. I would like to convert this list to a matrix with dimension 1,430,000*10. How can I do it more efficiently?\ My code is :

output=NULL
for(i in 1:length(z)) {
 output=rbind(output,
              matrix(z[[i]],ncol=10,byrow=TRUE))
}

This question is related to r list matrix performance

The answer is


you can use as.matrix as below:

output <- as.matrix(z)

I think you want

output <- do.call(rbind,lapply(z,matrix,ncol=10,byrow=TRUE))

i.e. combining @BlueMagister's use of do.call(rbind,...) with an lapply statement to convert the individual list elements into 11*10 matrices ...

Benchmarks (showing @flodel's unlist solution is 5x faster than mine, and 230x faster than the original approach ...)

n <- 1000
z <- replicate(n,matrix(1:110,ncol=10,byrow=TRUE),simplify=FALSE)
library(rbenchmark)
origfn <- function(z) {
    output <- NULL 
    for(i in 1:length(z))
        output<- rbind(output,matrix(z[[i]],ncol=10,byrow=TRUE))
}
rbindfn <- function(z) do.call(rbind,lapply(z,matrix,ncol=10,byrow=TRUE))
unlistfn <- function(z) matrix(unlist(z), ncol = 10, byrow = TRUE)

##          test replications elapsed relative user.self sys.self 
## 1   origfn(z)          100  36.467  230.804    34.834    1.540  
## 2  rbindfn(z)          100   0.713    4.513     0.708    0.012 
## 3 unlistfn(z)          100   0.158    1.000     0.144    0.008 

If this scales appropriately (i.e. you don't run into memory problems), the full problem would take about 130*0.2 seconds = 26 seconds on a comparable machine (I did this on a 2-year-old MacBook Pro).


You can also use,

output <- as.matrix(as.data.frame(z))

The memory usage is very similar to

output <- matrix(unlist(z), ncol = 10, byrow = TRUE)

Which can be verified, with mem_changed() from library(pryr).


It would help to have sample information about your output. Recursively using rbind on bigger and bigger things is not recommended. My first guess at something that would help you:

z <- list(1:3,4:6,7:9)
do.call(rbind,z)

See a related question for more efficiency, if needed.


Examples related to r

How to get AIC from Conway–Maxwell-Poisson regression via COM-poisson package in R? R : how to simply repeat a command? session not created: This version of ChromeDriver only supports Chrome version 74 error with ChromeDriver Chrome using Selenium How to show code but hide output in RMarkdown? remove kernel on jupyter notebook Function to calculate R2 (R-squared) in R Center Plot title in ggplot2 R ggplot2: stat_count() must not be used with a y aesthetic error in Bar graph R multiple conditions in if statement What does "The following object is masked from 'package:xxx'" mean?

Examples related to list

Convert List to Pandas Dataframe Column Python find elements in one list that are not in the other Sorting a list with stream.sorted() in Java Python Loop: List Index Out of Range How to combine two lists in R How do I multiply each element in a list by a number? Save a list to a .txt file The most efficient way to remove first N elements in a list? TypeError: list indices must be integers or slices, not str Parse JSON String into List<string>

Examples related to matrix

How to get element-wise matrix multiplication (Hadamard product) in numpy? How can I plot a confusion matrix? Error: stray '\240' in program What does the error "arguments imply differing number of rows: x, y" mean? How to input matrix (2D list) in Python? Difference between numpy.array shape (R, 1) and (R,) Counting the number of non-NaN elements in a numpy ndarray in Python Inverse of a matrix using numpy How to create an empty matrix in R? numpy matrix vector multiplication

Examples related to performance

Why is 2 * (i * i) faster than 2 * i * i in Java? What is the difference between spark.sql.shuffle.partitions and spark.default.parallelism? How to check if a key exists in Json Object and get its value Why does C++ code for testing the Collatz conjecture run faster than hand-written assembly? Most efficient way to map function over numpy array The most efficient way to remove first N elements in a list? Fastest way to get the first n elements of a List into an Array Why is "1000000000000000 in range(1000000000000001)" so fast in Python 3? pandas loc vs. iloc vs. at vs. iat? Android Recyclerview vs ListView with Viewholder