Read all files in a folder and apply a function to each data frame

Question

I am doing a relatively simple piece of analysis which I have put into a function  on all the files in a particular folder   I was wondering whether anyone had any tips to help me automate the process on a number of different folders      Firstly  I was wondering whether there was a way of reading all the files in a particular folder straight into R   I believe the following command will list all the files    files  lt -  Sys glob    csv         which I found from Using R to list all files with a specified extension  And then the following code reads all those files into R   listOfFiles  lt - lapply files  function x  read table x  header   FALSE         from Manipulating multiple files in R  But the files seem to be read in as one continuous list and not individual files    how can I change the script to open all the csv files in a particular folder as individual dataframes     Secondly  assuming that I can read all the files in separately  how do I complete a function on all these dataframes in one go   For example  I have created four small dataframes so I can illustrate what I want   Df 1  lt - data frame A   c 5 4 7 6 8 4  B    c 1 5 2 4 9 1    Df 2  lt - data frame A   c 1 6  B    c 2 3 4 5 1 1    Df 3  lt - data frame A   c 4 6 8 0 1 11  B    c 7 6 5 9 1 15    Df 4  lt - data frame A   c 4 2 6 8 1 0  B    c 3 1 9 11 2 16       I have also made up an example function   Summary lt -function dfile   SumA lt -sum dfile A  MinA lt -min dfile A  MeanA lt -mean dfile A  MedianA lt -median dfile A  MaxA lt -max dfile A   sumB lt -sum dfile B  MinB lt -min dfile B  MeanB lt -mean dfile B  MedianB lt -median dfile B  MaxB lt -max dfile B   Sum lt -c sumA sumB  Min lt -c MinA MinB  Mean lt -c MeanA MeanB  Median lt -c MedianA MedianB  Max lt -c MaxA MaxB  rm sumA sumB MinA MinB MeanA MeanB MedianA MedianB MaxA MaxB   Label lt -c  A   B   dfile summary lt -data frame Label Sum Min Mean Median Max  return dfile summary     I would ordinarily use the following command to apply the function to each individual dataframe   Df1 summary lt -Summary dfile   Is there a way instead of applying the function to all the dataframes  and use the titles of the dataframes in the summary tables  i e  Df1 summary    Many thanks   Katie

User · Answer

Here is a tidyverse option that might not the most elegant  but offers some flexibility in terms of what is included in the summary   library tidyverse  dir path  lt -    path to data directory   file pattern  lt -  Df    0-9    csv    regex pattern to match the file name format  read dir  lt - function dir path  file name     read csv paste0 dir path  file name     gt        mutate file name   file name    gt                    add the file name as a column                   gather variable  value  A B    gt                     convert the data from wide to long     group by file name  variable    gt        summarize sum   sum value  na rm   TRUE                 min   min value  na rm   TRUE                 mean   mean value  na rm   TRUE                 median   median value  na rm   TRUE                 max   max value  na rm   TRUE        df summary  lt -    list files dir path  pattern   file pattern    gt      map df   read dir dir path       df summary   A tibble  8 x 7   Groups    file name       file name variable   sum   min  mean median   max    lt chr gt       lt chr gt      lt int gt   lt dbl gt   lt dbl gt    lt dbl gt   lt dbl gt  1 Df 1 csv  A           34     4  5 67    5 5     8 2 Df 1 csv  B           22     1  3 67    3       9 3 Df 2 csv  A           21     1  3 5     3 5     6 4 Df 2 csv  B           16     1  2 67    2 5     5 5 Df 3 csv  A           30     0  5       5      11 6 Df 3 csv  B           43     1  7 17    6 5    15 7 Df 4 csv  A           21     0  3 5     3       8 8 Df 4 csv  B           42     1  7       6      16

User · Answer

On the contrary  I do think working with list makes it easy to automate such things   Here is one solution  I stored your four dataframes in folder temp     filenames  lt - list files  temp   pattern    csv   full names TRUE  ldf  lt - lapply filenames  read csv  res  lt - lapply ldf  summary  names res   lt - substr filenames  6  30    It is important to store the full path for your files  as I did with full names   otherwise you have to paste the working directory  e g    filenames  lt - list files  temp   pattern    csv   paste  temp   filenames  sep        will work too  Note that I used substr to extract file names while discarding full path   You can access your summary tables as follows    gt  res  df4 csv         A              B          Min     0 00   Min      1 00    1st Qu  1 25   1st Qu   2 25    Median  3 00   Median   6 00    Mean    3 50   Mean     7 00    3rd Qu  5 50   3rd Qu  10 50    Max     8 00   Max     16 00     If you really want to get individual summary tables  you can extract them afterwards  E g     for  i in 1 length res     assign paste paste  df   i  sep       summary   sep       res  i

User · Answer

usually i don t use for loop in R  but here is my solution using for loops and two packages   plyr and dostats  plyr is on cran and you can download dostats on https   github com halpo dostats  may be using install github from Hadley devtools package   Assuming that i have your first two data frame  Df 1 and Df 2  in csv files  you can do something like this   require plyr  require dostats   files  lt - list files pattern     csv     for  i in seq along files          assign paste  Df   i  sep         read csv files i         assign paste paste  Df   i  sep         summary   sep                     ldply get paste  Df   i  sep          dostats  sum  min  mean  median  max        Here is the output  R gt  Df1 summary    id sum min   mean median max 1   A  34   4 5 6667    5 5   8 2   B  22   1 3 6667    3 0   9 R gt  Df2 summary    id sum min   mean median max 1   A  21   1 3 5000    3 5   6 2   B  16   1 2 6667    2 5   5

[r] Read all files in a folder and apply a function to each data frame

Examples related to r

Examples related to list

Examples related to lapply

Examples related to summary