Call apply-like function on each row of dataframe with multiple arguments from each row

Question

I have a dataframe with multiple columns   For each row in the dataframe  I want to call a function on the row  and the input of the function is using multiple columns from that row   For example  let s say I have this data and this testFunc which accepts two args    gt  df  lt - data frame x c 1 2   y c 3 4   z c 5 6    gt  df   x y z 1 1 3 5 2 2 4 6  gt  testFunc  lt - function a  b  a   b   Let s say I want to apply this testFunc to columns x and z   So  for row 1 I want 1 5  and for row 2 I want 2   6   Is there a way to do this without writing a for loop  maybe with the apply function family   I tried this     gt  df  c  x   z      x z 1 1 5 2 2 6  gt  lapply df  c  x   z     testFunc  Error in a   b    b  is missing   But got error  any ideas   EDIT  the actual function I want to call is not a simple sum  but it is power t test   I used a b just for example purposes   The end goal is to be able to do something like this  written in pseudocode    df   data frame      delta c delta values        power c power values        sig level c sig level values     lapply df  power t test delta from each row of df                           power from each row of df                           sig level from each row of df      where the result is a vector of outputs for power t test for each row of df

User · Answer

If data frame columns are different types  apply   has a problem  A subtlety about row iteration is how apply a data frame  1       does  implicit type conversion to character types when columns are different types  eg  a factor and numeric column  Here s an example  using a factor  in one column to modify a numeric column   mean height   list BOY 69 5  GIRL 64 0   subjects   data frame gender   factor c  BOY    GIRL    GIRL    BOY               height   c 71 0  59 3  62 1  62 1    apply height  1  function x  x 2  - mean height  x 1       The subtraction fails because the columns are converted to character types   One fix is to back-convert the second column to a number   apply subjects  1  function x  as numeric x 2   - mean height  x 1       But the conversions can be avoided by keeping the columns separate  and using mapply     mapply function x y  y - mean height  x    subjects gender  subjects height    mapply   is needed because       does not accept a vector argument  So the column iteration could be done before the subtraction by passing a vector to     by a bit more ugly code   subjects height - unlist mean height subjects gender

User · Answer

data table has a really intuitive way of doing this as well   library data table   sample fxn   function x y z       return  x y  z     df   data table A   1 5 B seq 2 10 2  C   6 10   gt  df    A  B  C 1  1  2  6 2  2  4  7 3  3  6  8 4  4  8  9 5  5 10 10   The    operator can be called within brackets to add a new column using a function  df  new column    sample fxn A B C    gt  df    A  B  C new column 1  1  2  6         18 2  2  4  7         42 3  3  6  8         72 4  4  8  9        108 5  5 10 10        150   It s also easy to accept constants as arguments as well using this method   df  new column2    sample fxn A B 2     gt  df    A  B  C new column new column2 1  1  2  6         18           6 2  2  4  7         42          12 3  3  6  8         72          18 4  4  8  9        108          24 5  5 10 10        150          30

User · Answer

Others have correctly pointed out that mapply is made for this purpose  but  for the sake of completeness  a conceptually simpler method is just to use a for loop    for  row in 1 nrow df          df newvar row   lt - testFunc df x row   df z row

User · Answer

I came here looking for tidyverse function name - which I knew existed  Adding this for  my  future reference and for tidyverse enthusiasts  purrrlyr invoke rows  purrr invoke rows in older versions     With connection to standard stats methods as in the original question  the broom package would probably help

User · Answer

Use mapply   gt  df  lt - data frame x c 1 2   y c 3 4   z c 5 6    gt  df   x y z 1 1 3 5 2 2 4 6  gt  mapply function x y  x y  df x  df z   1  6 8   gt  cbind df f   mapply function x y  x y  df x  df z      x y z f 1 1 3 5 6 2 2 4 6 8

User · Answer

Here is an alternate approach   It is more intuitive   One key aspect I feel some of the answers did not take into account  which I point out for posterity  is apply   lets you do row calculations easily  but only for matrix  all numeric  data  operations on columns are possible still for dataframes   as data frame lapply df  myFunctionForColumn       To operate on rows  we make the transpose first   tdf lt -as data frame t df   as data frame lapply tdf  myFunctionForRow       The downside is that I believe R will make a copy of your data table  Which could be a memory issue    This is truly sad  because it is programmatically simple for tdf to just be an iterator to the original df  thus saving memory  but R does not allow pointer or iterator referencing    Also  a related question  is how to operate on each individual cell in a dataframe     newdf  lt - as data frame lapply df  function x   sapply x  myFunctionForEachCell

User · Answer

A really nice function for this is adply from plyr  especially if you want to append the result to the original dataframe   This function and its cousin ddply have saved me a lot of headaches and lines of code   df appended  lt - adply df  1  mutate  sum x z    Alternatively  you can call the function you desire   df appended  lt - adply df  1  mutate  sum testFunc x z

User · Answer

Many functions are vectorization already  and so there is no need for any iterations  neither for loops or  pply functions    Your testFunc is one such example   You can simply call      testFunc df    x    df    z      In general  I would recommend trying such vectorization approaches first and see if they get you your intended results       Alternatively  if you need to pass multiple arguments to a function which is not vectorized  mapply might be what you are looking for      mapply power t test  df    x    df    z

User · Answer

user20877984 s answer is excellent  Since they summed it up far better than my previous answer  here is my  posibly still shoddy  attempt at an application of the concept   Using do call in a basic fashion   powvalues  lt - list power 0 9 delta 2  do call power t test powvalues    Working on a full data set     get the example data df  lt - data frame delta c 1 1 2 2   power c  90  85  75  45      gt  df    delta power  1     1  0 90  2     1  0 85  3     2  0 75  4     2  0 45   lapply the power t test function to each of the rows of specified values   result  lt - lapply    split df 1 nrow df      function x  do call power t test x      gt  str result  List of 4    1 List of 8       n            num 22       delta        num 1       sd           num 1       sig level    num 0 05       power        num 0 9       alternative  chr  two sided        note         chr  n is number in  each  group        method       chr  Two-sample t test power calculation      - attr     class    chr  power htest     2 List of 8       n            num 19       delta        num 1       sd           num 1       sig level    num 0 05       power        num 0 85

User · Answer

You can apply apply to a subset of the original data    dat  lt - data frame x c 1 2   y c 3 4   z c 5 6    apply dat  c  x   z     1  function x  sum x      or if your function is just sum use the vectorized version   rowSums dat  c  x   z      1  6 8   If you want to use testFunc   testFunc  lt - function a  b  a   b  apply dat  c  x   z     1  function x  testFunc x 1  x 2      EDIT To access columns by name and not index you can do something like this    testFunc  lt - function a  b  a   b  apply dat  c  x   z     1  function y  testFunc y  z   y  x

User · Answer

A data frame is a list  so      For vectorized functions do call is usually a good bet  But the names of arguments come into play  Here your testFunc is called with args x and y in place of a and b  The     allows irrelevant args to be passed without causing an error    do call  function x z      testFunc x z   df     For non-vectorized functions  mapply will work  but you need to match the ordering of the args or explicitly name them   mapply testFunc  df x  df z    Sometimes apply will work - as when all args are of the same type so coercing the data frame to a matrix does not cause problems by changing data types  Your example was of this sort   If your function is to be called within another function into which the arguments are all passed  there is a much slicker method than these  Study the first lines of the body of lm   if you want to go that route

User · Answer

New answer with dplyr package  If the function that you want to apply is vectorized  then you could use the mutate function from the dplyr package    gt  library dplyr   gt  myf  lt - function tens  ones    10   tens   ones    gt  x  lt - data frame hundreds   7 9  tens   1 3  ones   4 6   gt  mutate x  value   myf tens  ones     hundreds tens ones value 1        7    1    4    14 2        8    2    5    25 3        9    3    6    36   Old answer with plyr package  In my humble opinion  the tool best suited to the task is mdply from the plyr package   Example    gt  library plyr   gt  x  lt - data frame tens   1 3  ones   4 6   gt  mdply x  function tens  ones    10   tens   ones      tens ones V1 1    1    4 14 2    2    5 25 3    3    6 36   Unfortunately  as Bertjan Broeksema pointed out  this approach fails if you don t use all the columns of the data frame in the mdply call  For example    gt  library plyr   gt  x  lt - data frame hundreds   7 9  tens   1 3  ones   4 6   gt  mdply x  function tens  ones    10   tens   ones    Error in  function  tens  ones     unused argument  hundreds   7

[r] Call apply-like function on each row of dataframe with multiple arguments from each row

Examples related to r

Examples related to dataframe