Adding a column to a data frame

Question

I have the data frame below  I want to add a column that classifies my data according to column 1  h no  in that way that the first series of h no 1 2 3 4 is class 1  the second series of h no  1 to 7  is class 2 etc  such as indicated in the last column   h no  h freq  h freqsq 1     0 09091 0 008264628 1 2     0 00000 0 000000000 1 3     0 04545 0 002065702 1 4     0 00000 0 000000000 1   1     0 13636 0 018594050 2 2     0 00000 0 000000000 2 3     0 00000 0 000000000 2 4     0 04545 0 002065702 2 5     0 31818 0 101238512 2 6     0 00000 0 000000000 2 7     0 50000 0 250000000 2  1     0 13636 0 018594050 3  2     0 09091 0 008264628 3 3     0 40909 0 167354628 3 4     0 04545 0 002065702 3

User · Answer

If I understand the question correctly  you want to detect when the h no doesn t increase and then increment the class   I m going to walk through how I solved this problem  there is a self-contained function at the end    Working  We only care about the h no column for the moment  so we can extract that from the data frame    gt  h no  lt - data h no   We want to detect when h no doesn t go up  which we can do by working out when the difference between successive elements is either negative or zero  R provides the diff function which gives us the vector of differences    gt  d h no  lt - diff h no   gt  d h no   1   1  1  1 -3  1  1  1  1  1  1 -6  1  1  1   Once we have that  it is a simple matter to find the ones that are non-positive    gt  nonpos  lt - d h no  lt   0  gt  nonpos   1  FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  13  FALSE FALSE   In R  TRUE and FALSE are basically the same as 1 and 0  so if we get the cumulative sum of nonpos  it will increase by 1 in  almost  the appropriate spots  The cumsum function  which is basically the opposite of diff  can do this    gt  cumsum nonpos    1  0 0 0 1 1 1 1 1 1 1 2 2 2 2   But  there are two problems  the numbers are one too small  and  we are missing the first element  there should be four in the first class    The first problem is simply solved  1 cumsum nonpos   And the second just requires adding a 1 to the front of the vector  since the first element is always in class 1     gt  classes  lt - c 1  1   cumsum nonpos     gt  classes    1  1 1 1 1 2 2 2 2 2 2 2 3 3 3 3   Now  we can attach it back onto our data frame with cbind  by using the class  syntax  we can give the column the class heading      gt  data w classes  lt - cbind data  class classes    And data w classes now contains the result   Final result  We can compress the lines together and wrap it all up into a function to make it easier to use   classify  lt - function data       cbind data  class c 1  1   cumsum diff data h no   lt   0        Or  since it makes sense for the class to be a factor   classify  lt - function data       cbind data  class factor c 1  1   cumsum diff data h no   lt   0         You use either function like    gt  classified  lt - classify data    doesn t overwrite data  gt  data  lt - classify data    data now has the  class  column    This method of solving this problem is good because it avoids explicit iteration  which is generally recommend for R  and avoids generating lots of intermediate vectors and list etc  And also it s kinda neat how it can be written on one line

User · Answer

In addition to Roman s answer  something like this might be even simpler  Note that I haven t tested it because I do not have access to R right now     Note that I use a global variable here   normally not advisable  but I liked the   use here to make the code shorter index  lt  lt - 0 new column   sapply df h no  function x      if x    1  index   index   1   return index       The function iterates over the values in n ho and always returns the categorie that the current value belongs to  If a value of 1 is detected  we increase the global variable index and continue

User · Answer

Approach based on identifying number of groups  x in mapply  and its length  y in mapply   mytb lt -read table text  h no  h freq  h freqsq group 1     0 09091 0 008264628 1 2     0 00000 0 000000000 1 3     0 04545 0 002065702 1 4     0 00000 0 000000000 1   1     0 13636 0 018594050 2 2     0 00000 0 000000000 2 3     0 00000 0 000000000 2 4     0 04545 0 002065702 2 5     0 31818 0 101238512 2 6     0 00000 0 000000000 2 7     0 50000 0 250000000 2  1     0 13636 0 018594050 3  2     0 09091 0 008264628 3 3     0 40909 0 167354628 3 4     0 04545 0 002065702 3   header T  stringsAsFactors F  mytb group lt -NULL  positionsof1s lt -grep 1 mytb h no   mytb newgroup lt -unlist mapply function x y     rep x y                          repeat x number y times   x  1 length positionsof1s        x is 1 to number of nth group   g1 g3   y  c  diff positionsof1s         y is number of repeats of groups g1 to penultimate  g2    4  7         nrow mytb -                this line and the following gives number of repeat for last group  g3             positionsof1s length positionsof1s   -1      number of rows - position of penultimate group  g2               mytb

User · Answer

You can add a column to your data using various techniques  The quotes below come from the  Details  section of the relevant help text     data frame      Data frames can be indexed in several modes  When   and    are used with a single vector index  x i  or x  i     they index the data frame as if it were a list    my dataframe  new col    lt - a vector my dataframe   new col     lt - a vector      The data frame method for    treats x as a list   my dataframe new col  lt - a vector      When   and    are used with two indices  x i  j  and x  i  j    they act like indexing a matrix   my dataframe     new col    lt - a vector   Since the method for data frame assumes that if you don t specify if you re working with columns or rows  it will assume you mean columns     For your example  this should work     make some fake data your df  lt - data frame no   c 1 4  1 7  1 5   h freq   runif 16   h freqsq   runif 16      find where one appears and  from  lt - which your df no    1  to  lt - c  from-1  -1   nrow your df     up to which point the sequence runs    generate a sequence  len  and based on its length  repeat a consecutive number len times get seq  lt - mapply from  to  1 length from   FUN   function x  y  z                len  lt - length seq from   x 1   to   y 1                return rep z  times   len                  when we unlist  we get a vector your df group  lt - unlist get seq    and append it to your original data frame  since this is   designating a group  it makes sense to make it a factor your df group  lt - as factor your df group       no     h freq   h freqsq group 1   1 0 40998238 0 06463876     1 2   2 0 98086928 0 33093795     1 3   3 0 28908651 0 74077119     1 4   4 0 10476768 0 56784786     1 5   1 0 75478995 0 60479945     2 6   2 0 26974011 0 95231761     2 7   3 0 53676266 0 74370154     2 8   4 0 99784066 0 37499294     2 9   5 0 89771767 0 83467805     2 10  6 0 05363139 0 32066178     2 11  7 0 71741529 0 84572717     2 12  1 0 10654430 0 32917711     3 13  2 0 41971959 0 87155514     3 14  3 0 32432646 0 65789294     3 15  4 0 77896780 0 27599187     3 16  5 0 06100008 0 55399326     3

User · Answer

Easily  Your data frame is A  b  lt - A  1  b  lt - b  1 b  lt - cumsum b    Then you get the column b

User · Answer

I believe that using  cbind  is the simplest way to add a column to a data frame in R  Below an example       myDf   data frame index seq 1 10 1   Val seq 1 10 1       newCol  seq 2 20 2      myDf   cbind myDf newCol

User · Answer

Data frame   h new column    lt - as integer Data frame   h no    breaks c 1  4  7

[r] Adding a column to a data.frame

Working

Final result

Examples related to r

Examples related to dataframe