Split data frame string column into multiple columns

Question

I d like to take data of the form  before   data frame attr   c 1 30 4 6   type c  foo and bar   foo and bar 2      attr          type 1    1   foo and bar 2   30 foo and bar 2 3    4   foo and bar 4    6 foo and bar 2   and use split   on the column  type  from above to get something like this     attr type 1 type 2 1    1    foo    bar 2   30    foo  bar 2 3    4    foo    bar 4    6    foo  bar 2   I came up with something unbelievably complex involving some form of apply that worked  but I ve since misplaced that  It seemed far too complicated to be the best way  I can use strsplit as below  but then unclear how to get that back into 2 columns in the data frame    gt  strsplit as character before type    and      1    1   foo   bar     2    1   foo     bar 2     3    1   foo   bar     4    1   foo     bar 2    Thanks for any pointers  I ve not quite groked R lists just yet

User · Answer

An easy way is to use sapply   and the   function   before  lt - data frame attr   c 1 30 4 6   type c  foo and bar   foo and bar 2    out  lt - strsplit as character before type    and      For example    gt  data frame t sapply out            X1    X2 1 foo   bar 2 foo bar 2 3 foo   bar 4 foo bar 2   sapply   s result is a matrix and needs transposing and casting back to a data frame  It is then some simple manipulations that yield the result you wanted   after  lt - with before  data frame attr   attr   after  lt - cbind after  data frame t sapply out          names after  2 3   lt - paste  type   1 2  sep          At this point  after is what you wanted   gt  after   attr type 1 type 2 1    1    foo    bar 2   30    foo  bar 2 3    4    foo    bar 4    6    foo  bar 2

User · Answer

Another approach if you want to stick with strsplit   is to use the unlist   command  Here s a solution along those lines   tmp  lt - matrix unlist strsplit as character before type     and      ncol 2     byrow TRUE  after  lt - cbind before attr  as data frame tmp   names after   lt - c  attr    type 1    type 2

User · Answer

To add to the options  you could also use my splitstackshape  cSplit function like this   library splitstackshape  cSplit before   type     and         attr type 1 type 2   1     1    foo    bar   2    30    foo  bar 2   3     4    foo    bar   4     6    foo  bar 2

User · Answer

here is a one liner along the same lines as aniko s solution  but using hadley s stringr package   do call rbind  str split before type    and

User · Answer

Another option is to use the new tidyr package   library dplyr  library tidyr   before  lt - data frame    attr   c 1  30  4  6       type   c  foo and bar    foo and bar 2      before   gt     separate type  c  foo    bar      and          attr foo   bar    1    1 foo   bar    2   30 foo bar 2    3    4 foo   bar    4    6 foo bar 2

User · Answer

Notice that sapply with     can be used to extract either the first or second items in those lists so   before type 1  lt - sapply strsplit as character before type    and          1  before type 2  lt - sapply strsplit as character before type    and          2  before type  lt - NULL   And here s a gsub method   before type 1  lt - gsub   and           before type  before type 2  lt - gsub      and        before type  before type  lt - NULL

User · Answer

Here is another base R solution  We can use read table but since it accepts only one-byte sep argument and here we have multi-byte separator we can use gsub to replace the multibyte separator to any one-byte separator and use that as sep argument in read table  cbind before 1   read table text   gsub   and      t   before type                     sep     t   col names   paste0  type    1 2        attr type 1 type 2  1    1    foo    bar  2   30    foo  bar 2  3    4    foo    bar  4    6    foo  bar 2   In this case  we can also make it shorter by replacing it with default sep argument so we don t have to mention it explicitly  cbind before 1   read table text   gsub   and         before type                     col names   paste0  type    1 2

User · Answer

Yet another approach  use rbind on out   before  lt - data frame attr   c 1 30 4 6   type c  foo and bar   foo and bar 2      out  lt - strsplit as character before type    and     do call rbind  out          1     2      1    foo   bar     2    foo   bar 2   3    foo   bar     4    foo   bar 2    And to combine   data frame before attr  do call rbind  out

User · Answer

5 years later adding the obligatory data table solution  library data table     v 1 9 6   setDT before    paste0  type   1 2     tstrsplit type    and     before      attr          type type1 type2   1     1   foo and bar   foo   bar   2    30 foo and bar 2   foo bar 2   3     4   foo and bar   foo   bar   4     6 foo and bar 2   foo bar 2   We could also both make sure that the resulting columns will have correct types and improve performance by adding type convert and fixed arguments  since   and   isn t really a regex   setDT before    paste0  type   1 2     tstrsplit type    and    type convert   TRUE  fixed   TRUE

User · Answer

This question is pretty old but I ll add the solution I found the be the simplest at present   library reshape2  before   data frame attr   c 1 30 4 6   type c  foo and bar   foo and bar 2    newColNames  lt - c  type1    type2   newCols  lt - colsplit before type    and    newColNames  after  lt - cbind before  newCols  after type  lt - NULL after

User · Answer

Since R version 3 4 0 you can use strcapture   from the utils package  included with base R installs   binding the output onto the other column s    out  lt - strcapture            and            as character before type       data frame type 1   character    type 2   character       cbind before  attr    out      attr type 1 type 2   1    1    foo    bar   2   30    foo  bar 2   3    4    foo    bar   4    6    foo  bar 2

User · Answer

Use stringr  str split fixed  library stringr  str split fixed before type    and    2

User · Answer

The subject is almost exhausted  I  d like though to offer a solution to a slightly more general version where you don t know the number of output columns  a priori  So for example you have  before   data frame attr   c 1 30 4 6   type c  foo and bar   foo and bar 2    foo and bar 2 and bar 3    foo and bar      attr                    type 1    1             foo and bar 2   30           foo and bar 2 3    4 foo and bar 2 and bar 3 4    6             foo and bar   We can t use dplyr separate   because we don t know the number of the result columns before the split  so I have then created a function that uses stringr to split a column  given the pattern and a name prefix for the generated columns  I hope the coding patterns used  are correct   split into multiple  lt - function column  pattern         into prefix     cols  lt - str split fixed column  pattern  n   Inf      Sub out the    s returned by filling the matrix to the right  with NAs which are useful   cols which cols          lt - NA   cols  lt - as tibble cols      name the  cols  tibble as  into prefix 1    into prefix 2         into prefix m       where m     columns of  cols    m  lt - dim cols  2     names cols   lt - paste into prefix  1 m  sep          return cols      We can then use split into multiple in a dplyr pipe as follows   after  lt - before   gt      bind cols split into multiple   type    and     type      gt        selecting those that start with  type   will remove the original  type  column   select attr  starts with  type       gt after   attr type 1 type 2 type 3 1    1    foo    bar    lt NA gt  2   30    foo  bar 2    lt NA gt  3    4    foo  bar 2  bar 3 4    6    foo    bar    lt NA gt    And then we can use gather to tidy up     after   gt      gather key  val  -attr  na rm   T      attr    key   val 1     1 type 1   foo 2    30 type 1   foo 3     4 type 1   foo 4     6 type 1   foo 5     1 type 2   bar 6    30 type 2 bar 2 7     4 type 2 bar 2 8     6 type 2   bar 11    4 type 3 bar 3

User · Answer

Here is a base R one liner that overlaps a number of previous solutions, but returns a data.frame with the proper names.

out <- setNames(data.frame(before$attr,
                  do.call(rbind, strsplit(as.character(before$type),
                                          split="_and_"))),
                  c("attr", paste0("type_", 1:2)))
out
  attr type_1 type_2
1    1    foo    bar
2   30    foo  bar_2
3    4    foo    bar
4    6    foo  bar_2

It uses strsplit to break up the variable, and data.frame with do.call/rbind to put the data back into a data.frame. The additional incremental improvement is the use of setNames to add variable names to the data.frame.

User · Answer

base but probably slow   n  lt - 1 for i in strsplit as character before type    and           before n   type 1    lt - i  1        before n   type 2    lt - i  2        n  lt - n   1         attr          type type 1 type 2    1    1   foo and bar    foo    bar    2   30 foo and bar 2    foo  bar 2    3    4   foo and bar    foo    bar    4    6 foo and bar 2    foo  bar 2

[r] Split data frame string column into multiple columns

The answer is

Examples related to r

Examples related to string

Examples related to dataframe

Examples related to split

Examples related to r-faq

Tags