Splitting a dataframe string column into multiple different columns

Question

What I am trying to accomplish is splitting a column into multiple columns   I would prefer the first column to contain  F   second column  US   third  CA6  or  DL   and the fourth to be  Z13  or  U13  etc etc   My entire df follows the same pattern of X XX XXXX XXX or X XX XXX XXX or X XX XX XXX and I know the third column is where my problem lies because of the different lengths   I have only used substr in the past and I could use that here with some if statements but would like to learn how to use stringr package and POSIX to do this  unless there is a better option    Thank you in advance   Here is my df   c  F US CLE V13    F US CA6 U13    F US CA6 U13    F US CA6 U13     F US CA6 U13    F US CA6 U13    F US CA6 U13    F US CA6 U13     F US DL U13    F US DL U13    F US DL U13    F US DL Z13    F US DL Z13

User · Answer

Is this what you are trying to do     Our data text  lt - c  F US CLE V13    F US CA6 U13    F US CA6 U13    F US CA6 U13     F US CA6 U13    F US CA6 U13    F US CA6 U13    F US CA6 U13     F US DL U13    F US DL U13    F US DL U13    F US DL Z13    F US DL Z13        Split into individual elements by the     character    Remember to escape it  because     by itself matches any single character elems  lt - unlist  strsplit  text                 We know the dataframe should have 4 columns  so make a matrix m  lt - matrix  elems   ncol   4   byrow   TRUE       Coerce to data frame - head   is just to illustrate the top portion head  as data frame  m        V1 V2  V3  V4  1  F US CLE V13  2  F US CA6 U13  3  F US CA6 U13  4  F US CA6 U13  5  F US CA6 U13  6  F US CA6 U13

User · Answer

A very direct way is to just use read table on your character vector    gt  read table text   text  sep        colClasses    character      V1 V2  V3  V4 1   F US CLE V13 2   F US CA6 U13 3   F US CA6 U13 4   F US CA6 U13 5   F US CA6 U13 6   F US CA6 U13 7   F US CA6 U13 8   F US CA6 U13 9   F US  DL U13 10  F US  DL U13 11  F US  DL U13 12  F US  DL Z13 13  F US  DL Z13   colClasses needs to be specified  otherwise F gets converted to FALSE  which is something I need to fix in  splitstackshape   otherwise I would have recommended that         Update    a year later      Alternatively  you can use my cSplit function  like this   cSplit as data table text    text              text 1 text 2 text 3 text 4    1       F     US    CLE    V13    2       F     US    CA6    U13    3       F     US    CA6    U13    4       F     US    CA6    U13    5       F     US    CA6    U13    6       F     US    CA6    U13    7       F     US    CA6    U13    8       F     US    CA6    U13    9       F     US     DL    U13   10       F     US     DL    U13   11       F     US     DL    U13   12       F     US     DL    Z13   13       F     US     DL    Z13   Or  separate from  tidyr   like this   library dplyr  library tidyr   as data frame text    gt   separate text  into   paste  V   1 4  sep              V 1 V 2 V 3 V 4   1    F  US CLE V13   2    F  US CA6 U13   3    F  US CA6 U13   4    F  US CA6 U13   5    F  US CA6 U13   6    F  US CA6 U13   7    F  US CA6 U13   8    F  US CA6 U13   9    F  US  DL U13   10   F  US  DL U13   11   F  US  DL U13   12   F  US  DL Z13   13   F  US  DL Z13

User · Answer

We could use tidyr  extract    x  lt - c  F US CLE V13    F US CA6 U13    F US CA6 U13    F US CA6 U13       F US CA6 U13    F US CA6 U13    F US CA6 U13    F US CA6 U13       F US DL U13    F US DL U13    F US DL U13    F US DL Z13    F US DL Z13      library tidyr  extract tibble data x   data   regex                                     into   LETTERS 1 4     gt    A tibble  13 x 4   gt     A     B     C     D       gt      lt chr gt   lt chr gt   lt chr gt   lt chr gt    gt   1 F     US    CLE   V13     gt   2 F     US    CA6   U13     gt   3 F     US    CA6   U13     gt   4 F     US    CA6   U13     gt   5 F     US    CA6   U13     gt   6 F     US    CA6   U13     gt   7 F     US    CA6   U13     gt   8 F     US    CA6   U13     gt   9 F     US    DL    U13     gt  10 F     US    DL    U13     gt  11 F     US    DL    U13     gt  12 F     US    DL    Z13     gt  13 F     US    DL    Z13   Another option is to use unglue  unglue data      remotes  install github  moodymudskipper unglue   library unglue  unglue data x   A   B   C   D      gt     A  B   C   D   gt  1  F US CLE V13   gt  2  F US CA6 U13   gt  3  F US CA6 U13   gt  4  F US CA6 U13   gt  5  F US CA6 U13   gt  6  F US CA6 U13   gt  7  F US CA6 U13   gt  8  F US CA6 U13   gt  9  F US  DL U13   gt  10 F US  DL U13   gt  11 F US  DL U13   gt  12 F US  DL Z13   gt  13 F US  DL Z13   Created on 2019-09-14 by the reprex package  v0 3 0

User · Answer

The way via unlist and matrix seems a bit convoluted  and requires you to hard-code the number of elements  this is actually a pretty big no-go  Of course you could circumvent hard-coding that number and determine it at run-time   I would go a different route  and construct a data frame directly from the list that strsplit returns  For me  this is conceptually simpler  There are essentially two ways of doing this    as data frame      but since the list is exactly the wrong way round  we have a list of rows rather than a list of columns  we have to transpose the result  We also clear the rownames since they are ugly by default  but that   s strictly unnecessary      rownames lt -  t as data frame strsplit text            NULL   Alternatively  use rbind to construct a data frame from the list of rows  We use do call to call rbind with all the rows as separate arguments   do call rbind  strsplit text             Both ways yield the same result          1    2    3     4   1    F    US   CLE   V13   2    F    US   CA6   U13   3    F    US   CA6   U13   4    F    US   CA6   U13   5    F    US   CA6   U13   6    F    US   CA6   U13        Clearly  the second way is much simpler than the first

[r] Splitting a dataframe string column into multiple different columns

Examples related to r

Examples related to split

Examples related to dataframe

Examples related to stringr