Using strtok with a std string

Question

I have a string that I would like to tokenize. But the C strtok() function requires my string to be a char*. How can I do this simply?

I tried:

token = strtok(str.c_str(), " ");

which fails because it turns it into a const char*, not a char*

User · Answer

Assuming that by  string  you re talking about std  string in C    you might have a look at the Tokenizer package in Boost

User · Answer

It fails because str c str   returns constant string but char   strtok  char   str  const char   delimiters   requires volatile string  So you need to use  const cast lt  char   inorder to make it voletile  I am giving you a complete but small program to tokenize the string using C strtok   function       include  lt iostream gt      include  lt string gt      include  lt string h gt      using namespace std     int main            string s  20 6 5  3             strtok requires volatile string as it modifies the supplied string in order to tokenize it         char  str const cast lt  char   gt  s c str                char  tok         tok strtok str                       int arr 4   i 0             while tok  NULL              arr i    stoi tok              tok strtok NULL                                for int i 0  i lt 4  i    cout lt  lt arr i  lt  lt endl           return 0           NOTE  strtok may not be suitable in all situation as the string passed to function gets modified by being broken into smaller strings  Pls   ref to get better understanding of strtok functionality   How  strtok works  Added few print statement to better understand the changes happning to string in each call to strtok and how it returns token    include  lt iostream gt   include  lt string gt   include  lt string h gt   using namespace std  int main         string s  20 6 5  3       char  str const cast lt  char   gt  s c str             char  tok      cout lt  lt  string    lt  lt s lt  lt endl      tok strtok str                    cout lt  lt  String    lt  lt s lt  lt   tToken    lt  lt tok lt  lt endl         while tok  NULL           tok strtok NULL                   cout lt  lt  String    lt  lt s lt  lt   t tToken    lt  lt tok lt  lt endl            return 0      Output   string  20 6 5  3  String  206 5  3    Token  20 String  2065  3     Token  6 String  2065 3      Token  5 String  2065 3      Token  3 String  2065 3      Token     strtok iterate over the string first call find the non delemetor character  2 in this case  and marked it as token start then continues scan for a delimeter and replace it with null charater    gets replaced in actual string  and return start which points to token start character  i e   it return token 20 which is terminated by null   In subsequent call it start scaning from the next character and returns token if found else null  subsecuntly it returns token 6  5  3

User · Answer

First off I would say use boost tokenizer.
Alternatively if your data is space separated then the string stream library is very useful.

But both the above have already been covered.
So as a third C-Like alternative I propose copying the std::string into a buffer for modification.

std::string   data("The data I want to tokenize");

// Create a buffer of the correct length:
std::vector<char>  buffer(data.size()+1);

// copy the string into the buffer
strcpy(&buffer[0],data.c_str());

// Tokenize
strtok(&buffer[0]," ");

User · Answer

EDIT  usage of const cast is only used to demonstrate the effect of strtok   when applied to a pointer returned by string  c str     You should not use  strtok   since it modifies the tokenized string which may lead to undesired  if not undefined  behaviour as the C string  belongs  to the string instance     include  lt string gt   include  lt iostream gt   int main int ac  char   av        std  string theString  hello world        std  cout  lt  lt  theString  lt  lt    -    lt  lt  theString size    lt  lt  std  endl         --- this cast  only  to illustrate the effect of strtok   on std  string      char  token   strtok const cast lt char    gt  theString c str                std  cout  lt  lt  theString  lt  lt    -    lt  lt  theString size    lt  lt  std  endl       return 0      After the call to strtok    the space was  removed  from the string  or turned down to a non-printable character  but the length remains unchanged     gt   a out hello world - 11 helloworld - 11   Therefore you have to resort to native mechanism  duplication of the string or an third party library as previously mentioned

User · Answer

If boost is available on your system  I think it s standard on most Linux distros these days   it has a Tokenizer class you can use  If not  then a quick Google turns up a hand-rolled tokenizer for std  string that you can probably just copy and paste   It s very short  And  if you don t like either of those  then here s a split   function I wrote to make my life easier   It ll break a string into pieces using any of the chars in  delim  as separators   Pieces are appended to the  parts  vector   void split const string amp  str  const string amp  delim  vector lt string gt  amp  parts      size t start  end   0    while  end  lt  str size          start   end      while  start  lt  str size    amp  amp   delim find str start      string  npos           start        skip initial whitespace           end   start      while  end  lt  str size    amp  amp   delim find str end      string  npos           end       skip to end of word           if  end-start    0        just ignore zero-length strings        parts push back string str  start  end-start

User · Answer

First off I would say use boost tokenizer.
Alternatively if your data is space separated then the string stream library is very useful.

But both the above have already been covered.
So as a third C-Like alternative I propose copying the std::string into a buffer for modification.

std::string   data("The data I want to tokenize");

// Create a buffer of the correct length:
std::vector<char>  buffer(data.size()+1);

// copy the string into the buffer
strcpy(&buffer[0],data.c_str());

// Tokenize
strtok(&buffer[0]," ");

User · Answer

include  lt iostream gt   include  lt string gt   include  lt sstream gt  int main        std  string myText  some-text-to-tokenize        std  istringstream iss myText       std  string token      while  std  getline iss  token   -                  std  cout  lt  lt  token  lt  lt  std  endl            return 0      Or  as mentioned  use boost for more flexibility

User · Answer

If you don t mind open source  you could use the subbuffer and subparser classes from https   github com EdgeCast json parser  The original string is left intact  there is no allocation and no copying of data  I have not compiled the following so there may be errors   std  string input string  hello world    subbuffer input input string   subparser flds input       subparser  SKIP EMPTY   while   flds empty          subbuffer fld   flds next           do something with fld       or if you know it is only two fields subbuffer fld1   input before       subbuffer fld2   input sub fld1 length     1  ltrim

User · Answer

If boost is available on your system  I think it s standard on most Linux distros these days   it has a Tokenizer class you can use  If not  then a quick Google turns up a hand-rolled tokenizer for std  string that you can probably just copy and paste   It s very short  And  if you don t like either of those  then here s a split   function I wrote to make my life easier   It ll break a string into pieces using any of the chars in  delim  as separators   Pieces are appended to the  parts  vector   void split const string amp  str  const string amp  delim  vector lt string gt  amp  parts      size t start  end   0    while  end  lt  str size          start   end      while  start  lt  str size    amp  amp   delim find str start      string  npos           start        skip initial whitespace           end   start      while  end  lt  str size    amp  amp   delim find str end      string  npos           end       skip to end of word           if  end-start    0        just ignore zero-length strings        parts push back string str  start  end-start

User · Answer

include  lt iostream gt   include  lt string gt   include  lt sstream gt  int main        std  string myText  some-text-to-tokenize        std  istringstream iss myText       std  string token      while  std  getline iss  token   -                  std  cout  lt  lt  token  lt  lt  std  endl            return 0      Or  as mentioned  use boost for more flexibility

User · Answer

If you don t mind open source  you could use the subbuffer and subparser classes from https   github com EdgeCast json parser  The original string is left intact  there is no allocation and no copying of data  I have not compiled the following so there may be errors   std  string input string  hello world    subbuffer input input string   subparser flds input       subparser  SKIP EMPTY   while   flds empty          subbuffer fld   flds next           do something with fld       or if you know it is only two fields subbuffer fld1   input before       subbuffer fld2   input sub fld1 length     1  ltrim

User · Answer

Duplicate the string  tokenize it  then free it   char  dup   strdup str c str     token   strtok dup        free dup

User · Answer

Assuming that by  string  you re talking about std  string in C    you might have a look at the Tokenizer package in Boost

User · Answer

There is a more elegant solution    With std  string you can use resize   to allocate a suitably large buffer  and  s 0  to get a pointer to the internal buffer    At this point many fine folks will jump and yell at the screen  But this is the fact  About 2 years ago  the library working group decided  meeting at Lillehammer  that just like for std  vector  std  string should also formally  not just in practice  have a guaranteed contiguous buffer     The other concern is does strtok   increases the size of the string  The MSDN documentation says    Each call to strtok modifies strToken by inserting a null character after the token returned by that call    But this is not correct  Actually the function replaces the first occurrence of a separator character with  0  No change in the size of the string  If we have this string    one-two---three--four   we will end up with   one 0two 0--three 0-four   So my solution is very simple    std  string str  some-text-to-split    char seps      -   char  token   token   strtok   str 0   seps    while  token    NULL           Do your thing       token   strtok  NULL  seps        Read the discussion on http   www archivum info comp lang c   2008-05 02889 does std  string have something like CString  GetBuffer

User · Answer

I suppose the language is C  or C       strtok  IIRC  replace separators with  0  That s what it cannot use a const string  To workaround that  quickly   if the string isn t huge  you can just strdup   it  Which is wise if you need to keep the string unaltered  what the const suggest       On the other hand  you might want to use another tokenizer  perhaps hand rolled  less violent on the given argument

User · Answer

Assuming that by  string  you re talking about std  string in C    you might have a look at the Tokenizer package in Boost

User · Answer

If boost is available on your system  I think it s standard on most Linux distros these days   it has a Tokenizer class you can use  If not  then a quick Google turns up a hand-rolled tokenizer for std  string that you can probably just copy and paste   It s very short  And  if you don t like either of those  then here s a split   function I wrote to make my life easier   It ll break a string into pieces using any of the chars in  delim  as separators   Pieces are appended to the  parts  vector   void split const string amp  str  const string amp  delim  vector lt string gt  amp  parts      size t start  end   0    while  end  lt  str size          start   end      while  start  lt  str size    amp  amp   delim find str start      string  npos           start        skip initial whitespace           end   start      while  end  lt  str size    amp  amp   delim find str end      string  npos           end       skip to end of word           if  end-start    0        just ignore zero-length strings        parts push back string str  start  end-start

User · Answer

If boost is available on your system  I think it s standard on most Linux distros these days   it has a Tokenizer class you can use  If not  then a quick Google turns up a hand-rolled tokenizer for std  string that you can probably just copy and paste   It s very short  And  if you don t like either of those  then here s a split   function I wrote to make my life easier   It ll break a string into pieces using any of the chars in  delim  as separators   Pieces are appended to the  parts  vector   void split const string amp  str  const string amp  delim  vector lt string gt  amp  parts      size t start  end   0    while  end  lt  str size          start   end      while  start  lt  str size    amp  amp   delim find str start      string  npos           start        skip initial whitespace           end   start      while  end  lt  str size    amp  amp   delim find str end      string  npos           end       skip to end of word           if  end-start    0        just ignore zero-length strings        parts push back string str  start  end-start

User · Answer

include  lt iostream gt   include  lt string gt   include  lt sstream gt  int main        std  string myText  some-text-to-tokenize        std  istringstream iss myText       std  string token      while  std  getline iss  token   -                  std  cout  lt  lt  token  lt  lt  std  endl            return 0      Or  as mentioned  use boost for more flexibility

User · Answer

include  lt iostream gt   include  lt string gt   include  lt sstream gt  int main        std  string myText  some-text-to-tokenize        std  istringstream iss myText       std  string token      while  std  getline iss  token   -                  std  cout  lt  lt  token  lt  lt  std  endl            return 0      Or  as mentioned  use boost for more flexibility

User · Answer

First off I would say use boost tokenizer.
Alternatively if your data is space separated then the string stream library is very useful.

But both the above have already been covered.
So as a third C-Like alternative I propose copying the std::string into a buffer for modification.

std::string   data("The data I want to tokenize");

// Create a buffer of the correct length:
std::vector<char>  buffer(data.size()+1);

// copy the string into the buffer
strcpy(&buffer[0],data.c_str());

// Tokenize
strtok(&buffer[0]," ");

User · Answer

EDIT  usage of const cast is only used to demonstrate the effect of strtok   when applied to a pointer returned by string  c str     You should not use  strtok   since it modifies the tokenized string which may lead to undesired  if not undefined  behaviour as the C string  belongs  to the string instance     include  lt string gt   include  lt iostream gt   int main int ac  char   av        std  string theString  hello world        std  cout  lt  lt  theString  lt  lt    -    lt  lt  theString size    lt  lt  std  endl         --- this cast  only  to illustrate the effect of strtok   on std  string      char  token   strtok const cast lt char    gt  theString c str                std  cout  lt  lt  theString  lt  lt    -    lt  lt  theString size    lt  lt  std  endl       return 0      After the call to strtok    the space was  removed  from the string  or turned down to a non-printable character  but the length remains unchanged     gt   a out hello world - 11 helloworld - 11   Therefore you have to resort to native mechanism  duplication of the string or an third party library as previously mentioned

User · Answer

With C  17 str  string receives data   overload that returns a pointer to modifieable buffer so string can be used in strtok directly without any hacks    include  lt string gt   include  lt iostream gt   include  lt cstring gt   include  lt cstdlib gt   int main           std  string text  pop dop rop        char const   const psz delimiter           char   psz token   std  strtok text data    psz delimiter        while nullptr    psz token                  std  cout  lt  lt  psz token  lt  lt    std  endl          psz token   std  strtok nullptr  psz delimiter             return EXIT SUCCESS      output     pop   dop   rop

User · Answer

Duplicate the string  tokenize it  then free it   char  dup   strdup str c str     token   strtok dup        free dup

User · Answer

I suppose the language is C  or C       strtok  IIRC  replace separators with  0  That s what it cannot use a const string  To workaround that  quickly   if the string isn t huge  you can just strdup   it  Which is wise if you need to keep the string unaltered  what the const suggest       On the other hand  you might want to use another tokenizer  perhaps hand rolled  less violent on the given argument

User · Answer

I suppose the language is C  or C       strtok  IIRC  replace separators with  0  That s what it cannot use a const string  To workaround that  quickly   if the string isn t huge  you can just strdup   it  Which is wise if you need to keep the string unaltered  what the const suggest       On the other hand  you might want to use another tokenizer  perhaps hand rolled  less violent on the given argument

User · Answer

There is a more elegant solution    With std  string you can use resize   to allocate a suitably large buffer  and  s 0  to get a pointer to the internal buffer    At this point many fine folks will jump and yell at the screen  But this is the fact  About 2 years ago  the library working group decided  meeting at Lillehammer  that just like for std  vector  std  string should also formally  not just in practice  have a guaranteed contiguous buffer     The other concern is does strtok   increases the size of the string  The MSDN documentation says    Each call to strtok modifies strToken by inserting a null character after the token returned by that call    But this is not correct  Actually the function replaces the first occurrence of a separator character with  0  No change in the size of the string  If we have this string    one-two---three--four   we will end up with   one 0two 0--three 0-four   So my solution is very simple    std  string str  some-text-to-split    char seps      -   char  token   token   strtok   str 0   seps    while  token    NULL           Do your thing       token   strtok  NULL  seps        Read the discussion on http   www archivum info comp lang c   2008-05 02889 does std  string have something like CString  GetBuffer

User · Answer

EDIT  usage of const cast is only used to demonstrate the effect of strtok   when applied to a pointer returned by string  c str     You should not use  strtok   since it modifies the tokenized string which may lead to undesired  if not undefined  behaviour as the C string  belongs  to the string instance     include  lt string gt   include  lt iostream gt   int main int ac  char   av        std  string theString  hello world        std  cout  lt  lt  theString  lt  lt    -    lt  lt  theString size    lt  lt  std  endl         --- this cast  only  to illustrate the effect of strtok   on std  string      char  token   strtok const cast lt char    gt  theString c str                std  cout  lt  lt  theString  lt  lt    -    lt  lt  theString size    lt  lt  std  endl       return 0      After the call to strtok    the space was  removed  from the string  or turned down to a non-printable character  but the length remains unchanged     gt   a out hello world - 11 helloworld - 11   Therefore you have to resort to native mechanism  duplication of the string or an third party library as previously mentioned

User · Answer

Assuming that by  string  you re talking about std  string in C    you might have a look at the Tokenizer package in Boost

User · Answer

First off I would say use boost tokenizer.
Alternatively if your data is space separated then the string stream library is very useful.

But both the above have already been covered.
So as a third C-Like alternative I propose copying the std::string into a buffer for modification.

std::string   data("The data I want to tokenize");

// Create a buffer of the correct length:
std::vector<char>  buffer(data.size()+1);

// copy the string into the buffer
strcpy(&buffer[0],data.c_str());

// Tokenize
strtok(&buffer[0]," ");

User · Answer

With C  17 str  string receives data   overload that returns a pointer to modifieable buffer so string can be used in strtok directly without any hacks    include  lt string gt   include  lt iostream gt   include  lt cstring gt   include  lt cstdlib gt   int main           std  string text  pop dop rop        char const   const psz delimiter           char   psz token   std  strtok text data    psz delimiter        while nullptr    psz token                  std  cout  lt  lt  psz token  lt  lt    std  endl          psz token   std  strtok nullptr  psz delimiter             return EXIT SUCCESS      output     pop   dop   rop

User · Answer

Typecasting to  char   got it working for me  token   strtok  char   str c str     quot   quot

User · Answer

Typecasting to  char   got it working for me  token   strtok  char   str c str     quot   quot

User · Answer

Duplicate the string  tokenize it  then free it   char  dup   strdup str c str     token   strtok dup        free dup

User · Answer

I suppose the language is C  or C       strtok  IIRC  replace separators with  0  That s what it cannot use a const string  To workaround that  quickly   if the string isn t huge  you can just strdup   it  Which is wise if you need to keep the string unaltered  what the const suggest       On the other hand  you might want to use another tokenizer  perhaps hand rolled  less violent on the given argument

User · Answer

Duplicate the string  tokenize it  then free it   char  dup   strdup str c str     token   strtok dup        free dup

User · Answer

It fails because str c str   returns constant string but char   strtok  char   str  const char   delimiters   requires volatile string  So you need to use  const cast lt  char   inorder to make it voletile  I am giving you a complete but small program to tokenize the string using C strtok   function       include  lt iostream gt      include  lt string gt      include  lt string h gt      using namespace std     int main            string s  20 6 5  3             strtok requires volatile string as it modifies the supplied string in order to tokenize it         char  str const cast lt  char   gt  s c str                char  tok         tok strtok str                       int arr 4   i 0             while tok  NULL              arr i    stoi tok              tok strtok NULL                                for int i 0  i lt 4  i    cout lt  lt arr i  lt  lt endl           return 0           NOTE  strtok may not be suitable in all situation as the string passed to function gets modified by being broken into smaller strings  Pls   ref to get better understanding of strtok functionality   How  strtok works  Added few print statement to better understand the changes happning to string in each call to strtok and how it returns token    include  lt iostream gt   include  lt string gt   include  lt string h gt   using namespace std  int main         string s  20 6 5  3       char  str const cast lt  char   gt  s c str             char  tok      cout lt  lt  string    lt  lt s lt  lt endl      tok strtok str                    cout lt  lt  String    lt  lt s lt  lt   tToken    lt  lt tok lt  lt endl         while tok  NULL           tok strtok NULL                   cout lt  lt  String    lt  lt s lt  lt   t tToken    lt  lt tok lt  lt endl            return 0      Output   string  20 6 5  3  String  206 5  3    Token  20 String  2065  3     Token  6 String  2065 3      Token  5 String  2065 3      Token  3 String  2065 3      Token     strtok iterate over the string first call find the non delemetor character  2 in this case  and marked it as token start then continues scan for a delimeter and replace it with null charater    gets replaced in actual string  and return start which points to token start character  i e   it return token 20 which is terminated by null   In subsequent call it start scaning from the next character and returns token if found else null  subsecuntly it returns token 6  5  3

User · Answer

EDIT  usage of const cast is only used to demonstrate the effect of strtok   when applied to a pointer returned by string  c str     You should not use  strtok   since it modifies the tokenized string which may lead to undesired  if not undefined  behaviour as the C string  belongs  to the string instance     include  lt string gt   include  lt iostream gt   int main int ac  char   av        std  string theString  hello world        std  cout  lt  lt  theString  lt  lt    -    lt  lt  theString size    lt  lt  std  endl         --- this cast  only  to illustrate the effect of strtok   on std  string      char  token   strtok const cast lt char    gt  theString c str                std  cout  lt  lt  theString  lt  lt    -    lt  lt  theString size    lt  lt  std  endl       return 0      After the call to strtok    the space was  removed  from the string  or turned down to a non-printable character  but the length remains unchanged     gt   a out hello world - 11 helloworld - 11   Therefore you have to resort to native mechanism  duplication of the string or an third party library as previously mentioned

[c++] Using strtok with a std::string

The answer is

Examples related to c++

Examples related to strtok

Tags