[c] Tokenizing strings in C

I have been trying to tokenize a string using SPACE as delimiter but it doesn't work. Does any one have suggestion on why it doesn't work?

Edit: tokenizing using:

strtok(string, " ");

The code is like the following

pch = strtok (str," ");
while (pch != NULL)
{
  printf ("%s\n",pch);
  pch = strtok (NULL, " ");
}

This question is related to c string tokenize

The answer is


Here's an example of strtok usage, keep in mind that strtok is destructive of its input string (and therefore can't ever be used on a string constant

char *p = strtok(str, " ");
while(p != NULL) {
    printf("%s\n", p);
    p = strtok(NULL, " ");
}

Basically the thing to note is that passing a NULL as the first parameter to strtok tells it to get the next token from the string it was previously tokenizing.


strtok can be very dangerous. It is not thread safe. Its intended use is to be called over and over in a loop, passing in the output from the previous call. The strtok function has an internal variable that stores the state of the strtok call. This state is not unique to each thread - it is global. If any other code uses strtok in another thread, you get problems. Not the kind of problems you want to track down either!

I'd recommend looking for a regex implementation, or using sscanf to pull apart the string.

Try this:

char strprint[256];
char text[256];
strcpy(text, "My string to test");
while ( sscanf( text, "%s %s", strprint, text) > 0 ) {
   printf("token: %s\n", strprint);
}

Note: The 'text' string is destroyed as it's separated. This may not be the preferred behaviour =)


When reading the strtok documentation, I see you need to pass in a NULL pointer after the first "initializing" call. Maybe you didn't do that. Just a guess of course.


Here is another strtok() implementation, which has the ability to recognize consecutive delimiters (standard library's strtok() does not have this)

The function is a part of BSD licensed string library, called zString. You are more than welcome to contribute :)

https://github.com/fnoyanisi/zString

char *zstring_strtok(char *str, const char *delim) {
    static char *static_str=0;      /* var to store last address */
    int index=0, strlength=0;       /* integers for indexes */
    int found = 0;                  /* check if delim is found */

    /* delimiter cannot be NULL
    * if no more char left, return NULL as well
    */
    if (delim==0 || (str == 0 && static_str == 0))
        return 0;

    if (str == 0)
        str = static_str;

    /* get length of string */
    while(str[strlength])
        strlength++;

    /* find the first occurance of delim */
    for (index=0;index<strlength;index++)
        if (str[index]==delim[0]) {
            found=1;
            break;
        }

    /* if delim is not contained in str, return str */
    if (!found) {
        static_str = 0;
        return str;
    }

    /* check for consecutive delimiters
    *if first char is delim, return delim
    */
    if (str[0]==delim[0]) {
        static_str = (str + 1);
        return (char *)delim;
    }

    /* terminate the string
    * this assignmetn requires char[], so str has to
    * be char[] rather than *char
    */
    str[index] = '\0';

    /* save the rest of the string */
    if ((str + index + 1)!=0)
        static_str = (str + index + 1);
    else
        static_str = 0;

        return str;
}

As mentioned in previous posts, since strtok(), or the one I implmented above, relies on a static *char variable to preserve the location of last delimiter between consecutive calls, extra care should be taken while dealing with multi-threaded aplications.


Here's an example of strtok usage, keep in mind that strtok is destructive of its input string (and therefore can't ever be used on a string constant

char *p = strtok(str, " ");
while(p != NULL) {
    printf("%s\n", p);
    p = strtok(NULL, " ");
}

Basically the thing to note is that passing a NULL as the first parameter to strtok tells it to get the next token from the string it was previously tokenizing.


I've made some string functions in order to split values, by using less pointers as I could because this code is intended to run on PIC18F processors. Those processors does not handle really good with pointers when you have few free RAM available:

#include <stdio.h>
#include <string.h>

char POSTREQ[255] = "pwd=123456&apply=Apply&d1=88&d2=100&pwr=1&mpx=Internal&stmo=Stereo&proc=Processor&cmp=Compressor&ip1=192&ip2=168&ip3=10&ip4=131&gw1=192&gw2=168&gw3=10&gw4=192&pt=80&lic=&A=A";

int findchar(char *string, int Start, char C) {
    while((string[Start] != 0)) { Start++; if(string[Start] == C) return Start; }
    return -1;
}

int findcharn(char *string, int Times, char C) {
   int i = 0, pos = 0, fnd = 0;

    while(i < Times) {
       fnd = findchar(string, pos, C);
        if(fnd < 0) return -1;
        if(fnd > 0) pos = fnd;
       i++;
   }
   return fnd;
}

void mid(char *in, char *out, int start, int end) {
    int i = 0;
    int size = end - start;

    for(i = 0; i < size; i++){
        out[i] = in[start + i + 1];
    }
    out[size] = 0;
}

void getvalue(char *out, int index) {
    mid(POSTREQ, out, findcharn(POSTREQ, index, '='), (findcharn(POSTREQ, index, '&') - 1));
}

void main() {
   char n_pwd[7];
   char n_d1[7];

   getvalue(n_d1, 1);

   printf("Value: %s\n", n_d1);
} 

You can simplify the code by introducing an extra variable.

#include <string.h>
#include <stdio.h>

int main()
{
    char str[100], *s = str, *t = NULL;

    strcpy(str, "a space delimited string");
    while ((t = strtok(s, " ")) != NULL) {
        s = NULL;
        printf(":%s:\n", t);
    }
    return 0;
}

strtok can be very dangerous. It is not thread safe. Its intended use is to be called over and over in a loop, passing in the output from the previous call. The strtok function has an internal variable that stores the state of the strtok call. This state is not unique to each thread - it is global. If any other code uses strtok in another thread, you get problems. Not the kind of problems you want to track down either!

I'd recommend looking for a regex implementation, or using sscanf to pull apart the string.

Try this:

char strprint[256];
char text[256];
strcpy(text, "My string to test");
while ( sscanf( text, "%s %s", strprint, text) > 0 ) {
   printf("token: %s\n", strprint);
}

Note: The 'text' string is destroyed as it's separated. This may not be the preferred behaviour =)


When reading the strtok documentation, I see you need to pass in a NULL pointer after the first "initializing" call. Maybe you didn't do that. Just a guess of course.


strtok can be very dangerous. It is not thread safe. Its intended use is to be called over and over in a loop, passing in the output from the previous call. The strtok function has an internal variable that stores the state of the strtok call. This state is not unique to each thread - it is global. If any other code uses strtok in another thread, you get problems. Not the kind of problems you want to track down either!

I'd recommend looking for a regex implementation, or using sscanf to pull apart the string.

Try this:

char strprint[256];
char text[256];
strcpy(text, "My string to test");
while ( sscanf( text, "%s %s", strprint, text) > 0 ) {
   printf("token: %s\n", strprint);
}

Note: The 'text' string is destroyed as it's separated. This may not be the preferred behaviour =)


You can simplify the code by introducing an extra variable.

#include <string.h>
#include <stdio.h>

int main()
{
    char str[100], *s = str, *t = NULL;

    strcpy(str, "a space delimited string");
    while ((t = strtok(s, " ")) != NULL) {
        s = NULL;
        printf(":%s:\n", t);
    }
    return 0;
}

Here's an example of strtok usage, keep in mind that strtok is destructive of its input string (and therefore can't ever be used on a string constant

char *p = strtok(str, " ");
while(p != NULL) {
    printf("%s\n", p);
    p = strtok(NULL, " ");
}

Basically the thing to note is that passing a NULL as the first parameter to strtok tells it to get the next token from the string it was previously tokenizing.


Do it like this:

char s[256];
strcpy(s, "one two three");
char* token = strtok(s, " ");
while (token) {
    printf("token: %s\n", token);
    token = strtok(NULL, " ");
}

Note: strtok modifies the string its tokenising, so it cannot be a const char*.


Here's an example of strtok usage, keep in mind that strtok is destructive of its input string (and therefore can't ever be used on a string constant

char *p = strtok(str, " ");
while(p != NULL) {
    printf("%s\n", p);
    p = strtok(NULL, " ");
}

Basically the thing to note is that passing a NULL as the first parameter to strtok tells it to get the next token from the string it was previously tokenizing.


int not_in_delimiter(char c, char *delim){

    while(*delim != '\0'){
            if(c == *delim) return 0;
            delim++;
    }
    return 1;
}

char *token_separater(char *source, char *delimiter, char **last){

char *begin, *next_token;
char *sbegin;

/*Get the start of the token */
if(source)
  begin = source;
else
  begin = *last;

sbegin = begin;

/*Scan through the string till we find character in delimiter. */
while(*begin != '\0' && not_in_delimiter(*begin, delimiter)){
       begin++;
}

/* Check if we have reached at of the string */
if(*begin == '\0') {
/* We dont need to come further, hence return NULL*/
   *last = NULL;
    return sbegin;
}
/* Scan the string till we find a character which is not in delimiter */
 next_token  = begin;
 while(next_token != '\0' && !not_in_delimiter(*next_token, delimiter))    {
    next_token++;
 }
 /* If we have not reached at the end of the string */
 if(*next_token != '\0'){
  *last = next_token--;
  *next_token = '\0';
   return sbegin;
}
}

 void main(){

    char string[10] = "abcb_dccc";
    char delim[10] = "_";
    char *token = NULL;
    char *last = "" ;
    token  = token_separater(string, delim, &last);
    printf("%s\n", token);
    while(last){
            token  = token_separater(NULL, delim, &last);
            printf("%s\n", token);
    }

}

You can read detail analysis at blog mentioned in my profile :)


Do it like this:

char s[256];
strcpy(s, "one two three");
char* token = strtok(s, " ");
while (token) {
    printf("token: %s\n", token);
    token = strtok(NULL, " ");
}

Note: strtok modifies the string its tokenising, so it cannot be a const char*.


I've made some string functions in order to split values, by using less pointers as I could because this code is intended to run on PIC18F processors. Those processors does not handle really good with pointers when you have few free RAM available:

#include <stdio.h>
#include <string.h>

char POSTREQ[255] = "pwd=123456&apply=Apply&d1=88&d2=100&pwr=1&mpx=Internal&stmo=Stereo&proc=Processor&cmp=Compressor&ip1=192&ip2=168&ip3=10&ip4=131&gw1=192&gw2=168&gw3=10&gw4=192&pt=80&lic=&A=A";

int findchar(char *string, int Start, char C) {
    while((string[Start] != 0)) { Start++; if(string[Start] == C) return Start; }
    return -1;
}

int findcharn(char *string, int Times, char C) {
   int i = 0, pos = 0, fnd = 0;

    while(i < Times) {
       fnd = findchar(string, pos, C);
        if(fnd < 0) return -1;
        if(fnd > 0) pos = fnd;
       i++;
   }
   return fnd;
}

void mid(char *in, char *out, int start, int end) {
    int i = 0;
    int size = end - start;

    for(i = 0; i < size; i++){
        out[i] = in[start + i + 1];
    }
    out[size] = 0;
}

void getvalue(char *out, int index) {
    mid(POSTREQ, out, findcharn(POSTREQ, index, '='), (findcharn(POSTREQ, index, '&') - 1));
}

void main() {
   char n_pwd[7];
   char n_d1[7];

   getvalue(n_d1, 1);

   printf("Value: %s\n", n_d1);
} 

int not_in_delimiter(char c, char *delim){

    while(*delim != '\0'){
            if(c == *delim) return 0;
            delim++;
    }
    return 1;
}

char *token_separater(char *source, char *delimiter, char **last){

char *begin, *next_token;
char *sbegin;

/*Get the start of the token */
if(source)
  begin = source;
else
  begin = *last;

sbegin = begin;

/*Scan through the string till we find character in delimiter. */
while(*begin != '\0' && not_in_delimiter(*begin, delimiter)){
       begin++;
}

/* Check if we have reached at of the string */
if(*begin == '\0') {
/* We dont need to come further, hence return NULL*/
   *last = NULL;
    return sbegin;
}
/* Scan the string till we find a character which is not in delimiter */
 next_token  = begin;
 while(next_token != '\0' && !not_in_delimiter(*next_token, delimiter))    {
    next_token++;
 }
 /* If we have not reached at the end of the string */
 if(*next_token != '\0'){
  *last = next_token--;
  *next_token = '\0';
   return sbegin;
}
}

 void main(){

    char string[10] = "abcb_dccc";
    char delim[10] = "_";
    char *token = NULL;
    char *last = "" ;
    token  = token_separater(string, delim, &last);
    printf("%s\n", token);
    while(last){
            token  = token_separater(NULL, delim, &last);
            printf("%s\n", token);
    }

}

You can read detail analysis at blog mentioned in my profile :)


You can simplify the code by introducing an extra variable.

#include <string.h>
#include <stdio.h>

int main()
{
    char str[100], *s = str, *t = NULL;

    strcpy(str, "a space delimited string");
    while ((t = strtok(s, " ")) != NULL) {
        s = NULL;
        printf(":%s:\n", t);
    }
    return 0;
}

Do it like this:

char s[256];
strcpy(s, "one two three");
char* token = strtok(s, " ");
while (token) {
    printf("token: %s\n", token);
    token = strtok(NULL, " ");
}

Note: strtok modifies the string its tokenising, so it cannot be a const char*.


When reading the strtok documentation, I see you need to pass in a NULL pointer after the first "initializing" call. Maybe you didn't do that. Just a guess of course.


You can simplify the code by introducing an extra variable.

#include <string.h>
#include <stdio.h>

int main()
{
    char str[100], *s = str, *t = NULL;

    strcpy(str, "a space delimited string");
    while ((t = strtok(s, " ")) != NULL) {
        s = NULL;
        printf(":%s:\n", t);
    }
    return 0;
}

Examples related to c

conflicting types for 'outchar' Can't compile C program on a Mac after upgrade to Mojave Program to find largest and second largest number in array Prime numbers between 1 to 100 in C Programming Language In c, in bool, true == 1 and false == 0? How I can print to stderr in C? Visual Studio Code includePath "error: assignment to expression with array type error" when I assign a struct field (C) Compiling an application for use in highly radioactive environments How can you print multiple variables inside a string using printf?

Examples related to string

How to split a string in two and store it in a field String method cannot be found in a main class method Kotlin - How to correctly concatenate a String Replacing a character from a certain index Remove quotes from String in Python Detect whether a Python string is a number or a letter How does String substring work in Swift How does String.Index work in Swift swift 3.0 Data to String? How to parse JSON string in Typescript

Examples related to tokenize

How to get rid of punctuation using NLTK tokenizer? How do I tokenize a string sentence in NLTK? Splitting string into multiple rows in Oracle Parse (split) a string in C++ using string delimiter (standard C++) How to use stringstream to separate comma separated strings Split string with PowerShell and do something with each token Splitting comma separated string in a PL/SQL stored proc Convert comma separated string to array in PL/SQL Is there a function to split a string in PL/SQL? How to split a string in shell and get the last field