Parse split a string in C using string delimiter standard C

Question

I am parsing a string in C   using the following   using namespace std   string parsed input  text to be parsed   stringstream input stringstream input    if  getline input stringstream parsed                 do some processing      Parsing with a single char delimiter is fine  But what if I want to use a string as delimiter   Example  I want to split   scott gt  tiger   with  gt   as delimiter so that I can get scott and tiger

User · Answer

This is a complete method that splits the string on any delimiter and returns a vector of the chopped up strings.

It is an adaptation from the answer from ryanbwork. However, his check for: if(token != mystring) gives wrong results if you have repeating elements in your string. This is my solution to that problem.

vector<string> Split(string mystring, string delimiter)
{
    vector<string> subStringList;
    string token;
    while (true)
    {
        size_t findfirst = mystring.find_first_of(delimiter);
        if (findfirst == string::npos) //find_first_of returns npos if it couldn't find the delimiter anymore
        {
            subStringList.push_back(mystring); //push back the final piece of mystring
            return subStringList;
        }
        token = mystring.substr(0, mystring.find_first_of(delimiter));
        mystring = mystring.substr(mystring.find_first_of(delimiter) + 1);
        subStringList.push_back(token);
    }
    return subStringList;
}

User · Answer

I would use boost  tokenizer   Here s documentation explaining how to make an appropriate tokenizer function  http   www boost org doc libs 1 52 0 libs tokenizer tokenizerfunction htm  Here s one that works for your case   struct my tokenizer func       template lt typename It gt      bool operator   It amp  next  It end  std  string  amp  tok                if  next    end              return false          char const   del     gt             auto pos   std  search next  end  del  del   2           tok assign next  pos           next   pos          if  next    end              std  advance next  2           return true             void reset          int main         std  string to be parsed    1  one gt  2  two gt  3  three gt  4  four       for  auto i   boost  tokenizer lt my tokenizer func gt  to be parsed           std  cout  lt  lt  i  lt  lt    n

User · Answer

For string delimiter  Split string based on a string delimiter  Such as splitting string  adsf- qwret- nvfkbdsj- orthdfjgh- dfjrleih  based on string delimiter  -    output will be   adsf    qwret    nvfkbdsj    orthdfjgh    dfjrleih      include  lt iostream gt   include  lt sstream gt   include  lt vector gt   using namespace std      for string delimiter vector lt string gt  split  string s  string delimiter        size t pos start   0  pos end  delim len   delimiter length        string token      vector lt string gt  res       while   pos end   s find  delimiter  pos start      string  npos            token   s substr  pos start  pos end - pos start           pos start   pos end   delim len          res push back  token              res push back  s substr  pos start        return res     int main         string str    adsf- qwret- nvfkbdsj- orthdfjgh- dfjrleih       string delimiter    -        vector lt string gt  v   split  str  delimiter        for  auto i   v  cout  lt  lt  i  lt  lt  endl       return 0       Output   adsf qwret nvfkbdsj orthdfjgh dfjrleih     For single character delimiter  Split string based on a character delimiter  Such as splitting string  adsf qwer poui fdgh  with delimiter     will output   adsf    qwer    poui    fdg h     include  lt iostream gt   include  lt sstream gt   include  lt vector gt   using namespace std   vector lt string gt  split  const string  amp s  char delim        vector lt string gt  result      stringstream ss  s       string item       while  getline  ss  item  delim             result push back  item              return result     int main         string str    adsf qwer poui fdgh       vector lt string gt  v   split  str             for  auto i   v  cout  lt  lt  i  lt  lt  endl       return 0       Output   adsf qwer poui fdgh

User · Answer

Since this is the top-rated Stack Overflow Google search result for C   split string or similar  I ll post a complete  copy paste runnable example that shows both methods  splitString uses stringstream  probably the better and easier option in most cases  splitString2 uses find and substr  a more manual approach     SplitString cpp   include  lt iostream gt   include  lt vector gt   include  lt string gt   include  lt sstream gt      function prototypes std  vector lt std  string gt  splitString const std  string amp  str  char delim   std  vector lt std  string gt  splitString2 const std  string amp  str  char delim   std  string getSubstring const std  string amp  str  int leftIdx  int rightIdx     int main void         Test cases - all will pass      std  string str    quot ab cd ef quot       std  string str    quot abcdef quot       std  string str    quot  quot       std  string str    quot  cd ef quot       std  string str    quot ab cd  quot        behavior of splitString and splitString2 is different for this final case only  if this case matters to you choose which one you need as applicable         std  vector lt std  string gt  tokens   splitString str             std  cout  lt  lt   quot tokens   quot   lt  lt   quot  n quot        if  tokens empty            std  cout  lt  lt   quot  tokens is empty  quot   lt  lt   quot  n quot         else         for  auto amp  token   tokens              if  token     quot  quot   std  cout  lt  lt   quot  empty string  quot   lt  lt   quot  n quot         else std  cout  lt  lt  token  lt  lt   quot  n quot                    return 0     std  vector lt std  string gt  splitString const std  string amp  str  char delim      std  vector lt std  string gt  tokens       if  str     quot  quot   return tokens       std  string currentToken       std  stringstream ss str        while  std  getline ss  currentToken  delim           tokens push back currentToken            return tokens     std  vector lt std  string gt  splitString2 const std  string amp  str  char delim      std  vector lt std  string gt  tokens       if  str     quot  quot   return tokens       int leftIdx   0       int delimIdx   str find delim        int rightIdx       while  delimIdx    std  string  npos          rightIdx   delimIdx - 1           std  string token   getSubstring str  leftIdx  rightIdx       tokens push back token               prep for next time around     leftIdx   delimIdx   1           delimIdx   str find delim  delimIdx   1            rightIdx   str size   - 1       std  string token   getSubstring str  leftIdx  rightIdx     tokens push back token        return tokens     std  string getSubstring const std  string amp  str  int leftIdx  int rightIdx      return str substr leftIdx  rightIdx - leftIdx   1

User · Answer

This method uses std  string  find without mutating the original string by remembering the beginning and end of the previous substring token    include  lt iostream gt   include  lt string gt   int main         std  string s    scott gt  tiger       std  string delim     gt          auto start   0U      auto end   s find delim       while  end    std  string  npos                std  cout  lt  lt  s substr start  end - start   lt  lt  std  endl          start   end   delim length            end   s find delim  start              std  cout  lt  lt  s substr start  end

User · Answer

std  vector lt std  string gt  parse std  string str std  string delim       std  vector lt std  string gt  tokens      char  str c   strdup str c str          char  token   NULL       token   strtok str c  delim c str          while  token    NULL             tokens push back std  string token              token   strtok NULL  delim c str                 delete   str c       return tokens

User · Answer

You can use the std  string  find   function to find the position of your string delimiter  then use std  string  substr   to get a token   Example   std  string s    scott gt  tiger   std  string delimiter     gt     std  string token   s substr 0  s find delimiter       token is  scott     The find const string amp  str  size t pos   0  function returns the position of the first occurrence of str in the string  or npos if the string is not found  The substr size t pos   0  size t n   npos  function returns a substring of the object  starting at position pos and of length npos      If you have multiple delimiters  after you have extracted one token  you can remove it  delimiter included  to proceed with subsequent extractions  if you want to preserve the original string  just use s   s substr pos   delimiter length        s erase 0  s find delimiter    delimiter length       This way you can easily loop to get each token    Complete Example   std  string s    scott gt  tiger gt  mushroom   std  string delimiter     gt      size t pos   0  std  string token  while   pos   s find delimiter      std  string  npos        token   s substr 0  pos       std  cout  lt  lt  token  lt  lt  std  endl      s erase 0  pos   delimiter length       std  cout  lt  lt  s  lt  lt  std  endl    Output   scott tiger mushroom

User · Answer

As a bonus  here is a code example of a split function and macro that is easy to use and where you can choose the container type    include  lt iostream gt   include  lt vector gt   include  lt string gt    define split str  delim  type   split fn lt type lt std  string gt  gt  str  delim     template  lt typename Container gt  Container split fn const std  string amp  str  char delim              Container cont        std  size t current  previous   0      current   str find delim       while  current    std  string  npos            cont push back str substr previous  current - previous            previous   current   1          current   str find delim  previous             cont push back str substr previous  current - previous             return cont     int main              auto test   std  string  quot This is a great test quot        auto res   split test       std  vector            for auto  amp i   res            std  cout  lt  lt  i  lt  lt   quot    quot       quot this quot    quot is quot    quot a quot    quot great quot    quot test quot                      return 0

User · Answer

Function   std  vector lt std  string gt  WSJCppCore  split const std  string amp  sWhat  const std  string amp  sDelim        std  vector lt std  string gt  vRet      size t nPos   0      size t nLen   sWhat length        size t nDelimLen   sDelim length        while  nPos  lt  nLen            std  size t nFoundPos   sWhat find sDelim  nPos           if  nFoundPos    std  string  npos                std  string sToken   sWhat substr nPos  nFoundPos - nPos               vRet push back sToken               nPos   nFoundPos   nDelimLen              if  nFoundPos   nDelimLen    nLen       last delimiter                 vRet push back                              else               std  string sToken   sWhat substr nPos  nLen - nPos               vRet push back sToken               break                      return vRet      Unit-tests   bool UnitTestSplit  run     bool bTestSuccess   true       struct LTest           LTest              const std  string  amp sStr              const std  string  amp sDelim              const std  vector lt std  string gt   amp vExpectedVector                         this- gt sStr   sStr              this- gt sDelim   sDelim              this- gt vExpectedVector   vExpectedVector                     std  string sStr          std  string sDelim          std  vector lt std  string gt  vExpectedVector             std  vector lt LTest gt  tests      tests push back LTest  1 2 3 4 5          1    2    3    4    5          tests push back LTest   1f 2  3   44354 5kdasjfdre 2              1f    2     3      44354    5kdasjfdre    2          tests push back LTest   1f 2  3   44354 5kdasjfdre               1f    2     3      44354    5kdasjfdre              tests push back LTest  some1   gt  some2   gt  some3      gt      some1      some2      some3          tests push back LTest  some1   gt  some2   gt  some3   gt       gt      some1      some2      some3                for  int i   0  i  lt  tests size    i              LTest test   tests i           std  string sPrefix    test    std  to string i            test sStr                  std  vector lt std  string gt  vSplitted   WSJCppCore  split test sStr  test sDelim           compareN bTestSuccess  sPrefix      size   vSplitted size    test vExpectedVector size             int nMin   std  min vSplitted size    test vExpectedVector size             for  int n   0  n  lt  nMin  n                  compareS bTestSuccess  sPrefix      element      std  to string n   vSplitted n   test vExpectedVector n                         return bTestSuccess

User · Answer

include lt iostream gt   include lt algorithm gt  using namespace std   int split count string str char delimit   return count str begin   str end   delimit      void split string str char delimit string res     int a 0 i 0  while a lt str size     res i  str substr a str find delimit    a  res i  size   1  i         int main     string a  abc xyz mno def   int x split count a      1  string res x   split a     res    for int i 0 i lt x i    cout lt  lt res i  lt  lt endl    return 0      P S  Works only if the lengths of the strings after splitting are equal

User · Answer

strtok allows you to pass in multiple chars as delimiters  I bet if you passed in      your example string would be split correctly  even though the   and   are counted as individual delimiters    EDIT if you don t want to use c str   to convert from string to char   you can use substr and find first of to tokenize   string token  mystring  scott gt  tiger    while token    mystring     token   mystring substr 0 mystring find first of   gt         mystring   mystring substr mystring find first of   gt       1     printf   s   token c str

User · Answer

Container splitR const std  string amp  input  const std  string amp  delims        Container out      size t delims len   delims size        auto begIdx   0      auto endIdx   input find delims  begIdx       if  endIdx    std  string  npos  amp  amp  input size      0            insert in container out  input             while  endIdx    std  string  npos            insert in container out  input substr begIdx  endIdx - begIdx            begIdx   endIdx   delims len          endIdx   input find delims  begIdx           if  endIdx    std  string  npos                insert in container out  input substr begIdx  input length   - begIdx                        return out

User · Answer

Since C  11 it can be done like this  std  vector lt std  string gt  splitString const std  string amp  str                                       const std  regex amp  regex      return  std  sregex token iterator str begin    str end    regex  -1              std  sregex token iterator             usually we have a predefined set of regular expressions  then    let s build those only once and re-use them multiple times static const std  regex regex1 R quot some-reg-exp1 quot   std  regex  optimize   static const std  regex regex2 R quot some-reg-exp2 quot   std  regex  optimize   static const std  regex regex3 R quot some-reg-exp3 quot   std  regex  optimize    string str    quot some string to split quot   std  vector lt std  string gt  tokens  splitString str  regex1       Notes   this is a small improvement to this answer see also Optimization techniques used by std  regex constants  optimize

User · Answer

You can also use regex for this  std  vector lt std  string gt  split const std  string str  const std  string regex str        std  regex regexz regex str       std  vector lt std  string gt  list std  sregex token iterator str begin    str end    regexz  -1                                     std  sregex token iterator         return list     which is equivalent to   std  vector lt std  string gt  split const std  string str  const std  string regex str        std  sregex token iterator token iter str begin    str end    regexz  -1       std  sregex token iterator end      std  vector lt std  string gt  list      while  token iter    end                list emplace back  token iter               return list      and use it like this    include  lt iostream gt   include  lt string gt   include  lt regex gt   std  vector lt std  string gt  split const std  string str  const std  string regex str         a yet more concise form      return   std  sregex token iterator str begin    str end    std  regex regex str   -1   std  sregex token iterator         int main         std  string input str    quot lets split this quot       std  string regex str    quot   quot        auto tokens   split input str  regex str       for  auto amp  item  tokens                std  cout lt  lt item  lt  lt std  endl            play with it online  http   cpp sh 9sumb you can simply use substrings  characters  etc like normal  or use actual regular expressions to do the splitting  its also concise and C  11

User · Answer

Here s my take on this  It handles the edge cases and takes an optional parameter to remove empty entries from the results   bool endsWith const std  string amp  s  const std  string amp  suffix        return s size    gt   suffix size    amp  amp             s substr s size   - suffix size       suffix     std  vector lt std  string gt  split const std  string amp  s  const std  string amp  delimiter  const bool amp  removeEmptyEntries   false        std  vector lt std  string gt  tokens       for  size t start   0  end  start  lt  s length    start   end   delimiter length                   size t position   s find delimiter  start            end   position    string  npos   position   s length              std  string token   s substr start  end - start            if   removeEmptyEntries     token empty                            tokens push back token                         if   removeEmptyEntries  amp  amp           s empty      endsWith s  delimiter                  tokens push back                 return tokens      Examples  split  a-b-c    -        3   a   b   c    split  a--c    -        3   a      c    split  -b-    -        3      b       split  --c--    -        5         c          split  --c--    -   true       1   c    split  a    -        1   a    split      -        1       split      -   true       0

User · Answer

This should work perfectly for string  or single character  delimiters  Don t forget to include  include  lt sstream gt   std  string input    quot Alfa   Bravo   Charlie   Delta quot   std  string delimiter    quot     quot    std  istringstream ss input   std  string token  std  string  iterator it   while std  getline ss  token    it   delimiter begin            std  cout  lt  lt  token  lt  lt   quot   quot   lt  lt    n      Token is extracted using         while     it   ss get                 Skip the rest of delimiter if exists  quot    quot     The first while loop extracts a token using the first character of the string delimiter  The second while loop skips the rest of the delimiter and stops at the beginning of the next token

User · Answer

Answer is already there  but selected-answer uses erase function which is very costly  think of some very big string in MBs   Therefore I use below function  vector lt string gt  split const string amp  i str  const string amp  i delim        vector lt string gt  result           size t found   i str find i delim       size t startIndex   0       while found    string  npos                result push back string i str begin   startIndex  i str begin   found            startIndex   found   i delim size            found   i str find i delim  startIndex             if startIndex    i str size            result push back string i str begin   startIndex  i str end          return result

User · Answer

std  vector lt std  string gt  split const std  string amp  s  char c      std  vector lt std  string gt  v    unsigned int ii   0    unsigned int j   s find c     while  j  lt  s length          v push back s substr i  j - i        i     j      j   s find c  j       if  j  gt   s length            v push back s substr i  s length            break              return v

User · Answer

If you do not want to modify the string  as in the answer by Vincenzo Pii  and want to output the last token as well  you may want to use this approach   inline std  vector lt std  string gt  splitString  const std  string  amp s  const std  string  amp delimiter        std  vector lt std  string gt  ret      size t start   0      size t end   0      size t len   0      std  string token      do  end   s find delimiter start            len   end - start          token   s substr start  len           ret emplace back  token            start    len   delimiter length            std  cout  lt  lt  token  lt  lt  std  endl       while   end    std  string  npos        return ret

User · Answer

A very simple naive approach  vector lt string gt  words seperate string s       vector lt string gt  ans      string w  quot  quot       for auto i s           if i                   ans push back w              w  quot  quot                     else             w  i                      ans push back w       return ans     Or you can use boost library split function  vector lt string gt  result   boost  split result  input  boost  is any of  quot  t quot      Or You can try TOKEN or strtok  char str      quot DELIMIT-ME-C   quot    char  token   strtok str   quot - quot     while  token          cout lt  lt token       token   strtok NULL   quot - quot         Or You can do this  char split with      vector lt string gt  words  string token   stringstream ss our string   while getline ss   token   split with   words push back token

User · Answer

You can use next function to split string   vector lt string gt  split const string amp  str  const string amp  delim        vector lt string gt  tokens      size t prev   0  pos   0      do               pos   str find delim  prev           if  pos    string  npos  pos   str length            string token   str substr prev  pos-prev           if   token empty    tokens push back token           prev   pos   delim length              while  pos  lt  str length    amp  amp  prev  lt  str length         return tokens

User · Answer

This code splits lines from text  and add everyone into a vector   vector lt string gt  split char  phrase  string delimiter       vector lt string gt  list      string s   string phrase       size t pos   0      string token      while   pos   s find delimiter      string  npos            token   s substr 0  pos           list push back token           s erase 0  pos   delimiter length               list push back s       return list      Called by    vector lt string gt  listFilesMax   split buffer    n

[c++] Parse (split) a string in C++ using string delimiter (standard C++)

Examples related to c++

Examples related to parsing

Examples related to split

Examples related to token

Examples related to tokenize