[c++] Parsing a comma-delimited std::string

If I have a std::string containing a comma-separated list of numbers, what's the simplest way to parse out the numbers and put them in an integer array?

I don't want to generalise this out into parsing anything else. Just a simple string of comma separated integer numbers such as "1,1,1,1,2,1,1,1,0".

This question is related to c++ string parsing stl csv

The answer is


void ExplodeString( const std::string& string, const char separator, std::list<int>& result ) {
    if( string.size() ) {
        std::string::const_iterator last = string.begin();
        for( std::string::const_iterator i=string.begin(); i!=string.end(); ++i ) {
            if( *i == separator ) {
                const std::string str(last,i);
                int id = atoi(str.c_str());
                result.push_back(id);
                last = i;
                ++ last;
            }
        }
        if( last != string.end() ) result.push_back( atoi(&*last) );
    }
}

Alternative solution using generic algorithms and Boost.Tokenizer:

struct ToInt
{
    int operator()(string const &str) { return atoi(str.c_str()); }
};

string values = "1,2,3,4,5,9,8,7,6";

vector<int> ints;
tokenizer<> tok(values);

transform(tok.begin(), tok.end(), back_inserter(ints), ToInt());

You could also use the following function.

void tokenize(const string& str, vector<string>& tokens, const string& delimiters = ",")
{
  // Skip delimiters at beginning.
  string::size_type lastPos = str.find_first_not_of(delimiters, 0);

  // Find first non-delimiter.
  string::size_type pos = str.find_first_of(delimiters, lastPos);

  while (string::npos != pos || string::npos != lastPos) {
    // Found a token, add it to the vector.
    tokens.push_back(str.substr(lastPos, pos - lastPos));

    // Skip delimiters.
    lastPos = str.find_first_not_of(delimiters, pos);

    // Find next non-delimiter.
    pos = str.find_first_of(delimiters, lastPos);
  }
}

The C++ String Toolkit Library (Strtk) has the following solution to your problem:

#include <string>
#include <deque>
#include <vector>
#include "strtk.hpp"
int main()
{ 
   std::string int_string = "1,2,3,4,5,6,7,8,9,10,11,12,13,14,15";
   std::vector<int> int_list;
   strtk::parse(int_string,",",int_list);

   std::string double_string = "123.456|789.012|345.678|901.234|567.890";
   std::deque<double> double_list;
   strtk::parse(double_string,"|",double_list);

   return 0;
}

More examples can be found Here


Yet another, rather different, approach: use a special locale that treats commas as white space:

#include <locale>
#include <vector>

struct csv_reader: std::ctype<char> {
    csv_reader(): std::ctype<char>(get_table()) {}
    static std::ctype_base::mask const* get_table() {
        static std::vector<std::ctype_base::mask> rc(table_size, std::ctype_base::mask());

        rc[','] = std::ctype_base::space;
        rc['\n'] = std::ctype_base::space;
        rc[' '] = std::ctype_base::space;
        return &rc[0];
    }
}; 

To use this, you imbue() a stream with a locale that includes this facet. Once you've done that, you can read numbers as if the commas weren't there at all. Just for example, we'll read comma-delimited numbers from input, and write then out one-per line on standard output:

#include <algorithm>
#include <iterator>
#include <iostream>

int main() {
    std::cin.imbue(std::locale(std::locale(), new csv_reader()));
    std::copy(std::istream_iterator<int>(std::cin), 
              std::istream_iterator<int>(),
              std::ostream_iterator<int>(std::cout, "\n"));
    return 0;
}

std::string input="1,1,1,1,2,1,1,1,0";
std::vector<long> output;
for(std::string::size_type p0=0,p1=input.find(',');
        p1!=std::string::npos || p0!=std::string::npos;
        (p0=(p1==std::string::npos)?p1:++p1),p1=input.find(',',p0) )
    output.push_back( strtol(input.c_str()+p0,NULL,0) );

It would be a good idea to check for conversion errors in strtol(), of course. Maybe the code may benefit from some other error checks as well.


Simple Copy/Paste function, based on the boost tokenizer.

void strToIntArray(std::string string, int* array, int array_len) {
  boost::tokenizer<> tok(string);
  int i = 0;
  for(boost::tokenizer<>::iterator beg=tok.begin(); beg!=tok.end();++beg){
    if(i < array_len)
      array[i] = atoi(beg->c_str());
    i++;
}

bool GetList (const std::string& src, std::vector<int>& res)
  {
    using boost::lexical_cast;
    using boost::bad_lexical_cast;
    bool success = true;
    typedef boost::tokenizer<boost::char_separator<char> > tokenizer;
    boost::char_separator<char> sepa(",");
    tokenizer tokens(src, sepa);
    for (tokenizer::iterator tok_iter = tokens.begin(); 
         tok_iter != tokens.end(); ++tok_iter) {
      try {
        res.push_back(lexical_cast<int>(*tok_iter));
      }
      catch (bad_lexical_cast &) {
        success = false;
      }
    }
    return success;
  }

Something less verbose, std and takes anything separated by a comma.

stringstream ss( "1,1,1,1, or something else ,1,1,1,0" );
vector<string> result;

while( ss.good() )
{
    string substr;
    getline( ss, substr, ',' );
    result.push_back( substr );
}

#include <sstream>
#include <vector>
#include <algorithm>
#include <iterator>

const char *input = ",,29870,1,abc,2,1,1,1,0";
int main()
{
    std::stringstream ss(input);
    std::vector<int> output;
    int i;
    while ( !ss.eof() )
    {
       int c =  ss.peek() ;
       if ( c < '0' || c > '9' )
       {
          ss.ignore(1);
          continue;
        }

       if (ss >> i)
       {
          output.push_back(i);
        }

    }

    std::copy(output.begin(), output.end(), std::ostream_iterator<int> (std::cout, " ") );
    return 0;
}

#include <sstream>
#include <vector>

const char *input = "1,1,1,1,2,1,1,1,0";

int main() {
    std::stringstream ss(input);
    std::vector<int> output;
    int i;
    while (ss >> i) {
        output.push_back(i);
        ss.ignore(1);
    }
}

Bad input (for instance consecutive separators) will mess this up, but you did say simple.


This is the simplest way, which I used a lot. It works for any one-character delimiter.

#include<bits/stdc++.h>
using namespace std;

int main() {
   string str;

   cin >> str;
   int temp;
   vector<int> result;
   char ch;
   stringstream ss(str);

   do
   {
       ss>>temp;
       result.push_back(temp);
   }while(ss>>ch);

   for(int i=0 ; i < result.size() ; i++)
       cout<<result[i]<<endl;

   return 0;
}

Lots of pretty terrible answers here so I'll add mine (including test program):

#include <string>
#include <iostream>
#include <cstddef>

template<typename StringFunction>
void splitString(const std::string &str, char delimiter, StringFunction f) {
  std::size_t from = 0;
  for (std::size_t i = 0; i < str.size(); ++i) {
    if (str[i] == delimiter) {
      f(str, from, i);
      from = i + 1;
    }
  }
  if (from <= str.size())
    f(str, from, str.size());
}


int main(int argc, char* argv[]) {
    if (argc != 2)
        return 1;

    splitString(argv[1], ',', [](const std::string &s, std::size_t from, std::size_t to) {
        std::cout << "`" << s.substr(from, to - from) << "`\n";
    });

    return 0;
}

Nice properties:

  • No dependencies (e.g. boost)
  • Not an insane one-liner
  • Easy to understand (I hope)
  • Handles spaces perfectly fine
  • Doesn't allocate splits if you don't want to, e.g. you can process them with a lambda as shown.
  • Doesn't add characters one at a time - should be fast.
  • If using C++17 you could change it to use a std::stringview and then it won't do any allocations and should be extremely fast.

Some design choices you may wish to change:

  • Empty entries are not ignored.
  • An empty string will call f() once.

Example inputs and outputs:

""      ->   {""}
","     ->   {"", ""}
"1,"    ->   {"1", ""}
"1"     ->   {"1"}
" "     ->   {" "}
"1, 2," ->   {"1", " 2", ""}
" ,, "  ->   {" ", "", " "}

simple structure, easily adaptable, easy maintenance.

std::string stringIn = "my,csv,,is 10233478,separated,by commas";
std::vector<std::string> commaSeparated(1);
int commaCounter = 0;
for (int i=0; i<stringIn.size(); i++) {
    if (stringIn[i] == ",") {
        commaSeparated.push_back("");
        commaCounter++;
    } else {
        commaSeparated.at(commaCounter) += stringIn[i];
    }
}

in the end you will have a vector of strings with every element in the sentence separated by spaces. empty strings are saved as separate items.


I cannot yet comment (getting started on the site) but added a more generic version of Jerry Coffin's fantastic ctype's derived class to his post.

Thanks Jerry for the super idea.

(Because it must be peer-reviewed, adding it here too temporarily)

struct SeparatorReader: std::ctype<char>
{
    template<typename T>
    SeparatorReader(const T &seps): std::ctype<char>(get_table(seps), true) {}

    template<typename T>
    std::ctype_base::mask const *get_table(const T &seps) {
        auto &&rc = new std::ctype_base::mask[std::ctype<char>::table_size]();
        for(auto &&sep: seps)
            rc[static_cast<unsigned char>(sep)] = std::ctype_base::space;
        return &rc[0];
    }
};

string exp = "token1 token2 token3";
char delimiter = ' ';
vector<string> str;
string acc = "";
for(int i = 0; i < exp.size(); i++)
{
    if(exp[i] == delimiter)
    {
        str.push_back(acc);
        acc = "";
    }
    else
        acc += exp[i];
}

I'm surprised no one has proposed a solution using std::regex yet:

#include <string>
#include <algorithm>
#include <vector>
#include <regex>

void parse_csint( const std::string& str, std::vector<int>& result ) {

    typedef std::regex_iterator<std::string::const_iterator> re_iterator;
    typedef re_iterator::value_type re_iterated;

    std::regex re("(\\d+)");

    re_iterator rit( str.begin(), str.end(), re );
    re_iterator rend;

    std::transform( rit, rend, std::back_inserter(result), 
        []( const re_iterated& it ){ return std::stoi(it[1]); } );

}

This function inserts all integers at the back of the input vector. You can tweak the regular expression to include negative integers, or floating point numbers, etc.


Examples related to c++

Method Call Chaining; returning a pointer vs a reference? How can I tell if an algorithm is efficient? Difference between opening a file in binary vs text How can compare-and-swap be used for a wait-free mutual exclusion for any shared data structure? Install Qt on Ubuntu #include errors detected in vscode Cannot open include file: 'stdio.h' - Visual Studio Community 2017 - C++ Error How to fix the error "Windows SDK version 8.1" was not found? Visual Studio 2017 errors on standard headers How do I check if a Key is pressed on C++

Examples related to string

How to split a string in two and store it in a field String method cannot be found in a main class method Kotlin - How to correctly concatenate a String Replacing a character from a certain index Remove quotes from String in Python Detect whether a Python string is a number or a letter How does String substring work in Swift How does String.Index work in Swift swift 3.0 Data to String? How to parse JSON string in Typescript

Examples related to parsing

Got a NumberFormatException while trying to parse a text file for objects Uncaught SyntaxError: Unexpected end of JSON input at JSON.parse (<anonymous>) Python/Json:Expecting property name enclosed in double quotes Correctly Parsing JSON in Swift 3 How to get response as String using retrofit without using GSON or any other library in android UIButton action in table view cell "Expected BEGIN_OBJECT but was STRING at line 1 column 1" How to convert an XML file to nice pandas dataframe? How to extract multiple JSON objects from one file? How to sum digits of an integer in java?

Examples related to stl

Why is it OK to return a 'vector' from a function? How to remove all the occurrences of a char in c++ string How to use the priority queue STL for objects? use std::fill to populate vector with increasing numbers What does iterator->second mean? How to set initial size of std::vector? Sorting a vector in descending order How do I reverse a C++ vector? Recommended way to insert elements into map Replace an element into a specific position of a vector

Examples related to csv

Pandas: ValueError: cannot convert float NaN to integer Export result set on Dbeaver to CSV Convert txt to csv python script How to import an Excel file into SQL Server? "CSV file does not exist" for a filename with embedded quotes Save Dataframe to csv directly to s3 Python Data-frame Object has no Attribute (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape How to write to a CSV line by line? How to check encoding of a CSV file