You can try the Boost Tokenizer library, in particular the Escaped List Separator
You can try the Boost Tokenizer library, in particular the Escaped List Separator
I've worked with a lot of CSV files in my time. I'd like to add the advice:
1 - Depending on the source (Excel, etc), commas or tabs may be embedded in a field. Usually, the rule is that they will be 'protected' because the field will be double-quote delimited, as in "Boston, MA 02346".
2 - Some sources will not double-quote delimit all text fields. Other sources will. Others will delimit all fields, even numerics.
3 - Fields containing double-quotes usually get the embedded double quotes doubled up (and the field itself delimited with double quotes, as in "George ""Babe"" Ruth".
4 - Some sources will embed CR/LFs (Excel is one of these!). Sometimes it'll be just a CR. The field will usually be double-quote delimited, but this situation is very difficult to handle.
This is a good exercise for yourself to work on :)
You should break your library into three parts
So you are looking at writing a CSVDocument class that contains:
So that you may use your library like this:
CSVDocument doc;
doc.Load("file.csv");
CSVDocumentBody* body = doc.GetBody();
CSVDocumentRow* header = body->GetRow(0);
for (int i = 0; i < header->GetFieldCount(); i++)
{
CSVDocumentField* col = header->GetField(i);
cout << col->GetText() << "\t";
}
for (int i = 1; i < body->GetRowCount(); i++) // i = 1 so we skip the header
{
CSVDocumentRow* row = body->GetRow(i);
for (int p = 0; p < row->GetFieldCount(); p++)
{
cout << row->GetField(p)->GetText() << "\t";
}
cout << "\n";
}
body->GetRecord(10)->SetText("hello world");
CSVDocumentRow* lastRow = body->AddRow();
lastRow->AddField()->SetText("Hey there");
lastRow->AddField()->SetText("Hey there column 2");
doc->Save("file.csv");
Which gives us the following interfaces:
class CSVDocument
{
public:
void Load(const char* file);
void Save(const char* file);
CSVDocumentBody* GetBody();
};
class CSVDocumentBody
{
public:
int GetRowCount();
CSVDocumentRow* GetRow(int index);
CSVDocumentRow* AddRow();
};
class CSVDocumentRow
{
public:
int GetFieldCount();
CSVDocumentField* GetField(int index);
CSVDocumentField* AddField(int index);
};
class CSVDocumentField
{
public:
const char* GetText();
void GetText(const char* text);
};
Now you just have to fill in the blanks from here :)
Believe me when I say this - investing your time into learning how to make libraries, especially those dealing with the loading, manipulation and saving of data, will not only remove your dependence on the existence of such libraries but will also make you an all-around better programmer.
:)
EDIT
I don't know how much you already know about string manipulation and parsing; so if you get stuck I would be happy to help.
You can try the Boost Tokenizer library, in particular the Escaped List Separator
Here is some code you can use. The data from the csv is stored inside an array of rows. Each row is an array of strings. Hope this helps.
#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
#include <vector>
typedef std::string String;
typedef std::vector<String> CSVRow;
typedef CSVRow::const_iterator CSVRowCI;
typedef std::vector<CSVRow> CSVDatabase;
typedef CSVDatabase::const_iterator CSVDatabaseCI;
void readCSV(std::istream &input, CSVDatabase &db);
void display(const CSVRow&);
void display(const CSVDatabase&);
int main(){
std::fstream file("file.csv", std::ios::in);
if(!file.is_open()){
std::cout << "File not found!\n";
return 1;
}
CSVDatabase db;
readCSV(file, db);
display(db);
}
void readCSV(std::istream &input, CSVDatabase &db){
String csvLine;
// read every line from the stream
while( std::getline(input, csvLine) ){
std::istringstream csvStream(csvLine);
CSVRow csvRow;
String csvCol;
// read every element from the line that is seperated by commas
// and put it into the vector or strings
while( std::getline(csvStream, csvCol, ',') )
csvRow.push_back(csvCol);
db.push_back(csvRow);
}
}
void display(const CSVRow& row){
if(!row.size())
return;
CSVRowCI i=row.begin();
std::cout<<*(i++);
for(;i != row.end();++i)
std::cout<<','<<*i;
}
void display(const CSVDatabase& db){
if(!db.size())
return;
CSVDatabaseCI i=db.begin();
for(; i != db.end(); ++i){
display(*i);
std::cout<<std::endl;
}
}
Look at 'The Practice of Programming' (TPOP) by Kernighan & Pike. It includes an example of parsing CSV files in both C and C++. But it would be worth reading the book even if you don't use the code.
(Previous URL: http://cm.bell-labs.com/cm/cs/tpop/)
Using boost tokenizer to parse records, see here for more details.
ifstream in(data.c_str());
if (!in.is_open()) return 1;
typedef tokenizer< escaped_list_separator<char> > Tokenizer;
vector< string > vec;
string line;
while (getline(in,line))
{
Tokenizer tok(line);
vec.assign(tok.begin(),tok.end());
/// do something with the record
if (vec.size() < 3) continue;
copy(vec.begin(), vec.end(),
ostream_iterator<string>(cout, "|"));
cout << "\n----------------------" << endl;
}
Look at 'The Practice of Programming' (TPOP) by Kernighan & Pike. It includes an example of parsing CSV files in both C and C++. But it would be worth reading the book even if you don't use the code.
(Previous URL: http://cm.bell-labs.com/cm/cs/tpop/)
More information would be useful.
But the simplest form:
#include <iostream>
#include <sstream>
#include <fstream>
#include <string>
int main()
{
std::ifstream data("plop.csv");
std::string line;
while(std::getline(data,line))
{
std::stringstream lineStream(line);
std::string cell;
while(std::getline(lineStream,cell,','))
{
// You have a cell!!!!
}
}
}
Also see this question: CSV parser in C++
This is a good exercise for yourself to work on :)
You should break your library into three parts
So you are looking at writing a CSVDocument class that contains:
So that you may use your library like this:
CSVDocument doc;
doc.Load("file.csv");
CSVDocumentBody* body = doc.GetBody();
CSVDocumentRow* header = body->GetRow(0);
for (int i = 0; i < header->GetFieldCount(); i++)
{
CSVDocumentField* col = header->GetField(i);
cout << col->GetText() << "\t";
}
for (int i = 1; i < body->GetRowCount(); i++) // i = 1 so we skip the header
{
CSVDocumentRow* row = body->GetRow(i);
for (int p = 0; p < row->GetFieldCount(); p++)
{
cout << row->GetField(p)->GetText() << "\t";
}
cout << "\n";
}
body->GetRecord(10)->SetText("hello world");
CSVDocumentRow* lastRow = body->AddRow();
lastRow->AddField()->SetText("Hey there");
lastRow->AddField()->SetText("Hey there column 2");
doc->Save("file.csv");
Which gives us the following interfaces:
class CSVDocument
{
public:
void Load(const char* file);
void Save(const char* file);
CSVDocumentBody* GetBody();
};
class CSVDocumentBody
{
public:
int GetRowCount();
CSVDocumentRow* GetRow(int index);
CSVDocumentRow* AddRow();
};
class CSVDocumentRow
{
public:
int GetFieldCount();
CSVDocumentField* GetField(int index);
CSVDocumentField* AddField(int index);
};
class CSVDocumentField
{
public:
const char* GetText();
void GetText(const char* text);
};
Now you just have to fill in the blanks from here :)
Believe me when I say this - investing your time into learning how to make libraries, especially those dealing with the loading, manipulation and saving of data, will not only remove your dependence on the existence of such libraries but will also make you an all-around better programmer.
:)
EDIT
I don't know how much you already know about string manipulation and parsing; so if you get stuck I would be happy to help.
I found this interesting approach:
Quote: CSVtoC is a program that takes a CSV or comma-separated values file as input and dumps it as a C structure.
Naturally, you can't make changes to the CSV file, but if you just need in-memory read-only access to the data, it could work.
Here is some code you can use. The data from the csv is stored inside an array of rows. Each row is an array of strings. Hope this helps.
#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
#include <vector>
typedef std::string String;
typedef std::vector<String> CSVRow;
typedef CSVRow::const_iterator CSVRowCI;
typedef std::vector<CSVRow> CSVDatabase;
typedef CSVDatabase::const_iterator CSVDatabaseCI;
void readCSV(std::istream &input, CSVDatabase &db);
void display(const CSVRow&);
void display(const CSVDatabase&);
int main(){
std::fstream file("file.csv", std::ios::in);
if(!file.is_open()){
std::cout << "File not found!\n";
return 1;
}
CSVDatabase db;
readCSV(file, db);
display(db);
}
void readCSV(std::istream &input, CSVDatabase &db){
String csvLine;
// read every line from the stream
while( std::getline(input, csvLine) ){
std::istringstream csvStream(csvLine);
CSVRow csvRow;
String csvCol;
// read every element from the line that is seperated by commas
// and put it into the vector or strings
while( std::getline(csvStream, csvCol, ',') )
csvRow.push_back(csvCol);
db.push_back(csvRow);
}
}
void display(const CSVRow& row){
if(!row.size())
return;
CSVRowCI i=row.begin();
std::cout<<*(i++);
for(;i != row.end();++i)
std::cout<<','<<*i;
}
void display(const CSVDatabase& db){
if(!db.size())
return;
CSVDatabaseCI i=db.begin();
for(; i != db.end(); ++i){
display(*i);
std::cout<<std::endl;
}
}
I've worked with a lot of CSV files in my time. I'd like to add the advice:
1 - Depending on the source (Excel, etc), commas or tabs may be embedded in a field. Usually, the rule is that they will be 'protected' because the field will be double-quote delimited, as in "Boston, MA 02346".
2 - Some sources will not double-quote delimit all text fields. Other sources will. Others will delimit all fields, even numerics.
3 - Fields containing double-quotes usually get the embedded double quotes doubled up (and the field itself delimited with double quotes, as in "George ""Babe"" Ruth".
4 - Some sources will embed CR/LFs (Excel is one of these!). Sometimes it'll be just a CR. The field will usually be double-quote delimited, but this situation is very difficult to handle.
This is a good exercise for yourself to work on :)
You should break your library into three parts
So you are looking at writing a CSVDocument class that contains:
So that you may use your library like this:
CSVDocument doc;
doc.Load("file.csv");
CSVDocumentBody* body = doc.GetBody();
CSVDocumentRow* header = body->GetRow(0);
for (int i = 0; i < header->GetFieldCount(); i++)
{
CSVDocumentField* col = header->GetField(i);
cout << col->GetText() << "\t";
}
for (int i = 1; i < body->GetRowCount(); i++) // i = 1 so we skip the header
{
CSVDocumentRow* row = body->GetRow(i);
for (int p = 0; p < row->GetFieldCount(); p++)
{
cout << row->GetField(p)->GetText() << "\t";
}
cout << "\n";
}
body->GetRecord(10)->SetText("hello world");
CSVDocumentRow* lastRow = body->AddRow();
lastRow->AddField()->SetText("Hey there");
lastRow->AddField()->SetText("Hey there column 2");
doc->Save("file.csv");
Which gives us the following interfaces:
class CSVDocument
{
public:
void Load(const char* file);
void Save(const char* file);
CSVDocumentBody* GetBody();
};
class CSVDocumentBody
{
public:
int GetRowCount();
CSVDocumentRow* GetRow(int index);
CSVDocumentRow* AddRow();
};
class CSVDocumentRow
{
public:
int GetFieldCount();
CSVDocumentField* GetField(int index);
CSVDocumentField* AddField(int index);
};
class CSVDocumentField
{
public:
const char* GetText();
void GetText(const char* text);
};
Now you just have to fill in the blanks from here :)
Believe me when I say this - investing your time into learning how to make libraries, especially those dealing with the loading, manipulation and saving of data, will not only remove your dependence on the existence of such libraries but will also make you an all-around better programmer.
:)
EDIT
I don't know how much you already know about string manipulation and parsing; so if you get stuck I would be happy to help.
I found this interesting approach:
Quote: CSVtoC is a program that takes a CSV or comma-separated values file as input and dumps it as a C structure.
Naturally, you can't make changes to the CSV file, but if you just need in-memory read-only access to the data, it could work.
More information would be useful.
But the simplest form:
#include <iostream>
#include <sstream>
#include <fstream>
#include <string>
int main()
{
std::ifstream data("plop.csv");
std::string line;
while(std::getline(data,line))
{
std::stringstream lineStream(line);
std::string cell;
while(std::getline(lineStream,cell,','))
{
// You have a cell!!!!
}
}
}
Also see this question: CSV parser in C++
I've worked with a lot of CSV files in my time. I'd like to add the advice:
1 - Depending on the source (Excel, etc), commas or tabs may be embedded in a field. Usually, the rule is that they will be 'protected' because the field will be double-quote delimited, as in "Boston, MA 02346".
2 - Some sources will not double-quote delimit all text fields. Other sources will. Others will delimit all fields, even numerics.
3 - Fields containing double-quotes usually get the embedded double quotes doubled up (and the field itself delimited with double quotes, as in "George ""Babe"" Ruth".
4 - Some sources will embed CR/LFs (Excel is one of these!). Sometimes it'll be just a CR. The field will usually be double-quote delimited, but this situation is very difficult to handle.
Look at 'The Practice of Programming' (TPOP) by Kernighan & Pike. It includes an example of parsing CSV files in both C and C++. But it would be worth reading the book even if you don't use the code.
(Previous URL: http://cm.bell-labs.com/cm/cs/tpop/)
I've worked with a lot of CSV files in my time. I'd like to add the advice:
1 - Depending on the source (Excel, etc), commas or tabs may be embedded in a field. Usually, the rule is that they will be 'protected' because the field will be double-quote delimited, as in "Boston, MA 02346".
2 - Some sources will not double-quote delimit all text fields. Other sources will. Others will delimit all fields, even numerics.
3 - Fields containing double-quotes usually get the embedded double quotes doubled up (and the field itself delimited with double quotes, as in "George ""Babe"" Ruth".
4 - Some sources will embed CR/LFs (Excel is one of these!). Sometimes it'll be just a CR. The field will usually be double-quote delimited, but this situation is very difficult to handle.
This is a good exercise for yourself to work on :)
You should break your library into three parts
So you are looking at writing a CSVDocument class that contains:
So that you may use your library like this:
CSVDocument doc;
doc.Load("file.csv");
CSVDocumentBody* body = doc.GetBody();
CSVDocumentRow* header = body->GetRow(0);
for (int i = 0; i < header->GetFieldCount(); i++)
{
CSVDocumentField* col = header->GetField(i);
cout << col->GetText() << "\t";
}
for (int i = 1; i < body->GetRowCount(); i++) // i = 1 so we skip the header
{
CSVDocumentRow* row = body->GetRow(i);
for (int p = 0; p < row->GetFieldCount(); p++)
{
cout << row->GetField(p)->GetText() << "\t";
}
cout << "\n";
}
body->GetRecord(10)->SetText("hello world");
CSVDocumentRow* lastRow = body->AddRow();
lastRow->AddField()->SetText("Hey there");
lastRow->AddField()->SetText("Hey there column 2");
doc->Save("file.csv");
Which gives us the following interfaces:
class CSVDocument
{
public:
void Load(const char* file);
void Save(const char* file);
CSVDocumentBody* GetBody();
};
class CSVDocumentBody
{
public:
int GetRowCount();
CSVDocumentRow* GetRow(int index);
CSVDocumentRow* AddRow();
};
class CSVDocumentRow
{
public:
int GetFieldCount();
CSVDocumentField* GetField(int index);
CSVDocumentField* AddField(int index);
};
class CSVDocumentField
{
public:
const char* GetText();
void GetText(const char* text);
};
Now you just have to fill in the blanks from here :)
Believe me when I say this - investing your time into learning how to make libraries, especially those dealing with the loading, manipulation and saving of data, will not only remove your dependence on the existence of such libraries but will also make you an all-around better programmer.
:)
EDIT
I don't know how much you already know about string manipulation and parsing; so if you get stuck I would be happy to help.
Source: Stackoverflow.com