[parsing] Import CSV file with mixed data types

I'm working with MATLAB for few days and I'm having difficulties to import a CSV-file to a matrix.

My problem is that my CSV-file contains almost only Strings and some integer values, so that csvread() doesn't work. csvread() only gets along with integer values.

How can I store my strings in some kind of a 2-dimensional array to have free access to each element?

Here's a sample CSV for my needs:

04;abc;def;ghj;klm;;;;;
;;;;;Test;text;0xFF;;
;;;;;asdfhsdf;dsafdsag;0x0F0F;;

The main thing are the empty cells and the texts within the cells. As you see, the structure may vary.

This question is related to parsing matlab file-io csv import

The answer is


Given the sample you posted, this simple code should do the job:

fid = fopen('file.csv','r');
C = textscan(fid, repmat('%s',1,10), 'delimiter',';', 'CollectOutput',true);
C = C{1};
fclose(fid);

Then you could format the columns according to their type. For example if the first column is all integers, we can format it as such:

C(:,1) = num2cell( str2double(C(:,1)) )

Similarly, if you wish to convert the 8th column from hex to decimals, you can use HEX2DEC:

C(:,8) = cellfun(@hex2dec, strrep(C(:,8),'0x',''), 'UniformOutput',false);

The resulting cell array looks as follows:

C = 
    [  4]    'abc'    'def'    'ghj'    'klm'    ''            ''                []    ''    ''
    [NaN]    ''       ''       ''       ''       'Test'        'text'        [ 255]    ''    ''
    [NaN]    ''       ''       ''       ''       'asdfhsdf'    'dsafdsag'    [3855]    ''    ''

Depending on the format of your file, importdata might work.

You can store Strings in a cell array. Type "doc cell" for more information.


Use xlsread, it works just as well on .csv files as it does on .xls files. Specify that you want three outputs:

[num char raw] = xlsread('your_filename.csv')

and it will give you an array containing only the numeric data (num), an array containing only the character data (char) and an array that contains all data types in the same format as the .csv layout (raw).


% Assuming that the dataset is ";"-delimited and each line ends with ";"
fid = fopen('sampledata.csv');
tline = fgetl(fid);
u=sprintf('%c',tline); c=length(u);
id=findstr(u,';'); n=length(id);
data=cell(1,n);
for I=1:n
    if I==1
        data{1,I}=u(1:id(I)-1);
    else
        data{1,I}=u(id(I-1)+1:id(I)-1);
    end
end
ct=1;
while ischar(tline)
    ct=ct+1;
    tline = fgetl(fid);
    u=sprintf('%c',tline);
    id=findstr(u,';');
    if~isempty(id)
        for I=1:n
            if I==1
                data{ct,I}=u(1:id(I)-1);
            else
                data{ct,I}=u(id(I-1)+1:id(I)-1);
            end
        end
    end
end
fclose(fid);

If your input file has a fixed amount of columns separated by commas and you know in which columns are the strings it might be best to use the function

textscan()

Note that you can specify a format where you read up to a maximum number of characters in the string or until a delimiter (comma) is found.


In R2013b or later you can use a table:

>> table = readtable('myfile.txt','Delimiter',';','ReadVariableNames',false)
>> table = 

    Var1    Var2     Var3     Var4     Var5        Var6          Var7         Var8      Var9    Var10
    ____    _____    _____    _____    _____    __________    __________    ________    ____    _____

      4     'abc'    'def'    'ghj'    'klm'    ''            ''            ''          NaN     NaN  
    NaN     ''       ''       ''       ''       'Test'        'text'        '0xFF'      NaN     NaN  
    NaN     ''       ''       ''       ''       'asdfhsdf'    'dsafdsag'    '0x0F0F'    NaN     NaN  

Here is more info.


Have you tried to use the "CSVIMPORT" function found in the file exchange? I haven't tried it myself, but it claims to handle all combinations of text and numbers.

http://www.mathworks.com/matlabcentral/fileexchange/23573-csvimport


I recommend looking at the dataset array.

The dataset array is a data type that ships with Statistics Toolbox. It is specifically designed to store hetrogeneous data in a single container.

The Statistics Toolbox demo page contains a couple vidoes that show some of the dataset array features. The first is titled "An Introduction to Dataset Arrays". The second is titled "An Introduction to Joins".

http://www.mathworks.com/products/statistics/demos.html


Examples related to parsing

Got a NumberFormatException while trying to parse a text file for objects Uncaught SyntaxError: Unexpected end of JSON input at JSON.parse (<anonymous>) Python/Json:Expecting property name enclosed in double quotes Correctly Parsing JSON in Swift 3 How to get response as String using retrofit without using GSON or any other library in android UIButton action in table view cell "Expected BEGIN_OBJECT but was STRING at line 1 column 1" How to convert an XML file to nice pandas dataframe? How to extract multiple JSON objects from one file? How to sum digits of an integer in java?

Examples related to matlab

how to open .mat file without using MATLAB? SQL server stored procedure return a table Python equivalent to 'hold on' in Matlab Octave/Matlab: Adding new elements to a vector How can I make a "color map" plot in matlab? How to display (print) vector in Matlab? Correlation between two vectors? How to plot a 2D FFT in Matlab? How can I find the maximum value and its index in array in MATLAB? How to save a figure in MATLAB from the command line?

Examples related to file-io

Python, Pandas : write content of DataFrame into text File Saving response from Requests to file How to while loop until the end of a file in Python without checking for empty line? Getting "java.nio.file.AccessDeniedException" when trying to write to a folder How do I add a resources folder to my Java project in Eclipse Read and write a String from text file Python Pandas: How to read only first n rows of CSV files in? Open files in 'rt' and 'wt' modes How to write to a file without overwriting current contents? Write objects into file with Node.js

Examples related to csv

Pandas: ValueError: cannot convert float NaN to integer Export result set on Dbeaver to CSV Convert txt to csv python script How to import an Excel file into SQL Server? "CSV file does not exist" for a filename with embedded quotes Save Dataframe to csv directly to s3 Python Data-frame Object has no Attribute (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape How to write to a CSV line by line? How to check encoding of a CSV file

Examples related to import

Import functions from another js file. Javascript The difference between "require(x)" and "import x" pytest cannot import module while python can How to import an Excel file into SQL Server? When should I use curly braces for ES6 import? How to import a JSON file in ECMAScript 6? Python: Importing urllib.quote importing external ".txt" file in python beyond top level package error in relative import Reading tab-delimited file with Pandas - works on Windows, but not on Mac