[python] Load data from txt with pandas

I am loading a txt file containig a mix of float and string data. I want to store them in an array where I can access each element. Now I am just doing

import pandas as pd

data = pd.read_csv('output_list.txt', header = None)
print data

This is the structure of the input file: 1 0 2000.0 70.2836942112 1347.28369421 /file_address.txt.

Now the data are imported as a unique column. How can I divide it, so to store different elements separately (so I can call data[i,j])? And how can I define a header?

This question is related to python io pandas

The answer is


Based on the latest changes in pandas, you can use, read_csv , read_table is deprecated:

import pandas as pd
pd.read_csv("file.txt", sep = "\t")

I usually take a look at the data first or just try to import it and do data.head(), if you see that the columns are separated with \t then you should specify sep="\t" otherwise, sep = " ".

import pandas as pd     
data = pd.read_csv('data.txt', sep=" ", header=None)

You can use it which is most helpful.

df = pd.read_csv(('data.txt'), sep="\t", skiprows=[0,1], names=['FromNode','ToNode'])

@Pietrovismara's solution is correct but I'd just like to add: rather than having a separate line to add column names, it's possible to do this from pd.read_csv.

df = pd.read_csv('output_list.txt', sep=" ", header=None, names=["a", "b", "c"])

I'd like to add to the above answers, you could directly use

df = pd.read_fwf('output_list.txt')

fwf stands for fixed width formatted lines.


You can use:

data = pd.read_csv('output_list.txt', sep=" ", header=None)
data.columns = ["a", "b", "c", "etc."]

Add sep=" " in your code, leaving a blank space between the quotes. So pandas can detect spaces between values and sort in columns. Data columns is for naming your columns.


If you want to load the txt file with specified column name, you can use the code below. It worked for me.

import pandas as pd    
data = pd.read_csv('file_name.txt', sep = "\t", names = ['column1_name','column2_name', 'column3_name'])

you can use this

import pandas as pd
dataset=pd.read_csv("filepath.txt",delimiter="\t")

You can import the text file using the read_table command as so:

import pandas as pd
df=pd.read_table('output_list.txt',header=None)

Preprocessing will need to be done after loading


If you don't have an index assigned to the data and you are not sure what the spacing is, you can use to let pandas assign an index and look for multiple spaces.

df = pd.read_csv('filename.txt', delimiter= '\s+', index_col=False)

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to io

Reading file using relative path in python project How to write to a CSV line by line? Getting "java.nio.file.AccessDeniedException" when trying to write to a folder Exception: Unexpected end of ZLIB input stream How to get File Created Date and Modified Date Printing Mongo query output to a file while in the mongo shell Load data from txt with pandas Writing File to Temp Folder How to get resources directory path programmatically ValueError : I/O operation on closed file

Examples related to pandas

xlrd.biffh.XLRDError: Excel xlsx file; not supported Pandas Merging 101 How to increase image size of pandas.DataFrame.plot in jupyter notebook? Trying to merge 2 dataframes but get ValueError Python Pandas User Warning: Sorting because non-concatenation axis is not aligned How to show all of columns name on pandas dataframe? Pandas/Python: Set value of one column based on value in another column Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Python convert object to float