[r] duplicate 'row.names' are not allowed error

I am trying to load a csv file that has 14 columns like this:

StartDate, var1, var2, var3, ..., var14

when I issue this command:

systems <- read.table("http://getfile.pl?test.csv", header = TRUE, sep = ",")

I get an error message.

duplicate row.names are not allowed

It seems to me that the first column name is causing the issue. When I manually download the file and remove the StartDate name from the file, R successfully reads the file and replaces the first column name with X. Can someone tell me what is going on? The file is a (comma separated) csv file.

This question is related to r csv r-faq

The answer is


The answer here (https://stackoverflow.com/a/22408965/2236315) by @adrianoesch should help (e.g., solves "If you know of a solution that does not require the awkward workaround mentioned in your comment (shift the column names, copy the data), that would be great." and "...requiring that the data be copied" proposed by @Frank).

Note that if you open in some text editor, you should see that the number of header fields less than number of columns below the header row. In my case, the data set had a "," missing at the end of the last header field.


Another possible reason for this error is that you have entire rows duplicated. If that is the case, the problem is solved by removing the duplicate rows.


This related question points out a part of the ?read.table documentation that explains your problem:

If there is a header and the first row contains one fewer field than the number of columns, the first column in the input is used for the row names. Otherwise if row.names is missing, the rows are numbered.

Your header row likely has 1 fewer column than the rest of the file and so read.table assumes that the first column is the row.names (which must all be unique), not a column (which can contain duplicated values). You can fix this by using one of the following two Solutions:

  1. adding a delimiter (ie \t or ,) to the front or end of your header row in the source file, or,
  2. removing any trailing delimiters in your data

The choice will depend on the structure of your data.

Example:
Here the header row is interpreted as having one fewer column than the data because the delimiters don't match:

v1,v2,v3   # 3 items!!
a1,a2,a3,  # 4 items
b1,b2,b3,  # 4 items

This is how it is interpreted by default:

   v1,v2,v3   # 3 items!!
a1,a2,a3,  # 4 items
b1,b2,b3,  # 4 items

The first column (with no header) values are interpreted as row.names: a1 and b1. If this column contains duplicates, which is entirely possible, then you get the duplicate 'row.names' are not allowed error.

If you set row.names = FALSE, the shift doesn't happen, but you still have a mismatching number of items in the header and in the data because the delimiters don't match.

Solution 1 Add trailing delimiter to header:

v1,v2,v3,  # 4 items!!
a1,a2,a3,  # 4 items
b1,b2,b3,  # 4 items

Solution 2 Remove excess trailing delimiter from non-header rows:

v1,v2,v3   # 3 items
a1,a2,a3   # 3 items!!
b1,b2,b3   # 3 items!!

It seems the problem can arise from more than one reasons. Following two steps worked when I was having same error.

  1. I saved my file as MS-DOS csv. ( Earlier it was saved in as just csv , excel starter 2010 ). Opened the csv in notepad++. No coma was inconsistent (consistency as described above @Brian).
  2. Noticed I was not using argument sep="," . I used and it worked ( even though that is default argument!)

In my case was a comma at the end of every line. By removing that worked


I used read_csv from the readr package

In my experience, the parameter row.names=NULL in the read.csv function will lead to a wrong reading of the file if a column name is missing, i.e. every column will be shifted.

read_csv solves this.


I had this error when opening a CSV file and one of the fields had commas embedded in it. The field had quotes around it, and I had cut and paste the read.table with quote="" in it. Once I took quote="" out, the default behavior of read.table took over and killed the problem. So I went from this:

systems <- read.table("http://getfile.pl?test.csv", header=TRUE, sep=",", quote="")

to this:

systems <- read.table("http://getfile.pl?test.csv", header=TRUE, sep=",")

Examples related to r

How to get AIC from Conway–Maxwell-Poisson regression via COM-poisson package in R? R : how to simply repeat a command? session not created: This version of ChromeDriver only supports Chrome version 74 error with ChromeDriver Chrome using Selenium How to show code but hide output in RMarkdown? remove kernel on jupyter notebook Function to calculate R2 (R-squared) in R Center Plot title in ggplot2 R ggplot2: stat_count() must not be used with a y aesthetic error in Bar graph R multiple conditions in if statement What does "The following object is masked from 'package:xxx'" mean?

Examples related to csv

Pandas: ValueError: cannot convert float NaN to integer Export result set on Dbeaver to CSV Convert txt to csv python script How to import an Excel file into SQL Server? "CSV file does not exist" for a filename with embedded quotes Save Dataframe to csv directly to s3 Python Data-frame Object has no Attribute (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape How to write to a CSV line by line? How to check encoding of a CSV file

Examples related to r-faq

What does "The following object is masked from 'package:xxx'" mean? What does "Error: object '<myvariable>' not found" mean? How do I deal with special characters like \^$.?*|+()[{ in my regex? What does %>% function mean in R? How to plot a function curve in R Use dynamic variable names in `dplyr` Error: unexpected symbol/input/string constant/numeric constant/SPECIAL in my code How should I deal with "package 'xxx' is not available (for R version x.y.z)" warning? How to select the row with the maximum value in each group R data formats: RData, Rda, Rds etc