[python] ValueError: Length of values does not match length of index | Pandas DataFrame.unique()

I am trying to get a new dataset, or change the value of the current dataset columns to their unique values. Here is an example of what I am trying to get :

   A B
 -----
0| 1 1
1| 2 5
2| 1 5
3| 7 9
4| 7 9
5| 8 9

Wanted Result    Not Wanted Result
       A B            A B
     -----          -----
    0| 1 1         0| 1 1
    1| 2 5         1| 2 5
    2| 7 9         2| 
    3| 8           3| 7 9
                   4|
                   5| 8

I don't really care about the index but it seems to be the problem. My code so far is pretty simple, I tried 2 approaches, 1 with a new dataFrame and one without.

#With New DataFrame
 def UniqueResults(dataframe):
    df = pd.DataFrame()
    for col in dataframe:
        S=pd.Series(dataframe[col].unique())
        df[col]=S.values
    return df

#Without new DataFrame
def UniqueResults(dataframe):
    for col in dataframe:
        dataframe[col]=dataframe[col].unique()
    return dataframe

I have the error "Length of Values does not match length of index" both times.

This question is related to python pandas dataframe

The answer is


The error comes up when you are trying to assign a list of numpy array of different length to a data frame, and it can be reproduced as follows:

A data frame of four rows:

df = pd.DataFrame({'A': [1,2,3,4]})

Now trying to assign a list/array of two elements to it:

df['B'] = [3,4]   # or df['B'] = np.array([3,4])

Both errors out:

ValueError: Length of values does not match length of index

Because the data frame has four rows but the list and array has only two elements.

Work around Solution (use with caution): convert the list/array to a pandas Series, and then when you do assignment, missing index in the Series will be filled with NaN:

df['B'] = pd.Series([3,4])

df
#   A     B
#0  1   3.0
#1  2   4.0
#2  3   NaN          # NaN because the value at index 2 and 3 doesn't exist in the Series
#3  4   NaN

For your specific problem, if you don't care about the index or the correspondence of values between columns, you can reset index for each column after dropping the duplicates:

df.apply(lambda col: col.drop_duplicates().reset_index(drop=True))

#   A     B
#0  1   1.0
#1  2   5.0
#2  7   9.0
#3  8   NaN

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to pandas

xlrd.biffh.XLRDError: Excel xlsx file; not supported Pandas Merging 101 How to increase image size of pandas.DataFrame.plot in jupyter notebook? Trying to merge 2 dataframes but get ValueError Python Pandas User Warning: Sorting because non-concatenation axis is not aligned How to show all of columns name on pandas dataframe? Pandas/Python: Set value of one column based on value in another column Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Python convert object to float

Examples related to dataframe

Trying to merge 2 dataframes but get ValueError How to show all of columns name on pandas dataframe? Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Display all dataframe columns in a Jupyter Python Notebook How to convert column with string type to int form in pyspark data frame? Display/Print one column from a DataFrame of Series in Pandas Binning column with python pandas Selection with .loc in python Set value to an entire column of a pandas dataframe