I have a pandas dataframe. I want to print the unique values of one of its columns in ascending order. This is how I am doing it:
import pandas as pd
df = pd.DataFrame({'A':[1,1,3,2,6,2,8]})
a = df['A'].unique()
print a.sort()
The problem is that I am getting a None
for the output.
Came across the question myself today. I think the reason that your code returns 'None' (exactly what I got by using the same method) is that
a.sort()
is calling the sort function to mutate the list a. In my understanding, this is a modification command. To see the result you have to use print(a).
My solution, as I tried to keep everything in pandas:
pd.Series(df['A'].unique()).sort_values()
Another way is using set data type.
Some characteristic of Sets: Sets are unordered, can include mixed data types, elements in a set cannot be repeated, are mutable.
Solving your question:
df = pd.DataFrame({'A':[1,1,3,2,6,2,8]})
sorted(set(df.A))
The answer in List type:
[1, 2, 3, 6, 8]
I would suggest using numpy's sort, as it is anyway what pandas is doing in background:
import numpy as np
np.sort(df.A.unique())
But doing all in pandas is valid as well.
You can also use the drop_duplicates() instead of unique()
df = pd.DataFrame({'A':[1,1,3,2,6,2,8]})
a = df['A'].drop_duplicates()
a.sort()
print a
sort
sorts inplace so returns nothing:
In [54]:
df = pd.DataFrame({'A':[1,1,3,2,6,2,8]})
a = df['A'].unique()
a.sort()
a
Out[54]:
array([1, 2, 3, 6, 8], dtype=int64)
So you have to call print a
again after the call to sort
.
Eg.:
In [55]:
df = pd.DataFrame({'A':[1,1,3,2,6,2,8]})
a = df['A'].unique()
a.sort()
print(a)
[1 2 3 6 8]
I prefer the oneliner:
print(sorted(df['Column Name'].unique()))
Source: Stackoverflow.com