I'd like to clarify a few things:
pandas.Series.tolist()
. I'm not sure why the top voted answer
leads off with using pandas.Series.values.tolist()
since as far as I can tell, it adds syntax/confusion with no added benefit.tst[lookupValue][['SomeCol']]
is a dataframe (as stated in the
question), not a series (as stated in a comment to the question). This is because tst[lookupValue]
is a dataframe, and slicing it with [['SomeCol']]
asks for
a list of columns (that list that happens to have a length of 1), resulting in a dataframe being returned. If you
remove the extra set of brackets, as in
tst[lookupValue]['SomeCol']
, then you are asking for just that one
column rather than a list of columns, and thus you get a series back.pandas.Series.tolist()
, so you should
definitely skip the second set of brackets in this case. FYI, if you
ever end up with a one-column dataframe that isn't easily avoidable
like this, you can use pandas.DataFrame.squeeze()
to convert it to
a series.tst[lookupValue]['SomeCol']
is getting a subset of a particular column via
chained slicing. It slices once to get a dataframe with only certain rows
left, and then it slices again to get a certain column. You can get
away with it here since you are just reading, not writing, but
the proper way to do it is tst.loc[lookupValue, 'SomeCol']
(which returns a series).ID = tst.loc[tst['SomeCol'] == 'SomeValue', 'SomeCol'].tolist()
Demo Code:
import pandas as pd
df = pd.DataFrame({'colA':[1,2,1],
'colB':[4,5,6]})
filter_value = 1
print "df"
print df
print type(df)
rows_to_keep = df['colA'] == filter_value
print "\ndf['colA'] == filter_value"
print rows_to_keep
print type(rows_to_keep)
result = df[rows_to_keep]['colB']
print "\ndf[rows_to_keep]['colB']"
print result
print type(result)
result = df[rows_to_keep][['colB']]
print "\ndf[rows_to_keep][['colB']]"
print result
print type(result)
result = df[rows_to_keep][['colB']].squeeze()
print "\ndf[rows_to_keep][['colB']].squeeze()"
print result
print type(result)
result = df.loc[rows_to_keep, 'colB']
print "\ndf.loc[rows_to_keep, 'colB']"
print result
print type(result)
result = df.loc[df['colA'] == filter_value, 'colB']
print "\ndf.loc[df['colA'] == filter_value, 'colB']"
print result
print type(result)
ID = df.loc[rows_to_keep, 'colB'].tolist()
print "\ndf.loc[rows_to_keep, 'colB'].tolist()"
print ID
print type(ID)
ID = df.loc[df['colA'] == filter_value, 'colB'].tolist()
print "\ndf.loc[df['colA'] == filter_value, 'colB'].tolist()"
print ID
print type(ID)
Result:
df
colA colB
0 1 4
1 2 5
2 1 6
<class 'pandas.core.frame.DataFrame'>
df['colA'] == filter_value
0 True
1 False
2 True
Name: colA, dtype: bool
<class 'pandas.core.series.Series'>
df[rows_to_keep]['colB']
0 4
2 6
Name: colB, dtype: int64
<class 'pandas.core.series.Series'>
df[rows_to_keep][['colB']]
colB
0 4
2 6
<class 'pandas.core.frame.DataFrame'>
df[rows_to_keep][['colB']].squeeze()
0 4
2 6
Name: colB, dtype: int64
<class 'pandas.core.series.Series'>
df.loc[rows_to_keep, 'colB']
0 4
2 6
Name: colB, dtype: int64
<class 'pandas.core.series.Series'>
df.loc[df['colA'] == filter_value, 'colB']
0 4
2 6
Name: colB, dtype: int64
<class 'pandas.core.series.Series'>
df.loc[rows_to_keep, 'colB'].tolist()
[4, 6]
<type 'list'>
df.loc[df['colA'] == filter_value, 'colB'].tolist()
[4, 6]
<type 'list'>