How to rearrange Pandas column sequence?

24

>>> df =DataFrame({'a':[1,2,3,4],'b':[2,4,6,8]})
>>> df['x']=df.a + df.b
>>> df['y']=df.a - df.b
>>> df
   a  b   x  y
0  1  2   3 -1
1  2  4   6 -2
2  3  6   9 -3
3  4  8  12 -4

Now I want to rearrange the column sequence, which makes 'x','y' column to be the first & second columns by :

>>> df = df[['x','y','a','b']]
>>> df
    x  y  a  b
0   3 -1  1  2
1   6 -2  2  4
2   9 -3  3  6
3  12 -4  4  8

But if I have a long coulmns 'a','b','c','d'....., and I don't want to explictly list the columns. How can I do that ?

Or Does Pandas provide a function like set_column_sequence(dataframe,col_name, seq) so that I can do : set_column_sequence(df,'x',0) and set_column_sequence(df,'y',1) ?

This question is tagged with python pandas

~ Asked on 2012-09-08 10:16:04

The Best Answer is


42

You could also do something like this:

df = df[['x', 'y', 'a', 'b']]

You can get the list of columns with:

cols = list(df.columns.values)

The output will produce something like this:

['a', 'b', 'x', 'y']

...which is then easy to rearrange manually before dropping it into the first function

~ Answered on 2014-05-19 15:32:06


11

There may be an elegant built-in function (but I haven't found it yet). You could write one:

# reorder columns
def set_column_sequence(dataframe, seq, front=True):
    '''Takes a dataframe and a subsequence of its columns,
       returns dataframe with seq as first columns if "front" is True,
       and seq as last columns if "front" is False.
    '''
    cols = seq[:] # copy so we don't mutate seq
    for x in dataframe.columns:
        if x not in cols:
            if front: #we want "seq" to be in the front
                #so append current column to the end of the list
                cols.append(x)
            else:
                #we want "seq" to be last, so insert this
                #column in the front of the new column list
                #"cols" we are building:
                cols.insert(0, x)
return dataframe[cols]

For your example: set_column_sequence(df, ['x','y']) would return the desired output.

If you want the seq at the end of the DataFrame instead simply pass in "front=False".

~ Answered on 2012-09-08 10:41:23


Most Viewed Questions: