Here are two ways to sort or change the order of columns in Pandas DataFrame.

(1) Use method reindex - custom sorts

df = df.reindex(sorted(df.columns), axis=1)

(2) Use method sort_index - sort with duplicate column names

df = df.sort_index(axis=1)

What is the difference between if need to change order of columns in DataFrame : reindex and sort_index.

The sort_index is a bit faster (depends on data and column number) and can be used with duplicate names. reindex is suitable if you need to apply custom order or sorting.

Both of them work for the two axis - rows and columns,

Suppose we have data like:

Region 1500 1600 1700 1750 1800 1850 1900
World 585 660 710 791 978 1262 1650
Africa 86 114 106 106 107 111 133
Asia 282 350 411 502 635 809 947
Europe 168 170 178 190 203 276 408
Latin America [Note 1] ​ 40 20 10 16 24 38 74

Where the full list of columns is:

Index(['Region', '1500', '1600', '1700', '1750', '1800', '1850', '1900',
   '1950', '1999', '2008', '2010', '2012', '2050', '2150'],
  dtype='object')

Data is available by:

import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/softhints/Pandas-Tutorials/master/data/population/population.csv')

Before the change of the order let's shuffle the columns and get the initial order:

import random

initial_order = df.columns.to_list()
cols = df.columns.to_list()

random.shuffle(cols)

1: Change order of columns by reindex

First example will show us how to use method reindex in order to sort the columns in alphabetical order:

df = df.reindex(sorted(df.columns), axis=1)

By df.columns we get all column names as they are stored in the DataFrame. We sort them by sorted and finally use the method reindex on columns.

A shorter code to sort the columns by name would be:

df = df[sorted(df.columns)]

Custom sort of columns with reindex

In order to change the column order in a custom way we can use method reindex. As we saw earlier we can get the list of columns and shuffle them:

cols = df.columns.to_list()

random.shuffle(cols)

Now we can apply this order to the DataFrame by:

df = df.reindex(df.columns, axis=1)

The columns of the updated DataFrame:

Index(['1500', '1900', '2000', '1900', '1850', '2000', '2000', '1900', '1700',
   '1800', '1600', '2000', '2150', '1750', 'Region'],
  dtype='object')

2: Sort columns by name by method sort_index

Alternative solution is to use the method sort_index. It doesn't support custom order but it's faster in general.

To update the column order by sort_index use this syntax:

df = df.sort_index(axis=1)

The official documentation for this method says:

Returns a new DataFrame sorted by label if inplace argument is False, otherwise updates the original DataFrame and returns None.

This method has parameter inplace - which is not the case for reindex.

3: Sort with duplicate column names

Finally let's see what will happen if we apply method reindex on DataFrame with duplicate column names. To achieve this we are going to update column names manually:

df.columns = ['1500', '1600', '1700', '1750', '1800', '1850', '1900', '1900', '1900',
       '2000', '2000', '2000', '2000', '2150', 'Region']

Method reindex is raising error:

ValueError: cannot reindex from a duplicate axis

While sort_index is working successfully

4: Shift columns in Pandas DataFrame

Finally let's see how to shift columns in Pandas DataFrame. This is possible by getting a list of columns names and updating the list of columns:

cols = df.columns.to_list()
cols = cols[-2:] + cols[:-2]

result:

['2050', '2150', 'Region', '1500', '1600', '1700', '1750', '1800', '1850', '1900', '1950', '1999', '2008', '2010', '2012']

Finally we can update the DataFrame order by:

df = df.reindex(cols, axis=1)

or by:

df = df[cols]

The df.reindex is the faster than the second solution

5: Performance comparison for reindex and sort_index

Finally lets check the performance for a pretty small DataFrame - (7, 15) between:

  • reindex - 254 µs ± 1.84 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
  • sort_index - 181 µs ± 7.34 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

The same comparison for (700000, 15):

  • reindex - 24.1 ms ± 740 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
  • sort_index - 22.7 ms ± 430 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

For (7, 1500):

  • reindex - 383 µs ± 4.93 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
  • sort_index - 826 µs ± 11.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Resources