How to Sort MultiIndex in Pandas

This tutorial will show how to sort MultiIndex in Pandas.

Let's begin by showing the syntax for sorting MultiIndex:

.sort_values(by=[('Level 1', 'Level 2')], ascending=False)

In order to sort MultiIndex you need to provide all levels which will be used for the sort. Otherwise you will get error like:

ValueError: The column label 'Depth' is not unique.
For a multi-index, the label must be a tuple with elements corresponding to each level.

Step 1: Create MultiIndex DataFrame

Very often multiple aggregation function will end into MultiIndex. It's quite common to sort the MultiIndex which is result of this aggregation.

So let's have this DataFrame:

Magnitude Type Depth Magnitude
MB 100.0 5.6
MWC 10.0 5.5
MWW 21.0 6.0
MWC 35.0 5.5
MWB 45.0 5.6

For this DataFrame we would like to group by Magnitude Type and get the mean, count and sum for columns - 'Depth', 'Magnitude'.

df_multi = df.groupby(['Magnitude Type'])[['Depth', 'Magnitude']].agg(['mean', 'count', 'sum'])

This would result into:

Depth Magnitude
mean count sum mean count sum
Magnitude Type
MB 81.579365 3761 306819.990 5.682957 3761 21373.60
MD 21.670000 6 130.020 5.966667 6 35.80
MH 8.074600 5 40.373 6.540000 5 32.70
ML 14.158273 77 1090.187 5.814675 77 447.73
MS 30.142226 1702 51302.068 5.994360 1702 10202.40
MW 77.034037 7722 594856.835 5.933794 7722 45820.76
MWB 76.989829 2458 189241.000 5.907282 2458 14520.10
MWC 66.808213 5669 378735.760 5.858176 5669 33210.00
MWR 22.445385 26 583.580 5.630769 26 146.40
MWW 67.568545 1983 133988.425 6.008674 1983 11915.20

In the next step we will see how to sort the MultiIndex above.

Step 2: Find the MultiIndex levels

Let's see what is stored as MultiIndex in the DataFrame above. Since we have MultiIndex for the columns we can get the information about the levels by:

df_multi.columns

result:

MultiIndex([(    'Depth',  'mean'),
            (    'Depth', 'count'),
            (    'Depth',   'sum'),
            ('Magnitude',  'mean'),
            ('Magnitude', 'count'),
            ('Magnitude',   'sum')],
           )

To get a specific level we can do:

df_multi.columns.get_level_values(1)

result:

Index(['mean', 'count', 'sum', 'mean', 'count', 'sum'], dtype='object')

Step 3: Sort MultiIndex in Pandas

Now let's say that we would like to sort by mean which is under Depth. From previous step we saw that we need to use: [('Depth', 'mean')] for the by parameter:

df_multi.sort_values(by=[('Depth', 'mean')], ascending=False).head(60)

So now the values are sorted by the pair of Depth - mean.

Depth Magnitude
mean count sum mean count sum
Magnitude Type
MB 81.579365 3761 306819.990 5.682957 3761 21373.60
MW 77.034037 7722 594856.835 5.933794 7722 45820.76
MWB 76.989829 2458 189241.000 5.907282 2458 14520.10
MWW 67.568545 1983 133988.425 6.008674 1983 11915.20
MWC 66.808213 5669 378735.760 5.858176 5669 33210.00

Step 4: Sort MultiIndex by multiple levels

What if you like to sort MultiIndex by multiple levels? In this case you can use the next syntax:

df_multi.sort_values(by=[('Depth', 'mean'), ('Depth', 'sum')], ascending=False)

This will sort :

  • first by - ('Depth', 'mean')
  • then by - ('Depth', 'sum')

Step 5: Sort MultiIndex by the level number

Finally let's say that you prefer to use the number of the level instead of providing a tuple.

In this case you can read the level info from Step 2 and use it.

For example sorting the MultiIndex by third level will be: df_multi.columns[2] - which is equivalent to ('Depth', 'sum'):

df_multi.sort_values(by=[df_multi.columns[2]], ascending=False).head(5)

Resources