This tutorial will show how to sort MultiIndex in Pandas.
Let's begin by showing the syntax for sorting MultiIndex:
.sort_values(by=[('Level 1', 'Level 2')], ascending=False)
In order to sort MultiIndex you need to provide all levels which will be used for the sort. Otherwise you will get error like:
ValueError: The column label 'Depth' is not unique.
For a multi-index, the label must be a tuple with elements corresponding to each level.
Step 1: Create MultiIndex DataFrame
Very often multiple aggregation function will end into MultiIndex. It's quite common to sort the MultiIndex which is result of this aggregation.
So let's have this DataFrame:
Magnitude Type | Depth | Magnitude |
---|---|---|
MB | 100.0 | 5.6 |
MWC | 10.0 | 5.5 |
MWW | 21.0 | 6.0 |
MWC | 35.0 | 5.5 |
MWB | 45.0 | 5.6 |
For this DataFrame we would like to group by Magnitude Type
and get the mean, count and sum for columns - 'Depth', 'Magnitude'
.
df_multi = df.groupby(['Magnitude Type'])[['Depth', 'Magnitude']].agg(['mean', 'count', 'sum'])
This would result into:
Depth | Magnitude | |||||
---|---|---|---|---|---|---|
mean | count | sum | mean | count | sum | |
Magnitude Type | ||||||
MB | 81.579365 | 3761 | 306819.990 | 5.682957 | 3761 | 21373.60 |
MD | 21.670000 | 6 | 130.020 | 5.966667 | 6 | 35.80 |
MH | 8.074600 | 5 | 40.373 | 6.540000 | 5 | 32.70 |
ML | 14.158273 | 77 | 1090.187 | 5.814675 | 77 | 447.73 |
MS | 30.142226 | 1702 | 51302.068 | 5.994360 | 1702 | 10202.40 |
MW | 77.034037 | 7722 | 594856.835 | 5.933794 | 7722 | 45820.76 |
MWB | 76.989829 | 2458 | 189241.000 | 5.907282 | 2458 | 14520.10 |
MWC | 66.808213 | 5669 | 378735.760 | 5.858176 | 5669 | 33210.00 |
MWR | 22.445385 | 26 | 583.580 | 5.630769 | 26 | 146.40 |
MWW | 67.568545 | 1983 | 133988.425 | 6.008674 | 1983 | 11915.20 |
In the next step we will see how to sort the MultiIndex above.
Step 2: Find the MultiIndex levels
Let's see what is stored as MultiIndex in the DataFrame above. Since we have MultiIndex for the columns we can get the information about the levels by:
df_multi.columns
result:
MultiIndex([( 'Depth', 'mean'),
( 'Depth', 'count'),
( 'Depth', 'sum'),
('Magnitude', 'mean'),
('Magnitude', 'count'),
('Magnitude', 'sum')],
)
To get a specific level we can do:
df_multi.columns.get_level_values(1)
result:
Index(['mean', 'count', 'sum', 'mean', 'count', 'sum'], dtype='object')
Step 3: Sort MultiIndex in Pandas
Now let's say that we would like to sort by mean
which is under Depth
. From previous step we saw that we need to use: [('Depth', 'mean')]
for the by
parameter:
df_multi.sort_values(by=[('Depth', 'mean')], ascending=False).head(60)
So now the values are sorted by the pair of Depth - mean
.
Depth | Magnitude | |||||
---|---|---|---|---|---|---|
mean | count | sum | mean | count | sum | |
Magnitude Type | ||||||
MB | 81.579365 | 3761 | 306819.990 | 5.682957 | 3761 | 21373.60 |
MW | 77.034037 | 7722 | 594856.835 | 5.933794 | 7722 | 45820.76 |
MWB | 76.989829 | 2458 | 189241.000 | 5.907282 | 2458 | 14520.10 |
MWW | 67.568545 | 1983 | 133988.425 | 6.008674 | 1983 | 11915.20 |
MWC | 66.808213 | 5669 | 378735.760 | 5.858176 | 5669 | 33210.00 |
Step 4: Sort MultiIndex by multiple levels
What if you like to sort MultiIndex by multiple levels? In this case you can use the next syntax:
df_multi.sort_values(by=[('Depth', 'mean'), ('Depth', 'sum')], ascending=False)
This will sort :
- first by - ('Depth', 'mean')
- then by - ('Depth', 'sum')
Step 5: Sort MultiIndex by the level number
Finally let's say that you prefer to use the number of the level instead of providing a tuple.
In this case you can read the level info from Step 2 and use it.
For example sorting the MultiIndex by third level will be: df_multi.columns[2]
- which is equivalent to ('Depth', 'sum')
:
df_multi.sort_values(by=[df_multi.columns[2]], ascending=False).head(5)