What is a DataFrame MultiIndex in Pandas

In Pandas MultiIndex is advanced indexing techniques for DataFrames. It allows multiple levels for the indexes. It can be called also - hierarchical index or multi-level index.

In Pandas indexes are represented as a labeled axis stored as an object. They help for:

  • identify data
  • data alignment
  • get and set of subsets

Step 1: DataFrame MultiIndex in Pandas

Let's create a MultiIndex DataFrame in Pandas. The first example shows MultiIndex for the rows:

cols = pd.MultiIndex.from_tuples([(0, 1), (0, 1)])
pd.DataFrame([[1,2], [3,4]], index=cols)

which will produce DataFrame with two level index:

  • 0 is the level 0
  • 1 is the level 1

Note: In this case 0 and 1 are the same in the levels which are not mandatory.

0 1
0 1 1 2
1 3 4

The next example will create a hierarchical index for the columns. As you can see there are two levels of the column index:

  • a - which is level - 0
  • b - is considered as level 1
cols = pd.MultiIndex.from_tuples([('a', 'b'), ('a', 'b')])
pd.DataFrame([[1,2], [3,4]], columns=cols)
a
b b
0 1 2
1 3 4

Step 2: Pandas MultiIndex to single index

In this step will check how to convert a multi-level index to a single level one. This can be done by dropping levels from the MultiIndex:

df.droplevel(level=0, axis=1)

which will result in:

b b
0 1 2
1 3 4

and

df.droplevel(level=1, axis=1)

which will result in:

a a
0 1 2
1 3 4

Note 1: Rows or column can be selected by parameter - axis
Note 2: If you try to use the labelled value: 'a' or 'b' like:

df.droplevel(level='b', axis=1) you will get an error:

KeyError: 'Level b not found'

Step 3: How to create Pandas MultiIndex

There are many ways to create a DataFrame with Pandas MultiIndex. We already saw the one which uses the method: from_tuples.

Now let's check another one - from_arrays:

mi = pd.MultiIndex.from_arrays(
[[1, 2], [3, 4], [5, 6]], names=['x', 'y', 'z'])

mi.to_frame()

which will result in:

x y z
x y z
1 3 5 1 3 5
2 4 6 2 4 6

Step 4: Convert Pandas MultiIndex into column

Another useful operation for MultiIndex DataFrames is to convert levels into columns. This can be done in several ways:

  • stack/unstack
  • pivot
  • crosstabs
  • reset_index

Let's check an example for: reset_index. So let's have the next DataFrame:

cols = pd.MultiIndex.from_tuples([('a', 'b'), ('a', 'b')])
df = pd.DataFrame([[1,2], [3,4]], index=cols)
0 1
a b 1 2
b 3 4

To convert MultiIndex to columns or rows can be done by:

df.reset_index()  

which will output:

level_0 level_1 0 1
0 a b 1 2
1 a b 3 4

Resources

  1. Pandas DataFrame MultiIndex Notebook
  2. pandas.MultiIndex