How to Melt Pandas DataFrame - pd.melt in Examples

In this quick tutorial, we'll see how to use melt in Pandas. We'll first look into basic pd.melt usage, then pd.melt() parameters, and finally some advanced examples and alternatives of melt in Pandas and Python.

In short we can do:

(1) pd.melt() in Pandas

pd.melt(df, id_vars=['A'], value_vars=['B'])

(2) pd.melt() and MultiIndex

pd.melt(df, id_vars=[('A', 'D')], value_vars=[('B', 'E')])

Setup

In this article we will use DataFrame which has information for population of countries:

import pandas as pd

data = [{"Country":"China","1950":"562,580","1955":"607,047","1960":"651,340","1980":"987,822","continent":"Asia"},
        {"Country":"India","1950":"369,881","1955":"404,268","1960":"445,394","1980":"684,888","continent":"Asia"},
        {"Country":"United States","1950":"151,869","1955":"165,070","1960":"179,980","1980":"227,225","continent":"N. America"},
        {"Country":"Indonesia","1950":"82,979","1955":"90,255","1960":"100,146","1980":"150,322","continent":"Asia"},
        {"Country":"Russia","1950":"101,937","1955":"111,126","1960":"119,632","1980":"139,039","continent":"Europe"},
        {"Country":"Brazil","1950":"53,444","1955":"61,652","1960":"71,412","1980":"121,064","continent":"S. America"}]

pd.DataFrame(data)

Final data format is perfect example of where Pandas melt function is useful:

Country 1950 1955 1960 1980 continent
0 China 562,580 607,047 651,340 987,822 Asia
1 India 369,881 404,268 445,394 684,888 Asia
2 United States 151,869 165,070 179,980 227,225 N. America
3 Indonesia 82,979 90,255 100,146 150,322 Asia
4 Russia 101,937 111,126 119,632 139,039 Europe
5 Brazil 53,444 61,652 71,412 121,064 S. America

1: What is melt in Pandas

The picture below shows melt function in action

There are 2 important parameters of this method:

  • id_vars - identifier variables
  • value_vars - measured variables, which are "melt" or "unpivoted" to row axis (non-identifier columns)
    • value - is the column values
    • variable - the column names

So the melt function will turn multiple columns - value_vars - to rows. There will be two non-identifier columns.

There are id_vars - identifier variables which will be considered as identifier columns.

Info
In short
pd.melt()
unpivots data. It makes it easier to filter, compare, visualize and use in DB like style.

2: Why is melt useful

Pandas melt function is useful when data is in form of:

  • pivot
  • pivot table
  • cross table

And we need to convert the multiple columns to rows.

Looking at the data above we might want to find the top 5 most populated pairs - year and country. Using melt() will help us to get this information much faster.

The reverse operation of melt can be found on: Opposite of Melt in Python and Pandas

3: Pandas melt example

Let's see how to use the melt function. First we will identify the parameters:

  • id_vars=['Country', 'continent']
  • value_vars=['1950', '1955']
pd.melt(df, id_vars=['Country', 'continent'], value_vars=['1950', '1955'])

This would result into:

Country continent variable value
0 China Asia 1950 562,580
1 India Asia 1950 369,881
2 United States N. America 1950 151,869
3 Indonesia Asia 1950 82,979
4 Russia Europe 1950 101,937
5 Brazil S. America 1950 53,444
6 China Asia 1955 607,047
7 India Asia 1955 404,268
8 United States N. America 1955 165,070
9 Indonesia Asia 1955 90,255
10 Russia Europe 1955 111,126
11 Brazil S. America 1955 61,652

So the output has 4 columns:

  • 2 identifier columns
    • Country
    • continent
  • 2 non identifier columns
    • variable
    • value

If we add more value_vars columns the number of the columns will be the same. Only id_vars change the number of the output columns.

4: Pandas melt - change names

If we like to change the non-identifier columns we can use 2 parameters:

  • var_name
  • value_name

and set the new names for the result columns:

pd.melt(df, id_vars=['Country', 'continent'], value_vars=['1950', '1955'],
              var_name ='year', value_name ='value')

5: Pandas melt - parameters

There are several parameters of melt function. Most of them were mentioned in the article.

Two remains to be described:

  • col_level - specifies the level to melt (MultiIndex)
  • ignore_index - ignore the original index

6: Pandas melt - MultiIndex

Melt function can be used for MultiIndex DataFrame.

To simulate melt function for MultiIndex we will turn the columns into MultiIndex by:

df = pd.concat({'year': df}, names=['Firstlevel'], axis=1)

More about: How to Add a Level to Index in Pandas DataFrame

We can check the columns by:

df.columns

result:

MultiIndex([('year',   'Country'),
        ('year',      '1950'),
        ('year',      '1955'),
        ('year',      '1960'),
        ('year',      '1980'),
        ('year', 'continent')],
       names=['Firstlevel', None])

Melt can be invoked on MultiIndex columns by:

pd.melt(df, id_vars=[('year', 'Country')], value_vars=[('year', '1950'), ('year', '1955')])

The result is:

(year, Country) Firstlevel None value
0 China year 1950 562,580
1 India year 1950 369,881
2 United States year 1950 151,869
3 Indonesia year 1950 82,979
4 Russia year 1950 101,937
5 Brazil year 1950 53,444
6 China year 1955 607,047
7 India year 1955 404,268
8 United States year 1955 165,070
9 Indonesia year 1955 90,255
10 Russia year 1955 111,126
11 Brazil year 1955 61,652

Conclusion

Pandas melt is a very useful function for reshaping DataFrames. It can multiple columns to rows.

We saw how it works, why it's important and several examples. The function can help "unpivoting" data.