How to Split Column into Multiple Columns in Pandas

1. Overview

Here are two approaches to split a column into multiple columns in Pandas:

  • list column
  • string column separated by a delimiter.

Below we can find both examples:

(1) Split column (list values) into multiple columns

pd.DataFrame(df["langs"].to_list(), columns=['prim_lang', 'sec_lang'])

(2) Split column by delimiter in Pandas

pd.DataFrame(df["skills"].str.split(',').fillna('[]').tolist())

In Pandas to split column we can use method .str.split(',') - which takes a delimiter as a parameter.

Next we will see how to apply both ways into practical examples.

2. Split List Column into Multiple Columns

For the first example we will create a simple DataFrame with 1 column which stores a list of two languages. We are going to generate 10 random lists of subset of languages:

import random

langs = ['Python', 'Java' , 'JS', 'C', 'C+']

df = pd.DataFrame({"langs": [ [random.choice(langs) for i in range(0,2)] for _ in range(10)]})

Our DataFrame looks like this:

langs
[C+, C+]
[Python, JS]
[C+, C+]
[Java, Java]
[Java, C]

In order to split this single column(which contain list values) into two columns we will use the next syntax:

pd.DataFrame(df["langs"].to_list(), columns=['prim_lang', 'sec_lang'])

The result of the split is:

prim_lang sec_lang
C+ C+
Python JS
C+ C+
Java Java
Java C

How does it work? The method df["langs"].to_list() is converting the initial column into list of lists:

[['C+', 'C+'],
 ['Python', 'JS'],
 ['C+', 'C+'],
 ['Java', 'Java'],
 ['Java', 'C'],
 ['Java', 'C+'],
 ['Python', 'C'],
 ['Python', 'C+'],
 ['C', 'Java'],
 ['C+', 'JS']]

Note: This method will work only if the stored values are lists. If you have string values separated by columns check Example 2.

3. Split Column by delimiter in Pandas

Now let's say that instead of storing lists like: ['C+', 'C+'] you have only the values separated by delimiter. In this case is a comma like 'C+', 'C+'.

Lets have data like the one below:

skills internship location
Software Testing Software Testing Noida
Java, SQL, Unix, Oracle, MS SQL Server, Hibernate (Java), Shell Scripting, Spring MVC, REST API Technical Operations - Networking And Monitoring Noida
English Proficiency (Spoken), English Proficiency (Written), Hindi Proficiency (Spoken), Hindi Proficiency (Written) Software Project Management Work From Home
HTML, CSS, Flask, Python, Django Web Development Work From Home
HTML, CSS, JavaScript, ReactJS, Redux Front End Development Work From Home

And we would like to split the column skills by delimiter into multiple columns. This time the number of elements is not fixed!

We can use Pandas string method .str.split(',') in order to split the values into lists of lists. If you have missing data you need to ensure that you default it by empty list by .fillna('[]'):

pd.DataFrame(df["skills"].str.split(',').fillna('[]').tolist())

This will create DataFrame like:

0 1 2 44 45
Software Testing None None None None
Java SQL Unix None None
English Proficiency (Spoken) English Proficiency (Written) Hindi Proficiency (Spoken) None None
HTML CSS Flask None None
HTML CSS JavaScript None None

As you can see the result DataFrame has 45 columns. Which means that one of the rows has 45 values separated by comma.

In order to find which row(s) have most values we can use syntax like - test the last column for all non null elements:

pd.DataFrame(df["skills"].str.split(',').fillna('[]').tolist())[45].dropna()

And we get as output:

1242     Gitlab

4. Conclusion and Resources

In this guide we saw how to split columns depending on the values which contain. We covered a column which contains lists and also splitting values separated by delimiter.

Below you can find some useful resources: