Here are two approaches to split list columns into multiple columns in Pandas DataFrame:

(1) Split column of list values into multiple columns

pd.DataFrame(df["langs"].to_list(), columns=['prim_lang', 'sec_lang'])

(2) Split column of values separated by comma into multiple columns

pd.DataFrame(df["skills"].str.split(',').fillna('[]').tolist())

Next we will see how to apply both ways into practical examples.

Example 1: Split column of list values into multiple columns

For the first example we will create a simple DataFrame with 1 column which stores a list of two languages. We are going to generate 10 random lists of subset of languages:

import random

langs = ['Python', 'Java' , 'JS', 'C', 'C+']

df = pd.DataFrame({"langs": [ [random.choice(langs) for i in range(0,2)] for _ in range(10)]})

Our DataFrame looks like this:

langs
[C+, C+]
[Python, JS]
[C+, C+]
[Java, Java]
[Java, C]

In order to split this single column into two columns we will use the next syntax:

pd.DataFrame(df["langs"].to_list(), columns=['prim_lang', 'sec_lang'])

The result of the split is:

prim_lang sec_lang
C+ C+
Python JS
C+ C+
Java Java
Java C

How does it work? The method df["langs"].to_list() is converting the initial column into list of lists:

[['C+', 'C+'],
 ['Python', 'JS'],
 ['C+', 'C+'],
 ['Java', 'Java'],
 ['Java', 'C'],
 ['Java', 'C+'],
 ['Python', 'C'],
 ['Python', 'C+'],
 ['C', 'Java'],
 ['C+', 'JS']]

Note: This method will work only if the stored values are lists. If you have string values separated by columns check Example 2.

Example 2: Split column of values separated by comma into multiple columns

Now let's say that instead of storing lists like: ['C+', 'C+'] you have only the values separated by comma like 'C+', 'C+'.

Lets have data like the one below:

skills internship location
Software Testing Software Testing Noida
Java, SQL, Unix, Oracle, MS SQL Server, Hibernate (Java), Shell Scripting, Spring MVC, REST API Technical Operations - Networking And Monitoring Noida
English Proficiency (Spoken), English Proficiency (Written), Hindi Proficiency (Spoken), Hindi Proficiency (Written) Software Project Management Work From Home
HTML, CSS, Flask, Python, Django Web Development Work From Home
HTML, CSS, JavaScript, ReactJS, Redux Front End Development Work From Home

And we would like to split the column skills into multiple columns. This time the number of elements is not fixed!

We can use Pandas string method .str.split(',') in order to split the values into lists of lists. If you have missing data you need to ensure that you default it by empty list by .fillna('[]'):

pd.DataFrame(df["skills"].str.split(',').fillna('[]').tolist())

This will create DataFrame like:

0 1 2 44 45
Software Testing None None None None
Java SQL Unix None None
English Proficiency (Spoken) English Proficiency (Written) Hindi Proficiency (Spoken) None None
HTML CSS Flask None None
HTML CSS JavaScript None None

As you can see the result DataFrame has 45 columns. Which means that one of the rows has 45 values separated by comma.

In order to find which row(s) have most values we can use syntax like - test the last column for all non null elements:

pd.DataFrame(df["skills"].str.split(',').fillna('[]').tolist())[45].dropna()

And we get as output:

1242     Gitlab

Resources