How to Add New Column Based on List of Keywords in Pandas DataFrame

In this short guide, I'll show you the steps to extract a list of keywords matched in a Pandas DataFrame and create new column(s).

In particular, I'll show you how to return keywords from a given column.

The image below shows what is the final outcome:

To start with a simple example, let's say that you have the next Pandas DataFrame:

url title keyword
https://towardsdatascience.com/training-on-batch-how-to-split-data-effectively-3234f3918b07 Training on batch: how to split data effectively? how, data
https://uxdesign.cc/design-and-data-how-to-humanize-data-32a03079311f Design & Data: how to humanize data how, data
https://medium.com/swlh/fyre-festival-achieved-perfect-product-market-fit-and-thats-why-we-should-question-the-lean-a6a45fcb735a Fyre Festival achieved perfect product-market fit (and that's why we should question the Lean Startup and VC dogma) question
https://uxdesign.cc/using-a-sneak-attack-question-during-your-designer-interviews-b918ff600977 Using a ‘sneak attack’ question during your Designer interviews question
https://medium.com/swlh/a-question-you-may-never-have-asked-culture-fit-or-culture-add-cf65b00770cf A question you may never have asked — culture fit or culture add? question

Notebook with the code: Extract list of keywords from a column in Pandas

Step 1: Read test DataFrame from Kaggle

The DataFrame above is available from Kaggle.

If you like to learn more about how to read Kaggle as a Pandas DataFrame check this article: How to Search and Download Kaggle Dataset to Pandas DataFrame

For this article we will use next code to download and read it:

import kaggle

kaggle.api.authenticate()
kaggle.api.dataset_download_file('dorianlazar/medium-articles-dataset', file_name='medium_data.csv',  path='data/')

read it by:

import pandas as pd
df = pd.read_csv('data/medium_data.csv.zip')

Step 2: Extract list of keywords from a column to new column

At this point we will define the list of keywords which we like to extract:

keywords = ['how', 'data', 'question', 'guide']

Then we are going to perform the extraction and the addition to a new column:

df['keyword'] = df['title'].str.findall('|'.join(keywords)).apply(set).str.join(', ')

At this point the new column keyword will contain all keywords found separated by a comma.

Step 3: Extract list of keywords to multiple columns

To extract the list of keywords to different columns use the next syntax:

for keyword in keywords:
    df[keyword] = df['title'].str.contains(keyword)

This will iterate over the list of the all columns and create a new column with True or False if the word exists or not.

If you like to style your DataFrame in the same way please check: How to style boolean values by different colors in Pandas

Final result can be found on the image below: