In this tutorial, we'll learn how to select non null values in Pandas. You can easily find also the number of non NaN or NaN values in column or multiple columns.

In the next section we will cover all the steps in a real world example.

To learn more about the NaN values in Pandas you can check also:
How to Get First Non-NaN Value Per Row in Pandas.

Setup

Let's start with sample data and the problem for this tutorial.

DataFrame with NaN values

We have different items on each row. The columns represent different topics. One item can have zero or multiple topics

Our goal is to get the first or last topic for each item.

import pandas as pd

details = {
    'topic_1': {'item_1': 1, 'item_2': 0, 'item_3': 0, 'item_4': 0, 'item_5': 1},
    'topic_2': {'item_1': 0, 'item_2': 1, 'item_3': 1, 'item_4': 0, 'item_5': 0},
    'topic_3': {'item_1': 1, 'item_2': 0, 'item_3': 0, 'item_4': 0, 'item_5': 0},
    'topic_4': {'item_1': 0, 'item_2': 0, 'item_3': 1, 'item_4': 0, 'item_5': 0}
}

df = pd.DataFrame(details)

result:

topic_1 topic_2 topic_3 topic_4
item_1 1 0 1 0
item_2 0 1 0 0
item_3 0 1 0 1
item_4 0 0 0 0
item_5 1 0 0 0

Intro - Get Name of First Non NaN Column

Can you extract a topic for each row?

The problem is to identify the first or last topic from multiple for each item.

Expected result:

topic_1 topic_2 topic_3 topic_4 category
item_1 1 0 1 0 topic_3
item_2 0 1 0 0 topic_2
item_3 0 1 0 1 topic_4
item_4 0 0 0 0 other
item_5 1 0 0 0 topic_1

Step 1: Get Column name for First non NaN Column

Let's start by getting the first topic per row/item from a list of columns.

First we will identify a list of columns which are going to be used.

We have two options:

  • get the columns by df.columns
  • or write them explicitly - categories = reversed(['topic_1', 'topic_2', 'topic_3', 'topic_4'])

We are going to use simple algorithm to get values from the multiple columns:

  • create new column with default value of 'other'
  • iterate over all rows
  • iterate over all categories
  • map the values to the topic name
  • fill the missing values from the last values of the new column - this step is needed in order to replace non matched values which will be assigned with NaNs.

The code below is getting the first non NaN values from each column:

categories = reversed(['topic_1', 'topic_2', 'topic_3', 'topic_4'])
df['category'] = 'other'

for ix, row in df.iterrows():
    for cat in categories:
        d = {1: cat}
        df['category'] = df[cat].map(d).fillna(df['category'])

To get the last non NaN value we can change the list order by removing reversed.

Step 2: Get columns with NaN values Pandas - explanation

In this step we will describe how the main part of the code is working.

The main part has two important functions:

So the idea is to map all values from the current column and with next columns to fill the new values while keeping the ones which are set.

Conclusion

In this short article we saw how to combine multiple Pandas functions in order to achieve complex logic - get first/last column from multiple which has non NaN value.

Using the described technique you can combine multiple columns to a single one based on their values.