How to Get First Non-NaN Value Per Row in Pandas

1. Overview

To get the first or the last non-NaN value per rows in Pandas we can use the next solutions:

(1) Get First/Last Non-NaN Values per row

df.fillna(method='bfill', axis=1).iloc[:, 0]

(2) Get non-NaN values with stack() and groupby()

df.stack().groupby(level=0).first().reindex(df.index)

(3) Get the column name of first non-NaN value per row

df.apply(pd.Series.first_valid_index, axis=1)

In the next steps we will cover all the examples in detail.

2. Setup

2.1. Data

For this tutorial we are going to use the following DataFrame:

import pandas as pd
import numpy as np

details = {
    'topic_1': {'item_1': 6, 'item_2': np.NaN, 'item_3': np.NaN, 'item_4': np.NaN, 'item_5': 5},
    'topic_2': {'item_1': np.NaN, 'item_2': 8, 'item_3': 5, 'item_4': np.NaN, 'item_5': np.NaN},
    'topic_3': {'item_1': 2, 'item_2': np.NaN, 'item_3': np.NaN, 'item_4': np.NaN, 'item_5': np.NaN},
    'topic_4': {'item_1': np.NaN, 'item_2': np.NaN, 'item_3': 7, 'item_4': np.NaN, 'item_5': 6}
}

df = pd.DataFrame(details)

Data looks like:

topic_1 topic_2 topic_3 topic_4
item_1 6.0 NaN 2.0 NaN
item_2 NaN 8.0 NaN NaN
item_3 NaN 5.0 NaN 7.0
item_4 NaN NaN NaN NaN
item_5 5.0 NaN NaN 6.0

2.2. Expected Result

The expectation is to get first non-NaN values per given row:

item_1    6.0
item_2    8.0
item_3    5.0
item_4    NaN
item_5    5.0
Name: topic_1, dtype: float64

3. Get First/Last Non-NaN Values per row

The first solution to get the non-NaN values per row from a list of columns use the next steps:

  • .fillna(method='bfill', axis=1) - to fill all non-NaN values from the last to the first one; axis=1 - means columns
  • .iloc[:, 0] - get the first column

So the final code will looks like:

df.fillna(method='bfill', axis=1).iloc[:, 0]

and the result Series will have all non-null values per given row:

item_1    6.0
item_2    8.0
item_3    5.0
item_4    NaN
item_5    5.0
Name: topic_1, dtype: float64

To get the last non-NaN value per row you need to change the code to:

df.fillna(method='ffill', axis=1).iloc[:, -1]

and the result would be:

item_1    2.0
item_2    8.0
item_3    7.0
item_4    NaN
item_5    6.0
Name: topic_4, dtype: float64

Note: To work only with the needed columns we can select them as df[['topic_1', 'topic_2']]

4. Get non-NaN values with stack() and groupby()

Another option to get the first or last non-NaN values is by combination of methods stack() and groupby(). The idea is to dynamically create a single column with first non-NaN value:

df.stack().groupby(level=0).first().reindex(df.index)

result:

item_1    6.0
item_2    8.0
item_3    5.0
item_4    NaN
item_5    5.0
dtype: float64

The following algorithm explains how the code works:

  • Use method stack() in order to stack the values:

    item_1 topic_1 6.0
    topic_3 2.0
    item_2 topic_2 8.0
    item_3 topic_2 5.0
    topic_4 7.0
    item_5 topic_1 5.0
    topic_4 6.0
    dtype: float64

  • .groupby(level=0).first() - will group by the level 0 of the multi-index and return the first values. The result would be:

    item_1 6.0
    item_2 8.0
    item_3 5.0
    item_5 5.0
    dtype: float64

  • .reindex(df.index) - the final step is to reindex values on the original DataFrame in order to add all missing values:

    item_1 6.0
    item_2 8.0
    item_3 5.0
    item_4 NaN
    item_5 5.0
    dtype: float64

5. Get the column name of first non-NaN value per row

If you need the column names instead of the values we can use the following code:

df.apply(pd.Series.first_valid_index, axis=1)

which will return the column name per each row which has the first non-NaN values:

item_1    topic_1
item_2    topic_2
item_3    topic_2
item_4       None
item_5    topic_1
dtype: object

The method first_valid_index() works as follows:

Return index for first non-NA value or None, if no NA value is found.

6. Conclusion

We covered how to work with multiple columns and NaN values. Now we know how to get the first or last non-empty values per row.

We also described and combined several Pandas functions like:

Finally we show how to understand the logic behind complex Pandas code.