1. Overview
To get the first or the last non-NaN value per rows in Pandas we can use the next solutions:
(1) Get First/Last Non-NaN Values per row
df.fillna(method='bfill', axis=1).iloc[:, 0]
(2) Get non-NaN values with stack() and groupby()
df.stack().groupby(level=0).first().reindex(df.index)
(3) Get the column name of first non-NaN value per row
- If you like to learn more about this problem please check: Pandas Combine Multiple Columns Into a Single in Pandas
df.apply(pd.Series.first_valid_index, axis=1)
In the next steps we will cover all the examples in detail.
2. Setup
2.1. Data
For this tutorial we are going to use the following DataFrame:
import pandas as pd
import numpy as np
details = {
'topic_1': {'item_1': 6, 'item_2': np.NaN, 'item_3': np.NaN, 'item_4': np.NaN, 'item_5': 5},
'topic_2': {'item_1': np.NaN, 'item_2': 8, 'item_3': 5, 'item_4': np.NaN, 'item_5': np.NaN},
'topic_3': {'item_1': 2, 'item_2': np.NaN, 'item_3': np.NaN, 'item_4': np.NaN, 'item_5': np.NaN},
'topic_4': {'item_1': np.NaN, 'item_2': np.NaN, 'item_3': 7, 'item_4': np.NaN, 'item_5': 6}
}
df = pd.DataFrame(details)
Data looks like:
topic_1 | topic_2 | topic_3 | topic_4 | |
---|---|---|---|---|
item_1 | 6.0 | NaN | 2.0 | NaN |
item_2 | NaN | 8.0 | NaN | NaN |
item_3 | NaN | 5.0 | NaN | 7.0 |
item_4 | NaN | NaN | NaN | NaN |
item_5 | 5.0 | NaN | NaN | 6.0 |
2.2. Expected Result
The expectation is to get first non-NaN values per given row:
item_1 6.0
item_2 8.0
item_3 5.0
item_4 NaN
item_5 5.0
Name: topic_1, dtype: float64
3. Get First/Last Non-NaN Values per row
The first solution to get the non-NaN values per row from a list of columns use the next steps:
.fillna(method='bfill', axis=1)
- to fill all non-NaN values from the last to the first one;axis=1
- means columns.iloc[:, 0]
- get the first column
So the final code will looks like:
df.fillna(method='bfill', axis=1).iloc[:, 0]
and the result Series will have all non-null values per given row:
item_1 6.0
item_2 8.0
item_3 5.0
item_4 NaN
item_5 5.0
Name: topic_1, dtype: float64
To get the last non-NaN value per row you need to change the code to:
df.fillna(method='ffill', axis=1).iloc[:, -1]
and the result would be:
item_1 2.0
item_2 8.0
item_3 7.0
item_4 NaN
item_5 6.0
Name: topic_4, dtype: float64
Note: To work only with the needed columns we can select them as df[['topic_1', 'topic_2']]
4. Get non-NaN values with stack() and groupby()
Another option to get the first or last non-NaN values is by combination of methods stack()
and groupby()
. The idea is to dynamically create a single column with first non-NaN value:
df.stack().groupby(level=0).first().reindex(df.index)
result:
item_1 6.0
item_2 8.0
item_3 5.0
item_4 NaN
item_5 5.0
dtype: float64
The following algorithm explains how the code works:
-
Use method
stack()
in order to stack the values:item_1 topic_1 6.0
topic_3 2.0
item_2 topic_2 8.0
item_3 topic_2 5.0
topic_4 7.0
item_5 topic_1 5.0
topic_4 6.0
dtype: float64 -
.groupby(level=0).first()
- will group by the level 0 of the multi-index and return the first values. The result would be:item_1 6.0
item_2 8.0
item_3 5.0
item_5 5.0
dtype: float64 -
.reindex(df.index)
- the final step is to reindex values on the original DataFrame in order to add all missing values:item_1 6.0
item_2 8.0
item_3 5.0
item_4 NaN
item_5 5.0
dtype: float64
5. Get the column name of first non-NaN value per row
If you need the column names instead of the values we can use the following code:
df.apply(pd.Series.first_valid_index, axis=1)
which will return the column name per each row which has the first non-NaN values:
item_1 topic_1
item_2 topic_2
item_3 topic_2
item_4 None
item_5 topic_1
dtype: object
The method first_valid_index()
works as follows:
Return index for first non-NA value or None, if no NA value is found.
6. Conclusion
We covered how to work with multiple columns and NaN values. Now we know how to get the first or last non-empty values per row.
We also described and combined several Pandas functions like:
- pandas.DataFrame.stack
- pandas.DataFrame.fillna
- pandas.DataFrame.groupby
- pandas.DataFrame.first
- pandas.DataFrame.reindex
- pandas.DataFrame.first_valid_index
Finally we show how to understand the logic behind complex Pandas code.