In this quick tutorial, we'll cover how to apply function to a single column in Pandas.
Here are two ways to apply function to column in DataFrame:
(1) Apply user defined function on column
(2) Apply lambda to function
df['col'].apply(lambda x: x + 1)
For multiple columns check: How to apply function to multiple columns in Pandas.
In the next section we will cover several different use cases and important details on the topic.
Let's say that we have the following DataFrame:
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randint(0,5,size=(5, 4)), columns=list('ABCD')) df
For large DataFrames use Dask or swifter - check Option 4!
Option 1: Pandas apply function to column
The first example will show how to define a function and then apply it on a column from a Pandas DataFrame.
First we will define a function which will be applied on the column by method -
pd.apply. Then we will called that function for column
def my_function(x): return x ** 2 df['A'].apply(my_function)
The result is squared values for each cell:
0 16 1 1 2 0 3 9 4 1 Name: A, dtype: int64
Option 2: Pandas apply function to column by
**A better way to apply function to a single column is by using Pandas
map method. **
Why is it better? Because
apply is designed for multiple columns while
map is intended for Pandas Series. A single column from Pandas is equal to a Pandas Series or 1 dimensional array.
map can be slightly faster than
apply for large DataFrames.
So the apply function by map can be done by:
def my_function(x): return x ** 2 df['A'].map(my_function)
The result is the same as Option 1.
Option 3: Pandas apply anonymous function / lambda to column
Sometimes a lambda or anonymous function is what you would like to apply to a column.
The syntax is very simple:
df['A'].map(lambda x: x ** 2)
df['A'].apply(lambda x: x ** 2)
The difference is the same:
apply method will be applied on DataFrame level while
map is applied on Series level.
So you can do:
df[['A', 'B']].apply(lambda x: x ** 2)
in order to apply lambda on multiple functions.
It's recommended to show intention of what you would like to do: 1) use map for single columns or 2) use apply if in future other columns should be added!
Option 4: Speed up Pandas apply function to column
Finally let's cover how to speed up applying function to single column in Pandas.
To optimize execution there are several options like:
You can find more info in the Resource section.
To test all options we will create DataFrame with shape -
10000 rows × 4 columns.
Dask - apply function to column - the fastest tested way:
import dask.dataframe as dd ddf = dd.from_pandas(df, npartitions=2) ddf["A"].apply(fnc, meta=('A', 'int64'))
The result is:
612 µs ± 3.31 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
swifter - fast apply function to column - it's much faster the method
apply and a bit slower than Dask - in the tested example:
752 µs ± 3.39 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
3.63 ms ± 28.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
3.57 ms ± 25 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)