In this quick tutorial, we'll cover how to apply function to a single column in Pandas.

Here are two ways to apply function to column in DataFrame:

(1) Apply user defined function on column

df['col'].map(my_function)

(2) Apply lambda to function

df['col'].apply(lambda x: x + 1)

For multiple columns check: How to apply function to multiple columns in Pandas.

In the next section we will cover several different use cases and important details on the topic.

Let's say that we have the following DataFrame:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,5,size=(5, 4)), columns=list('ABCD'))
df

DataFrame:

A B C D
4 2 2 3
1 0 0 3
0 2 1 0
3 2 4 2
1 1 4 1
× Pro Tip!
For large DataFrames use Dask or swifter - check Option 4!

Option 1: Pandas apply function to column

The first example will show how to define a function and then apply it on a column from a Pandas DataFrame.

First we will define a function which will be applied on the column by method - pd.apply. Then we will called that function for column A:

def my_function(x):
    return x ** 2
    
df['A'].apply(my_function)

The result is squared values for each cell:

0    16
1     1
2     0
3     9
4     1
Name: A, dtype: int64

Option 2: Pandas apply function to column by map

**A better way to apply function to a single column is by using Pandas map method. **

Why is it better? Because apply is designed for multiple columns while map is intended for Pandas Series. A single column from Pandas is equal to a Pandas Series or 1 dimensional array.

Method map can be slightly faster than apply for large DataFrames.

So the apply function by map can be done by:

def my_function(x):
    return x ** 2
    
df['A'].map(my_function)

The result is the same as Option 1.

Option 3: Pandas apply anonymous function / lambda to column

Sometimes a lambda or anonymous function is what you would like to apply to a column.

The syntax is very simple:

df['A'].map(lambda x: x ** 2)

or:

df['A'].apply(lambda x: x ** 2)

The difference is the same: apply method will be applied on DataFrame level while map is applied on Series level.

So you can do:

df[['A', 'B']].apply(lambda x: x ** 2)

in order to apply lambda on multiple functions.

× Pro Tip!
It's recommended to show intention of what you would like to do: 1) use map for single columns or 2) use apply if in future other columns should be added!

Option 4: Speed up Pandas apply function to column

Finally let's cover how to speed up applying function to single column in Pandas.

To optimize execution there are several options like:

  • Dask
  • Swifter

You can find more info in the Resource section.

To test all options we will create DataFrame with shape - 10000 rows × 4 columns.

Dask - apply function to column - the fastest tested way:

import dask.dataframe as dd
ddf = dd.from_pandas(df, npartitions=2)
ddf["A"].apply(fnc, meta=('A', 'int64'))

The result is:

612 µs ± 3.31 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

swifter - fast apply function to column - it's much faster the method apply and a bit slower than Dask - in the tested example:

df['A'].swifter.apply(fnc)

result:

752 µs ± 3.39 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Pandas apply

3.63 ms ± 28.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Pandas map

3.57 ms ± 25 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Resources