You can use the following code to apply a function to multiple columns in a Pandas DataFrame:
def get_date_time(row, date, time):
return row[date] + ' ' +row[time]
df.apply(get_date_time, axis=1, date='Date', time='Time')
For applying function to single column and performance optimization on apply
check - How to apply function to single column in Pandas
Next, you'll see several examples on how to apply a function to two and more columns in Pandas.
The DataFrame below is available from Kaggle:
Date | Latitude | Longitude | Depth | Type |
---|---|---|---|---|
12/24/2016 | -5.1460 | 153.5166 | 30.00 | Earthquake |
12/25/2016 | -43.4029 | -73.9395 | 38.00 | Earthquake |
12/25/2016 | -43.4810 | -74.4771 | 14.93 | Earthquake |
12/27/2016 | 45.7192 | 26.5230 | 97.00 | Earthquake |
12/28/2016 | 38.3754 | -118.8977 | 10.80 | Earthquake |
You can download it from Kaggle or read it with Python - How to Search and Download Kaggle Dataset to Pandas DataFrame
Option 1: Apply function to two columns in Pandas DataFrame
Suppose you would like to create a new column with the city based on the pair: Latitude and Longitude.
For this purpose we will define new function geo_rev(x)
which will be applied on columns and will return the city for each row:
import geocoder
def geo_rev(x):
g = geocoder.osm([x['Latitude'], x['Longitude']], method='reverse').json
if g:
return g.get('country')
else:
return 'no country'
df.apply(geo_rev, axis=1)
Function apply
takes argument axis=1
which can be described as:
- 0 or 'index': apply function to each column.
- 1 or 'columns': apply function to each row.
The function receives all values from the current row and they can be accessed by: x['Latitude']
To create a new column after applying a function we can use:
df['country'] = df.apply(geo_rev, axis=1)
Option 2: Apply function to multiple columns with parameters
If you need to apply a function to DataFrame and pass parameters to the function at the same time then you can use the following syntax:
def get_date_time(row, date, time):
return row[date] + ' ' +row[time]
df.apply(get_date_time, axis=1, date='Date', time='Time')
There's no limit on the number of parameters.
Option 3: Apply function with lambda and multiple columns
In this example we are going to use method apply
and lambda
in order to apply function to several columns.
Again we are going to convert Latitude
and Longitude
to country by applying function:
import pandas as pd
def geo_rev(lat, lon):
g = geocoder.osm([lat, lon], method='reverse').json
if g:
return g.get('country')
else:
return 'no country'
df.apply(lambda x: geo_rev(x['Latitude'], x['Longitude']), axis=1)
result is:
23402 Papua Niugini
23403 Chile
23404 Chile
23405 România
23406 United States
23407 United States
23408 United States
23409 日本
23410 Indonesia
23411 no country
Option 4: Select and apply function to multiple columns
You can select several columns from a Pandas DataFrame and apply function to them by:
def geo_rev(lat, lon, mag):
g = geocoder.osm([lat, lon], method='reverse').json
if g:
return g.get('country') + ' ' + str(mag)
else:
return 'no country '
df[['Latitude', 'Longitude', 'Magnitude']].apply(lambda x: geo_rev(*x), axis=1)
result of this operation is:
23402 Papua Niugini 5.8
23403 Chile 7.6
23404 Chile 5.6
Option 5: Apply function to multiple columns without using apply
Finally let's see an alternative solution to apply a function to several columns but without the method apply
.
This can be achieved by using a combination of list
and map
. This technique is much faster than using Pandas apply
:
def geo_rev(lat, lon):
g = geocoder.osm([lat, lon], method='reverse').json
if g:
return g.get('country')
else:
return 'no country '
list(map(geo_rev, df['Latitude'], df['Longitude']))
The advantage of this approach is the speed as we can see in the comparison below for a small dataset:
%timeit list(map(
- 12 µs per loop%timeit df.apply(
- 760 µs per loop