How to Normalize Column or DataFrame in Pandas

In this tutorial, we'll learn how to normalize columns or the whole DataFrame in Pandas. We will show different ways like:

(1) Min Max normalization

for whole DataFrame

(df-df.min())/(df.max()-df.min())

for column:

(df['col'] - df['col'].mean())/df['col'].std()

(2) Mean normalization

(df-df.mean())/df.std()

(3) biased normalization

scaler.fit_transform(df.iloc[:,:].to_numpy())

Let's cover all examples in more detail.

Setup

For this post we are creating example DataFrame with 3 numeric columns:

import pandas as pd

data = {'day': [1, 2, 3, 4, 5, 6, 7, 8],
        'temp': [9, 8, 6, 13, 10, 15, 9, 10],
        'humidity': [0.89, 0.86, 0.54, 0.73, 0.45, 0.63, 0.95, 0.67]}

df = pd.DataFrame(data=data)

Data looks like:

	day	temp	humidity
0	1	9	0.89
1	2	8	0.86
2	3	6	0.54
3	4	13	0.73
4	5	10	0.45

1: Min Max normalization in Pandas

So let's start by min max normalization (called also min max scaling) in Pandas and Python.

Single column

To do min max scaling for a single column we can do:

(df['humidity']-df['humidity'].min())/(df['humidity'].max()-df['humidity'].min())

The result is normalized Series:

0    0.88
1    0.82
2    0.18
3    0.56
4    0.00
5    0.36
6    1.00
7    0.44
Name: humidity, dtype: float64

Checking data next to the original column:

	humidity_norm	humidity
0	0.88	0.89
1	0.82	0.86
2	0.18	0.54
3	0.56	0.73
4	0.00	0.45

All columns

To normalize all columns of a DataFrame we can use:

(df-df.min())/(df.max()-df.min())

Which will result into:

	day	temp	humidity
0	0.000000	0.333333	0.88
1	0.142857	0.222222	0.82
2	0.285714	0.000000	0.18
3	0.428571	0.777778	0.56
4	0.571429	0.444444	0.00

2: Mean normalization in Pandas

Next we can see how to do mean normalization in Pandas and Python.

Single column

For a single column we can apply mean normalization by:

(df['humidity'] - df['humidity'].mean())/df['humidity'].std()

The result and the original values:

	humidity_norm	humidity
0	0.993475	0.89
1	0.823165	0.86
2	-0.993475	0.54
3	0.085155	0.73
4	-1.504406	0.45

All columns

To normalize the whole DataFrame with mean normalization we can do:

(df-df.mean())/df.std()

result:

	day	temp	humidity
0	-1.428869	-0.353553	0.993475
1	-1.020621	-0.707107	0.823165
2	-0.612372	-1.414214	-0.993475
3	-0.204124	1.060660	0.085155
4	0.204124	0.000000	-1.504406

3: Biased normalization in Pandas

To perform biased normalization in Pandas we can use the library sklearn. The results will differ from the Pandas normalization.

import pandas as pd
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

scaler.fit_transform(df.to_numpy())

The results are:

	0	1	2
0	-1.527525	-0.377964	1.062070
1	-1.091089	-0.755929	0.880001
2	-0.654654	-1.511858	-1.062070
3	-0.218218	1.133893	0.091035
4	0.218218	0.000000	-1.608277

4: Normalize rows in Pandas

There are multiple ways to normalize rows:

per sum
mean
min max

Normalize rows by their sum

To normalize row based on the sum of the row in Pandas we can do:

df.div(df.sum(axis=1), axis=0)

which will give use:

	day	temp	humidity
0	0.091827	0.826446	0.081726
1	0.184162	0.736648	0.079190
2	0.314465	0.628931	0.056604
3	0.225606	0.733221	0.041173
4	0.323625	0.647249	0.029126

Transpose

To normalize row wise in Pandas we can combine:

.T to transpose rows to columns
df.values to get the values as numpy array

Let's see an example:

import pandas as pd
from sklearn import preprocessing

data = df.T.values

scaler = preprocessing.MinMaxScaler()
pd.DataFrame(scaler.fit_transform(data)).T

So after using df.values we get:

array([[0.0135635 , 1.        , 0.        ],
       [0.15966387, 1.        , 0.        ],
       [0.45054945, 1.        , 0.        ],
       [0.26650367, 1.        , 0.        ],
       [0.47643979, 1.        , 0.        ],
       [0.3736952 , 1.        , 0.        ],
       [0.7515528 , 1.        , 0.        ],
       [0.78563773, 1.        , 0.        ]])

which are transformed to:

array([[0.        , 0.33333333, 0.88      ],
       [0.14285714, 0.22222222, 0.82      ],
       [0.28571429, 0.        , 0.18      ],
       [0.42857143, 0.77777778, 0.56      ],
       [0.57142857, 0.44444444, 0.        ],
       [0.71428571, 1.        , 0.36      ],
       [0.85714286, 0.33333333, 1.        ],
       [1.        , 0.44444444, 0.44      ]])

Conclusion

In this article we learned how to normalize columns and DataFrame in Pandas. Different ways of normalization were covered like - biased, unbiased, normalization per sum.

We also saw how to normalize rows of a DataFrame. Normalizing data is very useful in machine learning and visualizing data.

> Basic concepts

> Installations

> Series

> DataFrame

> Create

> Data Types

> Exercise

> Cheat Sheet

> Basic concepts

> Row

> Column

> Index

> MultiIndex

> Exercise

> Basic concepts

> read_csv()

> read_excel()

> Kaggle

> Exercise

> read_xml()

> read_json()

> to_csv()

> to_dict()

> to_json()

> Basic concepts

> groupby()

> Reshape

> melt()

> Exercise

> Pivot

> merge()

> Filter

> Basic concepts

> replace()

> split()

> Regex

> Search

> Exercise

> Find

> Basic concepts

> apply()

> aggfunc

> Convert

> count()

> Other

> Exercise

> map()

> Basic concepts

> Data Validation

> Data Cleaning

> Duplicate

> Time Series

> Pandas Error

> Get

> Basic concepts

> Styling

> Table

> Display

> DataIsBeautiful

> Beginners

> Data Science Projects

> Newsletter

Setup

1: Min Max normalization in Pandas

Single column

All columns

2: Mean normalization in Pandas

Single column

All columns

3: Biased normalization in Pandas

4: Normalize rows in Pandas

Normalize rows by their sum

Transpose

Conclusion