In this tutorial, we'll learn how to normalize columns or the whole DataFrame in Pandas. We will show different ways like:

(1) Min Max normalization

for whole DataFrame

``````(df-df.min())/(df.max()-df.min())
``````

for column:

``````(df['col'] - df['col'].mean())/df['col'].std()
``````

(2) Mean normalization

``````(df-df.mean())/df.std()
``````

(3) biased normalization

``````scaler.fit_transform(df.iloc[:,:].to_numpy())
``````

Let's cover all examples in more detail.

## Setup

For this post we are creating example DataFrame with 3 numeric columns:

``````import pandas as pd

data = {'day': [1, 2, 3, 4, 5, 6, 7, 8],
'temp': [9, 8, 6, 13, 10, 15, 9, 10],
'humidity': [0.89, 0.86, 0.54, 0.73, 0.45, 0.63, 0.95, 0.67]}

df = pd.DataFrame(data=data)
``````

Data looks like:

day temp humidity
0 1 9 0.89
1 2 8 0.86
2 3 6 0.54
3 4 13 0.73
4 5 10 0.45

## 1: Min Max normalization in Pandas

So let's start by min max normalization (called also min max scaling) in Pandas and Python.

### Single column

To do min max scaling for a single column we can do:

``````(df['humidity']-df['humidity'].min())/(df['humidity'].max()-df['humidity'].min())
``````

The result is normalized Series:

``````0    0.88
1    0.82
2    0.18
3    0.56
4    0.00
5    0.36
6    1.00
7    0.44
Name: humidity, dtype: float64
``````

Checking data next to the original column:

humidity_norm humidity
0 0.88 0.89
1 0.82 0.86
2 0.18 0.54
3 0.56 0.73
4 0.00 0.45

### All columns

To normalize all columns of a DataFrame we can use:

``````(df-df.min())/(df.max()-df.min())
``````

Which will result into:

day temp humidity
0 0.000000 0.333333 0.88
1 0.142857 0.222222 0.82
2 0.285714 0.000000 0.18
3 0.428571 0.777778 0.56
4 0.571429 0.444444 0.00

## 2: Mean normalization in Pandas

Next we can see how to do mean normalization in Pandas and Python.

### Single column

For a single column we can apply mean normalization by:

``````(df['humidity'] - df['humidity'].mean())/df['humidity'].std()
``````

The result and the original values:

humidity_norm humidity
0 0.993475 0.89
1 0.823165 0.86
2 -0.993475 0.54
3 0.085155 0.73
4 -1.504406 0.45

### All columns

To normalize the whole DataFrame with mean normalization we can do:

``````(df-df.mean())/df.std()
``````

result:

day temp humidity
0 -1.428869 -0.353553 0.993475
1 -1.020621 -0.707107 0.823165
2 -0.612372 -1.414214 -0.993475
3 -0.204124 1.060660 0.085155
4 0.204124 0.000000 -1.504406

## 3: Biased normalization in Pandas

To perform biased normalization in Pandas we can use the library `sklearn`. The results will differ from the Pandas normalization.

``````import pandas as pd
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

scaler.fit_transform(df.to_numpy())
``````

The results are:

0 1 2
0 -1.527525 -0.377964 1.062070
1 -1.091089 -0.755929 0.880001
2 -0.654654 -1.511858 -1.062070
3 -0.218218 1.133893 0.091035
4 0.218218 0.000000 -1.608277

## 4: Normalize rows in Pandas

There are multiple ways to normalize rows:

• per sum
• mean
• min max

### Normalize rows by their sum

To normalize row based on the sum of the row in Pandas we can do:

``````df.div(df.sum(axis=1), axis=0)
``````

which will give use:

day temp humidity
0 0.091827 0.826446 0.081726
1 0.184162 0.736648 0.079190
2 0.314465 0.628931 0.056604
3 0.225606 0.733221 0.041173
4 0.323625 0.647249 0.029126

### Transpose

To normalize row wise in Pandas we can combine:

• `.T` to transpose rows to columns
• `df.values` to get the values as numpy array

Let's see an example:

``````import pandas as pd
from sklearn import preprocessing

data = df.T.values

scaler = preprocessing.MinMaxScaler()
pd.DataFrame(scaler.fit_transform(data)).T

``````

So after using `df.values` we get:

``````array([[0.0135635 , 1.        , 0.        ],
[0.15966387, 1.        , 0.        ],
[0.45054945, 1.        , 0.        ],
[0.26650367, 1.        , 0.        ],
[0.47643979, 1.        , 0.        ],
[0.3736952 , 1.        , 0.        ],
[0.7515528 , 1.        , 0.        ],
[0.78563773, 1.        , 0.        ]])
``````

which are transformed to:

``````array([[0.        , 0.33333333, 0.88      ],
[0.14285714, 0.22222222, 0.82      ],
[0.28571429, 0.        , 0.18      ],
[0.42857143, 0.77777778, 0.56      ],
[0.57142857, 0.44444444, 0.        ],
[0.71428571, 1.        , 0.36      ],
[0.85714286, 0.33333333, 1.        ],
[1.        , 0.44444444, 0.44      ]])
``````

## Conclusion

In this article we learned how to normalize columns and DataFrame in Pandas. Different ways of normalization were covered like - biased, unbiased, normalization per sum.

We also saw how to normalize rows of a DataFrame. Normalizing data is very useful in machine learning and visualizing data.