In this post, we'll explore how to map DataFrame Index values using a dictionary in Pandas.
Setup
Consider a DataFrame with following data:
import pandas as pd
data = {'Value': [10, 15, 20, 25]}
df = pd.DataFrame(data, index=[1,2,3,4])
result:
Value | |
---|---|
1 | 10 |
2 | 15 |
3 | 20 |
4 | 25 |
This will create a DataFrame with an index labeled 1, 2, 3 and 4.
1: Map index with df.index.map
To map DataFrame index with Python dictionary we can use method: df.index.map
:
index_mapping = {1: 'Red', 2: 'Blue', 3: 'Green', 4: 'White' }
df.index = df.index.map(index_mapping)
The new index is based on the mapping of the provided values in the dictionary:
Value | |
---|---|
Red | 10 |
Blue | 15 |
Green | 20 |
White | 25 |
2: Map with a function
To map Pandas index with a function we have two options:
- lambda
- predefined functions
lambda
Let's remind us that - lambda function is a small anonymous function.
df.index.map(lambda x: x + 1)
the result is new index with changed values:
Index([2, 3, 4, 5], dtype='int64')
Another lambda example to map index:
df.index.map(lambda x: x.upper())
predefined functions
The example below will map all values and format the them:
df.index.map('Index {}'.format)
the result is new index with changed values:
Index(['Index 1', 'Index 2', 'Index 3', 'Index 4'], dtype='object')
3: Missing values in the dict
There is a parameter na_action
which controls behavior of missing values. If the index contains missing values they could be exclude from mapping with function:
import pandas as pd
data = {'Value': [10, 15, 20, 25]}
df = pd.DataFrame(data, index=[1,2,3, None])
df.index.map('Index {}'.format, na_action='ignore')
Will result into:
Index(['Index 1.0', 'Index 2.0', 'Index 3.0', nan], dtype='object')
4: Map with values without mapping
If a value is not found in the index we will end with index full of NaN values:
index_mapping = {1: 'Red', 2: 'Blue'}
df.index = df.index.map(index_mapping)
result:
Value | |
---|---|
Red | 10 |
Blue | 15 |
NaN | 20 |
NaN | 25 |
To avoid that we can use method replace:
index_mapping = {1: 'Red', 2: 'Blue'}
df.index = pd.Series(df.index).replace(index_mapping)
Conclusion
Mapping a DataFrame index using a dictionary might help to control index values. Some use cases are:
- data cleaning
- memory efficiency
- anonymization