To wrap or break long column names in Pandas we can use module textwrap
and map the column names with new line symbols:
(1) Wrap DataFrame column names
import textwrap
cols_wrap = [textwrap.wrap(x, width=20) for x in df.columns]
cols_wrap = {' '.join(words) : '<br>'.join(words) for words in cols_wrap}
cols_wrap
(2) Truncate column names
import textwrap
cols_wrap = {x: textwrap.wrap(x, width=15)[1] for x in df.columns}
This will prevent formatting issues or horizontal overflow on displayed data. Let's see it in more details and examples:
Data
Let's use this data:
import pandas as pd
df = pd.DataFrame({
"Test Data Type N0 extract 1": [0, 1, 2, 3],
"Test Data Type N101 extract 1": [3, 5, 7, 9],
"Prod Data Type N0 extract 0": [1, 2, 3, 4],
"Prod Data Type N101 extract 0": [0.5, 1.0, 1.5, 2.0],
})
which will have long names. If you work with 20+ columns this might be visually hard to digest:
Test Data Type N0 extract 1 | Test Data Type N101 extract 1 | Prod Data Type N0 extract 0 | Prod Data Type N101 extract 0 | |
---|---|---|---|---|
0 | 0 | 3 | 1 | 0.5 |
1 | 1 | 5 | 2 | 1.0 |
2 | 2 | 7 | 3 | 1.5 |
3 | 3 | 9 | 4 | 2.0 |
1. Wrap column names
We can wrap every column name no matter is it OK or too long, by inserting <br>
to break them:
import textwrap
cols_wrap = [textwrap.wrap(x, width=20) for x in df.columns]
cols_wrap = {' '.join(words) : '<br>'.join(words) for words in cols_wrap}
cols_wrap
This will create a dictionary:
{'Test Data Type N0 extract 1': 'Test Data Type N0<br>extract 1',
'Test Data Type N101 extract 1': 'Test Data Type N101<br>extract 1',
'Prod Data Type N0 extract 0': 'Prod Data Type N0<br>extract 0',
'Prod Data Type N101 extract 0': 'Prod Data Type N101<br>extract 0'}
df.rename(columns=cols_wrap).style.format()
Now we can display the DataFrame with wrapped column names:
Test Data Type N0 extract 1 |
Test Data Type N101 extract 1 |
Prod Data Type N0 extract 0 |
Prod Data Type N101 extract 0 |
|
---|---|---|---|---|
0 | 0 | 3 | 1 | 0.500000 |
1 | 1 | 5 | 2 | 1.000000 |
2 | 2 | 7 | 3 | 1.500000 |
3 | 3 | 9 | 4 | 2.000000 |
- we can control the lenght of the wrap by -
width=20
- shorter columns will remain the same
- the
<br>
works in Jupyterlab in combination with.style.format()
- original data is unchanged
2. Truncate Column names
We can also break the longer column names by similar approach:
import textwrap
cols_wrap = {x: textwrap.wrap(x, width=15)[1] for x in df.columns}
cols_wrap
this time we will have shorter names which consists only from the last part of the wrap:
{'Test Data Type N0 extract 1': 'N0 extract 1',
'Test Data Type N101 extract 1': 'N101 extract 1',
'Prod Data Type N0 extract 0': 'N0 extract 0',
'Prod Data Type N101 extract 0': 'N101 extract 0'}
result:
N0 extract 1 | N101 extract 1 | N0 extract 0 | N101 extract 0 | |
---|---|---|---|---|
0 | 0 | 3 | 1 | 0.5 |
1 | 1 | 5 | 2 | 1.0 |
2 | 2 | 7 | 3 | 1.5 |
3 | 3 | 9 | 4 | 2.0 |
3. Transpose for better vertical readability
If the dataset is small, sometimes transposing helps:
print(df.T.to_string())
or by printing:
print(df.T.to_string())
This prints the column names as row labels, which makes even long names easier to read:
0 | 1 | 2 | 3 | |
---|---|---|---|---|
Test Data Type N0 extract 1 | 0.0 | 1.0 | 2.0 | 3.0 |
Test Data Type N101 extract 1 | 3.0 | 5.0 | 7.0 | 9.0 |
Prod Data Type N0 extract 0 | 1.0 | 2.0 | 3.0 | 4.0 |
Prod Data Type N101 extract 0 | 0.5 | 1.0 | 1.5 | 2.0 |