How to Reset Column Names (Index) in Pandas
To reset column names (column index) in Pandas to numbers from 0 to N we can use several different approaches:
(1) Range from df.columns.size
df.columns = range(df.columns.size)
(2) Transpose to rows and reset_index
- the slowest options
df.T.reset_index(drop=True).T
(3) Range from column number - df.shape[1]
df.columns = range(df.shape[1])
Which one to use depends on data and the context in which it is used. If you like to change the order of the columns you can check: How to Change the Order of Columns in Pandas DataFrame
In Pandas there are two axes. Rows are considered as indexes. The other one is columns. Information for them is returned from method
axes
Below you can find simple example and performance comparison:
import pandas as pd
df = pd.DataFrame({
'name':['Softhints', 'DataScientyst', 'DataPlotPlus'],
'url':['https://www.softhints.com', 'https://datascientyst.com', 'https://dataplotplus.com'],
'id':['a', 'b', 'c']
}, index=[2, 3, 4])
})
result:
0 | 1 | 2 |
---|---|---|
Softhints | https://www.softhints.com | a |
DataScientyst | https://datascientyst.com | b |
DataPlotPlus | https://dataplotplus.com | c |
The columns are:
Index(['name', 'url', 'id'], dtype='object')
After reset by any of the above we get:
RangeIndex(start=0, stop=3, step=1)
0 | 1 | 2 |
---|---|---|
Softhints | https://www.softhints.com | a |
DataScientyst | https://datascientyst.com | b |
DataPlotPlus | https://dataplotplus.com | c |
Reset row and column index
In order to reset row and column index at the same time you can use Python tuples syntax like:
df.index, df.columns = [range(df.index.size), range(df.columns.size)]
Prior the reset:
name | url | id | |
---|---|---|---|
2 | Softhints | https://www.softhints.com | a |
3 | DataScientyst | https://datascientyst.com | b |
4 | DataPlotPlus | https://dataplotplus.com | c |
After the reset:
0 | 1 | 2 | |
---|---|---|---|
0 | Softhints | https://www.softhints.com | a |
1 | DataScientyst | https://datascientyst.com | b |
2 | DataPlotPlus | https://dataplotplus.com | c |
To get axes information from Pandas DataFrame we can use method axes
:
df.axes
result before the reset of the indexes:
[Int64Index([2, 3, 4], dtype='int64'), RangeIndex(start=0, stop=3, step=1)]
Performance comparison for resetting column names
Lets increase DataFrame rows by:
df_perf = pd.concat([df] * 10 ** 4)
to:
(30000, 3)
So the timings are:
range(df.shape[1])
- 10.4 µs ± 88.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)df.T.reset_index(drop=True).T
- 449 ms ± 4.38 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)df_perf.columns = range(df.columns.size)
- 10.1 µs ± 77.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
For DataFrame with 30000 columns and shape: (3, 30000):
range(df.shape[1])
- 10.4 µs ± 101 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)df.T.reset_index(drop=True).T
- 444 ms ± 7.38 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)df_perf.columns = range(df.columns.size)
- 10.1 µs ± 126 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)