In this post you can learn how to replace a header with the top row in a Pandas DataFrame
1: Replace the Header with the First Row
If your DataFrame has no header and the first row contains the correct column names, you can promote it to the header:
import pandas as pd
# Example DataFrame without a header
df = pd.DataFrame([
["Name", "Age", "City"],
["Alice", 25, "New York"],
["Bob", 30, "Los Angeles"]
])
df
the result is:
0 | 1 | 2 | |
---|---|---|---|
0 | Name | Age | City |
1 | Alice | 25 | New York |
2 | Bob | 30 | Los Angeles |
Now we can use the first row as a header by:
# Set the first row as header and remove it from data
df.columns = df.iloc[0] # Assign first row as column names
df = df[1:] # Drop the first row (now redundant)
df.reset_index(drop=True, inplace=True) # Reset index
Result:
Name | Age | City |
---|---|---|
Alice | 25 | New York |
Bob | 30 | Los Angeles |
Name | Age | City | |
---|---|---|---|
0 | Alice | 25 | New York |
1 | Bob | 30 | Los Angeles |
2: When Reading a CSV File
If you know your data structure beforehand, you can handle this during the import process:
CSV has no headers
df = pd.read_csv("file.csv", header=None) # Read without header
df.columns = ['header1', 'header2']
Use Some Row as a header
or use any row as a header:
df = pd.read_csv('file.csv', header=3)
Note: this might skip some rows from the file.
3. Multiple Header Rows
Sometimes datasets have multi-level headers spread across several rows. In such cases:
df.columns = df.iloc[0].astype(str) + '_' + df.iloc[1].astype(str)
df = df[2:].reset_index(drop=True)
4. Cleaning Header Names
After setting new headers, you might need to clean them:
df.columns = df.columns.str.strip().fillna('Unknown')
This will remove extra whitespace and handle NaN values.