Do you need to skip rows while reading CSV file with read_csv in Pandas? If so, this article will show you how to skip first rows of reading file.
Method read_csv
has parameter skiprows
which can be used as follows:
(1) Skip first rows reading CSV file in Pandas
pd.read_csv(csv_file, skiprows=3, header=None)
(2) Skip rows by index with read_csv
pd.read_csv(csv_file, skiprows=[0,2])
Lets check several practical examples which will cover all aspects of reading CSV file and skipping rows.
To start lets say that we have the next CSV file:
!cat '../data/csv/multine_header.csv'
CSV file with multiple headers (to learn more about reading a CSV file with multiple headers):
Date,Company A,Company A,Company B,Company B
,Rank,Points,Rank,Points
2021-09-06,1,7.9,2,6
2021-09-07,1,8.5,2,7
2021-09-08,2,8,1,8.1
Step 1: Skip first N rows while reading CSV file
First example shows how to skip consecutive rows with Pandas read_csv
method.
There are 2 options:
- skip rows in Pandas without using header
- skip first N rows and use header for the DataFrame - check Step 2
In this Step Pandas read_csv method will read data from row 4 (index of this row is 3). The newly created DataFrame will have autogenerated column names:
df = pd.read_csv(csv_file, skiprows=3, header=None)
This will result into:
0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|
2021-09-07 | 1 | 8.5 | 2 | 7.0 |
2021-09-08 | 2 | 8.0 | 1 | 8.1 |
Step 2: Skip first N rows and use header
If parameter header
of method read_csv
is not provided than first row will be used as a header. In combination of parameters header
and skiprows
- first the rows will be skipped and then first on of the remaining will be used as a header.
In the example below 3 rows from the CSV file will be skipped. The forth one will be used as a header of the new DataFrame.
df = pd.read_csv(csv_file, skiprows=3)
2021-09-07 | 1 | 8.5 | 2 | 7 |
---|---|---|---|---|
2021-09-08 | 2 | 8 | 1 | 8.1 |
Step 3: Pandas keep the header and skip first rows
What if you need to keep the header and then the skip N rows? This can be achieved in several different ways.
The most simple one is by builing a list of rows which to be skipped:
rows_to_skip = range(1,3)
df = pd.read_csv(csv_file, skiprows=rows_to_skip)
result:
Date | Company A | Company A.1 | Company B | Company B.1 |
---|---|---|---|---|
2021-09-07 | 1 | 8.5 | 2 | 7.0 |
2021-09-08 | 2 | 8.0 | 1 | 8.1 |
As you can see read_csv
method keep the header and skip first 2 rows after the header.
Step 4: Skip non consecutive rows with read_csv
by index
Parameter skiprows
is defined as:
Line numbers to skip (0-indexed) or number of lines to skip (int) at the start of the file.
So to skip rows 0 and 2 we can pass list of values to skiprows
:
df = pd.read_csv(csv_file, skiprows=[0,2])
Unnamed: 0 | Rank | Points | Rank.1 | Points.1 |
---|---|---|---|---|
2021-09-07 | 1 | 8.5 | 2 | 7.0 |
2021-09-08 | 2 | 8.0 | 1 | 8.1 |