Do you need to skip rows while reading CSV file with read_csv in Pandas? If so, this article will show you how to skip first rows of reading file.
read_csv has parameter
skiprows which can be used as follows:
(1) Skip first rows reading CSV file in Pandas
pd.read_csv(csv_file, skiprows=3, header=None)
(2) Skip rows by index with
Lets check several practical examples which will cover all aspects of reading CSV file and skipping rows.
To start lets say that we have the next CSV file:
CSV file with multiple headers (to learn more about reading a CSV file with multiple headers):
Date,Company A,Company A,Company B,Company B ,Rank,Points,Rank,Points 2021-09-06,1,7.9,2,6 2021-09-07,1,8.5,2,7 2021-09-08,2,8,1,8.1
Step 1: Skip first N rows while reading CSV file
First example shows how to skip consecutive rows with Pandas
There are 2 options:
- skip rows in Pandas without using header
- skip first N rows and use header for the DataFrame - check Step 2
In this Step Pandas read_csv method will read data from row 4 (index of this row is 3). The newly created DataFrame will have autogenerated column names:
df = pd.read_csv(csv_file, skiprows=3, header=None)
This will result into:
Step 2: Skip first N rows and use header
header of method
read_csv is not provided than first row will be used as a header. In combination of parameters
skiprows - first the rows will be skipped and then first on of the remaining will be used as a header.
In the example below 3 rows from the CSV file will be skipped. The forth one will be used as a header of the new DataFrame.
df = pd.read_csv(csv_file, skiprows=3)
Step 3: Pandas keep the header and skip first rows
What if you need to keep the header and then the skip N rows? This can be achieved in several different ways.
The most simple one is by builing a list of rows which to be skipped:
rows_to_skip = range(1,3) df = pd.read_csv(csv_file, skiprows=rows_to_skip)
|Date||Company A||Company A.1||Company B||Company B.1|
As you can see
read_csv method keep the header and skip first 2 rows after the header.
Step 4: Skip non consecutive rows with
read_csv by index
skiprows is defined as:
Line numbers to skip (0-indexed) or number of lines to skip (int) at the start of the file.
So to skip rows 0 and 2 we can pass list of values to
df = pd.read_csv(csv_file, skiprows=[0,2])