How To Read Multiple CSV Files into Pandas DataFrame
To read multiple CSV file into single Pandas DataFrame we can use the following syntax:
(1) Pandas read multiple CSV files
path = r'/home/user/Downloads'
all_files = glob.glob(path + "/*.csv")
lst = []
for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0)
lst.append(df)
merged_df = pd.concat(lst, axis=0, ignore_index=True)
(2) Read multiple CSV files - Dask
import dask.dataframe as dd
df = dd.read_csv("~/Downloads/test*.csv")
Pandas Example
Suppose that we would like to read all CSV files:
- located in folder -
/home/user/Downloads
- by pattern -
/test_*.csv
- starting withtest_
and ending on.csv
We can use the following code:
import glob
import pandas as pd
path = r'/home/user/Downloads'
pattern = "/test_*.csv"
all_files = glob.glob(path + pattern)
lst = []
for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0)
lst.append(df)
merged_df = pd.concat(lst, axis=0, ignore_index=True)
Let's say that we have the following files in this folder:
- other.csv
- test_1.csv
- test_2.csv
In the final DataFrame - merged_df we will have content only from files - test_1.csv and test_2.csv:
Read multiple CSV files with Dask
As an alternative solution we can use the dask module to read multiple CSV files. To install Dask you can visit: dask or use: pip install dask
.
To read multiple files from a folder with pattern we can use:
import dask.dataframe as dd
df = dd.read_csv("~/Downloads/test*.csv")
Resources
For more advanced examples on reading multiple CSV or JSON files with Pandas you can check: