How to Save a Pandas DataFrame as a Compressed CSV/JSON File

To save a DataFrame as a compressed CSV/JSON file using Pandas we can parameter compression='gzip as follows:

CSV

df.to_csv('data.csv.gz', index=False, compression='gzip')

JSON

df.to_JSON('data.json.gz', index=False, compression='gzip')

Saving a DataFrame as a Compressed CSV

You can save a Pandas DataFrame as a compressed CSV using the compression parameter in the to_csv() or to_json() function. Pandas supports multiple compression formats like:

gzip
bz2
zip
xz
zstd
tar

You can read more on the following link: pandas.DataFrame.to_csv

Example: Saving a DataFrame with gzip Compression

Below you can do a see basic example of on-the-fly compression of the output data:

import pandas as pd  

data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'City': ['New York', 'London', 'Paris']}  
df = pd.DataFrame(data)  
 
df.to_csv('data.csv.gz', index=False, compression='gzip')

Other Compression Formats

You can use different compression formats by changing the compression parameter:

df.to_csv('data.csv.bz2', index=False, compression='bz2')
df.to_csv('data.zip', index=False, compression='zip')
df.to_csv('data.csv.xz', index=False, compression='xz')

Reading the Compress CSV File

To read a compressed CSV file back into a Pandas DataFrame, use pd.read_csv() with the compression parameter:

df = pd.read_csv('data.csv.gz', compression='gzip')

Compression Results

By using compression, you can significantly reduce file size. In my tests I'm working with a file which contains the two columns:

https://www.example.com/south,Q6RnAzwGYA
https://www.example.com/mawson,zwGYAZc
https://www.example.com/sea,ZciVimr4
https://www.example.com/moo,4o6PwPjg
https://www.example.com/paul,Vimr4kvJw

You can find the results below:

original file is 4.1 GB
Pandas compression - 689 MB - 1.5 min

It took similar time for Ubuntu default compression which produced the same size - 689 MB.

The advantage of Pandas is that you can exclude some columns and get smaller size after compression.

> Basic concepts

> Installations

> Series

> DataFrame

> Create

> Data Types

> Exercise

> Cheat Sheet

> Basic concepts

> Row

> Column

> Index

> MultiIndex

> Exercise

> Basic concepts

> read_csv()

> read_excel()

> Kaggle

> Exercise

> read_xml()

> read_json()

> to_csv()

> to_dict()

> to_json()

> Basic concepts

> groupby()

> Reshape

> melt()

> Exercise

> Pivot

> merge()

> Filter

> Basic concepts

> replace()

> split()

> Regex

> Search

> Exercise

> Find

> Basic concepts

> apply()

> aggfunc

> Convert

> count()

> Other

> Exercise

> map()

> Basic concepts

> Data Validation

> Data Cleaning

> Duplicate

> Time Series

> Pandas Error

> Get

> Basic concepts

> Styling

> Table

> Display

> DataIsBeautiful

> Beginners

> Data Science Projects

> Newsletter

CSV

JSON

Saving a DataFrame as a Compressed CSV

Example: Saving a DataFrame with gzip Compression

Other Compression Formats

Reading the Compress CSV File

Compression Results