In this post, we'll take a brief look at the Kaggle Datasets and how to download/import them with Python. By the end, we'll see how to list, download single or multiple datasets and finally how to read them into Pandas DataFrame.

Step 1: Create Kaggle API token

First you will need to visit: Kaggle and create a new account. You can sign up with your google account.

In order to create new Kaggle API token follow:

  • Open your profile picture(top right)
  • Account - the url is: https://www.kaggle.com/<username>/account
  • API
  • Create new API Token
  • This will generate kaggle.json file
  • Place the file in your home folder as: ~/.kaggle/kaggle.json
  • For more security (optional) - chmod 600 ~/.kaggle/kaggle.json

More info is available on this link: Kaggle API

Step 2: Install Python's package for Kaggle

Next we are going to install the package which is going to download the datasets from Kaggle. You can install kaggle package in virtual environment by:

pip install kaggle

or for the user:

pip install --user kaggle

Step 3: Download single file from Kaggle dataset

Now we are going to demonstrate how to download a single CSV file from the Kaggle dataset. This will work only if previous steps were done successfully:

import kaggle

kaggle.api.authenticate()
kaggle.api.dataset_download_file('dorianlazar/medium-articles-dataset', file_name='medium_data.csv',  path='data/')

In the example above we are going to download file: medium_data.csv from: dorianlazar/medium-articles-dataset.

The file will be downloaded in the folder data/.

The file can be read by:

import pandas as pd
pd.read_csv('data/medium_data.csv.zip')

which produce:

search-and-download-kaggle-dataset-to-pandas-dataframe

Step 4: Download multiple files from Kaggle dataset

If we like to get all files from a Kaggle dataset then we can get them by:

import kaggle

kaggle.api.authenticate()

kaggle.api.dataset_download_files('dorianlazar/medium-articles-dataset', path='data/')

Note that this might be pretty slow for big datasets. We are downloading all files from the dataset mentioned above.

Step 5: List and search Kaggle datasets with API

Finally let's find how to list and search for Kaggle datasets. This can be done by next command:

!kaggle datasets list -s article

Where we are searching for the keyword - article. The output is:

ref title size lastUpdated downloadCount voteCount usabilityRating
----------------------------------------------------------- ----------------------------------------------- ----- ------------------- ------------- --------- ---------------
dorianlazar/medium-articles-dataset Medium articles dataset 1GB 2020-06-30 14:13:56 1804 103 0.9411765
hsankesara/medium-articles Medium Articles 1MB 2018-06-17 08:45:49 1983 72 0.88235295
gspmoreira/articles-sharing-reading-from-cit-deskdrop Articles sharing and reading from CI&T DeskDrop 8MB 2017-08-27 21:33:01 10062 135 0.8235294
jkkphys/english-wikipedia-articles-20170820-sqlite English Wikipedia Articles 2017-08-20 SQLite 7GB 2018-11-27 21:54:22 1417 84 0.875
asad1m9a9h6mood/news-articles News Articles 2MB 2017-04-30 11:02:29 2731 31 0.8235294
residentmario/wikipedia-article-titles Wikipedia Article Titles 73MB 2017-09-22 16:42:20 726 26 0.75
abhishek/10k-german-news-articles 10k German News Articles 123MB 2019-11-07 08:50:32 552 89 0.8235294
yufengdev/bbc-fulltext-and-category BBC articles fulltext and category 2MB 2018-06-08 05:44:22 3799 35 0.64705884
danofer/dbpedia-classes DBPedia Classes 166MB 2019-07-04 11:30:52 979 26 1.0
vetrirah/janatahack-independence-day-2020-ml-hackathon NLP on Research Articles 11MB 2020-08-19 14:35:13 309 24 1.0
jkkphys/english-wikipedia-articles-20170820-models English Wikipedia Articles 2017-08-20 Models 925MB 2018-11-28 17:09:32 379 19 0.8125
blessondensil294/topic-modeling-for-research-articles Topic Modeling for Research Articles 11MB 2020-08-18 08:53:26 321 21 1.0
szymonjanowski/internet-articles-data-with-users-engagement Internet news data with readers engagement 3MB 2020-11-21 17:09:57 4069 330 0.9411765
maxscheijen/dutch-news-articles Dutch News Articles 135MB 2021-05-24 08:01:12 104 13 1.0
aiswaryaramachandran/medium-articles-with-content Medium Articles (with Content) 218MB 2018-11-10 18:17:46 569 29 0.7352941
jeet2016/us-financial-news-articles US Financial News Articles 1GB 2018-09-05 01:27:43 2530 54 0.625
urbanbricks/wikipedia-promotional-articles Wikipedia Promotional Articles 201MB 2019-10-27 16:31:06 276 15 1.0
hkapoor/indian-financial-news-articles-20032020 Indian financial news articles (2003-2020) 3MB 2020-05-26 20:41:29 233 25 1.0
naharrison/particle-identification-from-detector-responses Particle Identification from Detector Responses 83MB 2018-10-24 21:14:33 298 28 0.7058824
zshujon/40k-bangla-newspaper-article 40k Bangla Newspaper Article 64MB 2018-09-22 09:54:40 276 10 0.5625