In this post, we will learn how to convert an API response to a Pandas DataFrame using the Python requests module. First we will read the API response to a data structure as:

  • CSV
  • JSON
  • XML
  • list of dictionaries

and then we use the:

  • pd.DataFrame constructor
  • pd.DataFrame.from_dict(data) etc

to create a DataFrame from that data structure.

Or simply use df=pd.read_json(url) to convert the API to Pandas DataFrame.

The image below shows the steps from API to DataFrame:

Here's simple example using a JSON weather API data:

import requests
import pandas as pd

url = "https://archive-api.open-meteo.com/v1/era5?latitude=52.52" +\
      "&longitude=13.41&start_date=2021-01-01&end_date=2021-12-31&hourly=temperature_2m"

response = requests.get(url)
data = response.json()
pd.DataFrame(data)

This will return the API response as Pandas DataFrame(some column are truncated for readiness):

latitude longitude generationtime_ms utc_offset_seconds timezone
time 52.5 13.400009 0.561953 0 GMT
temperature_2m 52.5 13.400009 0.561953 0 GMT

Next we will cover the process step by step.

Select suitable API

There are hundreds of free available APIs. Below you can find several curated lists to experiment with:

Choose one of them and test conversion from API to Pandas DataFrame with Python

Read API in Python

To read API requests in Python we will use the requests library. To make a GET request to an API endpoint and retrieve the response we do:

import requests

url = "https://www.reddit.com/r/Python/comments/21q40a/why_is_pandas_so_hard/.json"

response = requests.get(url)

if response.status_code == 200:
    data = response.json()
    print(data)
else:
    print("Request failed with status code:", response.status_code)

The requests.get method sends a GET request to the API at the target url. The response is stored in the response variable.

We check the status code of the response by response.status_code attribute. If the status code is 200, the response is considered as successful.

The data is converted to a JSON object using response.json().

If you face error like Too Many Requests - 429 check this section

Convert API Response to DataFrame

At this step we will convert the API response to Pandas DataFrame. The conversion depends on the API response type and structure.

Let's cover the case for this URL: Sample weather API response

which contains:

{
  "latitude": 52.5,
  "longitude": 13.400009,
  "generationtime_ms": 0.4019737243652344,
  "utc_offset_seconds": 0,
  "timezone": "GMT",
  "timezone_abbreviation": "GMT",
  "elevation": 38,
  "hourly_units": {
    "time": "iso8601",
    "temperature_2m": "°C"
  },
  "hourly": {
    "time": [
      "2021-01-01T00:00",
      "2021-01-01T01:00",
      "2021-01-01T02:00",
      "2021-01-01T03:00",

We can convert this JSON to DataFrame by:

response = requests.get(url)
data = response.json()
df = pd.DataFrame(data)

So we will have a DataFrame of two rows and multiple columns.

latitude longitude generationtime_ms utc_offset_seconds timezone
time 52.5 13.400009 0.561953 0 GMT
temperature_2m 52.5 13.400009 0.561953 0 GMT

Expand nested data and plot

What if you need to expand nested API data and plot it to a time series plot. As we saw there is hourly data for the temperature which is nested(you can also check the image at the start).

To expand nested data with Pandas we can use method pd.Series - this will convert nested data to Pandas Series:

pd.DataFrame(data)['hourly'].apply(pd.Series)

Expanded data will be accessible as new DataFrame with multiple columns:

0 1 2 3 4
time 2021-01-01T00:00 2021-01-01T01:00 2021-01-01T02:00 2021-01-01T03:00 2021-01-01T04:00
temperature_2m 1.1 0.9 0.7 0.6 0.6

On this page you can find more examples on expanding JSON data: expand nested JSON or Dict in Pandas

Finally we can plot data with:

pd.DataFrame(data)['hourly'].apply(pd.Series).T.set_index('time').plot()

Too Many Requests - 429

If we get error like:

{'message': 'Too Many Requests', 'error': 429}

Then we can add headers to the request in order to solve it. The simplest solution is by adding user agent:

import requests
import pandas as pd

url = "https://www.reddit.com/r/Python/comments/21q40a/why_is_pandas_so_hard/.json"

headers= {'User-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.3'}

response = requests.get(url, headers = headers)
data = response.json()

data

Not using headers will return error code 429 - Too Many Requests. Using headers we are able to read the data.

Note:

Do you know that adding .json to the end of URL - converts Reddit posts into API?

Easily read API with read_json

We can also use the method read_json and pass API URL to it. So just with one line of code: df=pd.read_json(url) Pandas can easily read and convert most API-s to Pandas DataFrame.

import pandas as pd

url = "https://archive-api.open-meteo.com/v1/era5?latitude=52.52" +\
  	"&longitude=13.41&start_date=2021-01-01&end_date=2021-12-31&hourly=temperature_2m"

df=pd.read_json(url)
df.head()

The result is the same as before.

We can read more about read_json parameters on this link: pandas.read_json. The most important ones are:

  • path_or_buf - a valid JSON str, path object or file-like object
  • orient - Indication of expected JSON string format
    • split
    • records
    • values

You can find many useful examples on this link: How to Read JSON Files in Pandas

Conclusion

We covered the most basic ways to read and convert API strings and streams to a Pandas DataFrame.