In this post, we will learn how to convert an API response to a Pandas DataFrame using the Python requests module. First we will read the API response to a data structure as:
- CSV
- JSON
- XML
- list of dictionaries
and then we use the:
pd.DataFrame
constructorpd.DataFrame.from_dict(data)
etc
to create a DataFrame from that data structure.
Or simply use df=pd.read_json(url)
to convert the API to Pandas DataFrame.
The image below shows the steps from API to DataFrame:
Here's simple example using a JSON weather API data:
import requests
import pandas as pd
url = "https://archive-api.open-meteo.com/v1/era5?latitude=52.52" +\
"&longitude=13.41&start_date=2021-01-01&end_date=2021-12-31&hourly=temperature_2m"
response = requests.get(url)
data = response.json()
pd.DataFrame(data)
This will return the API response as Pandas DataFrame(some column are truncated for readiness):
latitude | longitude | generationtime_ms | utc_offset_seconds | timezone | |
---|---|---|---|---|---|
time | 52.5 | 13.400009 | 0.561953 | 0 | GMT |
temperature_2m | 52.5 | 13.400009 | 0.561953 | 0 | GMT |
Next we will cover the process step by step.
Select suitable API
There are hundreds of free available APIs. Below you can find several curated lists to experiment with:
- A collective list of free APIs for use in software and web development - github collection
- What are your favorite free public API Free ones - reddit post
Choose one of them and test conversion from API to Pandas DataFrame with Python
Read API in Python
To read API requests in Python we will use the requests
library. To make a GET request to an API endpoint and retrieve the response we do:
import requests
url = "https://www.reddit.com/r/Python/comments/21q40a/why_is_pandas_so_hard/.json"
response = requests.get(url)
if response.status_code == 200:
data = response.json()
print(data)
else:
print("Request failed with status code:", response.status_code)
The requests.get
method sends a GET request to the API at the target url. The response is stored in the response variable.
We check the status code of the response by response.status_code
attribute. If the status code is 200, the response is considered as successful.
The data is converted to a JSON object using response.json()
.
If you face error like Too Many Requests - 429
check this section
Convert API Response to DataFrame
At this step we will convert the API response to Pandas DataFrame. The conversion depends on the API response type and structure.
Let's cover the case for this URL: Sample weather API response
which contains:
{
"latitude": 52.5,
"longitude": 13.400009,
"generationtime_ms": 0.4019737243652344,
"utc_offset_seconds": 0,
"timezone": "GMT",
"timezone_abbreviation": "GMT",
"elevation": 38,
"hourly_units": {
"time": "iso8601",
"temperature_2m": "°C"
},
"hourly": {
"time": [
"2021-01-01T00:00",
"2021-01-01T01:00",
"2021-01-01T02:00",
"2021-01-01T03:00",
We can convert this JSON to DataFrame by:
response = requests.get(url)
data = response.json()
df = pd.DataFrame(data)
So we will have a DataFrame of two rows and multiple columns.
latitude | longitude | generationtime_ms | utc_offset_seconds | timezone | |
---|---|---|---|---|---|
time | 52.5 | 13.400009 | 0.561953 | 0 | GMT |
temperature_2m | 52.5 | 13.400009 | 0.561953 | 0 | GMT |
Expand nested data and plot
What if you need to expand nested API data and plot it to a time series plot. As we saw there is hourly data for the temperature which is nested(you can also check the image at the start).
To expand nested data with Pandas we can use method pd.Series
- this will convert nested data to Pandas Series:
pd.DataFrame(data)['hourly'].apply(pd.Series)
Expanded data will be accessible as new DataFrame with multiple columns:
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
time | 2021-01-01T00:00 | 2021-01-01T01:00 | 2021-01-01T02:00 | 2021-01-01T03:00 | 2021-01-01T04:00 |
temperature_2m | 1.1 | 0.9 | 0.7 | 0.6 | 0.6 |
On this page you can find more examples on expanding JSON data: expand nested JSON or Dict in Pandas
Finally we can plot data with:
pd.DataFrame(data)['hourly'].apply(pd.Series).T.set_index('time').plot()
Too Many Requests - 429
If we get error like:
{'message': 'Too Many Requests', 'error': 429}
Then we can add headers to the request in order to solve it. The simplest solution is by adding user agent:
import requests
import pandas as pd
url = "https://www.reddit.com/r/Python/comments/21q40a/why_is_pandas_so_hard/.json"
headers= {'User-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.3'}
response = requests.get(url, headers = headers)
data = response.json()
data
Not using headers will return error code 429 - Too Many Requests. Using headers we are able to read the data.
Do you know that adding .json to the end of URL - converts Reddit posts into API?
Easily read API with read_json
We can also use the method read_json
and pass API URL to it. So just with one line of code: df=pd.read_json(url)
Pandas can easily read and convert most API-s to Pandas DataFrame.
import pandas as pd
url = "https://archive-api.open-meteo.com/v1/era5?latitude=52.52" +\
"&longitude=13.41&start_date=2021-01-01&end_date=2021-12-31&hourly=temperature_2m"
df=pd.read_json(url)
df.head()
The result is the same as before.
We can read more about read_json
parameters on this link: pandas.read_json. The most important ones are:
path_or_buf
- a valid JSON str, path object or file-like objectorient
- Indication of expected JSON string format- split
- records
- values
You can find many useful examples on this link: How to Read JSON Files in Pandas
Conclusion
We covered the most basic ways to read and convert API strings and streams to a Pandas DataFrame.