In this post you can find how to solve Pandas and Python error:
HTTPError: HTTP Error 403: Forbidden
HTTPError: HTTP Error 403: Forbidden
This error happens when we try to scrape tables with Pandas by using read_html
method. For example:
import pandas as pd
url_cur = 'https://tradingeconomics.com/currencies'
pd.read_html(url_cur)[0]
This results into error - HTTPError: HTTP Error 403: Forbidden.
To solve this error we can simulate browser and user agent in Pandas by passing headers.
Solution
Below you can find how to fix the error:
import requests
import pandas as pd
url_cur = 'https://tradingeconomics.com/currencies'
header = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36",
"X-Requested-With": "XMLHttpRequest"
}
r = requests.get(url_cur, headers=header)
pd.read_html(r.text)[0]
result:
Unnamed: 0 | Major | Price | Day | % | Weekly | Monthly | YoY | Date | |
---|---|---|---|---|---|---|---|---|---|
0 | NaN | EURUSD | 1.08909 | 0.0052 | 0.48% | 0.88% | 1.99% | -0.72% | Apr/03 |
1 | NaN | GBPUSD | 1.24050 | 0.0072 | 0.58% | 0.99% | 3.19% | -5.40% | Apr/03 |
2 | NaN | AUDUSD | 0.67726 | 0.0088 | 1.31% | 1.86% | 0.68% | -10.20% | Apr/03 |
3 | NaN | NZDUSD | 0.62818 | 0.0025 | 0.40% | 1.42% | 1.42% | -9.61% | Apr/03 |
4 | NaN | USDJPY | 132.53900 | 0.2510 | -0.19% | 0.74% | -2.48% | 7.95% | Apr/03 |
5 | NaN | USDCNY | 6.88090 | 0.0068 | 0.10% | -0.01% | -0.99% | 7.97% | Apr/03 |
6 | NaN | USDCHF | 0.91320 | 0.0016 | -0.17% | -0.27% | -1.88% | -1.41% | Apr/03 |
We solve the error by:
- using requests module
- download the page by using headers
- parse downloaded data with Pandas
Pandas authorization by user and password
Sometimes you may need to log by using user and password. The example below shows how to use requests
library to perform such request:
import requests
import pandas as pd
url = 'https://example.com'
username = 'your_username'
password = 'your_password'
response = requests.get(url, auth=(username, password))
if response.status_code == 200:
df = pd.read_html(response.content)[0]
print(df.head())
else:
print(f'Request failed')