If you're seeing the warning:
FutureWarning: Passing literal html to 'read_html' is deprecated and will be removed in a future version
this means you're using pandas' read_html()
function in a way that will soon be unsupported.
Why the Change?
The pandas development team is deprecating the direct passing of HTML strings to read_html()
. This change promotes better practices and prepares for future improvements to the function.
How to Fix It
Instead of:
pd.read_html("<table>...</table>")
which will raise warning and in future error:
FutureWarning: Passing literal html to 'read_html' is deprecated and will be removed in a future version
You should now use:
from io import StringIO
pd.read_html(StringIO("<table>...</table>"))
Or for HTML files:
pd.read_html("path/to/file.html") # This is still valid
Use StringIO for HTML strings
import pandas as pd
from io import StringIO
html_data = """
<table>
<tr><th>name</th><th>age</th></tr>
<tr><td>Alice</td><td>25</td></tr>
<tr><td>Bob</td><td>30</td></tr>
</table>
"""
df = pd.read_html(StringIO(html_data))[0]
print(df)
result:
name age
0 Alice 25
1 Bob 30
Use requests for web content
import requests
import pandas as pd
response = requests.get('https://example.com/data.html')
df = pd.read_html(StringIO(response.text))
Why This Matters
Making this change now will:
- Future-proof your code
- Remove annoying warning messages
- Ensure compatibility with upcoming pandas versions
Security concerns top the list, as accepting arbitrary HTML strings can potentially expose applications to security vulnerabilities. By requiring explicit file paths or URLs, pandas encourages safer data handling practices.
API clarity is another driving factor. Having a single function that accepts multiple input types can lead to confusion about expected behavior and error handling. Separating these concerns makes the API more predictable and easier to maintain.