When working with user input, web scraping, or data validation, you often need to verify whether a string represents a valid domain name. Python offers several approaches to accomplish this task, from simple regex patterns to specialized libraries.
In this guide, you'll learn different methods to validate domain names in Python with practical examples.
Here's a quick overview of the solutions:
(1) Using the validators library
import validators
validators.domain('example.com')
(2) Using regex patterns
import re
pattern = r'^(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,}$'
re.match(pattern, 'example.com')
(3) Using DNS lookup with socket
import socket
socket.gethostbyname('example.com')
(4) Using custom validation function
def is_valid_hostname(hostname):
if len(hostname) > 255:
return False
allowed = re.compile(r"(?!-)[A-Z\d-]{1,63}(?<!-)$", re.IGNORECASE)
return all(allowed.match(x) for x in hostname.split("."))
Let's explore each method with detailed examples.
1: Using the validators library (Recommended)
The simplest and most reliable approach is using the validators library, which handles most edge cases automatically:
import validators
# Valid domain
result = validators.domain('example.com')
print(result) # True
# Invalid domain with trailing slash
result = validators.domain('example.com/')
print(result) # ValidationFailure(func=domain, ...)
# Invalid domain
result = validators.domain('not a domain!')
print(result) # ValidationFailure(func=domain, ...)
Installation:
pip install validators
The validators library properly checks domain syntax according to RFC specifications, making it the most robust solution for production use.
2: Using regex for basic validation
For a lightweight solution without external dependencies, you can use regular expressions. This approach validates the syntactic structure of domain names:
import re
def validate_domain_regex(domain):
# Pattern for basic domain validation
pattern = r'^(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,}$'
return bool(re.match(pattern, domain))
# Test cases
print(validate_domain_regex('example.com')) # True
print(validate_domain_regex('sub.example.com')) # True
print(validate_domain_regex('example')) # False
print(validate_domain_regex('example..com')) # False
print(validate_domain_regex('-example.com')) # False
Important considerations:
- This checks syntax only, not whether the domain actually exists
- Does not validate internationalized domain names (IDN)
- May not catch all RFC edge cases
3: Checking domain existence with DNS lookup
To verify that a domain actually exists on the internet, you can perform a DNS lookup:
import socket
def domain_exists(domain):
try:
socket.gethostbyname(domain)
return True
except socket.error:
return False
# Test with real domains
print(domain_exists('google.com')) # True
print(domain_exists('thisisnotarealdomain123456.com')) # False
This method confirms the domain resolves to an IP address, but it requires an internet connection and may be slower than syntax validation.
4: Custom validation function with comprehensive rules
For complete control over validation rules, you can implement a custom function following RFC specifications:
import re
def is_valid_hostname(hostname):
"""
Validate hostname according to RFC 1035
- Maximum length: 255 characters
- Labels separated by dots
- Each label: 1-63 characters
- Labels can contain letters, digits, hyphens
- Labels cannot start or end with hyphen
"""
if len(hostname) > 255:
return False
# Remove trailing dot if present
if hostname[-1] == ".":
hostname = hostname[:-1]
# Check each label
allowed = re.compile(r"(?!-)[A-Z\d-]{1,63}(?<!-)$", re.IGNORECASE)
return all(allowed.match(x) for x in hostname.split("."))
# Test cases
print(is_valid_hostname('example.com')) # True
print(is_valid_hostname('sub.domain.example.com')) # True
print(is_valid_hostname('a' * 256)) # False (too long)
print(is_valid_hostname('-invalid.com')) # False (starts with hyphen)
print(is_valid_hostname('valid-domain.com')) # True
This function validates:
- Total hostname length (max 255 characters)
- Individual label length (1-63 characters)
- Valid characters (alphanumeric and hyphens)
- Proper hyphen placement (not at start or end)
5: Validating URLs vs domain names
When dealing with URLs instead of pure domain names, you need to extract the domain first:
from urllib.parse import urlparse
import validators
def validate_url_domain(url):
"""Extract and validate domain from URL"""
try:
parsed = urlparse(url)
domain = parsed.netloc or parsed.path.split('/')[0]
return validators.domain(domain)
except:
return False
# Test with URLs
print(validate_url_domain('https://www.example.com/path')) # True
print(validate_url_domain('http://example.com')) # True
print(validate_url_domain('www.example.com')) # True
print(validate_url_domain('not-a-url')) # False
6: Checking domain registration with WHOIS
To check if a domain is registered (not just syntactically valid), you can use the python-whois library:
import whois
def is_registered(domain):
"""Check if domain is registered using WHOIS"""
try:
w = whois.whois(domain)
return bool(w.domain_name)
except:
return False
# Check domain registration
print(is_registered('google.com')) # True
print(is_registered('thisisnotarealdomain99999.com')) # False
Installation:
pip install python-whois
Note: WHOIS lookups are slower and may be rate-limited by registrars.
7: Batch validation with error handling
When validating multiple domains, proper error handling is essential:
import validators
def validate_domains_batch(domains):
"""Validate multiple domains and return results"""
results = {}
for domain in domains:
try:
is_valid = validators.domain(domain)
results[domain] = {
'valid': bool(is_valid),
'error': None if is_valid else 'Invalid format'
}
except Exception as e:
results[domain] = {
'valid': False,
'error': str(e)
}
return results
# Test with multiple domains
domains = [
'google.com',
'invalid..domain',
'sub.example.com',
'not a domain',
'example.co.uk'
]
results = validate_domains_batch(domains)
for domain, result in results.items():
status = "✓" if result['valid'] else "✗"
print(f"{status} {domain}: {result}")
Output:
✓ google.com: {'valid': True, 'error': None}
✗ invalid..domain: {'valid': False, 'error': 'Invalid format'}
✓ sub.example.com: {'valid': True, 'error': None}
✗ not a domain: {'valid': False, 'error': 'Invalid format'}
✓ example.co.uk: {'valid': True, 'error': None}
Best practices
When choosing a validation method, consider:
- Use validators library for production - It handles edge cases and follows RFC specifications
- Validate syntax before DNS lookups - Save time and resources by checking format first
- Handle internationalized domains - Use punycode conversion for IDN support
- Add timeout for network operations - DNS and WHOIS lookups can hang
- Cache validation results - Avoid repeated lookups for the same domains
Conclusion
Validating domain names in Python can be accomplished through various methods depending on your requirements. The validators library offers the most comprehensive solution for syntax checking, while DNS lookups confirm actual domain existence. For custom needs, implementing RFC-compliant regex patterns provides full control over validation logic.
Choose the approach that best fits your use case, considering factors like performance, accuracy requirements, and whether you need to verify domain existence or just syntax.