How to Validate Domain Name in Pandas and Python

When working with user input, web scraping, or data validation, you often need to verify whether a string represents a valid domain name. Python offers several approaches to accomplish this task, from simple regex patterns to specialized libraries.

In this guide, you'll learn different methods to validate domain names in Python with practical examples.

Here's a quick overview of the solutions:

(1) Using the validators library

import validators
validators.domain('example.com')

(2) Using regex patterns

import re
pattern = r'^(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,}$'
re.match(pattern, 'example.com')

(3) Using DNS lookup with socket

import socket
socket.gethostbyname('example.com')

(4) Using custom validation function

def is_valid_hostname(hostname):
    if len(hostname) > 255:
        return False
    allowed = re.compile(r"(?!-)[A-Z\d-]{1,63}(?<!-)$", re.IGNORECASE)
    return all(allowed.match(x) for x in hostname.split("."))

Let's explore each method with detailed examples.

1: Using the validators library (Recommended)

The simplest and most reliable approach is using the validators library, which handles most edge cases automatically:

import validators

# Valid domain
result = validators.domain('example.com')
print(result)  # True

# Invalid domain with trailing slash
result = validators.domain('example.com/')
print(result)  # ValidationFailure(func=domain, ...)

# Invalid domain
result = validators.domain('not a domain!')
print(result)  # ValidationFailure(func=domain, ...)

Installation:

pip install validators

The validators library properly checks domain syntax according to RFC specifications, making it the most robust solution for production use.

2: Using regex for basic validation

For a lightweight solution without external dependencies, you can use regular expressions. This approach validates the syntactic structure of domain names:

import re

def validate_domain_regex(domain):
    # Pattern for basic domain validation
    pattern = r'^(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,}$'
    return bool(re.match(pattern, domain))

# Test cases
print(validate_domain_regex('example.com'))        # True
print(validate_domain_regex('sub.example.com'))    # True
print(validate_domain_regex('example'))            # False
print(validate_domain_regex('example..com'))       # False
print(validate_domain_regex('-example.com'))       # False

Important considerations:

This checks syntax only, not whether the domain actually exists
Does not validate internationalized domain names (IDN)
May not catch all RFC edge cases

3: Checking domain existence with DNS lookup

To verify that a domain actually exists on the internet, you can perform a DNS lookup:

import socket

def domain_exists(domain):
    try:
        socket.gethostbyname(domain)
        return True
    except socket.error:
        return False

# Test with real domains
print(domain_exists('google.com'))           # True
print(domain_exists('thisisnotarealdomain123456.com'))  # False

This method confirms the domain resolves to an IP address, but it requires an internet connection and may be slower than syntax validation.

4: Custom validation function with comprehensive rules

For complete control over validation rules, you can implement a custom function following RFC specifications:

import re

def is_valid_hostname(hostname):
    """
    Validate hostname according to RFC 1035
    - Maximum length: 255 characters
    - Labels separated by dots
    - Each label: 1-63 characters
    - Labels can contain letters, digits, hyphens
    - Labels cannot start or end with hyphen
    """
    if len(hostname) > 255:
        return False
    
    # Remove trailing dot if present
    if hostname[-1] == ".":
        hostname = hostname[:-1]
    
    # Check each label
    allowed = re.compile(r"(?!-)[A-Z\d-]{1,63}(?<!-)$", re.IGNORECASE)
    return all(allowed.match(x) for x in hostname.split("."))

# Test cases
print(is_valid_hostname('example.com'))           # True
print(is_valid_hostname('sub.domain.example.com')) # True
print(is_valid_hostname('a' * 256))               # False (too long)
print(is_valid_hostname('-invalid.com'))          # False (starts with hyphen)
print(is_valid_hostname('valid-domain.com'))      # True

This function validates:

Total hostname length (max 255 characters)
Individual label length (1-63 characters)
Valid characters (alphanumeric and hyphens)
Proper hyphen placement (not at start or end)

5: Validating URLs vs domain names

When dealing with URLs instead of pure domain names, you need to extract the domain first:

from urllib.parse import urlparse
import validators

def validate_url_domain(url):
    """Extract and validate domain from URL"""
    try:
        parsed = urlparse(url)
        domain = parsed.netloc or parsed.path.split('/')[0]
        return validators.domain(domain)
    except:
        return False

# Test with URLs
print(validate_url_domain('https://www.example.com/path'))  # True
print(validate_url_domain('http://example.com'))            # True
print(validate_url_domain('www.example.com'))               # True
print(validate_url_domain('not-a-url'))                     # False

6: Checking domain registration with WHOIS

To check if a domain is registered (not just syntactically valid), you can use the python-whois library:

import whois

def is_registered(domain):
    """Check if domain is registered using WHOIS"""
    try:
        w = whois.whois(domain)
        return bool(w.domain_name)
    except:
        return False

# Check domain registration
print(is_registered('google.com'))      # True
print(is_registered('thisisnotarealdomain99999.com'))  # False

Installation:

pip install python-whois

Note: WHOIS lookups are slower and may be rate-limited by registrars.

7: Batch validation with error handling

When validating multiple domains, proper error handling is essential:

import validators

def validate_domains_batch(domains):
    """Validate multiple domains and return results"""
    results = {}
    
    for domain in domains:
        try:
            is_valid = validators.domain(domain)
            results[domain] = {
                'valid': bool(is_valid),
                'error': None if is_valid else 'Invalid format'
            }
        except Exception as e:
            results[domain] = {
                'valid': False,
                'error': str(e)
            }
    
    return results

# Test with multiple domains
domains = [
    'google.com',
    'invalid..domain',
    'sub.example.com',
    'not a domain',
    'example.co.uk'
]

results = validate_domains_batch(domains)
for domain, result in results.items():
    status = "✓" if result['valid'] else "✗"
    print(f"{status} {domain}: {result}")

Output:

✓ google.com: {'valid': True, 'error': None}
✗ invalid..domain: {'valid': False, 'error': 'Invalid format'}
✓ sub.example.com: {'valid': True, 'error': None}
✗ not a domain: {'valid': False, 'error': 'Invalid format'}
✓ example.co.uk: {'valid': True, 'error': None}

Best practices

When choosing a validation method, consider:

Use validators library for production - It handles edge cases and follows RFC specifications
Validate syntax before DNS lookups - Save time and resources by checking format first
Handle internationalized domains - Use punycode conversion for IDN support
Add timeout for network operations - DNS and WHOIS lookups can hang
Cache validation results - Avoid repeated lookups for the same domains

Conclusion

Validating domain names in Python can be accomplished through various methods depending on your requirements. The validators library offers the most comprehensive solution for syntax checking, while DNS lookups confirm actual domain existence. For custom needs, implementing RFC-compliant regex patterns provides full control over validation logic.

Choose the approach that best fits your use case, considering factors like performance, accuracy requirements, and whether you need to verify domain existence or just syntax.

> Basic concepts

> Installations

> Series

> DataFrame

> Create

> Data Types

> Exercise

> Cheat Sheet

> Basic concepts

> Row

> Column

> Index

> MultiIndex

> Exercise

> Basic concepts

> read_csv()

> read_excel()

> Kaggle

> Exercise

> read_xml()

> read_json()

> to_csv()

> to_dict()

> to_json()

> Basic concepts

> groupby()

> Reshape

> melt()

> Exercise

> Pivot

> merge()

> Filter

> Basic concepts

> replace()

> split()

> Regex

> Search

> Exercise

> Find

> Basic concepts

> apply()

> aggfunc

> Convert

> count()

> Other

> Exercise

> map()

> Basic concepts

> Data Validation

> Data Cleaning

> Duplicate

> Time Series

> Pandas Error

> Get

> Basic concepts

> Styling

> Table

> Display

> DataIsBeautiful

> Beginners

> Data Science Projects

> Newsletter

1: Using the validators library (Recommended)

2: Using regex for basic validation

3: Checking domain existence with DNS lookup

4: Custom validation function with comprehensive rules

5: Validating URLs vs domain names

6: Checking domain registration with WHOIS

7: Batch validation with error handling

Best practices

Conclusion

Resources