What Is an Address Validation Service?
An address validation service takes raw user-entered address text, standardizes it to a canonical form, verifies it against authoritative postal data, enriches it with geocoordinates, and assigns a deliverability score. It is a critical component in e-commerce checkout, shipping systems, and CRM data hygiene pipelines. At low level it must handle abbreviation normalization, external API circuit breaking, fuzzy correction suggestions, and aggressive caching since validated addresses rarely change.
Validation Pipeline
Each raw address flows through five stages in sequence:
- Parse: tokenize the raw input into components — street number, street name, unit, city, state, zip, country.
- Standardize: normalize abbreviations (St → Street, Ave → Avenue, Apt → Apartment), uppercase state codes, zero-pad zip codes.
- Verify: call external postal verification API (SmartyStreets or USPS) to confirm the address exists in the postal authority database.
- Geocode: resolve the verified address to a latitude/longitude coordinate pair for proximity queries.
- Score deliverability: classify as residential, commercial, vacant, or active delivery point based on API response flags.
Standardization Rules
Street type abbreviations follow USPS Publication 28 standards. Common mappings:
- St → Street, Ave → Avenue, Blvd → Boulevard, Dr → Drive, Ln → Lane, Ct → Court, Rd → Road
- N/S/E/W directionals are expanded: N → North, S → South, E → East, W → West
- Unit designators: Apt → Apartment, Ste → Suite, Fl → Floor
Normalization is applied before the external API call to improve cache hit rates — two inputs that differ only in abbreviation style should map to the same cache key.
Schema
CREATE TABLE Address (
id BIGSERIAL PRIMARY KEY,
raw_input TEXT NOT NULL,
line1 VARCHAR(255),
line2 VARCHAR(128),
city VARCHAR(128),
state VARCHAR(64),
zip VARCHAR(16),
country VARCHAR(8) NOT NULL DEFAULT 'US',
lat NUMERIC(10, 7),
lng NUMERIC(10, 7),
deliverability VARCHAR(32), -- 'deliverable', 'undeliverable', 'vacant', 'unknown'
validated_at TIMESTAMPTZ
);
CREATE TABLE AddressValidationCache (
normalized_key VARCHAR(512) PRIMARY KEY,
result JSONB NOT NULL,
cached_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_address_geo ON Address USING GIST (
ll_to_earth(lat, lng)
);
External API Integration with Circuit Breaker
The SmartyStreets or USPS API is a synchronous dependency. Network failures must not cascade into checkout failures. A circuit breaker wraps the API call:
- Closed: normal operation, requests pass through.
- Open: after N consecutive failures within a window, the breaker opens and all calls fast-fail with a cached or degraded response.
- Half-open: after a timeout, one probe request is allowed; if it succeeds, the breaker closes.
In the open state, the service returns the standardized address without external verification, flagging deliverability as unknown. This is preferable to blocking checkout entirely.
Python Implementation
import hashlib
import json
import requests
from Levenshtein import distance as levenshtein_distance
from db import get_db
from circuit_breaker import CircuitBreaker
ABBREVIATION_MAP = {
'St': 'Street', 'Ave': 'Avenue', 'Blvd': 'Boulevard',
'Dr': 'Drive', 'Ln': 'Lane', 'Ct': 'Court', 'Rd': 'Road',
'N': 'North', 'S': 'South', 'E': 'East', 'W': 'West',
'Apt': 'Apartment', 'Ste': 'Suite', 'Fl': 'Floor'
}
smarty_breaker = CircuitBreaker(failure_threshold=5, recovery_timeout=30)
geocode_breaker = CircuitBreaker(failure_threshold=3, recovery_timeout=60)
CACHE_TTL_DAYS = 30
def standardize_address(parsed: dict) -> dict:
"""Expand abbreviations in street name and unit designator."""
result = dict(parsed)
tokens = result.get('street_name', '').split()
expanded = [ABBREVIATION_MAP.get(t, t) for t in tokens]
result['street_name'] = ' '.join(expanded)
if result.get('unit_designator') in ABBREVIATION_MAP:
result['unit_designator'] = ABBREVIATION_MAP[result['unit_designator']]
result['state'] = result.get('state', '').upper()
result['zip'] = result.get('zip', '').zfill(5)
return result
def _cache_key(standardized: dict) -> str:
canonical = json.dumps(standardized, sort_keys=True)
return hashlib.sha256(canonical.encode()).hexdigest()
def _get_cached(key: str) -> dict | None:
db = get_db()
row = db.execute("""
SELECT result FROM AddressValidationCache
WHERE normalized_key = %s
AND cached_at > NOW() - INTERVAL '%s days'
""", (key, CACHE_TTL_DAYS)).fetchone()
return row['result'] if row else None
def _set_cached(key: str, result: dict):
db = get_db()
db.execute("""
INSERT INTO AddressValidationCache (normalized_key, result, cached_at)
VALUES (%s, %s, NOW())
ON CONFLICT (normalized_key) DO UPDATE
SET result = EXCLUDED.result, cached_at = NOW()
""", (key, json.dumps(result)))
db.commit()
def geocode_address(standardized: dict) -> tuple[float | None, float | None]:
"""Resolve address to lat/lng via Google Maps Geocoding API."""
address_str = "{line1}, {city}, {state} {zip}".format(**standardized)
with geocode_breaker:
resp = requests.get(
'https://maps.googleapis.com/maps/api/geocode/json',
params={'address': address_str, 'key': 'GOOGLE_API_KEY'},
timeout=3
)
data = resp.json()
if data['status'] == 'OK':
loc = data['results'][0]['geometry']['location']
return loc['lat'], loc['lng']
return None, None
def validate_address(raw_input: str) -> dict:
"""Full validation pipeline: parse, standardize, verify, geocode, score."""
parsed = parse_address(raw_input) # assume external parser (usaddress lib)
standardized = standardize_address(parsed)
cache_key = _cache_key(standardized)
cached = _get_cached(cache_key)
if cached:
return cached
result = {
'line1': standardized.get('address_number', '') + ' ' + standardized.get('street_name', ''),
'line2': standardized.get('unit', ''),
'city': standardized.get('city', ''),
'state': standardized.get('state', ''),
'zip': standardized.get('zip', ''),
'country': 'US',
'deliverability': 'unknown',
'lat': None,
'lng': None
}
with smarty_breaker:
ss_resp = requests.post(
'https://us-street.api.smartystreets.com/street-address',
json=[{'street': result['line1'], 'city': result['city'],
'state': result['state'], 'zipcode': result['zip']}],
params={'auth-id': 'SS_AUTH_ID', 'auth-token': 'SS_AUTH_TOKEN'},
timeout=3
)
candidates = ss_resp.json()
if candidates:
c = candidates[0]
result['deliverability'] = c['analysis'].get('dpv_match_code', 'unknown')
result['line1'] = c['delivery_line_1']
result['zip'] = c['components']['zipcode']
lat, lng = geocode_address(standardized)
result['lat'] = lat
result['lng'] = lng
_set_cached(cache_key, result)
db = get_db()
db.execute("""
INSERT INTO Address (raw_input, line1, line2, city, state, zip, country, lat, lng, deliverability, validated_at)
VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, NOW())
""", (raw_input, result['line1'], result['line2'], result['city'],
result['state'], result['zip'], result['country'],
result['lat'], result['lng'], result['deliverability']))
db.commit()
return result
def fuzzy_suggest(raw_input: str, known_addresses: list[str], threshold: int = 5) -> list[str]:
"""Return known addresses within Levenshtein distance threshold of raw_input."""
return [
addr for addr in known_addresses
if levenshtein_distance(raw_input.lower(), addr.lower()) <= threshold
]
Geocoding and Proximity Queries
Latitude and longitude are stored using PostgreSQL's earthdistance extension for efficient proximity queries. A query for all addresses within 10 km of a point runs as:
SELECT * FROM Address
WHERE earth_box(ll_to_earth(37.7749, -122.4194), 10000) @> ll_to_earth(lat, lng)
AND earth_distance(ll_to_earth(37.7749, -122.4194), ll_to_earth(lat, lng)) < 10000;
Caching Strategy
Validated addresses are cached for 30 days keyed by the SHA-256 hash of the standardized address JSON. This is long relative to most cache TTLs because addresses change infrequently — streets are renamed or renumbered rarely. The cache key uses the standardized form so pre-normalization typos that resolve to the same address hit the same cache entry. Cache entries are invalidated proactively only when the postal authority issues bulk updates, which can be ingested via USPS monthly address change files.
See also: Shopify Interview Guide
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering