Internal system names, providers, and exact numbers have been abstracted or generalized for confidentiality — the architecture patterns and trade-offs described are accurate.
Context
Brazilian tax authorities published a normative instruction that introduces alphanumeric tax identifiers (CNPJs) — the first letter of the alphabet starts appearing in identifiers that, for the entire history of the platform, had been treated as strictly numeric.
The platform was built on the implicit assumption that these identifiers are digits. That assumption was everywhere:
- API serializers stripping non-digit characters
- Backend validation regexes
- Admin Django widgets and masks
- Frontend input masks
- Check-digit calculation routines
- Search fields stored as
bigint - Document type configs keyed by identifier shape
Ignoring the change wasn't an option — once new identifiers start being issued, any company that registers from that point on would be invisible to the system.
The work was a stack-wide audit followed by a coordinated rollout. Every place that touched the identifier had to keep working for both shapes (legacy numeric and new alphanumeric) without flipping a global switch that could break in-flight operations.
Architecture
Alphanumeric tax-id support
│
┌──────────────────────────┼────────────────────────────┐
▼ ▼ ▼
Backend Admin Frontend
│
- DV validation - Input widget ├─► Form masks
(numeric + alpha) - Search lookup ├─► Paste handlers
- Format check - Display formatting ├─► Validation
- API serializers - Filter logic └─► Capture-phase
- Document type configs coercion
Shared validation helpers
The first piece was a small core utility module: validate_cnpj_dv, validate_cpf_dv, validate_cpf_or_cnpj. Every caller — API views, admin, batch jobs, the lote ingestion — moved onto the same helpers, with full test coverage of the alphanumeric DV math. No more local regex variants per app.
API surface
The API serializers and views started accepting alphanumeric input alongside numeric. The Swagger spec was updated with explicit alphanumeric examples so consumers couldn't claim ignorance. The DV check became part of the API-level validation, so malformed identifiers fail at the boundary rather than reaching downstream services.
Admin
The Django admin had its own mask layer in JavaScript. The fix accepted alphanumeric input on the admin search, on filters, and on free-form input fields — keeping the experience identical for operators who still type numeric identifiers most of the day, but no longer rejecting valid alphanumeric ones.
Frontend masks (the long tail)
The frontend was where most of the iceberg lived. Masks were duplicated across pages — search bars, due diligence forms, asset evaluation, purchasing flows, RTD lookup, lote ingestion. The work had two halves:
- Extract a shared partial. The CPF/CNPJ mask handlers, previously copy-pasted, became a single partial with consistent paste, keydown, and capture-phase behavior.
- Make it alpha-aware end-to-end. When the user starts typing letters, the mask switches modes (instead of stripping them). Paste handling accepts alphanumeric strings without losing characters. Ctrl+A and selection are preserved through the mask logic.
A pile of bugs surfaced during this — letter-in-CPF-field crashes, CNPJ paste in custom due-diligence fields, lookup widgets that silently dropped letters. Each got its own targeted fix on top of the shared partial.
Per-product validation
The change rippled into product code: search for documents by CNPJ, asset evaluation flows that previously bailed on non-digits, lote spreadsheet ingestion that tokenized identifiers. Each of those got the shared validator pulled in, replacing local heuristics.
Trade-offs
Shared helpers, not per-app validation. Moving every app onto the same helpers was upfront work — auditing call sites, removing local copies. The benefit is one place to fix the next regulatory tweak, and a guarantee that the DV math matches across the platform. The cost was a few weeks of small, scattered refactor PRs.
Coexistence, not big-bang migration. The platform accepts both numeric and alphanumeric simultaneously. That meant every code path had to handle two shapes — more branches, more tests. The alternative — a flag day where the platform flips to alphanumeric-only — was a non-starter: tax authorities were issuing both shapes in parallel.
Capture-phase mask switching, not blur-time validation. Catching the letter-typed event at the capture phase is more fragile than validating on form submit (different browsers, IME edge cases). The benefit is that the input field never enters a broken intermediate state — the moment the user types a letter, the mask is already alpha-aware. The submit-time path is still there as the last line of defense.
Front + back validation, not back-only. Duplicating the validator in JavaScript means the rules can drift. The benefit was eliminating an entire class of round-trip errors during ingestion — operators caught typos in the form instead of in a webhook callback two hours later.
Outcome
- Compliance. The platform accepts identifiers in the new alphanumeric format across every entry point.
- Consolidated validation. One pair of shared utilities (
*_dv,*_or_*) backs API, admin, batch ingestion, and frontend. Fixing the next regulatory tweak is one PR, not twenty. - Frontend mask consolidation. A historical drift across pages collapsed into a single partial that every form imports.
- Defense in depth. Identifiers get validated at the form, at the API boundary, and in the persistence layer — typos surface at the closest moment to the user, not in downstream services.