Series navigation
- Part 1: Deploy Recursor + Authoritative + PowerAdmin
- Part 2 (this post): Operations and troubleshooting
- Part 3: AXFR workflows and the BIND backend
What to validate after rollout
A redundant DNS setup is only useful if you can prove failure behavior. This section is intentionally practical and test-driven.
1) Validate Recursor redundancy (client perspective)
If clients are configured with both dns01 and dns02 via DHCP:
- Stop the Recursor container on dns01:
docker stop dns_recursor
- Repeat typical queries from a client and confirm lookups still work via dns02.
2) Validate Authoritative redundancy (internal zones)
Your design forwards internal zones to both authoritative endpoints on port 5353. Test this explicitly:
- Stop the Authoritative container on dns01:
docker stop dns_authoritative
- Query an internal name via the Recursor and confirm it still resolves via the second forwarder.
This test is aligned with the core design decision that the authoritative service listens on 5353 behind the recursor.
Troubleshooting workflow
When something fails, verify each hop separately.
Step 1: Check the Recursor
Query the Recursor on port 53:
dig @10.26.2.53 host1.site.internal A
Step 2: Check the Authoritative server directly (debug path)
Query the Authoritative server directly on port 5353:
dig @10.26.2.53 -p 5353 host1.site.internal A
If the direct query fails, the problem is likely in Authoritative configuration, backend connectivity, or zone data. If the direct query works but the recursor query fails, focus on forwarding and recursor configuration.
Step 3: Confirm port ownership
This stack depends on the intentional port split:
- Recursor owns host port 53.
- Authoritative owns host port 5353.
DNSSEC pitfall: zone not properly signed
I ran into a DNSSEC issue where DNSSEC was enabled by default on a zone, and the Recursor refused to answer because the zone was not properly signed. The zone could be rectified using:
pdnsutil rectify-zone <ZONE>
If internal DNS suddenly stops responding for a specific zone after enabling DNSSEC or migrating zone data, this is a fast and effective remediation step 1.
Operational notes
Logging
Your authoritative template enables query logging and detailed logging, which is helpful during rollout and troubleshooting 2. Once the setup is stable, consider reducing verbosity based on your operational needs.
Persistence and configuration management
The Compose approach in this series mounts configuration read-only and persists runtime data. This supports configuration management workflows and makes container restarts safe.
Conclusion
Operations is about proving the design: redundancy via multiple forwarders, clear port separation (53 for Recursor, 5353 for Authoritative), and a small set of high-signal troubleshooting steps. The DNSSEC rectify-zone fix is the most valuable “real world” gotcha I documented from running this stack.
Next in series
Continue with Part 3: AXFR workflows and the BIND backend.
All configuration files and scripts for this post are available here: https://git.spacewars.ch/blog/public
