Operations and troubleshooting

Series navigation

Part 1: Deploy Recursor + Authoritative + PowerAdmin
Part 2 (this post): Operations and troubleshooting
Part 3: AXFR workflows and the BIND backend

What to validate after rollout

A redundant DNS setup is only useful if you can prove failure behavior. This section is intentionally practical and test-driven.

1) Validate Recursor redundancy (client perspective)

If clients are configured with both dns01 and dns02 via DHCP:

Stop the Recursor container on dns01:

docker stop dns_recursor

Repeat typical queries from a client and confirm lookups still work via dns02.

2) Validate Authoritative redundancy (internal zones)

Your design forwards internal zones to both authoritative endpoints on port 5353. Test this explicitly:

Stop the Authoritative container on dns01:

docker stop dns_authoritative

Query an internal name via the Recursor and confirm it still resolves via the second forwarder.

This test is aligned with the core design decision that the authoritative service listens on 5353 behind the recursor.

Troubleshooting workflow

When something fails, verify each hop separately.

Step 1: Check the Recursor

Query the Recursor on port 53:

dig @10.26.2.53 host1.site.internal A

Step 2: Check the Authoritative server directly (debug path)

Query the Authoritative server directly on port 5353:

dig @10.26.2.53 -p 5353 host1.site.internal A

If the direct query fails, the problem is likely in Authoritative configuration, backend connectivity, or zone data. If the direct query works but the recursor query fails, focus on forwarding and recursor configuration.

Step 3: Confirm port ownership

This stack depends on the intentional port split:

Recursor owns host port 53.
Authoritative owns host port 5353.

DNSSEC pitfall: zone not properly signed

I ran into a DNSSEC issue where DNSSEC was enabled by default on a zone, and the Recursor refused to answer because the zone was not properly signed. The zone could be rectified using:

pdnsutil rectify-zone <ZONE>

If internal DNS suddenly stops responding for a specific zone after enabling DNSSEC or migrating zone data, this is a fast and effective remediation step 1.

Operational notes

Logging

Your authoritative template enables query logging and detailed logging, which is helpful during rollout and troubleshooting 2. Once the setup is stable, consider reducing verbosity based on your operational needs.

Persistence and configuration management

The Compose approach in this series mounts configuration read-only and persists runtime data. This supports configuration management workflows and makes container restarts safe.

Conclusion

Operations is about proving the design: redundancy via multiple forwarders, clear port separation (53 for Recursor, 5353 for Authoritative), and a small set of high-signal troubleshooting steps. The DNSSEC rectify-zone fix is the most valuable “real world” gotcha I documented from running this stack.

Next in series

Continue with Part 3: AXFR workflows and the BIND backend.

All configuration files and scripts for this post are available here: https://git.spacewars.ch/blog/public

Operations and troubleshooting

Series navigation

What to validate after rollout

1) Validate Recursor redundancy (client perspective)

2) Validate Authoritative redundancy (internal zones)

Troubleshooting workflow

Step 1: Check the Recursor

Step 2: Check the Authoritative server directly (debug path)

Step 3: Confirm port ownership

DNSSEC pitfall: zone not properly signed

Operational notes

Logging

Persistence and configuration management

Conclusion

Next in series

Leave a Reply Cancel reply

Partner Sites

Social & Community

Archive

Series navigation

What to validate after rollout

1) Validate Recursor redundancy (client perspective)

2) Validate Authoritative redundancy (internal zones)

Troubleshooting workflow

Step 1: Check the Recursor

Step 2: Check the Authoritative server directly (debug path)

Step 3: Confirm port ownership

DNSSEC pitfall: zone not properly signed

Operational notes

Logging

Persistence and configuration management

Conclusion

Next in series

Related Posts

Leave a Reply Cancel reply

Archive