Operations and troubleshooting

Series navigation


What to validate after rollout

A redundant DNS setup is only useful if you can prove failure behavior. This section is intentionally practical and test-driven.

1) Validate Recursor redundancy (client perspective)

If clients are configured with both dns01 and dns02 via DHCP:

  1. Stop the Recursor container on dns01:
  • docker stop dns_recursor
  1. Repeat typical queries from a client and confirm lookups still work via dns02.

2) Validate Authoritative redundancy (internal zones)

Your design forwards internal zones to both authoritative endpoints on port 5353. Test this explicitly:

  1. Stop the Authoritative container on dns01:
  • docker stop dns_authoritative
  1. Query an internal name via the Recursor and confirm it still resolves via the second forwarder.

This test is aligned with the core design decision that the authoritative service listens on 5353 behind the recursor.


Troubleshooting workflow

When something fails, verify each hop separately.

Step 1: Check the Recursor

Query the Recursor on port 53:

  • dig @10.26.2.53 host1.site.internal A

Step 2: Check the Authoritative server directly (debug path)

Query the Authoritative server directly on port 5353:

  • dig @10.26.2.53 -p 5353 host1.site.internal A

If the direct query fails, the problem is likely in Authoritative configuration, backend connectivity, or zone data. If the direct query works but the recursor query fails, focus on forwarding and recursor configuration.

Step 3: Confirm port ownership

This stack depends on the intentional port split:

  • Recursor owns host port 53.
  • Authoritative owns host port 5353.

DNSSEC pitfall: zone not properly signed

I ran into a DNSSEC issue where DNSSEC was enabled by default on a zone, and the Recursor refused to answer because the zone was not properly signed. The zone could be rectified using:

pdnsutil rectify-zone <ZONE>

If internal DNS suddenly stops responding for a specific zone after enabling DNSSEC or migrating zone data, this is a fast and effective remediation step 1.


Operational notes

Logging

Your authoritative template enables query logging and detailed logging, which is helpful during rollout and troubleshooting 2. Once the setup is stable, consider reducing verbosity based on your operational needs.

Persistence and configuration management

The Compose approach in this series mounts configuration read-only and persists runtime data. This supports configuration management workflows and makes container restarts safe.


Conclusion

Operations is about proving the design: redundancy via multiple forwarders, clear port separation (53 for Recursor, 5353 for Authoritative), and a small set of high-signal troubleshooting steps. The DNSSEC rectify-zone fix is the most valuable “real world” gotcha I documented from running this stack.


Next in series

Continue with Part 3: AXFR workflows and the BIND backend.


All configuration files and scripts for this post are available here: https://git.spacewars.ch/blog/public

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.