Skip to content

NodeLocalDNS: Serve stale DNS responses on upstream failures to prevent workload outages #747

@nitin-nizhawan

Description

@nitin-nizhawan

What would you like to be added?

Introduce stale DNS response serving in NodeLocalDNS, similar to “serve‑stale” mechanisms implemented in production resolvers (e.g., Unbound, BIND).

Behavior

If a cached record exists (even if TTL has expired), NodeLocalDNS should return the stale response when any of the following conditions occur:

  1. Upstream DNS is unreachable or times out
  2. Upstream DNS returns a temporary NXDOMAIN
  3. Upstream DNS returns a response with no IP addresses (empty A/AAAA records)

This mechanism is intended as a resiliency feature, not a replacement for normal TTL‑based resolution.

Why is this needed?

NodeLocalDNS currently does not prevent outages due to DNS resolution failures when the upstream DNS resolver(s):

  • Are temporarily unavailable or not functioning
  • Return transient NXDOMAIN responses
  • Return responses without any IP addresses (empty A/AAAA answers)

These failures can directly propagate to workloads and cause application outages, even when valid DNS data existed shortly before the failure.

RFC 8767 already defines serving stale DNS responses when in case when upstream dns servers are unavailable or not functioning. In addition to this we need stale responses even for temporary NXDOMAIN and empty responses.

Intermittent DNS failures are a well‑known source of cascading outages in distributed systems. A recent high‑profile AWS outage (caused by transient DNS resolution failures) highlighted how short‑lived DNS unavailability can lead to widespread service impact.

References

  • AWS DNS‑related outage (high‑level incident summary)
  • RFC 8767 – Serving Stale DNS Data to Improve Resilency

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions