Skip to content

[BUG]tcp query timeout, but udp query is ok #745

@joy717

Description

@joy717

Issue

i'm using nodelocaldns, and the upstream is coredns.
it's wired that when all nodes reboot, after some minutes, the tcp dns query for nodelocaldns will be time out, the udp query is okay.

netshoot-66bc59cdd7-wrj9t:~# while true; do date; dig kubernetes.default.svc.cluster.local +short +retries=0 +tcp && sleep 5 || break; done;
...
...
Wed Jan  7 10:19:22 UTC 2026
10.233.0.1
Wed Jan  7 10:19:27 UTC 2026
10.233.0.1
Wed Jan  7 10:19:32 UTC 2026
10.233.0.1
Wed Jan  7 10:19:37 UTC 2026
10.233.0.1
Wed Jan  7 10:19:42 UTC 2026
10.233.0.1
Wed Jan  7 10:19:47 UTC 2026
10.233.0.1
Wed Jan  7 10:19:52 UTC 2026
;; communications error to 169.254.25.10#53: timed out

; <<>> DiG 9.20.10 <<>> kubernetes.default.svc.cluster.local +short +retries=0 +tcp
;; global options: +cmd
;; no servers could be reached

the nodelocaldns cm:
10.233.0.3 is the coredns svc ip
169.254.25.10 is the ip for nic nodelocaldns

apiVersion: v1
data:
  Corefile: |
    cluster.local:53 {
        errors
        cache {
            success 9984 30
            denial 9984 5
        }
        reload
        loop
        bind 169.254.25.10
        forward . 10.233.0.3 {
            force_tcp
        }
        prometheus :9253
        health 169.254.25.10:9254
    }
    in-addr.arpa:53 {
        errors
        cache 30
        reload
        loop
        bind 169.254.25.10
        forward . 10.233.0.3 {
            force_tcp
        }
        prometheus :9253
    }
    ip6.arpa:53 {
        errors
        cache 30
        reload
        loop
        bind 169.254.25.10
        forward . 10.233.0.3 {
            force_tcp
        }
        prometheus :9253
    }
    .:53 {
        errors
        cache 30
        reload
        loop
        bind 169.254.25.10
        forward . 223.5.5.5
        prometheus :9253
    }
kind: ConfigMap

if I query via coredns directly, it's ok
if I delete the nodelocaldns pods, it's ok
if I add plugin pprof, it's ok

i debuged with a netshoot container, and i find that the tcp handshake is establishmented, but no data from nodelocaldns to client.

Can somebody help me?
Thanks in advanced.


k8s-dns-node-cache:1.21.1
coredns:v1.10.1
kubernetes: v1.24.6

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions