Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1711439

Summary:	Cluster DNS requests can be routed to un-ready CoreDNS pods
Product:	OpenShift Container Platform	Reporter:	Dan Mace <dmace>
Component:	Networking	Assignee:	Daneyon Hansen <dhansen>
Networking sub component:	DNS	QA Contact:	Hongan Li <hongli>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	high
Priority:	unspecified	CC:	aos-bugs, dhansen, mmasters
Version:	4.1.0
Target Milestone:	---
Target Release:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-10-16 06:29:06 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Dan Mace 2019-05-17 19:44:46 UTC

Description of problem:

Because of a missing readiness probe, CoreDNS pods can report Ready while unable to service a request (due to initialization, apiserver connectivity loss, etc.).

When in this state, service traffic will still be routed to the CoreDNS pod, potentially causing timeouts or other connection errors when performing DNS queries.


Version-Release number of selected component (if applicable):

Payload: 4.1.0-0.nightly-2019-05-15-151517

How reproducible:

See https://bugzilla.redhat.com/show_bug.cgi?id=1711364

In such a cluster state, the following command will sometimes fail when the request is distributed to the affected CoreDNS pod:

dig @172.30.0.10 kubernetes.default.svc.cluster.local

Actual results:


Expected results:


Additional info:

Comment 1 Dan Mace 2019-05-17 19:45:52 UTC

This bug was first observed while debugging https://bugzilla.redhat.com/show_bug.cgi?id=1711364

Comment 3 Hongan Li 2019-06-26 08:30:14 UTC

verified with 4.2.0-0.nightly-2019-06-25-003324 and issue has been fixed.

$ oc get ds/dns-default -n openshift-dns -o yaml
spec:
  template:
    spec:
      containers:
      - args:
        name: dns
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /health
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 10
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 10


check iptables and ensure dns query will not be distributed to the affected pod.

-A KUBE-SERVICES -d 172.30.0.10/32 -p udp -m comment --comment "openshift-dns/dns-default:dns cluster IP" -m udp --dport 53 -j KUBE-SVC-BGNS3J6UB7MMLVDO
-A KUBE-SVC-BGNS3J6UB7MMLVDO -m statistic --mode random --probability 0.20000000019 -j KUBE-SEP-5AKZ7CS5F5VMNQSF
-A KUBE-SVC-BGNS3J6UB7MMLVDO -m statistic --mode random --probability 0.25000000000 -j KUBE-SEP-6ZXZTDOJTY36K2QO
-A KUBE-SVC-BGNS3J6UB7MMLVDO -m statistic --mode random --probability 0.33332999982 -j KUBE-SEP-YQXEHXFWXRE6J2LE
-A KUBE-SVC-BGNS3J6UB7MMLVDO -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-UVEJRI5KEK43CMS5
-A KUBE-SVC-BGNS3J6UB7MMLVDO -j KUBE-SEP-IABYQC3JKZQOBY4V

Comment 5 errata-xmlrpc 2019-10-16 06:29:06 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922