Description of problem: dns operator status flaps between True/False/False and True/True/(False|True) after updating dnses.operator.openshift.io/default OpenShift release version: 4.10.0-0.nightly-2021-12-23-153012 Cluster Platform: any How reproducible: 100% Steps to Reproduce (in detail): 1. update dnses.operator.openshift.io/default, e.g $ oc patch dnses.operator.openshift.io/default -p '{"spec":{"logLevel":"Debug"}}' --type=merge 2. Open another terminal and check/watch the co/dns status $ oc get co/dns -w Actual results: $ oc get co/dns -w NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE dns 4.10.0-0.nightly-2021-12-23-153012 True False False 6h7m dns 4.10.0-0.nightly-2021-12-23-153012 True True False 6h8m DNS "default" reports Progressing=True: "Have 5 available DNS pods, want 6." dns 4.10.0-0.nightly-2021-12-23-153012 True False False 6h8m dns 4.10.0-0.nightly-2021-12-23-153012 True True False 6h8m DNS "default" reports Progressing=True: "Have 5 available DNS pods, want 6." dns 4.10.0-0.nightly-2021-12-23-153012 True True True 6h8m DNS default is degraded dns 4.10.0-0.nightly-2021-12-23-153012 True True False 6h8m DNS "default" reports Progressing=True: "Have 5 available DNS pods, want 6." dns 4.10.0-0.nightly-2021-12-23-153012 True True True 6h8m DNS default is degraded dns 4.10.0-0.nightly-2021-12-23-153012 True True False 6h8m DNS "default" reports Progressing=True: "Have 5 available DNS pods, want 6." dns 4.10.0-0.nightly-2021-12-23-153012 True False False 6h9m dns 4.10.0-0.nightly-2021-12-23-153012 True True False 6h9m DNS "default" reports Progressing=True: "Have 5 available DNS pods, want 6." dns 4.10.0-0.nightly-2021-12-23-153012 True False False 6h9m dns 4.10.0-0.nightly-2021-12-23-153012 True True False 6h9m DNS "default" reports Progressing=True: "Have 5 available DNS pods, want 6." dns 4.10.0-0.nightly-2021-12-23-153012 True False False 6h9m During the rolling update, co/dns status flaps between True/False/False and True/True/(False|True). Expected results: During the DNS rolling update, co/dns status should always be True/True/False until the update is finished. Impact of the problem: co/dns doesn't show the real reconciling status Additional info: ** Please do not disregard the report template; filling the template out as much as possible will allow us to help you. Please consider attaching a must-gather archive (via `oc adm must-gather`). Please review must-gather contents for sensitive information before attaching any must-gathers to a bugzilla report. You may also mark the bug private if you wish.
Setting blocker- because this is not a regression or upgrade issue, but it looks like we are doing a lot of spurious status updates when the user uses the new logging API. Miheer, please take a look, and let me know if you need help with the status logic.
After some investigation, I believe this is a longstanding, general issue that doesn't apply only to the log-level API and predates 4.10. I can reproduce the issue on OpenShift 4.9 using the following steps: 1. Launch a 4.9 cluster. I used 4.9.0-0.nightly-2022-01-24-212243. 2. In one terminal, watch pods: oc -n openshift-dns get pods -w 2. In a second terminal, watch the clusteroperator: oc get co/dns -w 4. In a third terminal, watch dns pod logs: oc -n openshift-dns logs -c dns -f -l dns.operator.openshift.io/daemonset-dns=default --max-log-requests=6 5. Patch the dnses.operator.openshift.io/default in some way that causes a change to the Corefile config; for example: oc patch dns.operator/default --type=merge --patch='{"spec":{"servers":[{"name":"test","zones":["test"],"forwardPlugin":{"upstreams":["127.0.0.1:5353"]}}]}}' After performing these steps, I see the status updates for the clusteroperator: % oc get co/dns -w NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE dns 4.9.0-0.nightly-2022-01-24-212243 True False False 25m dns 4.9.0-0.nightly-2022-01-24-212243 True True False 25m DNS "default" reports Progressing=True: "Have 5 available DNS pods, want 6." dns 4.9.0-0.nightly-2022-01-24-212243 True False False 25m dns 4.9.0-0.nightly-2022-01-24-212243 True True False 25m DNS "default" reports Progressing=True: "Have 5 available DNS pods, want 6." dns 4.9.0-0.nightly-2022-01-24-212243 True True True 26m DNS default is degraded dns 4.9.0-0.nightly-2022-01-24-212243 True True False 26m DNS "default" reports Progressing=True: "Have 5 available DNS pods, want 6." dns 4.9.0-0.nightly-2022-01-24-212243 True False False 26m dns 4.9.0-0.nightly-2022-01-24-212243 True True False 26m DNS "default" reports Progressing=True: "Have 5 available DNS pods, want 6." dns 4.9.0-0.nightly-2022-01-24-212243 True True True 26m DNS default is degraded dns 4.9.0-0.nightly-2022-01-24-212243 True True False 26m DNS "default" reports Progressing=True: "Have 5 available DNS pods, want 6." dns 4.9.0-0.nightly-2022-01-24-212243 True True True 26m DNS default is degraded dns 4.9.0-0.nightly-2022-01-24-212243 True True False 26m DNS "default" reports Progressing=True: "Have 5 available DNS pods, want 6." dns 4.9.0-0.nightly-2022-01-24-212243 True False False 27m I also see each dns pod briefly go unready, then ready again: % oc -n openshift-dns get pods -w NAME READY STATUS RESTARTS AGE dns-default-242pw 2/2 Running 0 25m dns-default-28grx 2/2 Running 0 18m dns-default-44sv4 2/2 Running 0 25m dns-default-9wnx9 2/2 Running 0 18m dns-default-bn6kt 2/2 Running 0 19m dns-default-jj2kf 2/2 Running 0 25m node-resolver-7x9gq 1/1 Running 0 19m node-resolver-dldk4 1/1 Running 0 19m node-resolver-drvrj 1/1 Running 0 25m node-resolver-jqsnv 1/1 Running 0 19m node-resolver-scmnj 1/1 Running 0 25m node-resolver-ztnnm 1/1 Running 0 25m dns-default-44sv4 1/2 Running 0 25m dns-default-44sv4 2/2 Running 0 26m dns-default-28grx 1/2 Running 0 19m dns-default-9wnx9 1/2 Running 0 19m dns-default-28grx 2/2 Running 0 19m dns-default-9wnx9 2/2 Running 0 19m dns-default-242pw 1/2 Running 0 26m dns-default-jj2kf 1/2 Running 0 27m dns-default-242pw 2/2 Running 0 27m dns-default-bn6kt 1/2 Running 0 21m dns-default-jj2kf 2/2 Running 0 27m dns-default-bn6kt 2/2 Running 0 21m And finally, the logs reveal the issue; following is a snippet from one dns pod's logs: [INFO] Reloading [INFO] plugin/health: Going into lameduck mode for 20s [INFO] plugin/reload: Running configuration MD5 = 746216286a0ba8ae7a62a714fa095855 [INFO] Reloading complete I see the above repeated for every dns pod. Every time the Corefile configmap is updated, each CoreDNS instance reloads the Corefile, and when CoreDNS reloads, it goes into "lameduck mode" for 20 seconds, during which time it reports not-ready to the kubelet. We can add hysteresis to the status reporting logic to prevent flapping. I believe this work is best left till after 4.10.0.
I think still there is some flapping, but reduced than previous time. melvinjoseph@mjoseph-mac Downloads % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.ci.test-2022-04-13-042208-ci-ln-i9fzbyk-latest True False 40m Cluster version is 4.11.0-0.ci.test-2022-04-13-042208-ci-ln-i9fzbyk-latest melvinjoseph@mjoseph-mac Downloads % oc patch dns.operator/default --type=merge --patch='{"spec":{"servers":[{"name":"test","zones":["test"],"forwardPlugin":{"upstreams":["127.0.0.1:5353"]}}]}}' dns.operator.openshift.io/default patched melvinjoseph@mjoseph-mac Downloads % oc get co/dns -w NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE dns 4.11.0-0.ci.test-2022-04-13-042208-ci-ln-i9fzbyk-latest True False False 57m dns 4.11.0-0.ci.test-2022-04-13-042208-ci-ln-i9fzbyk-latest True True False 57m DNS "default" reports Progressing=True: "Have 5 available DNS pods, want 6." dns 4.11.0-0.ci.test-2022-04-13-042208-ci-ln-i9fzbyk-latest True True True 57m DNS default is degraded dns 4.11.0-0.ci.test-2022-04-13-042208-ci-ln-i9fzbyk-latest True True False 57m DNS "default" reports Progressing=True: "Have 5 available DNS pods, want 6." dns 4.11.0-0.ci.test-2022-04-13-042208-ci-ln-i9fzbyk-latest True False False 57m dns 4.11.0-0.ci.test-2022-04-13-042208-ci-ln-i9fzbyk-latest True False True 58m DNS default is degraded dns 4.11.0-0.ci.test-2022-04-13-042208-ci-ln-i9fzbyk-latest True False False 58m dns 4.11.0-0.ci.test-2022-04-13-042208-ci-ln-i9fzbyk-latest True True False 58m DNS "default" reports Progressing=True: "Have 5 available DNS pods, want 6." dns 4.11.0-0.ci.test-2022-04-13-042208-ci-ln-i9fzbyk-latest True False False 58m melvinjoseph@mjoseph-mac Downloads % oc -n openshift-dns get pods -w NAME READY STATUS RESTARTS AGE dns-default-44sv6 2/2 Running 0 57m dns-default-9fffj 2/2 Running 0 50m dns-default-b9mgt 2/2 Running 0 57m dns-default-ln49d 2/2 Running 0 57m dns-default-wb66v 2/2 Running 0 50m dns-default-zlj5f 2/2 Running 0 51m node-resolver-59dsx 1/1 Running 0 52m node-resolver-6trnh 1/1 Running 0 52m node-resolver-gdhc6 1/1 Running 0 57m node-resolver-h6dt4 1/1 Running 0 57m node-resolver-szstk 1/1 Running 0 57m node-resolver-t27dz 1/1 Running 0 52m dns-default-ln49d 1/2 Running 0 57m dns-default-44sv6 1/2 Running 0 57m dns-default-ln49d 2/2 Running 0 57m dns-default-b9mgt 1/2 Running 0 57m dns-default-44sv6 2/2 Running 0 57m dns-default-b9mgt 2/2 Running 0 58m dns-default-wb66v 1/2 Running 0 52m dns-default-9fffj 1/2 Running 0 52m dns-default-wb66v 2/2 Running 0 52m dns-default-9fffj 2/2 Running 0 52m dns-default-zlj5f 1/2 Running 0 52m dns-default-zlj5f 2/2 Running 0 52m melvinjoseph@mjoseph-mac Downloads % oc -n openshift-dns logs -c dns -f -l dns.operator.openshift.io/daemonset-dns=default --max-log-requests=6 [INFO] plugin/reload: Running configuration MD5 = 477feca3db059e6b9063274509ca76ee CoreDNS-1.8.7 linux/amd64, go1.17.5, [INFO] Reloading [INFO] plugin/health: Going into lameduck mode for 20s [INFO] Reloading [INFO] plugin/health: Going into lameduck mode for 20s [INFO] plugin/reload: Running configuration MD5 = ca3dd6b3842292b76f3c39bece74187e [INFO] Reloading complete [INFO] Reloading [INFO] plugin/health: Going into lameduck mode for 20s [INFO] plugin/reload: Running configuration MD5 = ca3dd6b3842292b76f3c39bece74187e [INFO] Reloading complete [INFO] Reloading [INFO] plugin/health: Going into lameduck mode for 20s [INFO] plugin/reload: Running configuration MD5 = ca3dd6b3842292b76f3c39bece74187e [INFO] Reloading complete [INFO] Reloading [INFO] plugin/health: Going into lameduck mode for 20s [INFO] plugin/reload: Running configuration MD5 = ca3dd6b3842292b76f3c39bece74187e [INFO] Reloading complete [INFO] plugin/reload: Running configuration MD5 = ca3dd6b3842292b76f3c39bece74187e [INFO] Reloading complete [INFO] Reloading [INFO] plugin/health: Going into lameduck mode for 20s [INFO] plugin/reload: Running configuration MD5 = ca3dd6b3842292b76f3c39bece74187e [INFO] Reloading complete
melvinjoseph@mjoseph-mac Downloads % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-06-04-014713 True False 7h9m Cluster version is 4.11.0-0.nightly-2022-06-04-014713 melvinjoseph@mjoseph-mac Downloads % melvinjoseph@mjoseph-mac Downloads % oc patch dns.operator/default --type=merge --patch='{"spec":{"servers":[{"name":"test","zones":["test"],"forwardPlugin":{"upstreams":["127.0.0.1:5353"]}}]}}' dns.operator.openshift.io/default patched melvinjoseph@mjoseph-mac Downloads % oc -n openshift-dns get pods -w NAME READY STATUS RESTARTS AGE dns-default-4j8gl 2/2 Running 0 6h37m dns-default-cnv29 2/2 Running 0 6h37m dns-default-pjsrd 2/2 Running 0 6h29m dns-default-vbqn8 2/2 Running 0 6h30m dns-default-vqkpp 2/2 Running 0 6h37m node-resolver-6js2l 1/1 Running 0 6h37m node-resolver-9p6vn 1/1 Running 0 6h30m node-resolver-n9257 1/1 Running 0 6h30m node-resolver-qp5wm 1/1 Running 0 6h37m node-resolver-qr5lb 1/1 Running 0 6h37m dns-default-4j8gl 1/2 Running 0 6h40m dns-default-pjsrd 1/2 Running 0 6h33m dns-default-4j8gl 2/2 Running 0 6h40m dns-default-pjsrd 2/2 Running 0 6h33m dns-default-vbqn8 1/2 Running 0 6h33m dns-default-vbqn8 2/2 Running 0 6h34m dns-default-vqkpp 1/2 Running 0 6h41m dns-default-cnv29 1/2 Running 0 6h41m dns-default-vqkpp 2/2 Running 0 6h41m dns-default-cnv29 2/2 Running 0 6h41m melvinjoseph@mjoseph-mac Downloads % oc get co/dns -w NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE dns 4.11.0-0.nightly-2022-06-04-014713 True False False 6h37m dns 4.11.0-0.nightly-2022-06-04-014713 True True False 6h40m DNS "default" reports Progressing=True: "Have 4 available DNS pods, want 5." dns 4.11.0-0.nightly-2022-06-04-014713 True True True 6h40m DNS default is degraded dns 4.11.0-0.nightly-2022-06-04-014713 True True False 6h40m DNS "default" reports Progressing=True: "Have 4 available DNS pods, want 5." dns 4.11.0-0.nightly-2022-06-04-014713 True False False 6h40m melvinjoseph@mjoseph-mac Downloads % oc -n openshift-dns logs -c dns -f -l dns.operator.openshift.io/daemonset-dns=default --max-log-requests=6 [INFO] 10.128.0.51:48836 - 5141 "AAAA IN registry.redhat.io.us-west-2.compute.internal. udp 63 false 512" NXDOMAIN qr,rd,ra 63 0.003339211s [INFO] 10.128.0.51:45118 - 61895 "A IN registry.redhat.io.mjoseph-ip18.qe.devcluster.openshift.com. udp 77 false 512" NXDOMAIN qr,rd,ra 77 0.002902669s [INFO] 10.128.0.51:34660 - 34948 "AAAA IN registry.redhat.io.mjoseph-ip18.qe.devcluster.openshift.com. udp 77 false 512" NXDOMAIN qr,rd,ra 77 0.003045752s [INFO] Reloading [INFO] plugin/health: Going into lameduck mode for 20s [INFO] 10.129.0.1:4062 - 14450 "AAAA IN api.openshift.com.us-west-2.compute.internal. udp 62 false 512" NXDOMAIN qr,rd,ra 62 0.007191427s [INFO] 10.129.0.1:60402 - 11971 "A IN api.openshift.com.us-west-2.compute.internal. udp 62 false 512" NXDOMAIN qr,rd,ra 62 0.007292815s [INFO] Reloading [INFO] plugin/health: Going into lameduck mode for 20s Based on the output and https://bugzilla.redhat.com/show_bug.cgi?id=2037190#c8 i marking this bug as verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069