Bug 1946929

Summary:	the default dns operator's Progressing status is always True and cluster operator dns Progressing status is False
Product:	OpenShift Container Platform	Reporter:	Hongan Li <hongli>
Component:	Networking	Assignee:	Miciah Dashiel Butler Masters <mmasters>
Networking sub component:	DNS	QA Contact:	jechen <jechen>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	high
Priority:	high	CC:	amcdermo, aos-bugs, jechen
Version:	4.8
Target Milestone:	---
Target Release:	4.8.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-07-27 22:57:54 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Hongan Li 2021-04-07 09:42:20 UTC

Description of problem:
the default dns operator's Progressing status is always True and cluster operator dns Progressing status is False.
And the message shows null/nil for node selector and tolerations.

Version-Release number of selected component (if applicable):
4.8.0-0.nightly-2021-04-06-162113

How reproducible:
100%

Steps to Reproduce:
1. fresh install 4.8 cluster
2. $ oc get dnses.operator.openshift.io default -oyaml
3. $ oc get co/dns

Actual results:
$ oc get dnses.operator.openshift.io default -oyaml
<---snip--->
spec:
nodePlacement: {}
status:
clusterDomain: cluster.local
clusterIP: 172.30.0.10
conditions:
- lastTransitionTime: "2021-04-07T08:53:24Z"
message: Enough DNS and node-resolver pods are available, and the DNS service
has a cluster IP address.
reason: AsExpected
status: "False"
type: Degraded
- lastTransitionTime: "2021-04-07T08:53:30Z"
message: |-
Have DNS daemonset with node selector map[kubernetes.io/os:linux], want map[].
Have DNS daemonset with tolerations [{node-role.kubernetes.io/master Exists <nil>}], want [].
reason: Reconciling
status: "True"
type: Progressing

$ oc get co/dns
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE
dns 4.8.0-0.nightly-2021-04-06-162113 True False False 38m

Expected results:
1. normally the dnses.operator/default Progressing status should be False
2. the message should not show null/nil for node selector and tolerations.
3. dns.operator and co/dns should keep consistent status conditions.

Additional info:

Comment 1 Hongan Li 2021-04-07 10:02:10 UTC

*** Bug 1946931 has been marked as a duplicate of this bug. ***

Comment 2 Hongan Li 2021-04-07 10:03:33 UTC

*** Bug 1946933 has been marked as a duplicate of this bug. ***

Comment 3 Hongan Li 2021-04-13 08:59:02 UTC

### if explicitly set default value in spec.nodePlacement, the Progressing status shows False:
spec:
  nodePlacement:
    nodeSelector:
      kubernetes.io/os: linux
    tolerations:
    - key: node-role.kubernetes.io/master
      operator: Exists
status:
  clusterDomain: cluster.local
  clusterIP: 172.30.0.10
  conditions:
<---snip--->
   - lastTransitionTime: "2021-04-13T08:20:29Z"
    message: All DNS and node-resolver pods are available, and the DNS service has
      a cluster IP address.
    reason: AsExpected
    status: "False"
    type: Progressing


### use custom config in spec.nodePlacement also get "False" status:
spec:
  nodePlacement:
    nodeSelector:
      node-role.kubernetes.io/worker: ""
    tolerations:
    - key: my-test
      operator: Exists
status:
  clusterDomain: cluster.local
  clusterIP: 172.30.0.10
  conditions:
<---snip--->
  - lastTransitionTime: "2021-04-13T08:54:16Z"
    message: All DNS and node-resolver pods are available, and the DNS service has
      a cluster IP address.
    reason: AsExpected
    status: "False"
    type: Progressing

Comment 5 jechen 2021-05-03 13:51:50 UTC

Verified in  4.8.0-0.nightly-2021-04-30-201824

$ oc get clusterversions.config.openshift.io 
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-04-30-201824   True        False         35m     Cluster version is 4.8.0-0.nightly-2021-04-30-201824


$ oc get dnses.operator.openshift.io default -oyaml
<--snip-->
status:
  clusterDomain: cluster.local
  clusterIP: 172.30.0.10
  conditions:
  - lastTransitionTime: "2021-05-03T12:44:39Z"
    message: Enough DNS and node-resolver pods are available, and the DNS service has a cluster IP address.
    reason: AsExpected
    status: "False"
    type: Degraded
  - lastTransitionTime: "2021-05-03T12:45:22Z"
    message: All DNS and node-resolver pods are available, and the DNS service has a cluster IP address.
    reason: AsExpected
    status: "False"      <--verified expected result 
    type: Progressing
  - lastTransitionTime: "2021-05-03T12:35:25Z"
    message: The DNS daemonset has available pods, and the DNS service has a cluster IP address.
    reason: AsExpected
    status: "True"
    type: Available


$ oc get co/dns
NAME   VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
dns    4.8.0-0.nightly-2021-04-30-201824   True        False         False      68m             <--verified DNS operator's Progressing status

Comment 6 jechen 2021-05-03 13:56:18 UTC

waiting for new build to verify https://github.com/openshift/cluster-dns-operator/pull/262

Comment 7 jechen 2021-05-04 14:04:34 UTC

Verified in 4.8.0-0.nightly-2021-05-04-042616

$ oc get clusterversions.config.openshift.io 
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-05-04-042616   True        False         20m     Cluster version is 4.8.0-0.nightly-2021-05-04-042616


1. update the node selector in dns.operator default, check "Progressing" status
$ oc edit dns.operator default
spec:
  nodePlacement:
    nodeSelector:
      node-role.kubernetes.io/worker: ""


$ oc get dnses.operator.openshift.io default -oyaml
<--snip-->
spec:
  nodePlacement:
    nodeSelector:
      node-role.kubernetes.io/worker: ""
status:
  clusterDomain: cluster.local
  clusterIP: 172.30.0.10
  conditions:
  - lastTransitionTime: "2021-05-04T12:41:23Z"
    message: Enough DNS and node-resolver pods are available, and the DNS service has a cluster IP address.
    reason: AsExpected
    status: "False"
    type: Degraded
  - lastTransitionTime: "2021-05-04T13:08:23Z"
    message: All DNS and node-resolver pods are available, and the DNS service has a cluster IP address.
    reason: AsExpected
    status: "False"                  <--verified expected result 
    type: Progressing
  - lastTransitionTime: "2021-05-04T12:35:02Z"
    message: The DNS daemonset has available pods, and the DNS service has a cluster IP address.
    reason: AsExpected
    status: "True"
    type: Available

2. config custom tolerations of dns pod (to not tolerate master node taints), check "Processing" status
$ oc edit dns.operator default
spec:
  nodePlacement:
    tolerations:
    - effect: NoExecute
      key: my-dns-test
      operators: Equal
      value: abc
      tolerationSeconds: 3600


$ oc get dnses.operator.openshift.io default -oyaml
<--snip--->
spec:
  nodePlacement:
    tolerations:
    - effect: NoExecute
      key: my-dns-test
      tolerationSeconds: 3600
      value: abc
status:
  clusterDomain: cluster.local
  clusterIP: 172.30.0.10
  conditions:
  - lastTransitionTime: "2021-05-04T13:37:55Z"
    message: Enough DNS and node-resolver pods are available, and the DNS service has a cluster IP address.
    reason: AsExpected
    status: "False"
    type: Degraded
  - lastTransitionTime: "2021-05-04T13:52:04Z"
    message: All DNS and node-resolver pods are available, and the DNS service has a cluster IP address.
    reason: AsExpected
    status: "False"               <--verified expected result 
    type: Progressing
  - lastTransitionTime: "2021-05-04T12:35:02Z"
    message: The DNS daemonset has available pods, and the DNS service has a cluster IP address.
    reason: AsExpected
    status: "True"
    type: Available


$ oc get co/dns
NAME   VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
dns    4.8.0-0.nightly-2021-05-04-042616   True        False         False      83m

Comment 10 errata-xmlrpc 2021-07-27 22:57:54 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438