Bug 2093236

Summary:	DNS operator stopped reconciling after 4.10 to 4.11 upgrade \| 4.11 nightly to 4.11 nightly upgrade
Product:	OpenShift Container Platform	Reporter:	Andreas Karis <akaris>
Component:	Networking	Assignee:	Andrew McDermott <amcdermo>
Networking sub component:	DNS	QA Contact:	Melvin Joseph <mjoseph>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	high
Priority:	high	CC:	aos-bugs, hongli, mmasters
Version:	4.11
Target Milestone:	---
Target Release:	4.11.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2022-08-10 11:16:16 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Andreas Karis 2022-06-03 11:33:03 UTC

Description of problem:

Job link: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-upgrade-from-stable-4.10-e2e-aws-ovn-upgrade/1526139736576495616

The cluster operator was stuck and stopped updating the co object:
{code}
[akaris@linux analysis-2]$ omg get co | grep -v 'True       False        False'
NAME                                      VERSION                        AVAILABLE  PROGRESSING  DEGRADED  SINCE
dns                                       4.10.14                        True       True         False     2h0m
{code}

It looks like the openshift-dns-operator somehow stopped updating or reconciling.

We see in the last update line in the operator's logs that it updated the DNS default status to available "from having 5 up to date DNS pods" ; however, the co object shows something completely different. Indented (for clarity) the line from the log:
{code}
2022-05-16T11:20:41.082720910Z time="2022-05-16T11:20:41Z" level=info msg="updated DNS default status: 

old: 
v1.DNSStatus{ClusterIP:\"172.30.0.10\", ClusterDomain:\"cluster.local\", 
Conditions:[]v1.OperatorCondition{
v1.OperatorCondition{Type:\"Degraded\", Status:\"False\", LastTransitionTime:time.Date(2022, time.May, 16, 10, 28, 41, 0, time.Local), Reason:\"AsExpected\", Message:\"Enough DNS pods are available, and the DNS service has a cluster IP address.\"}, 
v1.OperatorCondition{Type:\"Progressing\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 16, 11, 19, 53, 0, time.Local), Reason:\"Reconciling\", Message:\"Have 5 up-to-date DNS pods, want 6.\"}, 
v1.OperatorCondition{Type:\"Available\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 16, 10, 20, 5, 0, time.Local), Reason:\"AsExpected\", Message:\"The DNS daemonset has available pods, and the DNS service has a cluster IP address.\"}, 
v1.OperatorCondition{Type:\"Upgradeable\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 16, 10, 19, 50, 0, time.Local), Reason:\"AsExpected\", Message:\"DNS Operator can be upgraded\"}
}}, 

new: 
v1.DNSStatus{ClusterIP:\"172.30.0.10\", ClusterDomain:\"cluster.local\", 
Conditions:[]v1.OperatorCondition{
v1.OperatorCondition{Type:\"Degraded\", Status:\"False\", LastTransitionTime:time.Date(2022, time.May, 16, 10, 28, 41, 0, time.Local), Reason:\"AsExpected\", Message:\"Enough DNS pods are available, and the DNS service has a cluster IP address.\"}, 
v1.OperatorCondition{Type:\"Progressing\", Status:\"False\", LastTransitionTime:time.Date(2022, time.May, 16, 11, 20, 41, 0, time.Local), Reason:\"AsExpected\", Message:\"All DNS and node-resolver pods are available, and the DNS service has a cluster IP address.\"}, 
v1.OperatorCondition{Type:\"Available\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 16, 10, 20, 5, 0, time.Local), Reason:\"AsExpected\", Message:\"The DNS daemonset has available pods, and the DNS service has a cluster IP address.\"}, 
v1.OperatorCondition{Type:\"Upgradeable\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 16, 10, 19, 50, 0, time.Local), Reason:\"AsExpected\", Message:\"DNS Operator can be upgraded\"}
}}"
{code}

But co DNS shows:
{code}
  - lastTransitionTime: '2022-05-16T11:20:41Z'
    message: 'Upgrading operator to "4.11.0-0.ci-2022-05-16-095559".

      Upgrading coredns to "registry.ci.openshift.org/ocp/4.11-2022-05-16-095559@sha256:c371180cfc6ba0c7ccf4d1b5da89beee8b2ea575e6e89bc89f06280884753236".

      Upgrading kube-rbac-proxy to "registry.ci.openshift.org/ocp/4.11-2022-05-16-095559@sha256:5b01b4dccbca6d9f3526d861b92cb64885a3bd748a508bd1228ec10170a4485c".'
    reason: Upgrading
    status: 'True'
    type: Progressing

{code}

That progressing message above there however does not match with any of the log messages from the new operator-pod.

I'm speculating and maybe its because it intermittently lost its watchers and then never recovers (but I'm not sure about this part, it's pure speculation and unlikely):
{code}
(...)
2022-05-16T11:20:39.979874343Z W0516 11:20:39.979768       1 reflector.go:442] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: watch of *v1.Node ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2022-05-16T11:20:39.979881804Z W0516 11:20:39.979769       1 reflector.go:442] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: watch of *v1.ConfigMap ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2022-05-16T11:20:39.979881804Z W0516 11:20:39.979787       1 reflector.go:442] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: watch of *v1.DaemonSet ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2022-05-16T11:20:39.979881804Z W0516 11:20:39.979788       1 reflector.go:442] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: watch of *v1.Service ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
(...)
{code}

Current time when the must-gather was taken is and we never saw an update from the dns operator since 11:20:
{code}
[akaris@linux analysis-2]$ omg get events -n default -o yaml | grep time | sort  | tail -1
      time: '2022-05-16T13:18:57Z'
{code}

Last log line in the DNS operator:
{code}
2022-05-16T11:20:41.576468249Z time="2022-05-16T11:20:41Z" level=info msg="reconciling request: /default"
{code}

OpenShift release version:
{code}
[akaris@linux analysis-2]$ omg get clusterversion
NAME     VERSION  AVAILABLE  PROGRESSING  SINCE  STATUS
version           True       True         2h5m   Working towards 4.11.0-0.ci-2022-05-16-095559: 658 of 802 done (82% complete)
{code}

Cluster Platform:
AWS

How reproducible:
Seen in CI


Full output:
==============================================================================================================================

{code}
[akaris@linux analysis-2]$ omg get co | grep -v 'True       False        False'
NAME                                      VERSION                        AVAILABLE  PROGRESSING  DEGRADED  SINCE
dns                                       4.10.14                        True       True         False     2h0m
[akaris@linux analysis-2]$ omg get pods -n openshift-dns
NAME                 READY  STATUS   RESTARTS  AGE
dns-default-bdx6d    2/2    Running  0         2h1m
dns-default-dc8s5    2/2    Running  0         2h0m
dns-default-dlqkw    2/2    Running  0         2h1m
dns-default-nl6qp    2/2    Running  0         2h2m
dns-default-rvb6f    2/2    Running  0         2h3m
dns-default-rzqkx    2/2    Running  0         2h2m
node-resolver-62crv  1/1    Running  0         2h3m
node-resolver-jc4zh  1/1    Running  0         2h3m
node-resolver-l88nt  1/1    Running  0         2h3m
node-resolver-m9d2l  1/1    Running  0         2h3m
node-resolver-vcgqh  1/1    Running  0         2h3m
node-resolver-x89ph  1/1    Running  0         2h3m
{code}

{code}
[akaris@linux analysis-2]$ omg get clusterversion
NAME     VERSION  AVAILABLE  PROGRESSING  SINCE  STATUS
version           True       True         2h5m   Working towards 4.11.0-0.ci-2022-05-16-095559: 658 of 802 done (82% complete)
{code}

Current time is:
{code}
[akaris@linux analysis-2]$ omg get events -n default -o yaml | grep time | sort  | tail -1
      time: '2022-05-16T13:18:57Z'
{code}

{code}
  conditions:
  - lastTransitionTime: '2022-05-16T10:20:05Z'
    message: DNS "default" is available.
    reason: AsExpected
    status: 'True'
    type: Available
  - lastTransitionTime: '2022-05-16T11:20:41Z'
    message: 'Upgrading operator to "4.11.0-0.ci-2022-05-16-095559".

      Upgrading coredns to "registry.ci.openshift.org/ocp/4.11-2022-05-16-095559@sha256:c371180cfc6ba0c7ccf4d1b5da89beee8b2ea575e6e89bc89f06280884753236".

      Upgrading kube-rbac-proxy to "registry.ci.openshift.org/ocp/4.11-2022-05-16-095559@sha256:5b01b4dccbca6d9f3526d861b92cb64885a3bd748a508bd1228ec10170a4485c".'
    reason: Upgrading
    status: 'True'
    type: Progressing
  - lastTransitionTime: '2022-05-16T10:28:41Z'
    reason: DNSNotDegraded
    status: 'False'
    type: Degraded
  - lastTransitionTime: '2022-05-16T10:19:50Z'
    message: 'DNS default is upgradeable: DNS Operator can be upgraded'
    reason: DNSUpgradeable
    status: 'True'
    type: Upgradeable
{code}

The operator is simply still reporting an upgrading message.

{code}
openshift-dns-operator                            dns-operator-57597d499b-xvlk2                                2/2    Running    0         2h4m   10.128.0.94   ip-10-0-181-237.ec2.internal
[akaris@linux analysis-2]$
{code}

{code}
[akaris@linux analysis-2]$ omg get pod -o json -n openshift-dns-operator                            dns-operator-57597d499b-xvlk2 | jq '.status'
{
  "conditions": [
    {
      "lastProbeTime": null,
      "lastTransitionTime": "2022-05-16T11:16:37Z",
      "status": "True",
      "type": "Initialized"
    },
    {
      "lastProbeTime": null,
      "lastTransitionTime": "2022-05-16T11:16:43Z",
      "status": "True",
      "type": "Ready"
    },
    {
      "lastProbeTime": null,
      "lastTransitionTime": "2022-05-16T11:16:43Z",
      "status": "True",
      "type": "ContainersReady"
    },
    {
      "lastProbeTime": null,
      "lastTransitionTime": "2022-05-16T11:16:37Z",
      "status": "True",
      "type": "PodScheduled"
    }
  ],
  "containerStatuses": [
    {
      "containerID": "cri-o://4e7547fe92aa0ee700d6cd1521addb394200344d5e2b2c4ffd119a4d31ff3de7",
      "image": "registry.ci.openshift.org/ocp/4.11-2022-05-16-095559@sha256:6e03ab982d4cf1242b8d567ebe61299976ff80258ea6b19fb01e621f10f6fe1e",
      "imageID": "registry.ci.openshift.org/ocp/4.11-2022-05-16-095559@sha256:6e03ab982d4cf1242b8d567ebe61299976ff80258ea6b19fb01e621f10f6fe1e",
      "lastState": {},
      "name": "dns-operator",
      "ready": true,
      "restartCount": 0,
      "started": true,
      "state": {
        "running": {
          "startedAt": "2022-05-16T11:16:42Z"
        }
      }
    },
    {
      "containerID": "cri-o://40099408a4052788d2cb2fdf1d91c333908e2a9ce434e94a1cea0ab3256d47fa",
      "image": "registry.ci.openshift.org/ocp/4.11-2022-05-16-095559@sha256:5b01b4dccbca6d9f3526d861b92cb64885a3bd748a508bd1228ec10170a4485c",
      "imageID": "registry.ci.openshift.org/ocp/4.11-2022-05-16-095559@sha256:5b01b4dccbca6d9f3526d861b92cb64885a3bd748a508bd1228ec10170a4485c",
      "lastState": {},
      "name": "kube-rbac-proxy",
      "ready": true,
      "restartCount": 0,
      "started": true,
      "state": {
        "running": {
          "startedAt": "2022-05-16T11:16:42Z"
        }
      }
    }
  ],
  "hostIP": "10.0.181.237",
  "phase": "Running",
  "podIP": "10.128.0.94",
  "podIPs": [
    {
      "ip": "10.128.0.94"
    }
  ],
  "qosClass": "Burstable",
  "startTime": "2022-05-16T11:16:37Z"
}
{code}

{code}
[akaris@linux analysis-2]$ omg logs -n openshift-dns-operator                            dns-operator-57597d499b-xvlk2 -c dns-operator | tail -n 30
2022-05-16T11:19:01.757993648Z time="2022-05-16T11:19:01Z" level=info msg="updated DNS default status: old: v1.DNSStatus{ClusterIP:\"172.30.0.10\", ClusterDomain:\"cluster.local\", Conditions:[]v1.OperatorCondition{v1.OperatorCondition{Type:\"Degraded\", Status:\"False\", LastTransitionTime:time.Date(2022, time.May, 16, 10, 28, 41, 0, time.Local), Reason:\"AsExpected\", Message:\"Enough DNS pods are available, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Progressing\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 16, 11, 19, 1, 0, time.Local), Reason:\"Reconciling\", Message:\"Have 5 available DNS pods, want 6.\\nHave 3 up-to-date DNS pods, want 6.\"}, v1.OperatorCondition{Type:\"Available\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 16, 10, 20, 5, 0, time.Local), Reason:\"AsExpected\", Message:\"The DNS daemonset has available pods, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Upgradeable\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 16, 10, 19, 50, 0, time.Local), Reason:\"AsExpected\", Message:\"DNS Operator can be upgraded\"}}}, new: v1.DNSStatus{ClusterIP:\"172.30.0.10\", ClusterDomain:\"cluster.local\", Conditions:[]v1.OperatorCondition{v1.OperatorCondition{Type:\"Degraded\", Status:\"False\", LastTransitionTime:time.Date(2022, time.May, 16, 10, 28, 41, 0, time.Local), Reason:\"AsExpected\", Message:\"Enough DNS pods are available, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Progressing\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 16, 11, 19, 1, 0, time.Local), Reason:\"Reconciling\", Message:\"Have 5 available DNS pods, want 6.\\nHave 4 up-to-date DNS pods, want 6.\"}, v1.OperatorCondition{Type:\"Available\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 16, 10, 20, 5, 0, time.Local), Reason:\"AsExpected\", Message:\"The DNS daemonset has available pods, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Upgradeable\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 16, 10, 19, 50, 0, time.Local), Reason:\"AsExpected\", Message:\"DNS Operator can be upgraded\"}}}"
2022-05-16T11:19:01.760380220Z time="2022-05-16T11:19:01Z" level=info msg="reconciling request: /default"
2022-05-16T11:19:16.689256992Z time="2022-05-16T11:19:16Z" level=info msg="reconciling request: /default"
2022-05-16T11:19:16.770284730Z time="2022-05-16T11:19:16Z" level=info msg="updated DNS default status: old: v1.DNSStatus{ClusterIP:\"172.30.0.10\", ClusterDomain:\"cluster.local\", Conditions:[]v1.OperatorCondition{v1.OperatorCondition{Type:\"Degraded\", Status:\"False\", LastTransitionTime:time.Date(2022, time.May, 16, 10, 28, 41, 0, time.Local), Reason:\"AsExpected\", Message:\"Enough DNS pods are available, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Progressing\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 16, 11, 19, 1, 0, time.Local), Reason:\"Reconciling\", Message:\"Have 5 available DNS pods, want 6.\\nHave 4 up-to-date DNS pods, want 6.\"}, v1.OperatorCondition{Type:\"Available\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 16, 10, 20, 5, 0, time.Local), Reason:\"AsExpected\", Message:\"The DNS daemonset has available pods, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Upgradeable\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 16, 10, 19, 50, 0, time.Local), Reason:\"AsExpected\", Message:\"DNS Operator can be upgraded\"}}}, new: v1.DNSStatus{ClusterIP:\"172.30.0.10\", ClusterDomain:\"cluster.local\", Conditions:[]v1.OperatorCondition{v1.OperatorCondition{Type:\"Degraded\", Status:\"False\", LastTransitionTime:time.Date(2022, time.May, 16, 10, 28, 41, 0, time.Local), Reason:\"AsExpected\", Message:\"Enough DNS pods are available, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Progressing\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 16, 11, 19, 16, 0, time.Local), Reason:\"Reconciling\", Message:\"Have 4 up-to-date DNS pods, want 6.\"}, v1.OperatorCondition{Type:\"Available\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 16, 10, 20, 5, 0, time.Local), Reason:\"AsExpected\", Message:\"The DNS daemonset has available pods, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Upgradeable\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 16, 10, 19, 50, 0, time.Local), Reason:\"AsExpected\", Message:\"DNS Operator can be upgraded\"}}}"
2022-05-16T11:19:16.772765985Z time="2022-05-16T11:19:16Z" level=info msg="reconciling request: /default"
2022-05-16T11:19:38.164577501Z time="2022-05-16T11:19:38Z" level=info msg="reconciling request: /default"
2022-05-16T11:19:38.281651967Z time="2022-05-16T11:19:38Z" level=info msg="updated DNS default status: old: v1.DNSStatus{ClusterIP:\"172.30.0.10\", ClusterDomain:\"cluster.local\", Conditions:[]v1.OperatorCondition{v1.OperatorCondition{Type:\"Degraded\", Status:\"False\", LastTransitionTime:time.Date(2022, time.May, 16, 10, 28, 41, 0, time.Local), Reason:\"AsExpected\", Message:\"Enough DNS pods are available, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Progressing\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 16, 11, 19, 16, 0, time.Local), Reason:\"Reconciling\", Message:\"Have 4 up-to-date DNS pods, want 6.\"}, v1.OperatorCondition{Type:\"Available\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 16, 10, 20, 5, 0, time.Local), Reason:\"AsExpected\", Message:\"The DNS daemonset has available pods, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Upgradeable\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 16, 10, 19, 50, 0, time.Local), Reason:\"AsExpected\", Message:\"DNS Operator can be upgraded\"}}}, new: v1.DNSStatus{ClusterIP:\"172.30.0.10\", ClusterDomain:\"cluster.local\", Conditions:[]v1.OperatorCondition{v1.OperatorCondition{Type:\"Degraded\", Status:\"False\", LastTransitionTime:time.Date(2022, time.May, 16, 10, 28, 41, 0, time.Local), Reason:\"AsExpected\", Message:\"Enough DNS pods are available, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Progressing\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 16, 11, 19, 38, 0, time.Local), Reason:\"Reconciling\", Message:\"Have 5 available DNS pods, want 6.\\nHave 5 up-to-date DNS pods, want 6.\"}, v1.OperatorCondition{Type:\"Available\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 16, 10, 20, 5, 0, time.Local), Reason:\"AsExpected\", Message:\"The DNS daemonset has available pods, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Upgradeable\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 16, 10, 19, 50, 0, time.Local), Reason:\"AsExpected\", Message:\"DNS Operator can be upgraded\"}}}"
2022-05-16T11:19:38.283408807Z time="2022-05-16T11:19:38Z" level=info msg="reconciling request: /default"
2022-05-16T11:19:53.252341596Z time="2022-05-16T11:19:53Z" level=info msg="reconciling request: /default"
2022-05-16T11:19:53.395557074Z time="2022-05-16T11:19:53Z" level=info msg="updated DNS default status: old: v1.DNSStatus{ClusterIP:\"172.30.0.10\", ClusterDomain:\"cluster.local\", Conditions:[]v1.OperatorCondition{v1.OperatorCondition{Type:\"Degraded\", Status:\"False\", LastTransitionTime:time.Date(2022, time.May, 16, 10, 28, 41, 0, time.Local), Reason:\"AsExpected\", Message:\"Enough DNS pods are available, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Progressing\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 16, 11, 19, 38, 0, time.Local), Reason:\"Reconciling\", Message:\"Have 5 available DNS pods, want 6.\\nHave 5 up-to-date DNS pods, want 6.\"}, v1.OperatorCondition{Type:\"Available\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 16, 10, 20, 5, 0, time.Local), Reason:\"AsExpected\", Message:\"The DNS daemonset has available pods, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Upgradeable\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 16, 10, 19, 50, 0, time.Local), Reason:\"AsExpected\", Message:\"DNS Operator can be upgraded\"}}}, new: v1.DNSStatus{ClusterIP:\"172.30.0.10\", ClusterDomain:\"cluster.local\", Conditions:[]v1.OperatorCondition{v1.OperatorCondition{Type:\"Degraded\", Status:\"False\", LastTransitionTime:time.Date(2022, time.May, 16, 10, 28, 41, 0, time.Local), Reason:\"AsExpected\", Message:\"Enough DNS pods are available, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Progressing\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 16, 11, 19, 53, 0, time.Local), Reason:\"Reconciling\", Message:\"Have 5 up-to-date DNS pods, want 6.\"}, v1.OperatorCondition{Type:\"Available\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 16, 10, 20, 5, 0, time.Local), Reason:\"AsExpected\", Message:\"The DNS daemonset has available pods, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Upgradeable\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 16, 10, 19, 50, 0, time.Local), Reason:\"AsExpected\", Message:\"DNS Operator can be upgraded\"}}}"
2022-05-16T11:19:53.397425072Z time="2022-05-16T11:19:53Z" level=info msg="reconciling request: /default"
2022-05-16T11:20:39.979748444Z W0516 11:20:39.979711       1 reflector.go:442] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: watch of *v1.Service ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2022-05-16T11:20:39.979819120Z W0516 11:20:39.979806       1 reflector.go:442] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: watch of *v1.Service ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2022-05-16T11:20:39.979827076Z W0516 11:20:39.979807       1 reflector.go:442] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: watch of *v1.DaemonSet ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2022-05-16T11:20:39.979832918Z W0516 11:20:39.979711       1 reflector.go:442] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: watch of *v1.ClusterOperator ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2022-05-16T11:20:39.979864281Z W0516 11:20:39.979828       1 reflector.go:442] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: watch of *v1.DaemonSet ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2022-05-16T11:20:39.979864281Z W0516 11:20:39.979711       1 reflector.go:442] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: watch of *v1.ConfigMap ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2022-05-16T11:20:39.979874343Z W0516 11:20:39.979743       1 reflector.go:442] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: watch of *v1.DNS ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2022-05-16T11:20:39.979874343Z W0516 11:20:39.979735       1 reflector.go:442] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: watch of *v1.Pod ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2022-05-16T11:20:39.979874343Z W0516 11:20:39.979753       1 reflector.go:442] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: watch of *v1.ConfigMap ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2022-05-16T11:20:39.979874343Z W0516 11:20:39.979768       1 reflector.go:442] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: watch of *v1.Node ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2022-05-16T11:20:39.979881804Z W0516 11:20:39.979769       1 reflector.go:442] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: watch of *v1.ConfigMap ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2022-05-16T11:20:39.979881804Z W0516 11:20:39.979787       1 reflector.go:442] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: watch of *v1.DaemonSet ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2022-05-16T11:20:39.979881804Z W0516 11:20:39.979788       1 reflector.go:442] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: watch of *v1.Service ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2022-05-16T11:20:40.964951157Z time="2022-05-16T11:20:40Z" level=info msg="reconciling request: /default"
2022-05-16T11:20:41.082720910Z time="2022-05-16T11:20:41Z" level=info msg="updated DNS default status: old: v1.DNSStatus{ClusterIP:\"172.30.0.10\", ClusterDomain:\"cluster.local\", Conditions:[]v1.OperatorCondition{v1.OperatorCondition{Type:\"Degraded\", Status:\"False\", LastTransitionTime:time.Date(2022, time.May, 16, 10, 28, 41, 0, time.Local), Reason:\"AsExpected\", Message:\"Enough DNS pods are available, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Progressing\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 16, 11, 19, 53, 0, time.Local), Reason:\"Reconciling\", Message:\"Have 5 up-to-date DNS pods, want 6.\"}, v1.OperatorCondition{Type:\"Available\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 16, 10, 20, 5, 0, time.Local), Reason:\"AsExpected\", Message:\"The DNS daemonset has available pods, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Upgradeable\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 16, 10, 19, 50, 0, time.Local), Reason:\"AsExpected\", Message:\"DNS Operator can be upgraded\"}}}, new: v1.DNSStatus{ClusterIP:\"172.30.0.10\", ClusterDomain:\"cluster.local\", Conditions:[]v1.OperatorCondition{v1.OperatorCondition{Type:\"Degraded\", Status:\"False\", LastTransitionTime:time.Date(2022, time.May, 16, 10, 28, 41, 0, time.Local), Reason:\"AsExpected\", Message:\"Enough DNS pods are available, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Progressing\", Status:\"False\", LastTransitionTime:time.Date(2022, time.May, 16, 11, 20, 41, 0, time.Local), Reason:\"AsExpected\", Message:\"All DNS and node-resolver pods are available, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Available\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 16, 10, 20, 5, 0, time.Local), Reason:\"AsExpected\", Message:\"The DNS daemonset has available pods, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Upgradeable\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 16, 10, 19, 50, 0, time.Local), Reason:\"AsExpected\", Message:\"DNS Operator can be upgraded\"}}}"
2022-05-16T11:20:41.084576254Z time="2022-05-16T11:20:41Z" level=info msg="reconciling request: /default"
2022-05-16T11:20:41.171878423Z time="2022-05-16T11:20:41Z" level=info msg="reconciling request: /default"
2022-05-16T11:20:41.576468249Z time="2022-05-16T11:20:41Z" level=info msg="reconciling request: /default"

[akaris@linux analysis-2]$ 
{code}

Around the 11:21 mark, tons of pods on the cluster see that same http connection issue:
{code}
[akaris@linux analysis-2]$ grep 'unable to decode an event from the watch stream' -RI | grep '2022-05-16T11:21' | wc -l
360
{code}

Something else that's weird - all of the pods are actually updated to the requested version:
{code}
  - lastTransitionTime: '2022-05-16T11:20:41Z'
    message: 'Upgrading operator to "4.11.0-0.ci-2022-05-16-095559".

      Upgrading coredns to "registry.ci.openshift.org/ocp/4.11-2022-05-16-095559@sha256:c371180cfc6ba0c7ccf4d1b5da89beee8b2ea575e6e89bc89f06280884753236".

      Upgrading kube-rbac-proxy to "registry.ci.openshift.org/ocp/4.11-2022-05-16-095559@sha256:5b01b4dccbca6d9f3526d861b92cb64885a3bd748a508bd1228ec10170a4485c".'
{code}

When we look at all of the pods, they are updated to exatly those containers:
{code}
[akaris@linux analysis-2]$ omg get pods -n openshift-dns | awk '/dns-default/ {print $1}' | while read p ; do omg get pod -n openshift-dns $p -o json | jq '.spec.containers[] | .image'; done
"registry.ci.openshift.org/ocp/4.11-2022-05-16-095559@sha256:c371180cfc6ba0c7ccf4d1b5da89beee8b2ea575e6e89bc89f06280884753236"
"registry.ci.openshift.org/ocp/4.11-2022-05-16-095559@sha256:5b01b4dccbca6d9f3526d861b92cb64885a3bd748a508bd1228ec10170a4485c"
"registry.ci.openshift.org/ocp/4.11-2022-05-16-095559@sha256:c371180cfc6ba0c7ccf4d1b5da89beee8b2ea575e6e89bc89f06280884753236"
"registry.ci.openshift.org/ocp/4.11-2022-05-16-095559@sha256:5b01b4dccbca6d9f3526d861b92cb64885a3bd748a508bd1228ec10170a4485c"
"registry.ci.openshift.org/ocp/4.11-2022-05-16-095559@sha256:c371180cfc6ba0c7ccf4d1b5da89beee8b2ea575e6e89bc89f06280884753236"
"registry.ci.openshift.org/ocp/4.11-2022-05-16-095559@sha256:5b01b4dccbca6d9f3526d861b92cb64885a3bd748a508bd1228ec10170a4485c"
"registry.ci.openshift.org/ocp/4.11-2022-05-16-095559@sha256:c371180cfc6ba0c7ccf4d1b5da89beee8b2ea575e6e89bc89f06280884753236"
"registry.ci.openshift.org/ocp/4.11-2022-05-16-095559@sha256:5b01b4dccbca6d9f3526d861b92cb64885a3bd748a508bd1228ec10170a4485c"
"registry.ci.openshift.org/ocp/4.11-2022-05-16-095559@sha256:c371180cfc6ba0c7ccf4d1b5da89beee8b2ea575e6e89bc89f06280884753236"
"registry.ci.openshift.org/ocp/4.11-2022-05-16-095559@sha256:5b01b4dccbca6d9f3526d861b92cb64885a3bd748a508bd1228ec10170a4485c"
"registry.ci.openshift.org/ocp/4.11-2022-05-16-095559@sha256:c371180cfc6ba0c7ccf4d1b5da89beee8b2ea575e6e89bc89f06280884753236"
"registry.ci.openshift.org/ocp/4.11-2022-05-16-095559@sha256:5b01b4dccbca6d9f3526d861b92cb64885a3bd748a508bd1228ec10170a4485c"
{code}

For reference https://github.com/openshift/cluster-dns-operator/blob/release-4.11/pkg/operator/controller/status/controller.go#L479

Comment 1 Andreas Karis 2022-06-03 12:45:34 UTC

We can see the exact same issue in run 1526731236049948672

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-e2e-azure-ovn-upgrade/1526731236049948672


{code}
[akaris@linux analysis-1526731236049948672]$ omg get co
NAME                                      VERSION                             AVAILABLE  PROGRESSING  DEGRADED  SINCE
authentication                            4.11.0-0.nightly-2022-05-18-010528  True       False        False     2h10m
baremetal                                 4.11.0-0.nightly-2022-05-18-010528  True       False        False     3h11m
cloud-controller-manager                  4.11.0-0.nightly-2022-05-18-010528  True       False        False     3h14m
cloud-credential                          4.11.0-0.nightly-2022-05-18-010528  True       False        False     2h12m
cluster-autoscaler                        4.11.0-0.nightly-2022-05-18-010528  True       False        False     2h9m
config-operator                           4.11.0-0.nightly-2022-05-18-010528  True       False        False     3h11m
console                                   4.11.0-0.nightly-2022-05-18-010528  True       False        False     2h12m
csi-snapshot-controller                   4.11.0-0.nightly-2022-05-18-010528  True       False        False     2h9m
dns                                       4.11.0-0.nightly-2022-05-11-054135  True       True         False     2h3m
etcd                                      4.11.0-0.nightly-2022-05-18-010528  True       False        False     2h30m
image-registry                            4.11.0-0.nightly-2022-05-18-010528  True       False        False     2h11m
ingress                                   4.11.0-0.nightly-2022-05-18-010528  True       False        False     2h12m
insights                                  4.11.0-0.nightly-2022-05-18-010528  True       False        False     3h4m
kube-apiserver                            4.11.0-0.nightly-2022-05-18-010528  True       False        False     2h26m
kube-controller-manager                   4.11.0-0.nightly-2022-05-18-010528  True       False        False     2h19m
kube-scheduler                            4.11.0-0.nightly-2022-05-18-010528  True       False        False     2h19m
kube-storage-version-migrator             4.11.0-0.nightly-2022-05-18-010528  True       False        False     2h10m
machine-api                               4.11.0-0.nightly-2022-05-18-010528  True       False        False     2h14m
machine-approver                          4.11.0-0.nightly-2022-05-18-010528  True       False        False     3h11m
machine-config                            4.11.0-0.nightly-2022-05-11-054135  True       False        False     3h10m
marketplace                               4.11.0-0.nightly-2022-05-18-010528  True       False        False     3h10m
monitoring                                4.11.0-0.nightly-2022-05-18-010528  True       False        False     2h10m
network                                   4.11.0-0.nightly-2022-05-18-010528  True       False        False     1h55m
node-tuning                               4.11.0-0.nightly-2022-05-18-010528  True       False        False     2h9m
openshift-apiserver                       4.11.0-0.nightly-2022-05-18-010528  True       False        False     2h9m
openshift-controller-manager              4.11.0-0.nightly-2022-05-18-010528  True       False        False     2h9m
openshift-samples                         4.11.0-0.nightly-2022-05-18-010528  True       False        False     2h12m
operator-lifecycle-manager                4.11.0-0.nightly-2022-05-18-010528  True       False        False     3h11m
operator-lifecycle-manager-catalog        4.11.0-0.nightly-2022-05-18-010528  True       False        False     2h7m
operator-lifecycle-manager-packageserver  4.11.0-0.nightly-2022-05-18-010528  True       False        False     2h9m
service-ca                                4.11.0-0.nightly-2022-05-18-010528  True       False        False     2h11m
storage                                   4.11.0-0.nightly-2022-05-18-010528  True       False        False     2h5m
[akaris@linux analysis-1526731236049948672]$ killall omg
[akaris@linux analysis-1526731236049948672]$ omg get clusterversion
NAME     VERSION  AVAILABLE  PROGRESSING  SINCE  STATUS
version           True       True         2h21m  Working towards 4.11.0-0.nightly-2022-05-18-010528: 658 of 802 done (82% complete)
[akaris@linux analysis-1526731236049948672]$ omg get co dns -o yaml
apiVersion: config.openshift.io/v1
kind: ClusterOperator
metadata:
  annotations:
    include.release.openshift.io/ibm-cloud-managed: 'true'
    include.release.openshift.io/self-managed-high-availability: 'true'
    include.release.openshift.io/single-node-developer: 'true'
  creationTimestamp: '2022-05-18T01:25:25Z'
  generation: 1
  managedFields:
  - apiVersion: config.openshift.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:include.release.openshift.io/ibm-cloud-managed: {}
          f:include.release.openshift.io/self-managed-high-availability: {}
          f:include.release.openshift.io/single-node-developer: {}
        f:ownerReferences:
          .: {}
          k:{"uid":"46bbbb00-9d1f-4d2f-80ac-f874fea89e79"}: {}
      f:spec: {}
    manager: Go-http-client
    operation: Update
    time: '2022-05-18T01:25:25Z'
  - apiVersion: config.openshift.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        .: {}
        f:extension: {}
    manager: Go-http-client
    operation: Update
    subresource: status
    time: '2022-05-18T01:25:26Z'
  - apiVersion: config.openshift.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        f:conditions: {}
        f:relatedObjects: {}
        f:versions: {}
    manager: dns-operator
    operation: Update
    subresource: status
    time: '2022-05-18T01:36:55Z'
  name: dns
  ownerReferences:
  - apiVersion: config.openshift.io/v1
    kind: ClusterVersion
    name: version
    uid: 46bbbb00-9d1f-4d2f-80ac-f874fea89e79
  resourceVersion: '61304'
  uid: 06ec1647-6916-4e4b-af6f-1a6f6ccac50b
spec: {}
status:
  conditions:
  - lastTransitionTime: '2022-05-18T01:37:14Z'
    message: DNS "default" is available.
    reason: AsExpected
    status: 'True'
    type: Available
  - lastTransitionTime: '2022-05-18T02:44:17Z'
    message: 'Upgrading operator to "4.11.0-0.nightly-2022-05-18-010528".

      Upgrading coredns to "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:26956e07a594b8665740d9cff7d9c30361ce8dbb1523a996c3aadf95ae77363b".

      Upgrading kube-rbac-proxy to "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:090ae5e7554012e1c0f1925f8dd7a02e110cb98f94d8774d3e17039115b8a109".'
    reason: Upgrading
    status: 'True'
    type: Progressing
  - lastTransitionTime: '2022-05-18T01:49:27Z'
    reason: DNSNotDegraded
    status: 'False'
    type: Degraded
  - lastTransitionTime: '2022-05-18T01:36:56Z'
    message: 'DNS default is upgradeable: DNS Operator can be upgraded'
    reason: DNSUpgradeable
    status: 'True'
    type: Upgradeable
  extension: null
  relatedObjects:
  - group: ''
    name: openshift-dns-operator
    resource: namespaces
  - group: operator.openshift.io
    name: default
    resource: dnses
  - group: ''
    name: openshift-dns
    resource: namespaces
  versions:
  - name: operator
    version: 4.11.0-0.nightly-2022-05-11-054135
  - name: coredns
    version: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:596c58ad0fb3a58712b27b051a95571d630374dc26d5a00afa7245b8c327de07
  - name: openshift-cli
    version: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9c25547a9165593735b7dacfbf6abbcaeb1ffc4cb941d2e0c0b65bea946bc008
  - name: kube-rbac-proxy
    version: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a175aec15d91bafafc84946593f9432736c3ef8643f1118fad49beb47d54cf57
{code}

We can see the same here, all dns pods are already updated:
{code}
[akaris@linux analysis-1526731236049948672]$  omg get pods -n openshift-dns | awk '/dns-default/ {print $1}' | while read p ; do omg get pod -n openshift-dns $p -o json | jq '.spec.containers[] | .image'; done
"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:26956e07a594b8665740d9cff7d9c30361ce8dbb1523a996c3aadf95ae77363b"
"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:090ae5e7554012e1c0f1925f8dd7a02e110cb98f94d8774d3e17039115b8a109"
"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:26956e07a594b8665740d9cff7d9c30361ce8dbb1523a996c3aadf95ae77363b"
"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:090ae5e7554012e1c0f1925f8dd7a02e110cb98f94d8774d3e17039115b8a109"
"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:26956e07a594b8665740d9cff7d9c30361ce8dbb1523a996c3aadf95ae77363b"
"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:090ae5e7554012e1c0f1925f8dd7a02e110cb98f94d8774d3e17039115b8a109"
"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:26956e07a594b8665740d9cff7d9c30361ce8dbb1523a996c3aadf95ae77363b"
"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:090ae5e7554012e1c0f1925f8dd7a02e110cb98f94d8774d3e17039115b8a109"
"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:26956e07a594b8665740d9cff7d9c30361ce8dbb1523a996c3aadf95ae77363b"
"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:090ae5e7554012e1c0f1925f8dd7a02e110cb98f94d8774d3e17039115b8a109"
"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:26956e07a594b8665740d9cff7d9c30361ce8dbb1523a996c3aadf95ae77363b"
"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:090ae5e7554012e1c0f1925f8dd7a02e110cb98f94d8774d3e17039115b8a109"
[akaris@linux analysis-1526731236049948672]$ omg get pods -n openshift-dns
NAME                 READY  STATUS   RESTARTS  AGE
dns-default-kmfjw    2/2    Running  0         2h4m
dns-default-n2tf6    2/2    Running  0         2h5m
dns-default-pq2zp    2/2    Running  0         2h5m
dns-default-tcgn7    2/2    Running  0         2h6m
dns-default-wtf8z    2/2    Running  0         2h7m
dns-default-z6hhn    2/2    Running  0         2h4m
node-resolver-kkjsz  1/1    Running  0         2h7m
node-resolver-mjnk9  1/1    Running  0         2h7m
node-resolver-ngqr2  1/1    Running  0         2h7m
node-resolver-w426l  1/1    Running  0         2h7m
node-resolver-x7wls  1/1    Running  0         2h7m
node-resolver-zxj8t  1/1    Running  0         2h7m
{code}

And the logs:
{code}
[akaris@linux analysis-1526731236049948672]$  omg logs -n openshift-dns-operator                 dns-operator-67f99d6557-2g6g5 -c dns-operator | tail -n 30
2022-05-18T02:42:36.954999590Z time="2022-05-18T02:42:36Z" level=info msg="reconciling request: /default"
2022-05-18T02:42:54.845966990Z time="2022-05-18T02:42:54Z" level=info msg="reconciling request: /default"
2022-05-18T02:42:54.932624410Z time="2022-05-18T02:42:54Z" level=info msg="updated DNS default status: old: v1.DNSStatus{ClusterIP:\"172.30.0.10\", ClusterDomain:\"cluster.local\", Conditions:[]v1.OperatorCondition{v1.OperatorCondition{Type:\"Degraded\", Status:\"False\", LastTransitionTime:time.Date(2022, time.May, 18, 1, 49, 27, 0, time.Local), Reason:\"AsExpected\", Message:\"Enough DNS pods are available, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Progressing\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 18, 2, 42, 36, 0, time.Local), Reason:\"Reconciling\", Message:\"Have 5 available DNS pods, want 6.\\nHave 4 up-to-date DNS pods, want 6.\"}, v1.OperatorCondition{Type:\"Available\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 18, 1, 37, 14, 0, time.Local), Reason:\"AsExpected\", Message:\"The DNS daemonset has available pods, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Upgradeable\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 18, 1, 36, 56, 0, time.Local), Reason:\"AsExpected\", Message:\"DNS Operator can be upgraded\"}}}, new: v1.DNSStatus{ClusterIP:\"172.30.0.10\", ClusterDomain:\"cluster.local\", Conditions:[]v1.OperatorCondition{v1.OperatorCondition{Type:\"Degraded\", Status:\"False\", LastTransitionTime:time.Date(2022, time.May, 18, 1, 49, 27, 0, time.Local), Reason:\"AsExpected\", Message:\"Enough DNS pods are available, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Progressing\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 18, 2, 42, 54, 0, time.Local), Reason:\"Reconciling\", Message:\"Have 4 up-to-date DNS pods, want 6.\"}, v1.OperatorCondition{Type:\"Available\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 18, 1, 37, 14, 0, time.Local), Reason:\"AsExpected\", Message:\"The DNS daemonset has available pods, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Upgradeable\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 18, 1, 36, 56, 0, time.Local), Reason:\"AsExpected\", Message:\"DNS Operator can be upgraded\"}}}"
2022-05-18T02:42:54.935050953Z time="2022-05-18T02:42:54Z" level=info msg="reconciling request: /default"
2022-05-18T02:43:15.631198953Z time="2022-05-18T02:43:15Z" level=info msg="reconciling request: /default"
2022-05-18T02:43:15.745417942Z time="2022-05-18T02:43:15Z" level=info msg="updated DNS default status: old: v1.DNSStatus{ClusterIP:\"172.30.0.10\", ClusterDomain:\"cluster.local\", Conditions:[]v1.OperatorCondition{v1.OperatorCondition{Type:\"Degraded\", Status:\"False\", LastTransitionTime:time.Date(2022, time.May, 18, 1, 49, 27, 0, time.Local), Reason:\"AsExpected\", Message:\"Enough DNS pods are available, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Progressing\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 18, 2, 42, 54, 0, time.Local), Reason:\"Reconciling\", Message:\"Have 4 up-to-date DNS pods, want 6.\"}, v1.OperatorCondition{Type:\"Available\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 18, 1, 37, 14, 0, time.Local), Reason:\"AsExpected\", Message:\"The DNS daemonset has available pods, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Upgradeable\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 18, 1, 36, 56, 0, time.Local), Reason:\"AsExpected\", Message:\"DNS Operator can be upgraded\"}}}, new: v1.DNSStatus{ClusterIP:\"172.30.0.10\", ClusterDomain:\"cluster.local\", Conditions:[]v1.OperatorCondition{v1.OperatorCondition{Type:\"Degraded\", Status:\"False\", LastTransitionTime:time.Date(2022, time.May, 18, 1, 49, 27, 0, time.Local), Reason:\"AsExpected\", Message:\"Enough DNS pods are available, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Progressing\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 18, 2, 43, 15, 0, time.Local), Reason:\"Reconciling\", Message:\"Have 5 available DNS pods, want 6.\\nHave 4 up-to-date DNS pods, want 6.\"}, v1.OperatorCondition{Type:\"Available\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 18, 1, 37, 14, 0, time.Local), Reason:\"AsExpected\", Message:\"The DNS daemonset has available pods, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Upgradeable\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 18, 1, 36, 56, 0, time.Local), Reason:\"AsExpected\", Message:\"DNS Operator can be upgraded\"}}}"
2022-05-18T02:43:15.748076288Z time="2022-05-18T02:43:15Z" level=info msg="reconciling request: /default"
2022-05-18T02:43:15.973438911Z time="2022-05-18T02:43:15Z" level=info msg="reconciling request: /default"
2022-05-18T02:43:16.067891155Z time="2022-05-18T02:43:16Z" level=info msg="updated DNS default status: old: v1.DNSStatus{ClusterIP:\"172.30.0.10\", ClusterDomain:\"cluster.local\", Conditions:[]v1.OperatorCondition{v1.OperatorCondition{Type:\"Degraded\", Status:\"False\", LastTransitionTime:time.Date(2022, time.May, 18, 1, 49, 27, 0, time.Local), Reason:\"AsExpected\", Message:\"Enough DNS pods are available, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Progressing\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 18, 2, 43, 15, 0, time.Local), Reason:\"Reconciling\", Message:\"Have 5 available DNS pods, want 6.\\nHave 4 up-to-date DNS pods, want 6.\"}, v1.OperatorCondition{Type:\"Available\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 18, 1, 37, 14, 0, time.Local), Reason:\"AsExpected\", Message:\"The DNS daemonset has available pods, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Upgradeable\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 18, 1, 36, 56, 0, time.Local), Reason:\"AsExpected\", Message:\"DNS Operator can be upgraded\"}}}, new: v1.DNSStatus{ClusterIP:\"172.30.0.10\", ClusterDomain:\"cluster.local\", Conditions:[]v1.OperatorCondition{v1.OperatorCondition{Type:\"Degraded\", Status:\"False\", LastTransitionTime:time.Date(2022, time.May, 18, 1, 49, 27, 0, time.Local), Reason:\"AsExpected\", Message:\"Enough DNS pods are available, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Progressing\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 18, 2, 43, 16, 0, time.Local), Reason:\"Reconciling\", Message:\"Have 5 available DNS pods, want 6.\\nHave 5 up-to-date DNS pods, want 6.\"}, v1.OperatorCondition{Type:\"Available\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 18, 1, 37, 14, 0, time.Local), Reason:\"AsExpected\", Message:\"The DNS daemonset has available pods, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Upgradeable\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 18, 1, 36, 56, 0, time.Local), Reason:\"AsExpected\", Message:\"DNS Operator can be upgraded\"}}}"
2022-05-18T02:43:16.070968109Z time="2022-05-18T02:43:16Z" level=info msg="reconciling request: /default"
2022-05-18T02:44:15.791945235Z W0518 02:44:15.791860       1 reflector.go:442] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: watch of *v1.Node ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2022-05-18T02:44:15.791945235Z W0518 02:44:15.791862       1 reflector.go:442] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: watch of *v1.Service ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2022-05-18T02:44:15.791945235Z W0518 02:44:15.791884       1 reflector.go:442] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: watch of *v1.Service ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2022-05-18T02:44:15.791945235Z W0518 02:44:15.791933       1 reflector.go:442] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: watch of *v1.Service ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2022-05-18T02:44:15.792019036Z W0518 02:44:15.791956       1 reflector.go:442] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: watch of *v1.ConfigMap ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2022-05-18T02:44:15.792019036Z W0518 02:44:15.791963       1 reflector.go:442] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: watch of *v1.DaemonSet ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2022-05-18T02:44:15.792019036Z W0518 02:44:15.791967       1 reflector.go:442] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: watch of *v1.ClusterOperator ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2022-05-18T02:44:15.792019036Z W0518 02:44:15.791979       1 reflector.go:442] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: watch of *v1.DNS ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2022-05-18T02:44:15.792019036Z W0518 02:44:15.791993       1 reflector.go:442] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: watch of *v1.ConfigMap ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2022-05-18T02:44:15.792019036Z W0518 02:44:15.792001       1 reflector.go:442] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: watch of *v1.Pod ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2022-05-18T02:44:15.792019036Z W0518 02:44:15.792003       1 reflector.go:442] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: watch of *v1.DaemonSet ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2022-05-18T02:44:15.792019036Z W0518 02:44:15.792009       1 reflector.go:442] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: watch of *v1.ConfigMap ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2022-05-18T02:44:15.792118238Z W0518 02:44:15.792073       1 reflector.go:442] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: watch of *v1.DaemonSet ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2022-05-18T02:44:16.672301571Z time="2022-05-18T02:44:16Z" level=info msg="reconciling request: /default"
2022-05-18T02:44:16.747104965Z time="2022-05-18T02:44:16Z" level=info msg="updated DNS default status: old: v1.DNSStatus{ClusterIP:\"172.30.0.10\", ClusterDomain:\"cluster.local\", Conditions:[]v1.OperatorCondition{v1.OperatorCondition{Type:\"Degraded\", Status:\"False\", LastTransitionTime:time.Date(2022, time.May, 18, 1, 49, 27, 0, time.Local), Reason:\"AsExpected\", Message:\"Enough DNS pods are available, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Progressing\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 18, 2, 43, 16, 0, time.Local), Reason:\"Reconciling\", Message:\"Have 5 available DNS pods, want 6.\\nHave 5 up-to-date DNS pods, want 6.\"}, v1.OperatorCondition{Type:\"Available\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 18, 1, 37, 14, 0, time.Local), Reason:\"AsExpected\", Message:\"The DNS daemonset has available pods, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Upgradeable\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 18, 1, 36, 56, 0, time.Local), Reason:\"AsExpected\", Message:\"DNS Operator can be upgraded\"}}}, new: v1.DNSStatus{ClusterIP:\"172.30.0.10\", ClusterDomain:\"cluster.local\", Conditions:[]v1.OperatorCondition{v1.OperatorCondition{Type:\"Degraded\", Status:\"False\", LastTransitionTime:time.Date(2022, time.May, 18, 1, 49, 27, 0, time.Local), Reason:\"AsExpected\", Message:\"Enough DNS pods are available, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Progressing\", Status:\"False\", LastTransitionTime:time.Date(2022, time.May, 18, 2, 44, 16, 0, time.Local), Reason:\"AsExpected\", Message:\"All DNS and node-resolver pods are available, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Available\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 18, 1, 37, 14, 0, time.Local), Reason:\"AsExpected\", Message:\"The DNS daemonset has available pods, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Upgradeable\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 18, 1, 36, 56, 0, time.Local), Reason:\"AsExpected\", Message:\"DNS Operator can be upgraded\"}}}"
2022-05-18T02:44:16.750168018Z time="2022-05-18T02:44:16Z" level=info msg="reconciling request: /default"
2022-05-18T02:44:16.813141408Z time="2022-05-18T02:44:16Z" level=info msg="reconciling request: /default"
2022-05-18T02:44:17.098547647Z time="2022-05-18T02:44:17Z" level=info msg="reconciling request: /default"
2022-05-18T02:44:17.164908296Z time="2022-05-18T02:44:17Z" level=info msg="reconciling request: /default"
{code}

Comment 2 Andreas Karis 2022-06-03 12:52:54 UTC

And the same here - run 1525510596311650304]
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-upgrade-from-stable-4.10-e2e-aws-ovn-upgrade/1525510596311650304

{code}
[akaris@linux analysis-1525510596311650304]$ omg get co | grep -v 'True       False        False'
NAME                                      VERSION                        AVAILABLE  PROGRESSING  DEGRADED  SINCE
dns                                       4.10.14                        True       True         False     1h58m
[akaris@linux analysis-1525510596311650304]$ omg get co dns -o yaml
apiVersion: config.openshift.io/v1
kind: ClusterOperator
metadata:
  annotations:
    include.release.openshift.io/ibm-cloud-managed: 'true'
    include.release.openshift.io/self-managed-high-availability: 'true'
    include.release.openshift.io/single-node-developer: 'true'
  creationTimestamp: '2022-05-14T16:32:20Z'
  generation: 1
  managedFields:
  - apiVersion: config.openshift.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:include.release.openshift.io/ibm-cloud-managed: {}
          f:include.release.openshift.io/self-managed-high-availability: {}
          f:include.release.openshift.io/single-node-developer: {}
        f:ownerReferences:
          .: {}
          k:{"uid":"8df8df3a-7725-47b7-9326-50bdbed53979"}: {}
      f:spec: {}
    manager: Go-http-client
    operation: Update
    time: '2022-05-14T16:32:20Z'
  - apiVersion: config.openshift.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        .: {}
        f:extension: {}
    manager: Go-http-client
    operation: Update
    subresource: status
    time: '2022-05-14T16:32:20Z'
  - apiVersion: config.openshift.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        f:conditions: {}
        f:relatedObjects: {}
        f:versions: {}
    manager: dns-operator
    operation: Update
    subresource: status
    time: '2022-05-14T16:38:12Z'
  name: dns
  ownerReferences:
  - apiVersion: config.openshift.io/v1
    kind: ClusterVersion
    name: version
    uid: 8df8df3a-7725-47b7-9326-50bdbed53979
  resourceVersion: '57174'
  uid: 68846a48-eed5-442f-8e5c-a04e9e1cc21e
spec: {}
status:
  conditions:
  - lastTransitionTime: '2022-05-14T16:38:30Z'
    message: DNS "default" is available.
    reason: AsExpected
    status: 'True'
    type: Available
  - lastTransitionTime: '2022-05-14T17:35:42Z'
    message: 'Upgrading operator to "4.11.0-0.ci-2022-05-14-160619".

      Upgrading coredns to "registry.ci.openshift.org/ocp/4.11-2022-05-14-160619@sha256:c371180cfc6ba0c7ccf4d1b5da89beee8b2ea575e6e89bc89f06280884753236".

      Upgrading kube-rbac-proxy to "registry.ci.openshift.org/ocp/4.11-2022-05-14-160619@sha256:5b01b4dccbca6d9f3526d861b92cb64885a3bd748a508bd1228ec10170a4485c".'
    reason: Upgrading
    status: 'True'
    type: Progressing
  - lastTransitionTime: '2022-05-14T16:45:16Z'
    reason: DNSNotDegraded
    status: 'False'
    type: Degraded
  - lastTransitionTime: '2022-05-14T16:38:12Z'
    message: 'DNS default is upgradeable: DNS Operator can be upgraded'
    reason: DNSUpgradeable
    status: 'True'
    type: Upgradeable
  extension: null
  relatedObjects:
  - group: ''
    name: openshift-dns-operator
    resource: namespaces
  - group: operator.openshift.io
    name: default
    resource: dnses
  - group: ''
    name: openshift-dns
    resource: namespaces
  versions:
  - name: operator
    version: 4.10.14
  - name: coredns
    version: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ceb0d1d2015b87e9daf3e57b93f5464f15a1386a6bcab5442b7dba594b058b24
  - name: openshift-cli
    version: registry.ci.openshift.org/ocp/4.11-2022-05-14-160619@sha256:b634a8ede5ffec8e4068475de9746424e34f73416959a241592736fd1cdf5ab8
  - name: kube-rbac-proxy
    version: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:08e8b4004edaeeb125ced09ab2c4cd6d690afaf3a86309c91a994dec8e3ccbf3

[akaris@linux analysis-1525510596311650304]$  omg get pods -n openshift-dns | awk '/dns-default/ {print $1}' | while read p ; do omg get pod -n openshift-dns $p -o json | jq '.spec.containers[] | .image'; done
"registry.ci.openshift.org/ocp/4.11-2022-05-14-160619@sha256:c371180cfc6ba0c7ccf4d1b5da89beee8b2ea575e6e89bc89f06280884753236"
"registry.ci.openshift.org/ocp/4.11-2022-05-14-160619@sha256:5b01b4dccbca6d9f3526d861b92cb64885a3bd748a508bd1228ec10170a4485c"
"registry.ci.openshift.org/ocp/4.11-2022-05-14-160619@sha256:c371180cfc6ba0c7ccf4d1b5da89beee8b2ea575e6e89bc89f06280884753236"
"registry.ci.openshift.org/ocp/4.11-2022-05-14-160619@sha256:5b01b4dccbca6d9f3526d861b92cb64885a3bd748a508bd1228ec10170a4485c"
"registry.ci.openshift.org/ocp/4.11-2022-05-14-160619@sha256:c371180cfc6ba0c7ccf4d1b5da89beee8b2ea575e6e89bc89f06280884753236"
"registry.ci.openshift.org/ocp/4.11-2022-05-14-160619@sha256:5b01b4dccbca6d9f3526d861b92cb64885a3bd748a508bd1228ec10170a4485c"
"registry.ci.openshift.org/ocp/4.11-2022-05-14-160619@sha256:c371180cfc6ba0c7ccf4d1b5da89beee8b2ea575e6e89bc89f06280884753236"
"registry.ci.openshift.org/ocp/4.11-2022-05-14-160619@sha256:5b01b4dccbca6d9f3526d861b92cb64885a3bd748a508bd1228ec10170a4485c"
"registry.ci.openshift.org/ocp/4.11-2022-05-14-160619@sha256:c371180cfc6ba0c7ccf4d1b5da89beee8b2ea575e6e89bc89f06280884753236"
"registry.ci.openshift.org/ocp/4.11-2022-05-14-160619@sha256:5b01b4dccbca6d9f3526d861b92cb64885a3bd748a508bd1228ec10170a4485c"
"registry.ci.openshift.org/ocp/4.11-2022-05-14-160619@sha256:c371180cfc6ba0c7ccf4d1b5da89beee8b2ea575e6e89bc89f06280884753236"
"registry.ci.openshift.org/ocp/4.11-2022-05-14-160619@sha256:5b01b4dccbca6d9f3526d861b92cb64885a3bd748a508bd1228ec10170a4485c"
[akaris@linux analysis-1525510596311650304]$ omg get pods -n openshift-dns
NAME                 READY  STATUS   RESTARTS  AGE
dns-default-249dg    2/2    Running  0         2h1m
dns-default-czz5x    2/2    Running  0         1h59m
dns-default-dcjm7    2/2    Running  0         1h58m
dns-default-l4vc9    2/2    Running  0         2h0m
dns-default-mprr6    2/2    Running  0         2h2m
dns-default-vjpn2    2/2    Running  0         2h0m
node-resolver-4r7tx  1/1    Running  0         2h2m
node-resolver-754hp  1/1    Running  0         2h2m
node-resolver-c5dnq  1/1    Running  0         2h2m
node-resolver-jf5l4  1/1    Running  0         2h2m
node-resolver-sx4nq  1/1    Running  0         2h2m
node-resolver-zstzk  1/1    Running  0         2h2m
[akaris@linux analysis-1525510596311650304]$ omg logs -n openshift-dns-operator                 dns-operator-67f99d6557-2g6g5 -c dns-operator | tail -n 30
[ERROR] Pod directory not found: /home/akaris/cases/dns-operator/analysis-1525510596311650304/registry-ci-openshift-org-ocp-4-11-2022-05-14-160619-sha256-090ae5109f7a1d071e12a49ae62460328b1bbe39e4bf4a3ff909f35629ae07a2/namespaces/openshift-dns-operator/pods/dns-operator-67f99d6557-2g6g5
[akaris@linux analysis-1525510596311650304]$ omg get pods -n openshift-dns-operator
NAME                           READY  STATUS   RESTARTS  AGE
dns-operator-5d5bf79f5d-5llxf  2/2    Running  0         2h2m
[akaris@linux analysis-1525510596311650304]$ omg logs -n openshift-dns-operator dns-operator-5d5bf79f5d-5llxf | tail -n 30
[ERROR] This pod has more than one containers:
        ['dns-operator', 'kube-rbac-proxy']
        Use -c/--container to specify the container
[akaris@linux analysis-1525510596311650304]$ omg logs -n openshift-dns-operator dns-operator-5d5bf79f5d-5llxf -c dns-operator | tail -n 30 
2022-05-14T17:34:02.952228784Z time="2022-05-14T17:34:02Z" level=info msg="updated DNS default status: old: v1.DNSStatus{ClusterIP:\"172.30.0.10\", ClusterDomain:\"cluster.local\", Conditions:[]v1.OperatorCondition{v1.OperatorCondition{Type:\"Degraded\", Status:\"False\", LastTransitionTime:time.Date(2022, time.May, 14, 16, 45, 16, 0, time.Local), Reason:\"AsExpected\", Message:\"Enough DNS pods are available, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Progressing\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 14, 17, 33, 41, 0, time.Local), Reason:\"Reconciling\", Message:\"Have 3 up-to-date DNS pods, want 6.\"}, v1.OperatorCondition{Type:\"Available\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 14, 16, 38, 30, 0, time.Local), Reason:\"AsExpected\", Message:\"The DNS daemonset has available pods, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Upgradeable\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 14, 16, 38, 12, 0, time.Local), Reason:\"AsExpected\", Message:\"DNS Operator can be upgraded\"}}}, new: v1.DNSStatus{ClusterIP:\"172.30.0.10\", ClusterDomain:\"cluster.local\", Conditions:[]v1.OperatorCondition{v1.OperatorCondition{Type:\"Degraded\", Status:\"False\", LastTransitionTime:time.Date(2022, time.May, 14, 16, 45, 16, 0, time.Local), Reason:\"AsExpected\", Message:\"Enough DNS pods are available, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Progressing\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 14, 17, 34, 2, 0, time.Local), Reason:\"Reconciling\", Message:\"Have 5 available DNS pods, want 6.\\nHave 4 up-to-date DNS pods, want 6.\"}, v1.OperatorCondition{Type:\"Available\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 14, 16, 38, 30, 0, time.Local), Reason:\"AsExpected\", Message:\"The DNS daemonset has available pods, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Upgradeable\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 14, 16, 38, 12, 0, time.Local), Reason:\"AsExpected\", Message:\"DNS Operator can be upgraded\"}}}"
2022-05-14T17:34:02.955049734Z time="2022-05-14T17:34:02Z" level=info msg="reconciling request: /default"
2022-05-14T17:34:20.918978128Z time="2022-05-14T17:34:20Z" level=info msg="reconciling request: /default"
2022-05-14T17:34:21.008638550Z time="2022-05-14T17:34:21Z" level=info msg="updated DNS default status: old: v1.DNSStatus{ClusterIP:\"172.30.0.10\", ClusterDomain:\"cluster.local\", Conditions:[]v1.OperatorCondition{v1.OperatorCondition{Type:\"Degraded\", Status:\"False\", LastTransitionTime:time.Date(2022, time.May, 14, 16, 45, 16, 0, time.Local), Reason:\"AsExpected\", Message:\"Enough DNS pods are available, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Progressing\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 14, 17, 34, 2, 0, time.Local), Reason:\"Reconciling\", Message:\"Have 5 available DNS pods, want 6.\\nHave 4 up-to-date DNS pods, want 6.\"}, v1.OperatorCondition{Type:\"Available\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 14, 16, 38, 30, 0, time.Local), Reason:\"AsExpected\", Message:\"The DNS daemonset has available pods, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Upgradeable\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 14, 16, 38, 12, 0, time.Local), Reason:\"AsExpected\", Message:\"DNS Operator can be upgraded\"}}}, new: v1.DNSStatus{ClusterIP:\"172.30.0.10\", ClusterDomain:\"cluster.local\", Conditions:[]v1.OperatorCondition{v1.OperatorCondition{Type:\"Degraded\", Status:\"False\", LastTransitionTime:time.Date(2022, time.May, 14, 16, 45, 16, 0, time.Local), Reason:\"AsExpected\", Message:\"Enough DNS pods are available, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Progressing\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 14, 17, 34, 21, 0, time.Local), Reason:\"Reconciling\", Message:\"Have 4 up-to-date DNS pods, want 6.\"}, v1.OperatorCondition{Type:\"Available\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 14, 16, 38, 30, 0, time.Local), Reason:\"AsExpected\", Message:\"The DNS daemonset has available pods, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Upgradeable\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 14, 16, 38, 12, 0, time.Local), Reason:\"AsExpected\", Message:\"DNS Operator can be upgraded\"}}}"
2022-05-14T17:34:21.013241525Z time="2022-05-14T17:34:21Z" level=info msg="reconciling request: /default"
2022-05-14T17:34:42.326082681Z time="2022-05-14T17:34:42Z" level=info msg="reconciling request: /default"
2022-05-14T17:34:42.467891086Z time="2022-05-14T17:34:42Z" level=info msg="updated DNS default status: old: v1.DNSStatus{ClusterIP:\"172.30.0.10\", ClusterDomain:\"cluster.local\", Conditions:[]v1.OperatorCondition{v1.OperatorCondition{Type:\"Degraded\", Status:\"False\", LastTransitionTime:time.Date(2022, time.May, 14, 16, 45, 16, 0, time.Local), Reason:\"AsExpected\", Message:\"Enough DNS pods are available, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Progressing\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 14, 17, 34, 21, 0, time.Local), Reason:\"Reconciling\", Message:\"Have 4 up-to-date DNS pods, want 6.\"}, v1.OperatorCondition{Type:\"Available\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 14, 16, 38, 30, 0, time.Local), Reason:\"AsExpected\", Message:\"The DNS daemonset has available pods, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Upgradeable\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 14, 16, 38, 12, 0, time.Local), Reason:\"AsExpected\", Message:\"DNS Operator can be upgraded\"}}}, new: v1.DNSStatus{ClusterIP:\"172.30.0.10\", ClusterDomain:\"cluster.local\", Conditions:[]v1.OperatorCondition{v1.OperatorCondition{Type:\"Degraded\", Status:\"False\", LastTransitionTime:time.Date(2022, time.May, 14, 16, 45, 16, 0, time.Local), Reason:\"AsExpected\", Message:\"Enough DNS pods are available, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Progressing\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 14, 17, 34, 42, 0, time.Local), Reason:\"Reconciling\", Message:\"Have 5 available DNS pods, want 6.\\nHave 5 up-to-date DNS pods, want 6.\"}, v1.OperatorCondition{Type:\"Available\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 14, 16, 38, 30, 0, time.Local), Reason:\"AsExpected\", Message:\"The DNS daemonset has available pods, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Upgradeable\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 14, 16, 38, 12, 0, time.Local), Reason:\"AsExpected\", Message:\"DNS Operator can be upgraded\"}}}"
2022-05-14T17:34:42.472562810Z time="2022-05-14T17:34:42Z" level=info msg="reconciling request: /default"
2022-05-14T17:35:41.511932339Z W0514 17:35:41.511893       1 reflector.go:442] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: watch of *v1.Service ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2022-05-14T17:35:41.512072961Z W0514 17:35:41.512053       1 reflector.go:442] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: watch of *v1.ClusterOperator ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2022-05-14T17:35:41.512106215Z W0514 17:35:41.512089       1 reflector.go:442] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: watch of *v1.DaemonSet ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2022-05-14T17:35:41.512139893Z W0514 17:35:41.512117       1 reflector.go:442] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: watch of *v1.ConfigMap ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2022-05-14T17:35:41.512167522Z W0514 17:35:41.512148       1 reflector.go:442] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: watch of *v1.Node ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2022-05-14T17:35:41.512232053Z time="2022-05-14T17:35:41Z" level=error msg="failed to ensure default dns Get \"https://172.30.0.1:443/apis/operator.openshift.io/v1/dnses/default\": http2: client connection lost"
2022-05-14T17:35:41.512260744Z W0514 17:35:41.512247       1 reflector.go:442] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: watch of *v1.Service ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2022-05-14T17:35:41.512318475Z W0514 17:35:41.512304       1 reflector.go:442] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: watch of *v1.DNS ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2022-05-14T17:35:41.512340839Z W0514 17:35:41.512325       1 reflector.go:442] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: watch of *v1.ConfigMap ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2022-05-14T17:35:41.512352962Z W0514 17:35:41.512348       1 reflector.go:442] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: watch of *v1.ConfigMap ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2022-05-14T17:35:41.512381130Z W0514 17:35:41.512371       1 reflector.go:442] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: watch of *v1.DaemonSet ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2022-05-14T17:35:41.512427727Z W0514 17:35:41.512415       1 reflector.go:442] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: watch of *v1.DaemonSet ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2022-05-14T17:35:41.512458488Z W0514 17:35:41.512451       1 reflector.go:442] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: watch of *v1.Service ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2022-05-14T17:35:41.512496706Z W0514 17:35:41.512477       1 reflector.go:442] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: watch of *v1.Pod ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
2022-05-14T17:35:42.365704114Z time="2022-05-14T17:35:42Z" level=info msg="reconciling request: /default"
2022-05-14T17:35:42.451387428Z time="2022-05-14T17:35:42Z" level=info msg="updated DNS default status: old: v1.DNSStatus{ClusterIP:\"172.30.0.10\", ClusterDomain:\"cluster.local\", Conditions:[]v1.OperatorCondition{v1.OperatorCondition{Type:\"Degraded\", Status:\"False\", LastTransitionTime:time.Date(2022, time.May, 14, 16, 45, 16, 0, time.Local), Reason:\"AsExpected\", Message:\"Enough DNS pods are available, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Progressing\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 14, 17, 34, 42, 0, time.Local), Reason:\"Reconciling\", Message:\"Have 5 available DNS pods, want 6.\\nHave 5 up-to-date DNS pods, want 6.\"}, v1.OperatorCondition{Type:\"Available\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 14, 16, 38, 30, 0, time.Local), Reason:\"AsExpected\", Message:\"The DNS daemonset has available pods, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Upgradeable\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 14, 16, 38, 12, 0, time.Local), Reason:\"AsExpected\", Message:\"DNS Operator can be upgraded\"}}}, new: v1.DNSStatus{ClusterIP:\"172.30.0.10\", ClusterDomain:\"cluster.local\", Conditions:[]v1.OperatorCondition{v1.OperatorCondition{Type:\"Degraded\", Status:\"False\", LastTransitionTime:time.Date(2022, time.May, 14, 16, 45, 16, 0, time.Local), Reason:\"AsExpected\", Message:\"Enough DNS pods are available, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Progressing\", Status:\"False\", LastTransitionTime:time.Date(2022, time.May, 14, 17, 35, 42, 0, time.Local), Reason:\"AsExpected\", Message:\"All DNS and node-resolver pods are available, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Available\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 14, 16, 38, 30, 0, time.Local), Reason:\"AsExpected\", Message:\"The DNS daemonset has available pods, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Upgradeable\", Status:\"True\", LastTransitionTime:time.Date(2022, time.May, 14, 16, 38, 12, 0, time.Local), Reason:\"AsExpected\", Message:\"DNS Operator can be upgraded\"}}}"
2022-05-14T17:35:42.454446312Z time="2022-05-14T17:35:42Z" level=info msg="reconciling request: /default"
2022-05-14T17:35:42.572214603Z time="2022-05-14T17:35:42Z" level=info msg="reconciling request: /default"
2022-05-14T17:35:42.904078610Z time="2022-05-14T17:35:42Z" level=info msg="reconciling request: /default"
2022-05-14T17:35:42.949765309Z time="2022-05-14T17:35:42Z" level=info msg="reconciling request: /default"
2022-05-14T17:35:43.033849849Z time="2022-05-14T17:35:43Z" level=info msg="reconciling request: /default"

[akaris@linux analysis-1525510596311650304]$ omg get clusterversion
NAME     VERSION  AVAILABLE  PROGRESSING  SINCE  STATUS
version           True       True         2h3m   Working towards 4.11.0-0.ci-2022-05-14-160619: 658 of 802 done (82% complete)
[akaris@linux analysis-1525510596311650304]$ 

{code}

Comment 3 Andreas Karis 2022-06-03 17:33:44 UTC

Just some further info - we can indeed see that the DNS object of name "default" is correctly updated:
~~~
[akaris@linux analysis-2]$ cat registry-ci-openshift-org-ocp-4-11-2022-05-16-095559-sha256-090ae5109f7a1d071e12a49ae62460328b1bbe39e4bf4a3ff909f35629ae07a2/cluster-scoped-resources/operator.openshift.io/dnses/default.yaml
---
apiVersion: operator.openshift.io/v1
kind: DNS
metadata:
  creationTimestamp: "2022-05-16T10:19:50Z"
  finalizers:
  - dns.operator.openshift.io/dns-controller
  generation: 1
  managedFields:
  - apiVersion: operator.openshift.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
          .: {}
          v:"dns.operator.openshift.io/dns-controller": {}
      f:spec:
        .: {}
        f:logLevel: {}
        f:nodePlacement: {}
        f:operatorLogLevel: {}
        f:upstreamResolvers:
          .: {}
          f:policy: {}
          f:upstreams: {}
    manager: dns-operator
    operation: Update
    time: "2022-05-16T10:19:50Z"
  - apiVersion: operator.openshift.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        .: {}
        f:clusterDomain: {}
        f:clusterIP: {}
        f:conditions: {}
    manager: dns-operator
    operation: Update
    subresource: status
    time: "2022-05-16T10:19:50Z"
  name: default
  resourceVersion: "57484"
  uid: 56af9aae-2124-4840-a378-2b3847073df6
spec:
  logLevel: Normal
  nodePlacement: {}
  operatorLogLevel: Normal
  upstreamResolvers:
    policy: Sequential
    upstreams:
    - port: 53
      type: SystemResolvConf
status:
  clusterDomain: cluster.local
  clusterIP: 172.30.0.10
  conditions:
  - lastTransitionTime: "2022-05-16T10:28:41Z"
    message: Enough DNS pods are available, and the DNS service has a cluster IP address.
    reason: AsExpected
    status: "False"
    type: Degraded
  - lastTransitionTime: "2022-05-16T11:20:41Z"
    message: All DNS and node-resolver pods are available, and the DNS service has
      a cluster IP address.
    reason: AsExpected
    status: "False"
    type: Progressing
  - lastTransitionTime: "2022-05-16T10:20:05Z"
    message: The DNS daemonset has available pods, and the DNS service has a cluster
      IP address.
    reason: AsExpected
    status: "True"
    type: Available
  - lastTransitionTime: "2022-05-16T10:19:50Z"
    message: DNS Operator can be upgraded
    reason: AsExpected
    status: "True"
    type: Upgradeable
~~~

Comment 4 Andreas Karis 2022-06-03 17:35:00 UTC

So this here works:
https://github.com/openshift/cluster-dns-operator/blob/d50df32df68f53c1d47db8f5e51a8b27c402f278/pkg/operator/controller/dns_status.go#L36

This here does not:
https://github.com/openshift/cluster-dns-operator/blob/d50df32df68f53c1d47db8f5e51a8b27c402f278/pkg/operator/controller/status/controller.go#L175

Comment 5 Miciah Dashiel Butler Masters 2022-06-07 16:16:09 UTC

Possibly related to <https://github.com/openshift/cluster-dns-operator/pull/318>.

Comment 8 Melvin Joseph 2022-06-14 08:48:35 UTC

From the prowci, the jobs are passing for those mentioned profiles.

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-upgrade-from-stable-4.10-e2e-aws-ovn-upgrade/1536577259970760704

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-e2e-azure-ovn-upgrade/1536550068805439488

Hence marking as verified.

Comment 10 Miciah Dashiel Butler Masters 2022-06-17 13:20:08 UTC

The issue was introduced in 4.11 by https://github.com/openshift/cluster-dns-operator/pull/318 and was fixed before we shipped a release with the issue, so no doc text is needed.

Comment 11 errata-xmlrpc 2022-08-10 11:16:16 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069