Bug 1952174 - DNS operator claims to be done upgrading before it even starts
Summary: DNS operator claims to be done upgrading before it even starts
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: DNS
Version: 4.5
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.8.0
Assignee: Miciah Dashiel Butler Masters
QA Contact: Arvind iyengar
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-21 16:52 UTC by W. Trevor King
Modified: 2021-11-26 11:56 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 23:02:20 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-dns-operator pull 269 0 None Merged Bug 1952174: status: Report old versions while progressing 2022-02-22 06:14:25 UTC
Github openshift cluster-dns-operator pull 274 0 None Merged Bug 1952174: status: Report old versions while rolling out new 2022-02-22 06:14:23 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 23:02:40 UTC

Description W. Trevor King 2021-04-21 16:52:30 UTC
Similar to bug 1928157, but different operator.  Example update from 4.7.8 to 4.8.0-0.ci-2021-04-21-123839 [1]:

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade/1384851693719523328/artifacts/e2e-aws-upgrade/openshift-e2e-test/artifacts/e2e.log | grep clusteroperator/dns
  Apr 21 14:18:02.677 I clusteroperator/dns versions: operator 4.7.8 -> 4.8.0-0.ci-2021-04-21-123839, coredns quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7a9a3f1de9e270b76dc7c384d4bbf740ad62c5cd365e5319f376b6db8c182360 -> registry.ci.openshift.org/ocp/4.8-2021-04-21-123839@sha256:d247f824e85acc1c6eb145fc59a8a3267ead8a2707f4731141761b7cbd5aa6d6, openshift-cli quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d5711192410a93070ddee1c39ef4cde0bbed7a72230fe7cfbf31fb0a1488ba03 -> registry.ci.openshift.org/ocp/4.8-2021-04-21-123839@sha256:5047e6868ffbd00e22ca0aa1ceee3bfb49bb3837962f85950b739fb13e23e637, kube-rbac-proxy quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:88a6ca499208c40f88d0b60294e7331846cfa36e941ff6d0bb2956172a54880e -> registry.ci.openshift.org/ocp/4.8-2021-04-21-123839@sha256:5584994d300418ece9fc5036bd931f560f665c8f77bb5bad27795eb80c6adc20
  Apr 21 14:18:03.463 E clusteroperator/dns condition/Degraded status/True reason/DNSDegraded changed: DNS default is degraded
  Apr 21 14:18:03.463 - 6s    E clusteroperator/dns condition/Degraded status/True reason/DNS default is degraded
  ...

Going Degraded=True at all is being hashed out in bug 1939723, this bug is just about the timing of the 'versions' bump.  The update started at 13:30Z:

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade/1384851693719523328/artifacts/e2e-aws-upgrade/gather-extra/artifacts/clusterversion.json | jq -r '.items[].status.history[] | .startedTime + " " + (.completionTime // "-") + .state + " " + .version'
  2021-04-21T13:30:27Z 2021-04-21T14:42:14ZCompleted 4.8.0-0.ci-2021-04-21-123839
  2021-04-21T13:02:50Z 2021-04-21T13:26:26ZCompleted 4.7.8

Looking at the relevant namespace, the cluster-version operator bumps the DNS operator at 14:17:55Z.  From above, the DNS operator bumps the ClusterOperator versions at 14:18:02Z.  The first new node-resolver pod is created at 14:18:03Z.

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade/1384851693719523328/artifacts/e2e-aws-upgrade/gather-extra/artifacts/events.json | jq -r '.items[] | select((.metadata.namespace | contains("dns")) and .reason == "SuccessfulCreate") | .firstTimestamp + " " + (.count | tostring) + " " + .message' | sort
  2021-04-21T13:03:11Z 1 Created pod: dns-operator-576bb9fc75-974r9
  2021-04-21T13:08:43Z 1 Created pod: dns-default-m8qmg
  ...
  2021-04-21T13:15:26Z 1 Created pod: dns-default-nl252
  2021-04-21T14:17:55Z 1 Created pod: dns-operator-8b55dc57-zbp2n
  2021-04-21T14:18:03Z 1 Created pod: node-resolver-22b4n
  ...
  2021-04-21T14:18:03Z 1 Created pod: node-resolver-mnfg8
  2021-04-21T14:18:36Z 1 Created pod: dns-default-j49sm
  ...
  2021-04-21T14:19:57Z 1 Created pod: dns-default-cjqmr
  2021-04-21T14:29:00Z 1 Created pod: dns-operator-8b55dc57-zwgvx

The DNS operator needs to delay the versions bump until the operands have leveled.  For example, from [2]:

  An operator reports a new "operator" version when it has rolled out the new version to all of its operands.

I.e. once all the outdated versions are gone.

[1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade/1384851693719523328
[2]: https://github.com/openshift/api/blob/a99ffa1cac6709edf8f502b16890b16f9a557e00/config/v1/types_cluster_operator.go#L43-L47

Comment 1 W. Trevor King 2021-04-21 17:27:37 UTC
Setting high and blocker+, per Clayton.

Comment 5 Arvind iyengar 2021-05-04 09:57:10 UTC
In the recent upgrade CI jobs post the PR merge, it is noted that there is a sufficient delay seen with the DNS operator before the version bump for the operands to level off. 

Most recent jobs as of writing:
------
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade/1388882557772238848/artifacts/e2e-aws-upgrade/
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade/1388974264626974720/artifacts/e2e-aws-upgrade/
------

Excerpts from the logs:
------
Upgrade timeframe:
curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade/1388882557772238848/artifacts/e2e-aws-upgrade/gather-extra/artifacts/clusterversion.json | jq -r '.items[].status.history[] | .startedTime + " " + (.completionTime // "-") + .state + " " + .version'

2021-05-02T16:30:09Z 2021-05-02T17:43:34ZCompleted 4.8.0-0.ci-2021-05-02-153814
2021-05-02T15:56:10Z 2021-05-02T16:25:58ZCompleted 4.7.9

DNS operator upgrade timeframe:
curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade/1388882557772238848/artifacts/e2e-aws-upgrade/openshift-e2e-test/artifacts/e2e.log | grep -i clusteroperator/dns
May 02 17:17:47.412 I clusteroperator/dns versions: operator 4.7.9 -> 4.8.0-0.ci-2021-05-02-153814, coredns quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:914f281d6d337d223403f7f3a80815ca34fc0302b2941452bbdd78fc44f59f85 -> registry.ci.openshift.org/ocp/4.8-2021-05-02-153814@sha256:57617715df51c7f4a1faf1cd59bfdc12cd57aea0a611abc4c85cfe57b30b404c, openshift-cli quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:725f4165c9a3327bea15daf342adc6542f308be829e66d3b4d43abd9ad2f9bef -> registry.ci.openshift.org/ocp/4.8-2021-05-02-153814@sha256:ed85237c5637b701478ed75affddb8fc0fdc5eb7da1cab5fbf188e29984118b9, kube-rbac-proxy quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e754d2f1c01e200774c5ab9023d3ed4652e9caedac64653929d9c7e44e215b14 -> registry.ci.openshift.org/ocp/4.8-2021-05-02-153814@sha256:5584994d300418ece9fc5036bd931f560f665c8f77bb5bad27795eb80c6adc20
May 02 17:17:48.132 E clusteroperator/dns condition/Degraded status/True reason/DNSDegraded changed: DNS default is degraded


Operator version bump timeframe:
curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade/1388882557772238848/artifacts/e2e-aws-upgrade/gather-extra/artifacts/events.json |jq -r '.items[] | select((.metadata.namespace | contains("dns")) and .reason == "SuccessfulCreate") | .firstTimestamp + " " + (.count | tostring) + " " + .message' | sort
...
2021-05-02T16:10:12Z 1 Created pod: dns-default-827wf
2021-05-02T16:10:52Z 1 Created pod: dns-default-f8r7l
2021-05-02T17:17:36Z 1 Created pod: dns-operator-5df7b59df7-d584w
2021-05-02T17:17:48Z 1 Created pod: node-resolver-c4cv7
----

Hence marking this as "verified"

Comment 6 W. Trevor King 2021-05-04 18:19:22 UTC
Hrm.  Job-detail page from the previous comment is [1].  [2] has:

        {
            "level": "Warning",
            "locator": "clusteroperator/dns",
            "message": "condition/Progressing status/True reason/DNS \"default\" reports Progressing=True: \"Have 0 available node-resolver pods, want 6.\"",
            "from": "2021-05-02T17:17:48Z",
            "to": "2021-05-02T17:20:03Z"
        },

as the first round of DNS Progressing=True.  But as you showed, the 'operator' version was bumped at 17:17:47, a second before we went Progressing=True, and ~10s after the new operator deployment was created at 17:17:36.  So I'm going to move this back to ASSIGNED to see if folks can find the race between going Progressing=True and bumping the version...

[1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade/1388882557772238848
[2]: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade/1388882557772238848/artifacts/e2e-aws-upgrade/openshift-e2e-test/artifacts/e2e-intervals_20210502-162627.json

Comment 8 W. Trevor King 2021-06-03 03:42:31 UTC
Checking 4.7.13 -> 4.7.14 [1]:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/1400078704817
737728/artifacts/e2e-aws-upgrade/e2e.log | grep clusteroperator/dns
Jun 02 15:22:23.035 I clusteroperator/dns versions: operator 4.7.13 -> 4.7.14, coredns quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:02c8a72eb9ab786c587f56135df845e7e4d0ca1511ddc879c7b4b02a0cb6d8cb -> quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ad08b23717af078a89f93a097f32abe9262daf9e32d124f8b1c6437efddb82e7, openshift-cli quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:157fcd230a3ed545c205cf6349b5331386aa43fa8b8cd8b5565e3134d996af90 -> quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9484e8b8af6f2c4a74a3f2be7cf64002d7d57647ae85ab69e73ff4191463100b, kube-rbac-proxy quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:fe71083cf24c14bdb98b9e7f32506bb2908f7974773423a260568fab37637908 -> quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:37ee4cf8184666792caa983611ab8d58dfd533c7cc7abe9f81a22a81876d9cd2
Jun 02 15:31:57.593 E clusteroperator/dns changed Degraded to True: DNSDegraded: DNS default is degraded
Jun 02 15:33:07.389 W clusteroperator/dns changed Degraded to False
Jun 02 15:36:48.414 E clusteroperator/dns changed Degraded to True: DNSDegraded: DNS default is degraded
Jun 02 15:37:12.681 W clusteroperator/dns changed Degraded to False

And the pod events:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/1400078704817737728/artifacts/e2e-aws-upgrade/events.json | jq -r '.items[] | select((.metadata.namespace | contains("dns")) and .reason == "SuccessfulCreate") | .firstTimestamp + " " + (.count | tostring) + " " + .message' | sort
...
2021-06-02T14:20:20Z 1 Created pod: dns-default-gbk7z
2021-06-02T15:22:12Z 1 Created pod: dns-operator-5d8fbbc7cf-zrmzd
2021-06-02T15:22:50Z 1 Created pod: dns-default-wkwf4
2021-06-02T15:23:40Z 1 Created pod: dns-default-vqwqm
...

So same deal.  Going back to 4.5.39 to 4.5.40 [2]:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/1395743198168485888/artifacts/e2e-aws-upgrade/e2e.log | grep clusteroperator/dns
May 21 15:12:10.734 I clusteroperator/dns versions: operator 4.5.39 -> 4.5.40
May 21 15:17:25.454 W clusteroperator/dns changed Progressing to True: Reconciling: At least 1 DNS DaemonSet is progressing.
May 21 15:17:25.647 E clusteroperator/dns changed Degraded to True: NotAllDNSesAvailable: Not all desired DNS DaemonSets available
...

And the pod events:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/1395743198168485888/artifacts/e2e-aws-upgrade/events.json | jq -r '.items[] | select((.metadata.namespace | contains("dns")) and .reason == "SuccessfulCreate") | .firstTimestamp + " " + (.count | tostring) + " " + .message' | sort
...
2021-05-21T14:48:42Z 1 Created pod: dns-default-rvn5b
2021-05-21T15:12:03Z 1 Created pod: dns-operator-66495f9764-kvvs4
2021-05-21T15:14:27Z 1 Created pod: dns-operator-66495f9764-blr8g
2021-05-21T15:22:07Z 1 Created pod: dns-operator-66495f9764-d5dx6

Huh, I guess no DaemonSet image bump there.  Looking at 4.5.36 to 4.5.40 [3]:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-launch-aws/1395743293937029120/artifacts/launch/e2e.log | grep clusteroperator/dns
May 21 15:24:44.671 W clusteroperator/dns changed Progressing to True: Reconciling: At least 1 DNS DaemonSet is progressing.
May 21 15:24:54.678 W clusteroperator/dns changed Progressing to False: AsExpected: Desired and available number of DNS DaemonSets are equal
May 21 15:30:23.231 I clusteroperator/dns versions: operator 4.5.36 -> 4.5.40, coredns quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:594a45f4b1577d4eb88ca1b33f388629e817b6150effa65fcde73e09a180ee8d -> quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1314585ea7847a35d1c10cd65f839e59fba2c061673f3241d3584793ff1b31d8, openshift-cli quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:80b02997edf4e257d0b3e276f9080ccc45cfa3878216de127c0a2c1c9862e684 -> quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cfd21deb6894118c6671ec9731ace1841250ab2f2aea2583c939a626bff703d7
May 21 15:30:24.226 W clusteroperator/dns changed Progressing to True: Reconciling: At least 1 DNS DaemonSet is progressing.
May 21 15:30:48.183 W clusteroperator/dns changed Progressing to False: AsExpected: Desired and available number of DNS DaemonSets are equal
...

And the pod events:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-launch-aws/1395743293937029120/artifacts/launch/events.json | jq -r '.items[] | select((.metadata.namespace | contains("dns")) and .reason == "SuccessfulCreate") | .firstTimestamp + " " + (.count | tostring) + " " + .message' | sort
...
2021-05-21T15:30:16Z 1 Created pod: dns-operator-66495f9764-wqln9
2021-05-21T15:30:29Z 1 Created pod: dns-default-x25tk
2021-05-21T15:30:52Z 1 Created pod: dns-default-v72s7
...

I'm moving the effected version back to 4.5 based on that, although I expect this is likely a "since the operator was born" thing.

[1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/1400078704817737728
[2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/1395743198168485888
[3]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-aws/1395743293937029120

Comment 9 Arvind iyengar 2021-06-15 11:24:07 UTC
Upgrade from 4.7.16 -> 4.8.0-0.nightly-2021-06-13-101614 [1]

Clusteroperator bumps the dns operator  roughly @11:45:49Z, the dns operator bumps clusteroperator @11:48:21 while the first resolver becomes online @11:46:01Z

curl https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade/1404020398542032896/artifacts/e2e-aws-upgrade/gather-extra/artifacts/clusterversion.json | jq -r '.items[].status.history[] | .startedTime + " " + (.completionTime // "-") + .state + " " + .version'
2021-06-13T11:05:50Z 2021-06-13T12:14:20ZCompleted 4.8.0-0.nightly-2021-06-13-101614
2021-06-13T10:33:47Z 2021-06-13T11:02:06ZCompleted 4.7.16

curl https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade/1404020398542032896/artifacts/e2e-aws-upgrade/openshift-e2e-test/artifacts/e2e.log | grep -i clusteroperator/dns
Jun 13 11:48:21.930 I clusteroperator/dns versions: operator 4.7.16 -> 4.8.0-0.nightly-2021-06-13-101614, coredns quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ad08b23717af078a89f93a097f32abe9262daf9e32d124f8b1c6437efddb82e7 -> quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4b74076f1f7fa27993b9009ed49aefcdb4df9125efff8e44e66e38b9512947ab, kube-rbac-proxy quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:37ee4cf8184666792caa983611ab8d58dfd533c7cc7abe9f81a22a81876d9cd2 -> quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:fb30e949355d8fb178870e7707fca97d0acca0b584b561e2a018bb659c9bcfc6
Jun 13 11:58:17.526 W clusteroperator/dns condition/Progressing status/True reason/DNSReportsProgressingIsTrue changed: DNS "default" reports Progressing=True: "Have 5 available node-resolver pods, want 6."


curl https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade/1404020398542032896/artifacts/e2e-aws-upgrade/gather-extra/artifacts/events.json |jq -r '.items[] | select((.metadata.namespace | contains("dns")) and .reason == "SuccessfulCreate") | .firstTimestamp + " " + (.count | tostring) + " " + .message' | sort 
2021-06-13T10:34:10Z 1 Created pod: dns-operator-74db48c654-jpzs2
2021-06-13T10:40:18Z 1 Created pod: dns-default-n6j82
...
2021-06-13T11:45:49Z 1 Created pod: dns-operator-7bb59b9b79-nxmvg  <--- 1st node resolver pod 
2021-06-13T11:46:01Z 1 Created pod: node-resolver-44tmj
...
2021-06-13T11:48:03Z 1 Created pod: dns-default-79vls
2021-06-13T12:00:20Z 1 Created pod: dns-operator-7bb59b9b79-c6znp


Upgrade from 4.7.16 -> 4.8.0-0.nightly-2021-06-13-024004 [2]

Clusteroperator bumps the dns operator  roughly @04:19:25Z, the dns operator bumps clusteroperator @04:23:30 while the first resolver becomes online @04:19:37Z

curl https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade/1403907283217289216/artifacts/e2e-aws-upgrade/gather-extra/artifacts/clusterversion.json | jq -r '.items[].status.history[] | .startedTime + " " + (.completionTime // "-") + .state + " " + .version'  % Total    % Received % Xferd  Average 
2021-06-13T03:36:00Z 2021-06-13T04:49:22ZCompleted 4.8.0-0.nightly-2021-06-13-024004
2021-06-13T03:03:14Z 2021-06-13T03:32:24ZCompleted 4.7.16

curl https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade/1403907283217289216/artifacts/e2e-aws-upgrade/openshift-e2e-test/artifacts/e2e.log | grep -i clusteroperator/dns
Jun 13 04:23:30.102 I clusteroperator/dns versions: operator 4.7.16 -> 4.8.0-0.nightly-2021-06-13-024004, coredns quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ad08b23717af078a89f93a097f32abe9262daf9e32d124f8b1c6437efddb82e7 -> quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4b74076f1f7fa27993b9009ed49aefcdb4df9125efff8e44e66e38b9512947ab, kube-rbac-proxy quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:37ee4cf8184666792caa983611ab8d58dfd533c7cc7abe9f81a22a81876d9cd2 -> quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:fb30e949355d8fb178870e7707fca97d0acca0b584b561e2a018bb659c9bcfc6
Jun 13 04:32:47.406 W clusteroperator/dns condition/Progressing status/True reason/DNSReportsProgressingIsTrue changed: DNS "default" reports Progressing=True: "Have 5 available node-resolver pods, want 6."

curl https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade/1403907283217289216/artifacts/e2e-aws-upgrade/gather-extra/artifacts/events.json |jq -r '.items[] | select((.metadata.namespace | contains("dns")) and .reason == "SuccessfulCreate") | .firstTimestamp + " " + (.count | tostring) + " " + .message' | sort
2021-06-13T03:03:37Z 1 Created pod: dns-operator-74db48c654-w6m7b
2021-06-13T03:09:28Z 1 Created pod: dns-default-b86zp
...
2021-06-13T04:19:25Z 1 Created pod: dns-operator-5bbd55f9f4-dtrt7
2021-06-13T04:19:37Z 1 Created pod: node-resolver-8hcck
...
2021-06-13T04:21:37Z 1 Created pod: dns-default-4mqr8
2021-06-13T04:23:12Z 1 Created pod: dns-default-w8b4g
2021-06-13T04:35:23Z 1 Created pod: dns-operator-5bbd55f9f4-jxlgg
2021-06-13T04:41:13Z 1 Created pod: dns-operator-5bbd55f9f4-jkb8n


There seems to be an ~3mins approx delay between the DNS version bump and the operand being leveled in both recent samples. 

[1] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade/1404020398542032896
[2] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade/1403907283217289216

Comment 11 Brandi Munilla 2021-06-24 16:44:41 UTC
Hi, does this bug require doc text? If so, please update the doc text field.

Comment 13 errata-xmlrpc 2021-07-27 23:02:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.