Seen in master/4.8 CI [1]: 1. 17:55:04Z, update to 4.8.0-0.ci.test-2021-02-23-170859-ci-op-m36h5d5b initiated [2]. 2. 18:34:42Z, machine-config goes Degraded=True on [4]: RenderConfigFailed: Failed to resync 4.8.0-0.ci.test-2021-02-23-170341-ci-op-m36h5d5b because: refusing to read osImageURL version "4.8.0-0.ci.test-2021-02-23-170859-ci-op-m36h5d5b", operator version "4.8.0-0.ci.test-2021-02-23-170341-ci-op-m36h5d5b" 3. 18:34:43Z, machine-config goes Available=False on [4]: Cluster not available for 4.8.0-0.ci.test-2021-02-23-170341-ci-op-m36h5d5b 4. 18:37:33Z, machine-config goes Progressing=True on [4]: Working towards 4.8.0-0.ci.test-2021-02-23-170859-ci-op-m36h5d5b 5. 18:37:33Z, machine-config goes Degraded=False [4]. 6. 18:40:39Z, rendered-master-4c88d1c343b9e2e390bbc5fdca7b6095 created [3]. 7. 18:40:44Z, ci-op-m36h5d5b-8aaeb-szd6x-master-1 asked to update to rendered-master-4c88d1c343b9e2e390bbc5fdca7b6095 [7]. 8. 18:41:12Z, machine-config sets Available=True [4]. 9. 18:41:12Z, machine-config bumps its 'operator' version [4], but 'operator' should only be bumped after the operator and all its operands have completed their update [5]. This unblocks the CVO, because machine-config is Available=True, Degraded=False, and 'operator' is the only version the CVO has been told to wait for [6]. 10. 18:41:13Z, machine-config sets Progressing=False on [4]: Cluster version is 4.8.0-0.ci.test-2021-02-23-170859-ci-op-m36h5d5b 11. 18:43:25Z, update nominally completes (per ClusterVersion [2]). 12. 18:48:13Z, ci-op-m36h5d5b-8aaeb-szd6x-master-0 asked to update to rendered-master-4c88d1c343b9e2e390bbc5fdca7b6095 [7]. 13. 18:55:33Z, ci-op-m36h5d5b-8aaeb-szd6x-master-2 asked to update to rendered-master-4c88d1c343b9e2e390bbc5fdca7b6095 [7]. So issues: a. machine-config should not go Available=False (step 3) or Degraded=True (step 2) during this healthy update. Clayton has [9] up to guard against this. b. machine-config should delay bumping the 'operator' entry in status.versions (step 9) until the 'master' pool completes (after step 13). Clayton has [8] up to help guard against this. [1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-network-operator/988/pull-ci-openshift-cluster-network-operator-master-e2e-gcp-ovn-upgrade/1364259570968432640 [2]: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-network-operator/988/pull-ci-openshift-cluster-network-operator-master-e2e-gcp-ovn-upgrade/1364259570968432640/artifacts/e2e-gcp-ovn-upgrade/gather-extra/artifacts/clusterversion.json | jq -r '.items[].status.history[] | .startedTime + " " + (.completionTime // "-") + " " + .state + " " + .version' 2021-02-23T17:55:04Z 2021-02-23T18:43:25Z Completed 4.8.0-0.ci.test-2021-02-23-170859-ci-op-m36h5d5b 2021-02-23T17:22:37Z 2021-02-23T17:52:39Z Completed 4.8.0-0.ci.test-2021-02-23-170341-ci-op-m36h5d5b [3]: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-network-operator/988/pull-ci-openshift-cluster-network-operator-master-e2e-gcp-ovn-upgrade/1364259570968432640/artifacts/e2e-gcp-ovn-upgrade/gather-extra/artifacts/machineconfigs.json | jq -r '.items[].metadata | .creationTimestamp + " " + .name' | grep render | sort 2021-02-23T17:33:03Z rendered-master-0791a04211638c1b7706db957757d22c 2021-02-23T17:33:03Z rendered-worker-94205363958f282f378ae9bb3586a9c0 2021-02-23T18:40:39Z rendered-master-4c88d1c343b9e2e390bbc5fdca7b6095 2021-02-23T18:40:39Z rendered-worker-99e80e3ee204abddcbf2411760dc2e93 [4]: $ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-network-operator/988/pull-ci-openshift-cluster-network-operator-master-e2e-gcp-ovn-upgrade/1364259570968432640/build-log.txt | grep clusteroperator/machin e-config Feb 23 18:34:42.844 E clusteroperator/machine-config changed Degraded to True: RenderConfigFailed: Failed to resync 4.8.0-0.ci.test-2021-02-23-170341-ci-op-m36h5d5b because: refusing to read osImageURL version "4.8.0-0.ci.test-2021-02-23-170859-ci-op-m36h5d5b", operator version "4.8.0-0.ci.test-2021-02-23-170341-ci-op-m36h5d5b" Feb 23 18:34:43.247 W clusteroperator/machine-config changed Available to False: Cluster not available for 4.8.0-0.ci.test-2021-02-23-170341-ci-op-m36h5d5b Feb 23 18:37:33.310 W clusteroperator/machine-config changed Progressing to True: Working towards 4.8.0-0.ci.test-2021-02-23-170859-ci-op-m36h5d5b Feb 23 18:37:33.311 I /machine-config reason/OperatorVersionChanged clusteroperator/machine-config-operator started a version change from [{operator 4.8.0-0.ci.test-2021-02-23-170341-ci-op-m36h5d5b}] to [{operator 4.8.0-0.ci.test-2021-02-23-170859-ci-op-m36h5d5b}] Feb 23 18:37:33.348 W clusteroperator/machine-config changed Degraded to False Feb 23 18:41:12.774 W clusteroperator/machine-config changed Available to True: Cluster has deployed 4.8.0-0.ci.test-2021-02-23-170859-ci-op-m36h5d5b Feb 23 18:41:12.834 I /machine-config reason/OperatorVersionChanged clusteroperator/machine-config-operator version changed from [{operator 4.8.0-0.ci.test-2021-02-23-170341-ci-op-m36h5d5b}] to [{operator 4.8.0-0.ci.test-2021-02-23-170859-ci-op-m36h5d5b}] Feb 23 18:41:12.904 I clusteroperator/machine-config versions: operator 4.8.0-0.ci.test-2021-02-23-170341-ci-op-m36h5d5b -> 4.8.0-0.ci.test-2021-02-23-170859-ci-op-m36h5d5b Feb 23 18:41:13.497 W clusteroperator/machine-config changed Progressing to False: Cluster version is 4.8.0-0.ci.test-2021-02-23-170859-ci-op-m36h5d5b [5]: https://github.com/openshift/api/blob/70b287c0a0a51be5cdc88d40858de59e31222f5f/config/v1/types_cluster_operator.go#L45 [6]: https://github.com/openshift/machine-config-operator/blob/1402af23b7b6f5acb6dd42ab2ed41c2e01acbbcc/install/0000_80_machine-config-operator_06_clusteroperator.yaml#L12-L13 [7]: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-network-operator/988/pull-ci-openshift-cluster-network-operator-master-e2e-gcp-ovn-upgrade/1364259570968432640/artifacts/e2e-gcp- ovn-upgrade/gather-extra/artifacts/events.json | jq -r '.items[] | select(.reason == "SetDesiredConfig") | .firstTimestamp + " " + (.count | tostring) + " " + .message' | sort | grep master 2021-02-23T18:40:44Z 1 Targeted node ci-op-m36h5d5b-8aaeb-szd6x-master-1 to config rendered-master-4c88d1c343b9e2e390bbc5fdca7b6095 2021-02-23T18:48:13Z 1 Targeted node ci-op-m36h5d5b-8aaeb-szd6x-master-0 to config rendered-master-4c88d1c343b9e2e390bbc5fdca7b6095 2021-02-23T18:55:33Z 1 Targeted node ci-op-m36h5d5b-8aaeb-szd6x-master-2 to config rendered-master-4c88d1c343b9e2e390bbc5fdca7b6095 [8]: https://github.com/openshift/origin/pull/25922 [9]: https://github.com/openshift/origin/pull/25920
Hey Trevor, a few questions: 1. is this a new issue in 4.8 only and not in previous versions? Do you know if this happens every time (in terms of unavailable + degraded during the upgrade timeframe) 2. for "a. machine-config should not go Available=False (step 3) or Degraded=True (step 2) during this healthy update. Clayton has [9] up to guard against this." How exactly does this failure manifest? Does it stop the upgrade? 3. b. machine-config should delay bumping the 'operator' entry in status.versions (step 9) until the 'master' pool completes (after step 13). Clayton has [8] up to help guard against this. Is the MCO prematurely reporting upgrade completion before master has finished? Is this also new behaviour?
> ...is this a new issue in 4.8 only and not in previous versions? No information yet on frequency. Clayton's origin#25920 and 25922 should give us that information. Might be able to query CI search in the meantime, but I haven't looked for a fingerprint to use. > How exactly does this failure manifest? Does it stop the upgrade? It temporarily blocks the update. If it lasts for 10+ minutes (was 6m or less this update), it would trigger a page-at-midnight ClsuterOperatorDown alert. And anyone who polls ClusterOperator during the update and sees Available=False from machine-config might justifiably believe that their cluster was exploding, and then be more likely to ignore later Available=False that represented real "my component is dead" concerns. > Is the MCO prematurely reporting upgrade completion before master has finished? Yes. > Is this also new behaviour? Unclear.
Rough handle on how often this happens: $ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=24h&type=build-log&search=Pool+master+is+still+reporting.*Updating:+true' | grep 'failures match' | sort pull-ci-openshift-cluster-cloud-controller-manager-operator-master-e2e-aws-upgrade - 7 runs, 71% failed, 20% of failures match pull-ci-openshift-cluster-network-operator-master-e2e-gcp-ovn-upgrade - 10 runs, 30% failed, 33% of failures match pull-ci-openshift-cluster-node-tuning-operator-master-e2e-upgrade - 5 runs, 60% failed, 100% of failures match pull-ci-openshift-cluster-openshift-controller-manager-operator-release-4.7-e2e-upgrade - 2 runs, 50% failed, 100% of failures match pull-ci-openshift-installer-master-e2e-aws-upgrade - 28 runs, 75% failed, 29% of failures match pull-ci-openshift-kubernetes-release-4.7-e2e-azure-upgrade - 2 runs, 50% failed, 100% of failures match pull-ci-openshift-machine-api-operator-master-e2e-aws-upgrade - 9 runs, 78% failed, 29% of failures match pull-ci-openshift-oc-master-e2e-aws-upgrade - 14 runs, 50% failed, 43% of failures match pull-ci-openshift-openshift-apiserver-master-e2e-aws-upgrade - 3 runs, 67% failed, 50% of failures match pull-ci-openshift-ovn-kubernetes-master-e2e-gcp-ovn-upgrade - 4 runs, 25% failed, 100% of failures match pull-ci-openshift-service-ca-operator-master-e2e-aws-upgrade - 2 runs, 50% failed, 100% of failures match release-openshift-okd-installer-e2e-aws-upgrade - 11 runs, 36% failed, 50% of failures match release-openshift-origin-installer-e2e-azure-upgrade-4.7 - 4 runs, 75% failed, 100% of failures match release-openshift-origin-installer-e2e-azure-upgrade-4.8 - 4 runs, 25% failed, 100% of failures match Searching with [1] turns up [2] as one of the older entries still in CI-search's 14d history, and [2] has: $ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/25884/pull-ci-openshift-origin-release-4.7-e2e-gcp-upgrade/1359664951764979712/build-log.txt | grep -1 'Pool master is still reporting.*Updating: true' Feb 11 02:16:04.771: INFO: Waiting on pools to be upgraded Feb 11 02:16:04.816: INFO: Pool master is still reporting (Updated: false, Updating: true, Degraded: false) ... Feb 11 02:33:54.879: INFO: Pool master is still reporting (Updated: false, Updating: true, Degraded: false) Feb 11 02:34:04.894: INFO: All pools completed upgrade So that suggests at least 4.7 is impacted, this failure mode is not new, and this failure mode is easy to reproduce. [1]: https://search.ci.openshift.org/?search=Pool+master+is+still+reporting.*Updating%3A+true&maxAge=336h&context=1&type=build-log&name=upgrade&maxMatches=5&maxBytes=20971520&groupBy=job [2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/25884/pull-ci-openshift-origin-release-4.7-e2e-gcp-upgrade/1359664951764979712
4.6.18 -> 4.7.0 CI job [1]: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-launch-aws/1363777305826037760/artifacts/launch/container-logs/test.log | grep -A1 'Waiting on pools' Feb 22 10:51:05.232: INFO: Waiting on pools to be upgraded Feb 22 10:51:05.460: INFO: All pools completed upgrade 4.7.0-0.nightly-2021-02-22-135912 -> 4.7.0-0.nightly-2021-02-22-210958 [2]: $ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-upgrade-4.7/1363960100204580864/build-log.txt | grep -A1 'Waiting on pools to be upgraded' Feb 22 22:06:11.253: INFO: Waiting on pools to be upgraded Feb 22 22:06:11.332: INFO: All pools completed upgrade 4.8.0-0.nightly-2021-02-23-160229 -> 4.8.0-0.nightly-2021-02-23-200827 [3]: $ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-upgrade-4.8/1364308785111240704/build-log.txt | grep -A1 'Waiting on pools' Feb 23 21:57:36.319: INFO: Waiting on pools to be upgraded Feb 23 21:57:36.372: INFO: Pool worker is still reporting (Updated: false, Updating: true, Degraded: false) $ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-upgrade-4.8/1364308785111240704/build-log.txt | grep -c 'Pool master is' 0 So those three release-informers all look good. Not clear to me why the 'failures match' percentages from the previous comment are so high, but the number of matching job names is so limited. [1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-aws/1363777305826037760 [2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-upgrade-4.7/1363960100204580864 [3]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-upgrade-4.8/1364308785111240704
Looking at all e2e failures over the past 4d: $ curl -s 'https://search.ci.openshift.org/search?maxAge=96h&type=build-log&context=2&search=Waiting+on+pools+to+be+upgraded' | jq -r 'length as $all | ([to_entries[] | tostring | select(contains("Pool master is still reporting"))] | length) as $late_master | ($late_master | tostring) + " / " + ($all | tostring)' 72 / 519 So 72 jobs where we had late masters out of 519 update jobs which failed e2e. Breaking down by job name: $ curl -s 'https://search.ci.openshift.org/search?maxAge=96h&type=build-log&context=2&search=Waiting+on+pools+to+be+upgraded' | jq -r '. as $data | ([to_entries[].value | to_entries[0].value[0].name]) | unique[] | . as $name | ([$data | to_entries[].value | to_entries[0].value[0] | select(.name == $name)]) as $jobs | ($jobs | length) as $all | ([$jobs[] | tostring | select(contains("Pool master is still reporting"))] | length) as $late_master | ($late_master / $all | tostring) + " " + ($late_master | tostring) + " / " + ($all | tostring) + " " + $name' | sort -n 0 0 / 1 endurance-upgrade-aws-4.6 ... 0 0 / 9 periodic-ci-openshift-release-master-origin-4.6-e2e-azure-upgrade 0.08695652173913043 2 / 23 release-openshift-origin-installer-e2e-aws-upgrade 0.125 1 / 8 pull-ci-openshift-origin-master-e2e-gcp-upgrade 0.125 2 / 16 periodic-ci-openshift-release-master-ocp-4.8-e2e-metal-ipi-upgrade 0.16666666666666666 1 / 6 release-openshift-origin-installer-e2e-azure-upgrade-4.9 0.18181818181818182 2 / 11 pull-ci-openshift-service-ca-operator-master-e2e-aws-upgrade 0.2 2 / 10 release-openshift-ocp-installer-e2e-aws-upgrade-4.8 0.25 1 / 4 pull-ci-openshift-cluster-cloud-controller-manager-operator-master-e2e-aws-upgrade 0.2608695652173913 6 / 23 pull-ci-openshift-cluster-version-operator-master-e2e-agnostic-upgrade 0.2682926829268293 11 / 41 pull-ci-openshift-installer-master-e2e-aws-upgrade 0.3333333333333333 1 / 3 pull-ci-openshift-cluster-ingress-operator-master-e2e-upgrade 0.3333333333333333 2 / 6 pull-ci-openshift-machine-api-operator-master-e2e-aws-upgrade 0.3333333333333333 3 / 9 pull-ci-openshift-cluster-kube-apiserver-operator-master-e2e-upgrade 0.36363636363636365 4 / 11 pull-ci-openshift-oc-master-e2e-aws-upgrade 0.5 1 / 2 pull-ci-openshift-openshift-apiserver-master-e2e-aws-upgrade 0.5384615384615384 7 / 13 release-openshift-okd-installer-e2e-aws-upgrade 0.6666666666666666 2 / 3 pull-ci-openshift-cluster-network-operator-master-e2e-gcp-ovn-upgrade 0.75 3 / 4 pull-ci-openshift-cluster-node-tuning-operator-master-e2e-upgrade 1 1 / 1 pull-ci-openshift-cluster-api-provider-aws-master-e2e-aws-upgrade 1 1 / 1 pull-ci-openshift-cluster-openshift-controller-manager-operator-release-4.7-e2e-upgrade 1 1 / 1 pull-ci-openshift-images-master-e2e-aws-upgrade 1 1 / 1 pull-ci-openshift-kubernetes-release-4.7-e2e-azure-upgrade 1 1 / 1 pull-ci-openshift-openshift-apiserver-release-4.7-e2e-aws-upgrade 1 1 / 1 pull-ci-openshift-ovn-kubernetes-master-e2e-gcp-ovn-upgrade 1 1 / 1 release-openshift-origin-installer-e2e-azure-upgrade-4.8 1 5 / 5 release-openshift-origin-installer-e2e-gcp-upgrade-4.7 1 9 / 9 release-openshift-origin-installer-e2e-azure-upgrade-4.7 So 4.7 OKD Azure and GCP updates seem like reliable reproducers. Getting a bunch of job links: $ curl -s 'https://search.ci.openshift.org/search?maxAge=96h&name=^release-openshift-origin-installer-e2e-azure-upgrade-4 .7$&type=build-log&context=2&search=Waiting+on+pools+to+be+upgraded' | jq -r 'to_entries[] | .key + "\n " + ([.value | to_entries[].value[].context[]] | join("\n "))' | grep '^http\|Pool master' https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-upgrade-4.7/1362996944573566976 Feb 20 06:56:02.052: INFO: Pool master is still reporting (Updated: false, Updating: true, Degraded: false) https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-upgrade-4.7/1363087725145624578 Feb 20 12:59:13.866: INFO: Pool master is still reporting (Updated: false, Updating: true, Degraded: false) https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-upgrade-4.7/1363269331844075520 Feb 21 01:06:58.210: INFO: Pool master is still reporting (Updated: false, Updating: true, Degraded: false) https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-upgrade-4.7/1363632187471040512 Feb 22 01:03:30.494: INFO: Pool master is still reporting (Updated: false, Updating: true, Degraded: false) https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-upgrade-4.7/1363813696526094336 Feb 22 13:26:27.738: INFO: Pool master is still reporting (Updated: false, Updating: true, Degraded: false) https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-upgrade-4.7/1363904442780356608 Feb 22 19:01:16.860: INFO: Pool master is still reporting (Updated: false, Updating: true, Degraded: false) https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-upgrade-4.7/1364085961394229248 Feb 23 06:58:17.556: INFO: Pool master is still reporting (Updated: false, Updating: true, Degraded: false) https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-upgrade-4.7/1364267562774302720 Feb 23 19:07:57.801: INFO: Pool master is still reporting (Updated: false, Updating: true, Degraded: false) https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-upgrade-4.7/1364358329765203968 Feb 24 01:10:24.861: INFO: Pool master is still reporting (Updated: false, Updating: true, Degraded: false) And releases used in those updates: $ curl -s 'https://search.ci.openshift.org/search?maxAge=96h&name=^release-openshift-origin-installer-e2e-azure-upgrade-4.7$&type=build-log&context=1&search=Resolved+release' | jq -r 'to_entries[] | .key + "\n " + ([.value | to_entries[].value[].context[]] | join("\n "))' | grep '^http\|Resolved release' https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-upgrade-4.7/1362996944573566976 2021/02/20 05:26:34 Resolved release initial to registry.ci.openshift.org/ocp/release:4.7.0-0.ci-2021-02-18-184351 2021/02/20 05:26:34 Resolved release latest to registry.ci.openshift.org/ocp/release:4.7.0-0.ci-2021-02-18-205909 https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-upgrade-4.7/1363087725145624578 2021/02/20 11:27:03 Resolved release initial to registry.ci.openshift.org/ocp/release:4.7.0-0.ci-2021-02-18-184351 2021/02/20 11:27:03 Resolved release latest to registry.ci.openshift.org/ocp/release:4.7.0-0.ci-2021-02-18-205909 https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-upgrade-4.7/1363269331844075520 2021/02/20 23:28:40 Resolved release initial to registry.ci.openshift.org/ocp/release:4.7.0-0.ci-2021-02-18-184351 2021/02/20 23:28:40 Resolved release latest to registry.ci.openshift.org/ocp/release:4.7.0-0.ci-2021-02-18-205909 https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-upgrade-4.7/1363632187471040512 2021/02/21 23:30:44 Resolved release initial to registry.ci.openshift.org/ocp/release:4.7.0-0.ci-2021-02-18-184351 2021/02/21 23:30:44 Resolved release latest to registry.ci.openshift.org/ocp/release:4.7.0-0.ci-2021-02-18-205909 https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-upgrade-4.7/1363722874543673344 2021/02/22 05:30:59 Resolved release initial to registry.ci.openshift.org/ocp/release:4.7.0-0.ci-2021-02-18-205909 2021/02/22 05:30:59 Resolved release latest to registry.ci.openshift.org/ocp/release:4.7.0-0.ci-2021-02-22-030505 https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-upgrade-4.7/1363813696526094336 2021/02/22 11:31:44 Resolved release initial to registry.ci.openshift.org/ocp/release:4.7.0-0.ci-2021-02-22-030505 2021/02/22 11:31:44 Resolved release latest to registry.ci.openshift.org/ocp/release:4.7.0-0.ci-2021-02-22-033505 https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-upgrade-4.7/1363904442780356608 2021/02/22 17:32:23 Resolved release initial to registry.ci.openshift.org/ocp/release:4.7.0-0.ci-2021-02-22-033505 2021/02/22 17:32:23 Resolved release latest to registry.ci.openshift.org/ocp/release:4.7.0-0.ci-2021-02-22-100111 https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-upgrade-4.7/1364085961394229248 2021/02/23 05:33:42 Resolved release initial to registry.ci.openshift.org/ocp/release:4.7.0-0.ci-2021-02-22-164056 2021/02/23 05:33:42 Resolved release latest to registry.ci.openshift.org/ocp/release:4.7.0-0.ci-2021-02-22-185213 https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-upgrade-4.7/1364267562774302720 2021/02/23 17:35:20 Resolved release initial to registry.ci.openshift.org/ocp/release:4.7.0-0.ci-2021-02-22-164056 2021/02/23 17:35:20 Resolved release latest to registry.ci.openshift.org/ocp/release:4.7.0-0.ci-2021-02-22-185213 https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-upgrade-4.7/1364358329765203968 2021/02/23 23:35:57 Resolved release initial to registry.ci.openshift.org/ocp/release:4.7.0-0.ci-2021-02-22-164056 2021/02/23 23:35:57 Resolved release latest to registry.ci.openshift.org/ocp/release:4.7.0-0.ci-2021-02-22-185213 But it's currently: $ date --utc --iso=m 2021-02-24T05:52+00:00 So maybe something has been fixed in 4.7 in this space in the past two days? Nope, looks like 4.7 is just really quiet [1]. [2] shows the only change between the impacted 4.7.0-0.ci-2021-02-22-185213 and the current tip to be a rebuild of the ovirt-csi-driver container. [1]: https://amd64.ocp.releases.ci.openshift.org/#4.7.0-0.ci [2]: https://amd64.ocp.releases.ci.openshift.org/releasestream/4.7.0-0.ci/release/4.7.0-0.ci-2021-02-22-185213?from=4.7.0-0.ci-2021-02-22-171056
I observed a 4.7.0 to 4.8-nightly upgrade on GCP to see the state of the upgrade, dumping some of my findings here: The upgrade progresses to 83% (MCO) and the MCO gets to the following state on co status: Conditions: Last Transition Time: 2021-02-24T17:53:59Z Message: Working towards 4.8.0-0.nightly-2021-02-24-063313 Status: True <- correct Type: Progressing Last Transition Time: 2021-02-24T17:53:59Z Status: False <- correct Type: Degraded Last Transition Time: 2021-02-24T17:52:01Z Message: Cluster not available for 4.7.0 Status: False <- incorrect? Type: Available Last Transition Time: 2021-02-24T16:04:40Z Reason: AsExpected Status: True <- also incorrect? Type: Upgradeable The availability of the operator should be true, and in the code, cov1helpers.IsStatusConditionTrue(co.Status.Conditions, configv1.OperatorDegraded) Is the only evaluation criteria we seem to have for setting available=false. That is obviously false, so how come available is evaluating false? This ties a bit back into the fact that we're using available==degraded which is also wrong, but that's another discussion. The operator does actually not go progressing=True or update its version until master pool is done, which is correct, so this is not consistently reproducible. Will try to look into the above job logs to see what's happening there. A few more observations: DNS auth openshift-apiserver goes degraded while MCO is updating, this in turn causes clusterversion status to report Unable to apply 4.8.0-0.nightly-2021-02-24-063313: wait has exceeded 40 minutes for these operators: openshift-apiserver and then eventually wraps around to: Working towards 4.8.0-0.nightly-2021-02-24-063313: 172 of 669 done (25% complete) I assume this is the CVO attempting to reconcile and starting from the beginning? After the master MCP completes, the MCO reports complete, all other operators are back to not-degraded, and eventually the clusterversion reports: Cluster version is 4.8.0-0.nightly-2021-02-24-063313 And the update completes successfully. It feels like this is a bit of a mess. My first investigation will be why the availability of the operator is not being evaluated correctly. If you have some idea of whether the above is "normal" @Trevor it'd be appreciated.
Looking at the status sync, the majority of the code hasn't changed in awhile: https://github.com/openshift/machine-config-operator/blob/master/pkg/operator/status.go The version sync happens here: https://github.com/openshift/machine-config-operator/blob/789951039423aa1745af3b23f06ce00ba5ebe8f9/pkg/operator/sync.go#L95 which would happen only if https://github.com/openshift/machine-config-operator/blob/789951039423aa1745af3b23f06ce00ba5ebe8f9/pkg/operator/sync.go#L66 as finished, which can only happen if https://github.com/openshift/machine-config-operator/blob/789951039423aa1745af3b23f06ce00ba5ebe8f9/pkg/operator/sync.go#L568 has completed. So theoretically while the pool is still progressing, the version should never get to update. Now unfortunately in some of the failing runs I looked at, the MCO pod logs start after the version has been updated, which means a previous MCO pod set the version wrongly and the logs are now lost. I will try to replicate this to see what exactly is happening, but without the relevant logs I don't quite see how we got to a situation where the version is updated wrongly (and why this wouldn't have been the case 2 releases ago, since the code hasn't really changed)
(In reply to Yu Qi Zhang from comment #6) > Last Transition Time: 2021-02-24T17:52:01Z > Message: Cluster not available for 4.7.0 > Status: False <- incorrect? > Type: Available Yeah. You should only go Available=False when an operand (the pool here?) is non-functional or non-available [1]. > Last Transition Time: 2021-02-24T16:04:40Z > Reason: AsExpected > Status: True <- also incorrect? > Type: Upgradeable Yeah, ideally we'd go Upgradeable=False during 4.7->4.8 to block folks from redirecting their update to 4.9 until the pools had all caught up to 4.8, unless we get [2] or similar to get safe n-2 version skew. > The availability of the operator should be true, and in the code, > > cov1helpers.IsStatusConditionTrue(co.Status.Conditions, > configv1.OperatorDegraded) > > Is the only evaluation criteria we seem to have for setting available=false. > That is obviously false, so how come available is evaluating false? I'm not sure about this. Will poke around some. > The operator does actually not go progressing=True or update its version > until master pool is done, which is correct, so this is not consistently > reproducible. Will try to look into the above job logs to see what's > happening there. I think you mean "back to progressing=False". And yeah, I haven't been able to reproduce the operator-leveling-before-master-pool behavior in the few jobs I've launched myself. > Working towards 4.8.0-0.nightly-2021-02-24-063313: 172 of 669 done (25% > complete) > > I assume this is the CVO attempting to reconcile and starting from the > beginning? CVO doesn't persist the % completion, so if the CVO pod is drained, the replacement CVO pod will come back up at 0% and should quickly burn through the bulk of the already-reconciled update before sticking somewhere near where it had been before. (In reply to Yu Qi Zhang from comment #7) > Now unfortunately in some of the failing runs I looked at, the MCO pod logs start after the version has been updated, which means a previous MCO pod set the version wrongly and the logs are now lost. Yeah, this is a pain. We're rolling out some hosted-Loki stuff, e.g. [3] which should help mitigate. [1]: https://github.com/openshift/api/blob/70b287c0a0a51be5cdc88d40858de59e31222f5f/config/v1/types_cluster_operator.go#L143-L145 [2]: https://github.com/openshift/enhancements/blob/41692eb45ffaddea8bcb8c002462c527fd59c707/enhancements/update/eus-upgrades-mvp.md [3]: https://github.com/openshift/release/pull/16186
Still digging into some details but https://github.com/openshift/machine-config-operator/pull/2433 should be a potential remediation. Let me know if you are able to reproduce outside of specific CI runs
Sanity check showing that the initial CI job discussed in comment 0 does have the required-for-upgrade label set [1]: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-network-operator/988/pull-ci-openshift-cluster-network-operator-master-e2e-gcp-ovn-upgrade/1364259570968432640/artifacts/e2e-gcp-ovn-upgrade/gather-extra/artifacts/machineconfigpools.json | jq -r '.items[].metadata | .name + "\n " + (.labels | keys | join("\n "))' master machineconfiguration.openshift.io/mco-built-in operator.machineconfiguration.openshift.io/required-for-upgrade pools.operator.machineconfiguration.openshift.io/master worker machineconfiguration.openshift.io/mco-built-in pools.operator.machineconfiguration.openshift.io/worker [1]: https://github.com/openshift/machine-config-operator/blame/f53cb9aa4a9c81ec4d0dfe6cb734957d4523676a/pkg/operator/sync.go#L38
Jerry and I dug into this some more today. Going back to comment 0, this window is just too short: 6. 18:40:39Z, rendered-master-4c88d1c343b9e2e390bbc5fdca7b6095 created... 7. 18:40:44Z, ci-op-m36h5d5b-8aaeb-szd6x-master-1 asked to update to rendered-master-4c88d1c343b9e2e390bbc5fdca7b6095... 8. 18:41:12Z, machine-config sets Available=True... 9. 18:41:12Z, machine-config bumps its 'operator' version... So we suspect syncRequiredMachineConfigPools, which was bumped in 4.7 with [1], learning to watch all pools instead of just watching required pools. Before that PR, the file had [2]: if pool.Generation <= pool.Status.ObservedGeneration && isPoolStatusConditionTrue(pool, mcfgv1.MachineConfigPoolUpdated) && !degraded { continue } lastErr = fmt.Errorf("error pool %s is not ready, retrying. Status: (pool degraded: %v total: %d, ready %d, updated: %d, unavailable: %d)", pool.Name, degraded, pool.Status.MachineCount, pool.Status.ReadyMachineCount, pool.Status.UpdatedMachineCount, pool.Status.UnavailableMachineCount) return false, nil That's accepting the pool as valid only when: a. The pool controller is caught up (metadata.generation <= status.observedGeneration). I.e. the pool controller has noticed the bumped rendered MachineConfig, because that bump brought a metadata.generation bump, and the controller has passed the metadata.generation bump on into status.observedGeneration. b. The pool is updated. And we know the pool controller's claim about up-to-date-ness isn't too stale, because of (a). c. The pool isn't degraded. Any of those conditions fail, and we set "pool ... not ready" and take another pass through the Poll loop. After [1]: a. Any degraded pool will block the MCO from reconciling [3], which is a bit harsh. I'd rather have any degraded pool block future minor updates via Upgradeable=False, but non-required pools should not be able to directly block the current update by going degraded. b. We grow an isMachineConfigPoolConfigurationValid call [4]. Not quite sure what that's about. c. If we pass the generation and updated checks, we continue [5], like before. d. But (this is the main bug), after that continue, there is no longer the "pool ... not ready" error (that error is now only triggered by a degraded pool; it used to be triggered by all not-level-yet required pools). [1]: https://github.com/openshift/machine-config-operator/pull/2231 [2]: https://github.com/openshift/machine-config-operator/blame/6531aac963b1ef9c2508838f7593ed6e91cdad62/pkg/operator/sync.go#L607-L613 [3]: https://github.com/openshift/machine-config-operator/pull/2231/files#diff-3e205577bec1a1d1711df8bfeff30e63f044b24cfbdc69bbd7d0052fdc881891R595 [4]: https://github.com/openshift/machine-config-operator/pull/2231/files#diff-3e205577bec1a1d1711df8bfeff30e63f044b24cfbdc69bbd7d0052fdc881891R601 [5]: https://github.com/openshift/machine-config-operator/pull/2231/files#diff-3e205577bec1a1d1711df8bfeff30e63f044b24cfbdc69bbd7d0052fdc881891R608
Down to one referenced PR, and that one has merged. Moving to MODIFIED.
Reassigning to Kirsten who provided the fix
Still seeing this in CI, even though the jobs are now using mco#2440: $ w3m -dump -cols 200 'https://search.ci.openshift.org/?search=the+%22master%22+pool+should+be+updated+before+the+CVO+reports+available+at+the+new+version&maxAge=12h' | grep 'failures match' | sort periodic-ci-openshift-release-master-nightly-4.8-e2e-metal-ipi-upgrade - 3 runs, 67% failed, 100% of failures match pull-ci-openshift-cluster-image-registry-operator-master-e2e-upgrade - 2 runs, 100% failed, 50% of failures match pull-ci-openshift-cluster-network-operator-master-e2e-agnostic-upgrade - 2 runs, 50% failed, 100% of failures match pull-ci-openshift-cluster-network-operator-master-e2e-gcp-ovn-upgrade - 2 runs, 100% failed, 50% of failures match pull-ci-openshift-cluster-version-operator-master-e2e-agnostic-upgrade - 3 runs, 100% failed, 33% of failures match rehearse-15646-pull-ci-openshift-cluster-api-provider-baremetal-master-e2e-metal-ipi-upgrade - 1 runs, 100% failed, 100% of failures match rehearse-16324-pull-ci-openshift-cloud-credential-operator-master-e2e-upgrade - 2 runs, 50% failed, 100% of failures match rehearse-16324-pull-ci-openshift-machine-api-operator-master-e2e-vsphere-upgrade - 2 runs, 50% failed, 100% of failures match release-openshift-ocp-installer-e2e-aws-upgrade-4.8 - 3 runs, 67% failed, 100% of failures match release-openshift-okd-installer-e2e-aws-upgrade - 8 runs, 25% failed, 50% of failures match Moving back to ASSIGNED while we investigate.
Upgraded from 4.7.1 to 4.8.0-0.nightly-2021-03-08-133419 Watched `oc get co/machine-config`, `oc get mcp/master`, and `oc get clusterversion`. The transition from mcp/master updated go from false to true at the same time co/machine-config goes from unavailable to available. Tue Mar 9 15:37:03 EST 2021 NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.1 True True 60m Working towards 4.8.0-0.nightly-2021-03-08-133419: 173 of 668 done (25% complete), waiting up to 40 minutes on openshift-apiserver NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE machine-config 4.7.1 False True False 17m NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-7639ea368e9a6940c480cd4c7ebc9570 False True False 3 2 2 0 88m Tue Mar 9 15:37:34 EST 2021 NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.1 True True 60m Working towards 4.8.0-0.nightly-2021-03-08-133419: 173 of 668 done (25% complete), waiting up to 40 minutes on openshift-apiserver NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE machine-config 4.7.1 False True False 18m NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-7639ea368e9a6940c480cd4c7ebc9570 False True False 3 2 2 0 88m Tue Mar 9 15:38:05 EST 2021 NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.1 True True 61m Working towards 4.8.0-0.nightly-2021-03-08-133419: 173 of 668 done (25% complete), waiting up to 40 minutes on openshift-apiserver NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE machine-config 4.7.1 False True False 18m NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-7639ea368e9a6940c480cd4c7ebc9570 False True False 3 2 2 0 89m Tue Mar 9 15:38:37 EST 2021 NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.1 True True 61m Working towards 4.8.0-0.nightly-2021-03-08-133419: 173 of 668 done (25% complete), waiting up to 40 minutes on openshift-apiserver NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE machine-config 4.7.1 False True False 19m NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-7639ea368e9a6940c480cd4c7ebc9570 False True False 3 2 2 0 89m Tue Mar 9 15:39:08 EST 2021 NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.1 True True 62m Working towards 4.8.0-0.nightly-2021-03-08-133419: 173 of 668 done (25% complete), waiting up to 40 minutes on openshift-apiserver NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE machine-config 4.7.1 False True False 19m NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-7639ea368e9a6940c480cd4c7ebc9570 False True False 3 2 2 0 90m Tue Mar 9 15:39:47 EST 2021 NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.1 True True 63m Working towards 4.8.0-0.nightly-2021-03-08-133419: 173 of 668 done (25% complete), waiting up to 40 minutes on openshift-apiserver NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE machine-config 4.7.1 False True False 20m NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-7639ea368e9a6940c480cd4c7ebc9570 False True False 3 2 2 0 91m Tue Mar 9 15:40:18 EST 2021 NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.1 True True 63m Working towards 4.8.0-0.nightly-2021-03-08-133419: 260 of 668 done (38% complete) NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE machine-config 4.8.0-0.nightly-2021-03-08-133419 True False False 27s NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-a5c62ad5e856b0d6b72504e1c2c6bf4c True False False 3 3 3 0 91m Tue Mar 9 15:40:50 EST 2021 NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.1 True True 64m Working towards 4.8.0-0.nightly-2021-03-08-133419: 664 of 668 done (99% complete) NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE machine-config 4.8.0-0.nightly-2021-03-08-133419 True False False 59s NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-a5c62ad5e856b0d6b72504e1c2c6bf4c True False False 3 3 3 0 92m Tue Mar 9 15:41:21 EST 2021 NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-03-08-133419 True False 31s Cluster version is 4.8.0-0.nightly-2021-03-08-133419 NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE machine-config 4.8.0-0.nightly-2021-03-08-133419 True False False 90s NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-a5c62ad5e856b0d6b72504e1c2c6bf4c True False False 3 3 3 0 92m MCO LOG ------------------ion rendered-master-a5c62ad5e856b0d6b72504e1c2c6bf4c, retrying I0309 20:39:49.983283 1 sync.go:613] pool master has not progressed to latest configuration: controller version mismatch for rendered-master-7639ea368e9a6940c480cd4c7ebc9570 expected 82868e63176fee2bc806c1deb308ed1fc8965d84 has ff9431921ef6c22f602da46c07980622f2809546: 2 (ready 2) out of 3 nodes are updating to latest configuration rendered-master-a5c62ad5e856b0d6b72504e1c2c6bf4c, retrying I0309 20:39:50.982471 1 sync.go:613] pool master has not progressed to latest configuration: controller version mismatch for rendered-master-7639ea368e9a6940c480cd4c7ebc9570 expected 82868e63176fee2bc806c1deb308ed1fc8965d84 has ff9431921ef6c22f602da46c07980622f2809546: 2 (ready 2) out of 3 nodes are updating to latest configuration rendered-master-a5c62ad5e856b0d6b72504e1c2c6bf4c, retrying I0309 20:39:51.983070 1 status.go:395] pool master references machine config rendered-master-a5c62ad5e856b0d6b72504e1c2c6bf4c at the expected 82868e63176fee2bc806c1deb308ed1fc8965d84 I0309 20:39:51.983177 1 status.go:395] pool master references machine config 00-master at the expected 82868e63176fee2bc806c1deb308ed1fc8965d84 I0309 20:39:51.983205 1 status.go:395] pool master references machine config 01-master-container-runtime at the expected 82868e63176fee2bc806c1deb308ed1fc8965d84 I0309 20:39:51.983229 1 status.go:395] pool master references machine config 01-master-kubelet at the expected 82868e63176fee2bc806c1deb308ed1fc8965d84 I0309 20:39:51.983252 1 status.go:395] pool master references machine config 99-master-generated-registries at the expected 82868e63176fee2bc806c1deb308ed1fc8965d84 I0309 20:39:51.983277 1 status.go:397] pool master references machine config 99-master-ssh, which does not declare a version I0309 20:39:51.983302 1 sync.go:619] pool master is up to date I0309 20:39:51.995090 1 sync.go:631] required machine-config pools synchronized I0309 20:39:52.378496 1 event.go:282] Event(v1.ObjectReference{Kind:"", Namespace:"", Name:"machine-config", UID:"5c085e65-0442-4af8-9786-a61807ed558e", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorVersionChanged' clusteroperator/machine-config-operator version changed from [{operator 4.7.1}] to [{operator 4.8.0-0.nightly-2021-03-08-133419}] W0309 20:39:53.196631 1 warnings.go:70] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition I0309 20:39:58.706206 1 sync.go:569] syncing Required MachineConfigPools I0309 20:39:59.706361 1 status.go:395] pool master references machine config rendered-master-a5c62ad5e856b0d6b72504e1c2c6bf4c at the expected 82868e63176fee2bc806c1deb308ed1fc8965d84 I0309 20:39:59.706484 1 status.go:395] pool master references machine config 00-master at the expected 82868e63176fee2bc806c1deb308ed1fc8965d84 I0309 20:39:59.706498 1 status.go:395] pool master references machine config 01-master-container-runtime at the expected 82868e63176fee2bc806c1deb308ed1fc8965d84 I0309 20:39:59.706773 1 status.go:395] pool master references machine config 01-master-kubelet at the expected 82868e63176fee2bc806c1deb308ed1fc8965d84 I0309 20:39:59.706839 1 status.go:395] pool master references machine config 99-master-generated-registries at the expected 82868e63176fee2bc806c1deb308ed1fc8965d84 I0309 20:39:59.706852 1 status.go:397] pool master references machine config 99-master-ssh, which does not declare a version I0309 20:39:59.706875 1 sync.go:619] pool master is up to date I0309 20:39:59.717163 1 sync.go:631] required machine-config pools synchronized W0309 20:40:19.603073 1 warnings.go:70] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition I0309 20:40:24.707010 1 sync.go:569] syncing Required MachineConfigPools I0309 20:40:25.707241 1 status.go:395] pool master references machine config rendered-master-a5c62ad5e856b0d6b72504e1c2c6bf4c at the expected 82868e63176fee2bc806c1deb308ed1fc8965d84 I0309 20:40:25.707281 1 status.go:395] pool master references machine config 00-master at the expected 82868e63176fee2bc806c1deb308ed1fc8965d84 I0309 20:40:25.707287 1 status.go:395] pool master references machine config 01-master-container-runtime at the expected 82868e63176fee2bc806c1deb308ed1fc8965d84 I0309 20:40:25.707292 1 status.go:395] pool master references machine config 01-master-kubelet at the expected 82868e63176fee2bc806c1deb308ed1fc8965d84 I0309 20:40:25.707298 1 status.go:395] pool master references machine config 99-master-generated-registries at the expected 82868e63176fee2bc806c1deb308ed1fc8965d84 I0309 20:40:25.707303 1 status.go:397] pool master references machine config 99-master-ssh, which does not declare a version I0309 20:40:25.707309 1 sync.go:619] pool master is up to date I0309 20:40:25.721609 1 sync.go:631] required machine-config pools synchronized W0309 20:40:26.519764 1 warnings.go:70] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition I0309 20:40:31.982034 1 sync.go:569] syncing Required MachineConfigPools I0309 20:40:32.982249 1 status.go:395] pool master references machine config rendered-master-a5c62ad5e856b0d6b72504e1c2c6bf4c at the expected 82868e63176fee2bc806c1deb308ed1fc8965d84
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438