Bug 2025396 - "master" pool should be updated before the CVO reports available at the new version occurred
Summary: "master" pool should be updated before the CVO reports available at the new v...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.8
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: 4.8.z
Assignee: Kirsten Garrison
QA Contact: Rio Liu
URL:
Whiteboard:
Depends On: 2025474
Blocks: 2025470
TreeView+ depends on / blocked
 
Reported: 2021-11-22 04:49 UTC by Rio Liu
Modified: 2022-03-16 11:30 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
Cause: When neither RHCOS nor MCO images change in a patch bump from 4.y.z to 4.y.z . For example: update from 4.8.20 to 4.8.21 Consequence: upgrade is marked as completed when control-plane nodes are still in updating state, actually the upgrade process is not completed yet, there is a potential risk that error will occur, if the user starts to do some other operations based on this result Workaround (if any): wait update on control-plane nodes to complete via command oc get mcp/master Result: the upgrade can be completed finally without any issue
Clone Of: 1999556
Environment:
[sig-arch][Feature:ClusterUpgrade] Cluster should remain functional during upgrade [Disruptive] [Serial] job=periodic-ci-openshift-release-master-nightly-4.9-e2e-metal-ipi-upgrade=all job=periodic-ci-openshift-release-master-nightly-4.9-e2e-aws-upgrade=all
Last Closed: 2022-03-16 11:30:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 2972 0 None open [release-4.8] Bug 2025396: annotate rendered config with OCP version 2022-03-02 19:38:56 UTC
Red Hat Product Errata RHBA-2022:0795 0 None None None 2022-03-16 11:30:33 UTC

Description Rio Liu 2021-11-22 04:49:18 UTC
+++ This bug was initially created as a clone of Bug #1999556 +++

We're seeing instances of the following error still:

    the "master" pool should be updated before the CVO reports available at the new version occurred

It looks like maybe a regression of https://bugzilla.redhat.com/show_bug.cgi?id=1970150?

See: 

https://search.ci.openshift.org/?search=pool+should+be+updated+before+the+CVO+reports+available+at+the+new+version&maxAge=168h&context=1&type=bug%2Bjunit&name=4.9&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

--- Additional comment from Kirsten Garrison on 2021-09-01 23:45:17 UTC ---

I'll take a look at this since I worked on the previous fix and want to figure out what's happening here.

--- Additional comment from Kirsten Garrison on 2021-09-02 02:11:54 UTC ---

timeline from https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-e2e-aws-upgrade/1432377737405796352


17:42:20 default machineconfigoperator machine-config OperatorVersionChanged clusteroperator/machine-config-operator started a version change from [{operator 4.9.0-0.nightly-2021-08-30-070917}] to [{operator 4.9.0-0.nightly-2021-08-30-161832}]

  - lastTransitionTime: "2021-08-30T17:42:49Z"
    message: sync completed towards (2) generation using controller version v4.9.0-202108281318.p0.git.00f349e.assembly.stream-dirty
    status: "True"
    type: TemplateControllerCompleted

Same osimageurl, same controller version: machineconfiguration.openshift.io/generated-by-version: v4.9.0-202108281318.p0.git.00f349e.assembly.stream-dirty

17:42:51 default machineconfigoperator machine-config OperatorVersionChanged clusteroperator/machine-config-operator version changed from [{operator 4.9.0-0.nightly-2021-08-30-070917}] to [{operator 4.9.0-0.nightly-2021-08-30-161832}]

17:42:53 default machineconfigcontroller-rendercontroller master RenderedConfigGenerated rendered-master-5d8b2493d5d9fd8e6a6762f985bf5828 successfully generated

This is only happening intermittently probably some sort of timing esp for the metal platform,  looks like we'll have to harden this up more. Likely verify checks against release version. TBD.

--- Additional comment from Scott Dodson on 2021-11-18 15:44:27 UTC ---

This issue comes up as a late emergency debugging situation anytime we ship a z-stream which hasn't updated the MCO and osImageURL (suspected). It happens in frequently enough that everyone forgets about this bug and sinks substantial amount of effort into figuring out what's wrong at the 11th hour before we ship a release. As such I'm going to mark this as blocker+ so that we can avoid that firedrill in the future. Once fixed we should backport this to all currently supported releases unless there's a technical reason not to.

Comment 7 errata-xmlrpc 2022-03-16 11:30:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.8.34 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0795


Note You need to log in before you can comment on or make changes to this bug.