Bug 1970150 - master pool is still upgrading when machine config reports level / restarts on osimageurl change
Summary: master pool is still upgrading when machine config reports level / restarts o...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.8
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.8.0
Assignee: Kirsten Garrison
QA Contact: Michael Nguyen
URL:
Whiteboard:
: 1955929 (view as bug list)
Depends On:
Blocks: 1970154
TreeView+ depends on / blocked
 
Reported: 2021-06-09 22:31 UTC by Kirsten Garrison
Modified: 2021-07-28 10:10 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 23:12:27 UTC
Target Upstream Version:


Attachments (Terms of Use)
upgrade progression 1 (128.00 KB, image/png)
2021-06-10 13:47 UTC, Michael Nguyen
no flags Details
upgrade progression 2 (129.14 KB, image/png)
2021-06-10 13:47 UTC, Michael Nguyen
no flags Details
upgrade progression 3 (128.80 KB, image/png)
2021-06-10 13:48 UTC, Michael Nguyen
no flags Details
upgrade progression 4 (127.57 KB, image/png)
2021-06-10 13:49 UTC, Michael Nguyen
no flags Details
upgrade progression 5 (126.20 KB, image/png)
2021-06-10 13:50 UTC, Michael Nguyen
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 2585 0 None closed Bug 1970150: operator/sync.go confirm renderedconfig osimageurl matches cvo 2021-06-09 22:36:10 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 23:12:42 UTC

Description Kirsten Garrison 2021-06-09 22:31:57 UTC
When there is no new MCO commit but there is an osimageurl change the master pool is still upgrading when the MCO reports level to the CVO.

This is a copy of Bug #1955929, which seems to address the issue, however there are some failing runs related to https://bugzilla.redhat.com/show_bug.cgi?id=1968754 so using this BZ to carry the fix which drastically reduced failures and keeping the other BZ open to audit after the new metal-ipi bug is fixed.


This bug was initially created as a copy of Bug #1955929

May  1 01:39:28.369: INFO: cluster upgrade is Progressing: Working towards 4.8.0-0.nightly-2021-05-01-000412: 652 of 675 done (96% complete)
May  1 01:39:38.369: INFO: Completed upgrade to registry.build01.ci.openshift.org/ci-op-ns22yv9h/release@sha256:1aeba3cfeb93d5912390fbffafaa3d024ae8db26489b01b2fa034d421f69b5db
May  1 01:39:38.460: INFO: Waiting on pools to be upgraded
May  1 01:39:38.632: INFO: Pool master is still reporting (Updated: false, Updating: true, Degraded: false)
May  1 01:39:38.632: INFO: Invariant violation detected: the "master" pool should be updated before the CVO reports available at the new version

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-e2e-aws-upgrade/1388283995501891584

Urgent because it’s happened in 38% of the last 16 upgrade jobs in nightly

https://search.ci.openshift.org/?search=Pool+master+is+still+reporting&maxAge=48h&context=1&type=build-log&name=upgrade&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Comment 1 Michael Nguyen 2021-06-10 13:47:20 UTC
Created attachment 1789861 [details]
upgrade progression 1

Comment 2 Michael Nguyen 2021-06-10 13:47:55 UTC
Created attachment 1789863 [details]
upgrade progression 2

Comment 3 Michael Nguyen 2021-06-10 13:48:47 UTC
Created attachment 1789864 [details]
upgrade progression 3

Comment 4 Michael Nguyen 2021-06-10 13:49:11 UTC
Created attachment 1789865 [details]
upgrade progression 4

Comment 5 Michael Nguyen 2021-06-10 13:50:23 UTC
Created attachment 1789866 [details]
upgrade progression 5

Comment 6 Michael Nguyen 2021-06-10 13:52:28 UTC
Verified on  registry.ci.openshift.org/ocp/release:4.8.0-0.nightly-2021-06-10-014052.  Upgraded to  registry.ci.openshift.org/ocp/release:4.8.0-0.nightly-2021-06-10-045932 which has no new MCO commit and a new osImageURL.  Watched `oc get co/machine-config` `oc get clusterversion` `oc get mcp`.  Verified the `co/machine-config` did not transition to the new version until the master pool completed updating.  See attachments.

Comment 8 Kirsten Garrison 2021-07-02 00:42:14 UTC
*** Bug 1955929 has been marked as a duplicate of this bug. ***

Comment 10 errata-xmlrpc 2021-07-27 23:12:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.