Bug 1970154 - master pool is still upgrading when machine config reports level / restarts on osimageurl change
Summary: master pool is still upgrading when machine config reports level / restarts o...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.7
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.7.z
Assignee: Kirsten Garrison
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On: 1970150
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-06-09 22:37 UTC by Kirsten Garrison
Modified: 2021-06-29 04:20 UTC (History)
0 users

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-06-29 04:20:14 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 2607 0 None open [release-4.7] Bug 1970154: operator/sync.go confirm renderedconfig osimageurl matches cvo 2021-06-09 22:57:25 UTC
Red Hat Product Errata RHBA-2021:2502 0 None None None 2021-06-29 04:20:37 UTC

Description Kirsten Garrison 2021-06-09 22:37:05 UTC
This bug was initially created as a copy of Bug #1970150

I am copying this bug because: 



When there is no new MCO commit but there is an osimageurl change the master pool is still upgrading when the MCO reports level to the CVO.

This is a copy of Bug #1955929, which seems to address the issue, however there are some failing runs related to https://bugzilla.redhat.com/show_bug.cgi?id=1968754 so using this BZ to carry the fix which drastically reduced failures and keeping the other BZ open to audit after the new metal-ipi bug is fixed.


This bug was initially created as a copy of Bug #1955929

May  1 01:39:28.369: INFO: cluster upgrade is Progressing: Working towards 4.8.0-0.nightly-2021-05-01-000412: 652 of 675 done (96% complete)
May  1 01:39:38.369: INFO: Completed upgrade to registry.build01.ci.openshift.org/ci-op-ns22yv9h/release@sha256:1aeba3cfeb93d5912390fbffafaa3d024ae8db26489b01b2fa034d421f69b5db
May  1 01:39:38.460: INFO: Waiting on pools to be upgraded
May  1 01:39:38.632: INFO: Pool master is still reporting (Updated: false, Updating: true, Degraded: false)
May  1 01:39:38.632: INFO: Invariant violation detected: the "master" pool should be updated before the CVO reports available at the new version

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-e2e-aws-upgrade/1388283995501891584

Urgent because it’s happened in 38% of the last 16 upgrade jobs in nightly

https://search.ci.openshift.org/?search=Pool+master+is+still+reporting&maxAge=48h&context=1&type=build-log&name=upgrade&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Comment 3 Michael Nguyen 2021-06-15 15:59:40 UTC
Verified.  Upgraded from 4.7.0-0.nightly-2021-06-12-025733 to 4.7.0-0.nightly-2021-06-12-053330 which has an update for the OS but not for MCO.  See transition below that the co/machine-config was not available with the upgraded version until the pools completed upgrading.


Tue Jun 15 11:50:04 EDT 2021
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2021-06-12-025733   True        True          26m     Working towards 4.7.0-0.nightly-2021-06-12-053330: 560 of 669 done (83% complete)
NAME             VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
machine-config   4.7.0-0.nightly-2021-06-12-025733   False       True          True       6m49s
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2021-06-12-025733   True        True          26m     Working towards 4.7.0-0.nightly-2021-06-12-053330: 560 of 669 done (83% complete)
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-961197f37ebdfb636b9fc64c25140468   False     True       False      3              2                   2                     0                      59m
worker   rendered-worker-5cf6eda5df86647a1827238a63337466   True      False      False      3              3                   3                     0                      59m
---------------------
Tue Jun 15 11:50:36 EDT 2021
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2021-06-12-053330   True        False         6s      Cluster version is 4.7.0-0.nightly-2021-06-12-053330
NAME             VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
machine-config   4.7.0-0.nightly-2021-06-12-053330   True        False         False      17s
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2021-06-12-053330   True        False         7s      Cluster version is 4.7.0-0.nightly-2021-06-12-053330
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-8b817162cd075e7fcc238986f3a3398b   True      False      False      3              3                   3                     0                      60m
worker   rendered-worker-5cf6eda5df86647a1827238a63337466   True      False      False      3              3                   3                     0                      60m
---------------------

Comment 4 OpenShift Automated Release Tooling 2021-06-17 12:29:08 UTC
OpenShift engineering has decided to not ship Red Hat OpenShift Container Platform 4.7.17 due a regression https://bugzilla.redhat.com/show_bug.cgi?id=1973006. All the fixes which were part of 4.7.17 will be now part of 4.7.18 and planned to be available in candidate channel on June 23 2021 and in fast channel on June 28th.

Comment 8 errata-xmlrpc 2021-06-29 04:20:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.7.18 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2502


Note You need to log in before you can comment on or make changes to this bug.