Description of problem: From (Administration > Cluster Setting > Details) even if the upgrade successfully completes, the console still shows 0 of N, 0% Version-Release number of selected component (if applicable): 4.9.17 How reproducible: Upgrade from 4.9.15 to 4.9.17
Reassigning to the MCO team for investigation as I believe this is an API issue. We’ve had multiple reports [1][2] of this same bug over the last couple days. In both cases, the problem is the underlying data from the worker MCP we use to determine the worker nodes have completed their update does not appear to be updating. This data point is the worker MCP Updating condition lastTransitionTime. We compare this time to the CVO status.history[0].startedTime for the spec.desiredUpdate.version since the worker nodes update later in the update cycle and can continue updating after the CVO is finished its updates. Any ideas on why the MCPs appear to not be updating in this scenario? [1] https://coreos.slack.com/archives/C6A3NV5J9/p1643841199166689 [2] https://coreos.slack.com/archives/C6A3NV5J9/p1643970982911949
The diffs between https://amd64.ocp.releases.ci.openshift.org/releasestream/4-stable/release/4.9.15 and https://amd64.ocp.releases.ci.openshift.org/releasestream/4-stable/release/4.9.17 indicate that there was no update either in the base OS and the MCO templates. So there was no update to either pool, and the MCO reported upgraded as successfully. The must-gather in https://access.redhat.com/support/cases/#/case/03138509/discussion?attachmentId=a092K0000336kzKQAQ corroborate that claim. The most recent MCs rendered-master-5741dcbb1c6dc2460c89871e158a9138 rendered-worker-189700adbe95c6e3c93fd10805794a6b Both were created Jan. 27th which was presumably the previous update, and the lastTransitionTime is thus showing this, which is expected. TLDR is this will happen in situations where the MCO needs to perform no update between versions, such that we simply don't roll out a pool update. Is there a reason `lastTransitionTime` is what is being used to determine success? This will cause problems in situations like this.
Reassigning to console as this is indeed a console bug.
*** Bug 2052046 has been marked as a duplicate of this bug. ***
This isue was also reported for 4.8 in https://bugzilla.redhat.com/show_bug.cgi?id=2052046 Updating the 'Version' to 4.8 due to this fact
*** Bug 2054722 has been marked as a duplicate of this bug. ***
@yapei@redhat.com, can you please see this gets verified? We've seen a number of duplicate bugs filed.
Created attachment 1862812 [details] 410 to 411 upgrade finish 1. Set up a 4.10.0-rc.3 cluster, create a custom MCP $ oc label node yapeiup-l77hw-worker-c-2f26s.c.openshift-qe.internal node-role.kubernetes.io/infra="" $ cat infra-mcp.yaml apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfigPool metadata: name: infra spec: machineConfigSelector: matchExpressions: - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,infra]} maxUnavailable: null nodeSelector: matchLabels: node-role.kubernetes.io/infra: "" paused: false $ oc create -f infra-mcp.yaml 2. Upgrade the testing cluster to 4.11.0-0.nightly-2022-02-18-121223 which contains the bug fix, when upgrade finished, 'Worker Nodes' and 'infra Nodes' progress bar all shows correct status, they no longer appear on Cluster Settings page verified on 4.11.0-0.nightly-2022-02-18-121223
Created attachment 1862813 [details] 410 to 411 upgrade just happen
*** Bug 1921529 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069