Bug 2050698 - After upgrading the cluster the console still show 0 of N, 0% progress for worker nodes
Summary: After upgrading the cluster the console still show 0 of N, 0% progress for wo...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Management Console
Version: 4.8
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.11.0
Assignee: Yadan Pei
QA Contact: Yadan Pei
URL:
Whiteboard:
: 1921529 2052046 2054722 (view as bug list)
Depends On:
Blocks: 2074571
TreeView+ depends on / blocked
 
Reported: 2022-02-04 13:37 UTC by Gabriel Meghnagi
Modified: 2022-11-03 17:02 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-10 10:47:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
410 to 411 upgrade finish (305.03 KB, image/png)
2022-02-23 07:41 UTC, Yadan Pei
no flags Details
410 to 411 upgrade just happen (186.44 KB, image/png)
2022-02-23 07:42 UTC, Yadan Pei
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift console pull 11038 0 None open [WIP] Bug 2050698: fix bug where Cluster Settings shows 0 of N, 0% progres… 2022-02-10 20:37:06 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 10:47:34 UTC

Description Gabriel Meghnagi 2022-02-04 13:37:32 UTC
Description of problem:

From (Administration > Cluster Setting > Details) even if the upgrade successfully completes, the console still shows 0 of N, 0%


Version-Release number of selected component (if applicable): 

4.9.17


How reproducible: 

Upgrade from 4.9.15 to 4.9.17

Comment 2 Robb Hamilton 2022-02-04 15:48:23 UTC
Reassigning to the MCO team for investigation as I believe this is an API issue.

We’ve had multiple reports [1][2] of this same bug over the last couple days. In both cases, the problem is the underlying data from the worker MCP we use to determine the worker nodes have completed their update does not appear to be updating.  This data point is the worker MCP Updating condition lastTransitionTime.  We compare this time to the CVO status.history[0].startedTime for the spec.desiredUpdate.version since the worker nodes update later in the update cycle and can continue updating after the CVO is finished its updates.  Any ideas on why the MCPs appear to not be updating in this scenario?

[1] https://coreos.slack.com/archives/C6A3NV5J9/p1643841199166689
[2] https://coreos.slack.com/archives/C6A3NV5J9/p1643970982911949

Comment 3 Yu Qi Zhang 2022-02-04 17:36:02 UTC
The diffs between https://amd64.ocp.releases.ci.openshift.org/releasestream/4-stable/release/4.9.15 and https://amd64.ocp.releases.ci.openshift.org/releasestream/4-stable/release/4.9.17

indicate that there was no update either in the base OS and the MCO templates. So there was no update to either pool, and the MCO reported upgraded as successfully.

The must-gather in https://access.redhat.com/support/cases/#/case/03138509/discussion?attachmentId=a092K0000336kzKQAQ corroborate that claim. The most recent MCs

rendered-master-5741dcbb1c6dc2460c89871e158a9138
rendered-worker-189700adbe95c6e3c93fd10805794a6b

Both were created Jan. 27th which was presumably the previous update, and the lastTransitionTime is thus showing this, which is expected.

TLDR is this will happen in situations where the MCO needs to perform no update between versions, such that we simply don't roll out a pool update. Is there a reason `lastTransitionTime` is what is being used to determine success? This will cause problems in situations like this.

Comment 4 Robb Hamilton 2022-02-04 18:15:27 UTC
Reassigning to console as this is indeed a console bug.

Comment 5 Jakub Hadvig 2022-02-10 07:54:09 UTC
*** Bug 2052046 has been marked as a duplicate of this bug. ***

Comment 6 Jakub Hadvig 2022-02-10 07:55:12 UTC
This isue was also reported for 4.8 in https://bugzilla.redhat.com/show_bug.cgi?id=2052046
Updating the 'Version' to 4.8 due to this fact

Comment 8 Jakub Hadvig 2022-02-15 15:31:44 UTC
*** Bug 2054722 has been marked as a duplicate of this bug. ***

Comment 9 Robb Hamilton 2022-02-21 13:12:32 UTC
@yapei@redhat.com, can you please see this gets verified?  We've seen a number of duplicate bugs filed.

Comment 11 Yadan Pei 2022-02-23 07:41:40 UTC
Created attachment 1862812 [details]
410 to 411 upgrade finish

1. Set up a 4.10.0-rc.3 cluster, create a custom MCP
$ oc label node yapeiup-l77hw-worker-c-2f26s.c.openshift-qe.internal node-role.kubernetes.io/infra=""
$ cat infra-mcp.yaml 
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: infra
spec:
  machineConfigSelector:
    matchExpressions:
      - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,infra]}
  maxUnavailable: null
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/infra: ""
  paused: false
$ oc create -f infra-mcp.yaml 

2. Upgrade the testing cluster to 4.11.0-0.nightly-2022-02-18-121223 which contains the bug fix, when upgrade finished, 'Worker Nodes' and 'infra Nodes' progress bar all shows correct status, they no longer appear on Cluster Settings page

verified on 4.11.0-0.nightly-2022-02-18-121223

Comment 12 Yadan Pei 2022-02-23 07:42:25 UTC
Created attachment 1862813 [details]
410 to 411 upgrade just happen

Comment 13 Kirsten Garrison 2022-03-02 21:05:23 UTC
*** Bug 1921529 has been marked as a duplicate of this bug. ***

Comment 16 errata-xmlrpc 2022-08-10 10:47:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.