2050698 – After upgrading the cluster the console still show 0 of N, 0% progress for worker nodes

Bug 2050698 - After upgrading the cluster the console still show 0 of N, 0% progress for worker nodes

Summary: After upgrading the cluster the console still show 0 of N, 0% progress for wo...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Management Console
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.11.0
Assignee:	Yadan Pei
QA Contact:	Yadan Pei
Docs Contact:
URL:
Whiteboard:
Duplicates (3):	1921529 2052046 2054722 (view as bug list)
Depends On:
Blocks:	2074571
TreeView+	depends on / blocked

Reported:	2022-02-04 13:37 UTC by Gabriel Meghnagi
Modified:	2022-11-03 17:02 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-08-10 10:47:09 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
410 to 411 upgrade finish (305.03 KB, image/png) 2022-02-23 07:41 UTC, Yadan Pei	no flags	Details
410 to 411 upgrade just happen (186.44 KB, image/png) 2022-02-23 07:42 UTC, Yadan Pei	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift console pull 11038	0	None	open	[WIP] Bug 2050698: fix bug where Cluster Settings shows 0 of N, 0% progres…	2022-02-10 20:37:06 UTC
Red Hat Product Errata	RHSA-2022:5069	0	None	None	None	2022-08-10 10:47:34 UTC

Description Gabriel Meghnagi 2022-02-04 13:37:32 UTC

Description of problem:

From (Administration > Cluster Setting > Details) even if the upgrade successfully completes, the console still shows 0 of N, 0%


Version-Release number of selected component (if applicable): 

4.9.17


How reproducible: 

Upgrade from 4.9.15 to 4.9.17

Comment 2 Robb Hamilton 2022-02-04 15:48:23 UTC

Reassigning to the MCO team for investigation as I believe this is an API issue.

We’ve had multiple reports [1][2] of this same bug over the last couple days. In both cases, the problem is the underlying data from the worker MCP we use to determine the worker nodes have completed their update does not appear to be updating.  This data point is the worker MCP Updating condition lastTransitionTime.  We compare this time to the CVO status.history[0].startedTime for the spec.desiredUpdate.version since the worker nodes update later in the update cycle and can continue updating after the CVO is finished its updates.  Any ideas on why the MCPs appear to not be updating in this scenario?

[1] https://coreos.slack.com/archives/C6A3NV5J9/p1643841199166689
[2] https://coreos.slack.com/archives/C6A3NV5J9/p1643970982911949

Comment 3 Yu Qi Zhang 2022-02-04 17:36:02 UTC

The diffs between https://amd64.ocp.releases.ci.openshift.org/releasestream/4-stable/release/4.9.15 and https://amd64.ocp.releases.ci.openshift.org/releasestream/4-stable/release/4.9.17

indicate that there was no update either in the base OS and the MCO templates. So there was no update to either pool, and the MCO reported upgraded as successfully.

The must-gather in https://access.redhat.com/support/cases/#/case/03138509/discussion?attachmentId=a092K0000336kzKQAQ corroborate that claim. The most recent MCs

rendered-master-5741dcbb1c6dc2460c89871e158a9138
rendered-worker-189700adbe95c6e3c93fd10805794a6b

Both were created Jan. 27th which was presumably the previous update, and the lastTransitionTime is thus showing this, which is expected.

TLDR is this will happen in situations where the MCO needs to perform no update between versions, such that we simply don't roll out a pool update. Is there a reason `lastTransitionTime` is what is being used to determine success? This will cause problems in situations like this.

Comment 4 Robb Hamilton 2022-02-04 18:15:27 UTC

Reassigning to console as this is indeed a console bug.

Comment 5 Jakub Hadvig 2022-02-10 07:54:09 UTC

*** Bug 2052046 has been marked as a duplicate of this bug. ***

Comment 6 Jakub Hadvig 2022-02-10 07:55:12 UTC

This isue was also reported for 4.8 in https://bugzilla.redhat.com/show_bug.cgi?id=2052046
Updating the 'Version' to 4.8 due to this fact

Comment 8 Jakub Hadvig 2022-02-15 15:31:44 UTC

*** Bug 2054722 has been marked as a duplicate of this bug. ***

Comment 9 Robb Hamilton 2022-02-21 13:12:32 UTC

@yapei@redhat.com, can you please see this gets verified?  We've seen a number of duplicate bugs filed.

Comment 11 Yadan Pei 2022-02-23 07:41:40 UTC

Created attachment 1862812 [details]
410 to 411 upgrade finish

1. Set up a 4.10.0-rc.3 cluster, create a custom MCP
$ oc label node yapeiup-l77hw-worker-c-2f26s.c.openshift-qe.internal node-role.kubernetes.io/infra=""
$ cat infra-mcp.yaml 
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: infra
spec:
  machineConfigSelector:
    matchExpressions:
      - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,infra]}
  maxUnavailable: null
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/infra: ""
  paused: false
$ oc create -f infra-mcp.yaml 

2. Upgrade the testing cluster to 4.11.0-0.nightly-2022-02-18-121223 which contains the bug fix, when upgrade finished, 'Worker Nodes' and 'infra Nodes' progress bar all shows correct status, they no longer appear on Cluster Settings page

verified on 4.11.0-0.nightly-2022-02-18-121223

Comment 12 Yadan Pei 2022-02-23 07:42:25 UTC

Created attachment 1862813 [details]
410 to 411 upgrade just happen

Comment 13 Kirsten Garrison 2022-03-02 21:05:23 UTC

*** Bug 1921529 has been marked as a duplicate of this bug. ***

Comment 16 errata-xmlrpc 2022-08-10 10:47:09 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069

Note You need to log in before you can comment on or make changes to this bug.