Bug 2079418 - cluster update status is stuck, also update is not even visible
Summary: cluster update status is stuck, also update is not even visible
Alias: None
Product: Red Hat Advanced Cluster Management for Kubernetes
Classification: Red Hat
Component: Cluster Lifecycle
Version: rhacm-2.4.z
Hardware: x86_64
OS: Linux
Target Milestone: ---
: rhacm-2.4.6
Assignee: Jian Qiu
QA Contact: Hui Chen
Christopher Dawson
Depends On:
TreeView+ depends on / blocked
Reported: 2022-04-27 14:11 UTC by Ilkka Tengvall
Modified: 2022-09-26 14:53 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2022-09-26 14:52:19 UTC
Target Upstream Version:
bot-tracker-sync: rhacm-2.4.z+

Attachments (Terms of Use)
ManagedCluster, ManagedClusterInfo, ClusterCurator (30.55 KB, text/plain)
2022-04-27 14:11 UTC, Ilkka Tengvall
no flags Details

System ID Private Priority Status Summary Last Updated
Github stolostron backlog issues 21996 0 None None None 2022-04-27 15:08:23 UTC
Red Hat Product Errata RHSA-2022:6696 0 None None None 2022-09-26 14:53:33 UTC

Description Ilkka Tengvall 2022-04-27 14:11:31 UTC
Created attachment 1875388 [details]
ManagedCluster, ManagedClusterInfo, ClusterCurator

Description of the problem:

I don't see cluster update option for the local-cluster, and while manually going to openshift console and applying update, the status get stuck in acm gui.

Release version: 2.4.3

Operator snapshot version:

OCP version: 4.10.11

Browser Info: FF latest on F35

Steps to reproduce:
1. go to acm clusters
2. monitor the status of local cluster

Actual results:

while update was available in cluster console, acm doesn't show it. ACM shows available update for the remote cluster, which was also the same version 4.10.9

Expected results:

There should be update button. And once monitoring the manually triggered update, the state should say up to date instead of "progressing 84%"

Additional info:

Chat here internally: https://chat.google.com/room/AAAAWskU424/njKNFzKsDko with screenshots

also see stuff.yml for requested resources.

Comment 1 Ilkka Tengvall 2022-04-27 14:14:09 UTC
logs from clustrer-curator-controller:

bin/sh: ./cluster-curator-controller: No such file or directory
I0426 21:36:32.633030       1 request.go:665] Waited for 1.600608866s due to client-side throttling, not priority and fairness, request: GET:
2022-04-26T21:36:35.436Z	INFO	controller-runtime.metrics	metrics server is starting to listen	{"addr": ":8080"}
2022-04-26T21:36:36.232Z	INFO	setup	starting manager
2022-04-26T21:36:36.531Z	INFO	starting metrics server	{"path": "/metrics"}
I0426 21:36:36.532049       1 leaderelection.go:248] attempting to acquire leader lease open-cluster-management/d362c584.cluster.open-cluster-management.io...
I0426 23:01:06.832334       1 leaderelection.go:258] successfully acquired lease open-cluster-management/d362c584.cluster.open-cluster-management.io
2022-04-26T23:01:06.832Z	DEBUG	events	Normal	{"object": {"kind":"ConfigMap","namespace":"open-cluster-management","name":"d362c584.cluster.open-cluster-management.io","uid":"aa8d8ae1-5f68-4506-ba94-1c823584dee4","apiVersion":"v1","resourceVersion":"45350134"}, "reason": "LeaderElection", "message": "cluster-curator-controller-76bc4968b5-zbsfn_d2f9927a-e079-4a75-acfb-240d7cda7ac8 became leader"}
2022-04-26T23:01:06.832Z	DEBUG	events	Normal	{"object": {"kind":"Lease","namespace":"open-cluster-management","name":"d362c584.cluster.open-cluster-management.io","uid":"42d06700-a958-4c44-8e39-709217f798d2","apiVersion":"coordination.k8s.io/v1","resourceVersion":"45350139"}, "reason": "LeaderElection", "message": "cluster-curator-controller-76bc4968b5-zbsfn_d2f9927a-e079-4a75-acfb-240d7cda7ac8 became leader"}
2022-04-26T23:01:07.133Z	INFO	controller.clustercurator	Starting EventSource	{"reconciler group": "cluster.open-cluster-management.io", "reconciler kind": "ClusterCurator", "source": "kind source: /, Kind="}
2022-04-26T23:01:07.133Z	INFO	controller.clustercurator	Starting Controller	{"reconciler group": "cluster.open-cluster-management.io", "reconciler kind": "ClusterCurator"}
2022-04-26T23:01:07.632Z	INFO	controller.clustercurator	Starting workers	{"reconciler group": "cluster.open-cluster-management.io", "reconciler kind": "ClusterCurator", "worker count": 1}

Comment 2 Kevin Cormier 2022-04-27 14:17:27 UTC
Seems similar to https://bugzilla.redhat.com/show_bug.cgi?id=2005759
It looks like we did not deliver a fix for 2.4.z but that something was improved for 2.5

Comment 3 Le Yang 2022-06-14 03:48:55 UTC
When upgrading a managed cluster, cluster-curator-controller will create a job to monitor the progress of the upgrading and update the status of the clustercurator CR accordingly. If the job fails for some reason, the status of the clustercurator CR will stuck in a stale status, while the upgrading may have already been completed. ACM console reads the upgrading status from the clustercurator CR, If it find the cluster is in a upgrading status, it will not show upgrade options on the UI. That's the root cause of this issue. It has been fixed in 2.5 release and the fix should be backport to 2.4 as well.

Comment 4 Le Yang 2022-06-30 10:37:47 UTC
The fix has been merged. It will be available in ACM 2.4.6.

Comment 5 Napoco Agbetra 2022-09-15 17:14:58 UTC
Verified on 2.4.6-DOWNSTREAM-2022-09-07-18-40-46
Cluster update status was visible on both UI and cluster curator yaml
Upgrade was completed

Comment 10 errata-xmlrpc 2022-09-26 14:52:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Critical: Red Hat Advanced Cluster Management 2.4.6 security update and bug fixes), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.