1906142 – Moving from Patched ACM 2.1.x CSV to Default Results in Degraded Cluster

Bug 1906142 - Moving from Patched ACM 2.1.x CSV to Default Results in Degraded Cluster

Summary: Moving from Patched ACM 2.1.x CSV to Default Results in Degraded Cluster

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Advanced Cluster Management for Kubernetes
Classification:	Red Hat
Component:	App Lifecycle
Sub Component:
Version:	rhacm-2.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	Xiangjing Li
QA Contact:	Eveline Cai
Docs Contact:	bswope@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-12-09 18:05 UTC by James Young
Modified:	2024-03-25 17:29 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-02-17 18:19:07 UTC
Target Upstream Version:
Embargoed:
Flags:	amcnamar: rhacm-2.1.z+

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	open-cluster-management backlog issues 7654	0	None	None	None	2021-02-22 14:22:32 UTC
Red Hat Product Errata	RHSA-2021:0607	0	None	None	None	2021-02-17 18:19:24 UTC

Comment 1 Mike Ng 2020-12-10 19:25:43 UTC

G2Bsync 742736374 comment
xiangjingli Thu, 10 Dec 2020 19:15:45 UTC
G2Bsync

We just resolved one issue found in the latest subscription images `multicluster-operators-subscription:community-latest` - The watch functionality in managed cluster subscription controller is dead after we apply the V0.6.3 k8s runtime controller.

Please verify it helps resolve the issue

1. delete the hub subscription pod, since the CSV has applied to the `community-latest` tag, the newly created hub subscription pod should fetch the latest subscription image
```
open-cluster-management multicluster-operators-hub-subscription-6f45b4456d-6824d
```
2. Also need to apply the latest subscription pod to managed cluster subscription pod, the csv patch won't impact the managed cluster subscription pod
```
open-cluster-management-agent-addon klusterlet-addon-appmgr-76fd9f6f75-f8fzh
```
e.g. To patch subscription pod in the managed cluster cluster1.

```
1.$ oc annotate klusterletaddonconfig -n cluster1 cluster1 klusterletaddonconfig-pause=true --overwrite=true

2. $ oc edit manifestwork -n cluster1 cluster1-klusterlet-addon-appmgr
imageOverrides:
multicluster_operators_subscription: quay.io/open-cluster-management/multicluster-operators-subscription:community-latest
```
If hub cluster is self-managed cluster, also need to this patch, the hub self-managed cluster is named as `local-cluster` by default.

Please note applying `community-latest` image to an older version of ACM may be dangerous sometimes. As its name indicated, it is the latest image that may have some new features in roadmap projects. It may have some dependencies such as new CRDs, configmap etc that are not bundled in the old version of ACM.

In this case, since customer cluster is using ACM 2.1, it should be safe to patch the `community-2.1` tag images to get new fixes related to ACM 2.1.

Comment 3 Mike Ng 2020-12-16 18:52:23 UTC

G2Bsync 746807346 comment 
 xiangjingli Wed, 16 Dec 2020 18:44:01 UTC 
 G2Bsync

Thanks James,  From the log you attached, it seems  the existing `multicluster-operators-standalone-subscription` pod is not terminated successfully, that caused the new CSV patch failure.

```
Normal   InstallWaiting      6m26s (x5 over 10m)    operator-lifecycle-manager  installing: waiting for deployment multicluster-operators-standalone-subscription to become ready: Waiting for rollout to finish: 1 old replicas are pending termination...
  Warning  InstallCheckFailed  92s                    operator-lifecycle-manager  install failed: deployment multicluster-operators-standalone-subscription not ready before timeout: deployment "multicluster-operators-standalone-subscription" exceeded its progress deadline
```
There are a couple of ways that is worth to try.

1.  force delete the existing `multicluster-operators-standalone-subscription` and `multicluster-operators-hub-subscription` pod
```
% oc get pods -n open-cluster-management |grep multicluster-operators
multicluster-operators-hub-subscription-699574fb5c-jkdmz          1/1     Running     0          4d18h
multicluster-operators-standalone-subscription-7ccb4bd766-67ddb   1/1     Running     0          4d18h

% oc delete pods -n open-cluster-management multicluster-operators-hub-subscription-699574fb5c-jkdmz   multicluster-operators-standalone-subscription-7ccb4bd766-67ddb 
```
Then the two pods should be restarted with the new image tag.

Please note that in the CSV,  there are two `hub-subscription`, `standalone-subscription` deployments  that apply the multicluster-operators-subscription image , It would be better to replace the image tag intor both deployments

```
      - name: multicluster-operators-hub-subscription
      ...
                image: quay.io/open-cluster-management/multicluster-operators-subscription@sha256:19b3d1add31e5e7026ade1eb0487cbb5618c52b219a83f3c5473ce16beaa7d88

      - name: multicluster-operators-standalone-subscription
      ...
                image: quay.io/open-cluster-management/multicluster-operators-subscription@sha256:19b3d1add31e5e7026ade1eb0487cbb5618c52b219a83f3c5473ce16beaa7d88
```

2.  try if the image `quay.io/open-cluster-management/multicluster-operators-subscription:community-2.1` works 

The `community-2.1` image is for ACM 2.1 release. Also it is not using the newer V0.6.3 k8s runtime controller.

After the CSV patch is done, we are expecting to see the three pods are in running status
```
% oc get pods -n open-cluster-management |grep multicluster-operators
multicluster-operators-application-556d678cdd-dpj48               5/5     Running     4          4d18h
multicluster-operators-hub-subscription-699574fb5c-jkdmz          1/1     Running     0          4d18h
multicluster-operators-standalone-subscription-7ccb4bd766-67ddb   1/1     Running     0          4d18h
```

Comment 4 Mike Ng 2021-02-01 19:45:52 UTC

G2Bsync 762396989 comment 
 juliana-hsu Mon, 18 Jan 2021 17:58:09 UTC 
 G2Bsync @YoungJM Is this resolved, can it be closed?

Comment 5 James Young 2021-02-04 07:07:51 UTC

This can be closed as https://github.com/open-cluster-management/backlog/issues/7171 has been closed. Fix should be in 2.1.3.

Comment 11 errata-xmlrpc 2021-02-17 18:19:07 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Advanced Cluster Management 2.1.3 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0607

Note You need to log in before you can comment on or make changes to this bug.