G2Bsync 742736374 comment xiangjingli Thu, 10 Dec 2020 19:15:45 UTC G2Bsync We just resolved one issue found in the latest subscription images `multicluster-operators-subscription:community-latest` - The watch functionality in managed cluster subscription controller is dead after we apply the V0.6.3 k8s runtime controller. Please verify it helps resolve the issue 1. delete the hub subscription pod, since the CSV has applied to the `community-latest` tag, the newly created hub subscription pod should fetch the latest subscription image ``` open-cluster-management multicluster-operators-hub-subscription-6f45b4456d-6824d ``` 2. Also need to apply the latest subscription pod to managed cluster subscription pod, the csv patch won't impact the managed cluster subscription pod ``` open-cluster-management-agent-addon klusterlet-addon-appmgr-76fd9f6f75-f8fzh ``` e.g. To patch subscription pod in the managed cluster cluster1. ``` 1.$ oc annotate klusterletaddonconfig -n cluster1 cluster1 klusterletaddonconfig-pause=true --overwrite=true 2. $ oc edit manifestwork -n cluster1 cluster1-klusterlet-addon-appmgr imageOverrides: multicluster_operators_subscription: quay.io/open-cluster-management/multicluster-operators-subscription:community-latest ``` If hub cluster is self-managed cluster, also need to this patch, the hub self-managed cluster is named as `local-cluster` by default. Please note applying `community-latest` image to an older version of ACM may be dangerous sometimes. As its name indicated, it is the latest image that may have some new features in roadmap projects. It may have some dependencies such as new CRDs, configmap etc that are not bundled in the old version of ACM. In this case, since customer cluster is using ACM 2.1, it should be safe to patch the `community-2.1` tag images to get new fixes related to ACM 2.1.
G2Bsync 746807346 comment xiangjingli Wed, 16 Dec 2020 18:44:01 UTC G2Bsync Thanks James, From the log you attached, it seems the existing `multicluster-operators-standalone-subscription` pod is not terminated successfully, that caused the new CSV patch failure. ``` Normal InstallWaiting 6m26s (x5 over 10m) operator-lifecycle-manager installing: waiting for deployment multicluster-operators-standalone-subscription to become ready: Waiting for rollout to finish: 1 old replicas are pending termination... Warning InstallCheckFailed 92s operator-lifecycle-manager install failed: deployment multicluster-operators-standalone-subscription not ready before timeout: deployment "multicluster-operators-standalone-subscription" exceeded its progress deadline ``` There are a couple of ways that is worth to try. 1. force delete the existing `multicluster-operators-standalone-subscription` and `multicluster-operators-hub-subscription` pod ``` % oc get pods -n open-cluster-management |grep multicluster-operators multicluster-operators-hub-subscription-699574fb5c-jkdmz 1/1 Running 0 4d18h multicluster-operators-standalone-subscription-7ccb4bd766-67ddb 1/1 Running 0 4d18h % oc delete pods -n open-cluster-management multicluster-operators-hub-subscription-699574fb5c-jkdmz multicluster-operators-standalone-subscription-7ccb4bd766-67ddb ``` Then the two pods should be restarted with the new image tag. Please note that in the CSV, there are two `hub-subscription`, `standalone-subscription` deployments that apply the multicluster-operators-subscription image , It would be better to replace the image tag intor both deployments ``` - name: multicluster-operators-hub-subscription ... image: quay.io/open-cluster-management/multicluster-operators-subscription@sha256:19b3d1add31e5e7026ade1eb0487cbb5618c52b219a83f3c5473ce16beaa7d88 - name: multicluster-operators-standalone-subscription ... image: quay.io/open-cluster-management/multicluster-operators-subscription@sha256:19b3d1add31e5e7026ade1eb0487cbb5618c52b219a83f3c5473ce16beaa7d88 ``` 2. try if the image `quay.io/open-cluster-management/multicluster-operators-subscription:community-2.1` works The `community-2.1` image is for ACM 2.1 release. Also it is not using the newer V0.6.3 k8s runtime controller. After the CSV patch is done, we are expecting to see the three pods are in running status ``` % oc get pods -n open-cluster-management |grep multicluster-operators multicluster-operators-application-556d678cdd-dpj48 5/5 Running 4 4d18h multicluster-operators-hub-subscription-699574fb5c-jkdmz 1/1 Running 0 4d18h multicluster-operators-standalone-subscription-7ccb4bd766-67ddb 1/1 Running 0 4d18h ```
G2Bsync 762396989 comment juliana-hsu Mon, 18 Jan 2021 17:58:09 UTC G2Bsync @YoungJM Is this resolved, can it be closed?
This can be closed as https://github.com/open-cluster-management/backlog/issues/7171 has been closed. Fix should be in 2.1.3.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat Advanced Cluster Management 2.1.3 security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:0607