Bug 2122292

Summary: pods in CrashLoopBackoff on 3.11 managed cluster
Product: Red Hat Advanced Cluster Management for Kubernetes Reporter: Ryan Spagnola <rspagnol>
Component: Cluster LifecycleAssignee: Le Yang <leyan>
Status: CLOSED ERRATA QA Contact: Hui Chen <huichen>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhacm-2.5CC: dhuynh, jverreng, njean, zyin
Target Milestone: ---Flags: huichen: qe_test_coverage+
bot-tracker-sync: rhacm-2.5.z+
Target Release: rhacm-2.5.2   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-09-13 20:06:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ryan Spagnola 2022-08-29 17:49:02 UTC
Description of the problem:
Updated the MCIR with the appropriate registries, the 4.x clusters picked up the change and are working as expected. The 3.11 clusters however have not picked up the change and are still in CrashLoopBackoff.

Release version:
2.5

Operator snapshot version:

OCP version:
3.11

Browser Info:

Steps to reproduce:
1. Attach 3.11 cluster to a hub using the MCIR settings from 2.4
2. Upgrade ACM from 2.4 to 2.5
3. Modify MCIR with the registries in 2.5 docs

Actual results:
3.11 clusters do not pick up the change and remain in 

Expected results:
3.11 clusters behave the same as 4.x and pick up the MCIR change

Additional info:
2.4 MCIR docs
https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes/2.4/html/clusters/managing-your-clusters#imp[…]ride

2.5 MCIR docs
https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes/2.5/html/clusters/managing-your-clusters#imp[…]ride

Comment 1 zyin@redhat.com 2022-08-30 15:02:36 UTC
could you help collect some info? or the mce and mch must-gather will be best.

oc get placements.cluster.open-cluster-management.io -A -o yaml
oc get placementdecisions.cluster.open-cluster-management.io -A -o yaml
oc get managedclusterset -o yaml
oc get mcl -o yaml
oc get managedclusterimageregistries.imageregistry.open-cluster-management.io -A -o yaml
oc get managedclusteraddons.addon.open-cluster-management.io -A -o yaml
oc get manifestworks.work.open-cluster-management.io -A -o yaml

Comment 6 zyin@redhat.com 2022-08-31 15:02:42 UTC
from the logs the cluster dmzsecuretestrr2 is not selected by the placement used in ManagedClusterImageRegistry. because the cluster has a taint 
    taints:
    - effect: NoSelect
      key: cluster.open-cluster-management.io/unreachable
      timeAdded: null


in 2.5, the placement used in ManagedClusterImageRegistry should add a toleration in the spec:
  tolerations:
  - key: "cluster.open-cluster-management.io/unreachable"
    operator: Exists

please refer to the section 1.7.3.1 step 2 in the doc https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes/2.5/html/clusters/managing-your-clusters#imp-clust-custom-image-override

Comment 14 errata-xmlrpc 2022-09-13 20:06:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Critical: Red Hat Advanced Cluster Management 2.5.2 security fixes and bug fixes), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:6507

Comment 16 Red Hat Bugzilla 2023-09-19 04:25:10 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days