Bug 2122292

Summary:	pods in CrashLoopBackoff on 3.11 managed cluster
Product:	Red Hat Advanced Cluster Management for Kubernetes	Reporter:	Ryan Spagnola <rspagnol>
Component:	Cluster Lifecycle	Assignee:	Le Yang <leyan>
Status:	CLOSED ERRATA	QA Contact:	Hui Chen <huichen>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	rhacm-2.5	CC:	dhuynh, jverreng, njean, zyin
Target Milestone:	---	Flags:	huichen: qe_test_coverage+ bot-tracker-sync: rhacm-2.5.z+
Target Release:	rhacm-2.5.2
Hardware:	All
OS:	All
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2022-09-13 20:06:38 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Ryan Spagnola 2022-08-29 17:49:02 UTC

Description of the problem:
Updated the MCIR with the appropriate registries, the 4.x clusters picked up the change and are working as expected. The 3.11 clusters however have not picked up the change and are still in CrashLoopBackoff.

Release version:
2.5

Operator snapshot version:

OCP version:
3.11

Browser Info:

Steps to reproduce:
1. Attach 3.11 cluster to a hub using the MCIR settings from 2.4
2. Upgrade ACM from 2.4 to 2.5
3. Modify MCIR with the registries in 2.5 docs

Actual results:
3.11 clusters do not pick up the change and remain in 

Expected results:
3.11 clusters behave the same as 4.x and pick up the MCIR change

Additional info:
2.4 MCIR docs
https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes/2.4/html/clusters/managing-your-clusters#imp[…]ride

2.5 MCIR docs
https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes/2.5/html/clusters/managing-your-clusters#imp[…]ride

Comment 1 zyin@redhat.com 2022-08-30 15:02:36 UTC

could you help collect some info? or the mce and mch must-gather will be best.

oc get placements.cluster.open-cluster-management.io -A -o yaml
oc get placementdecisions.cluster.open-cluster-management.io -A -o yaml
oc get managedclusterset -o yaml
oc get mcl -o yaml
oc get managedclusterimageregistries.imageregistry.open-cluster-management.io -A -o yaml
oc get managedclusteraddons.addon.open-cluster-management.io -A -o yaml
oc get manifestworks.work.open-cluster-management.io -A -o yaml

Comment 6 zyin@redhat.com 2022-08-31 15:02:42 UTC

from the logs the cluster dmzsecuretestrr2 is not selected by the placement used in ManagedClusterImageRegistry. because the cluster has a taint 
    taints:
    - effect: NoSelect
      key: cluster.open-cluster-management.io/unreachable
      timeAdded: null


in 2.5, the placement used in ManagedClusterImageRegistry should add a toleration in the spec:
  tolerations:
  - key: "cluster.open-cluster-management.io/unreachable"
    operator: Exists

please refer to the section 1.7.3.1 step 2 in the doc https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes/2.5/html/clusters/managing-your-clusters#imp-clust-custom-image-override

Comment 14 errata-xmlrpc 2022-09-13 20:06:38 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Critical: Red Hat Advanced Cluster Management 2.5.2 security fixes and bug fixes), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:6507

Comment 16 Red Hat Bugzilla 2023-09-19 04:25:10 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days