Bug 2055415

Summary: Possible race condition in console operator managed cluster sync
Product: OpenShift Container Platform Reporter: Jon Jackson <jonjacks>
Component: Management ConsoleAssignee: Jon Jackson <jonjacks>
Status: CLOSED WONTFIX QA Contact: Yadan Pei <yapei>
Severity: low Docs Contact:
Priority: unspecified    
Version: 4.11CC: aos-bugs, kcormier
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-11-22 18:28:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jon Jackson 2022-02-16 21:14:09 UTC
Description of problem:
There is a possible race condition in the console operator where the managed cluster config gets updated after the console deployment and doesn't trigger a rollout. 

Version-Release number of selected component (if applicable):
4.10+

How reproducible:
Rarely

Steps to Reproduce:
1. Enable multicluster tech preview by adding TechPreviewNoUpgrade featureSet to FeatureGate config. (NOTE THIS ACTION IS IRREVERSIBLE AND WILL MAKE THE CLUSTER UNUPGRADEABLE AND UNSUPPORTED) 
2. Install ACM 2.5+
3. Import a managed cluster using either the ACM console or the CLI
4. Once that managed cluster is showing in the cluster dropdown, import a second managed cluster

Actual results:
Sometimes the second managed cluster will never show up in the cluster dropdown

Expected results:
The second managed cluster eventually shows up in the cluster dropdown after a page refresh


Additional info:
Work around is to delete the console pod to force a rollout. I suspect the problem is that sometimes the deployment rolls out a new pod before the managed cluster config has been updated, so the new cluster config doesn't get parsed. The subsequent update to the managed cluster config map doesn't trigger another rollout, and so the new config never gets consumed until a rollout is forced.

Comment 1 Jakub Hadvig 2022-02-17 11:34:38 UTC
Jon I have a feeling that we are not picking up the configmap informer's event that the managed-clusters.yaml CM was updated.
I see that we are filtering based on the label https://github.com/openshift/console-operator/blob/master/pkg/console/operator/operator.go#L152
wondering if that's not causing race.

Comment 2 Kevin Cormier 2022-03-25 21:34:43 UTC
I encountered a problem today where a managed cluster was not added to the cluster switcher. @jonjacks believes it was an instance of this issue, and he was able to resolve the problem by forcing deployment of new console pods.

Comment 3 Jon Jackson 2022-11-22 18:28:24 UTC
Migrated to Jira: https://issues.redhat.com/browse/OCPBUGS-4008