2055415 – Possible race condition in console operator managed cluster sync

Bug 2055415 - Possible race condition in console operator managed cluster sync

Summary: Possible race condition in console operator managed cluster sync

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Management Console
Sub Component:
Version:	4.11
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	low
Target Milestone:	---
Target Release:	---
Assignee:	Jon Jackson
QA Contact:	Yadan Pei
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-02-16 21:14 UTC by Jon Jackson
Modified:	2022-11-22 18:28 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-11-22 18:28:24 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	OCPBUGS-4008	0	None	None	None	2022-11-22 18:28:23 UTC

Description Jon Jackson 2022-02-16 21:14:09 UTC

Description of problem:
There is a possible race condition in the console operator where the managed cluster config gets updated after the console deployment and doesn't trigger a rollout. 

Version-Release number of selected component (if applicable):
4.10+

How reproducible:
Rarely

Steps to Reproduce:
1. Enable multicluster tech preview by adding TechPreviewNoUpgrade featureSet to FeatureGate config. (NOTE THIS ACTION IS IRREVERSIBLE AND WILL MAKE THE CLUSTER UNUPGRADEABLE AND UNSUPPORTED) 
2. Install ACM 2.5+
3. Import a managed cluster using either the ACM console or the CLI
4. Once that managed cluster is showing in the cluster dropdown, import a second managed cluster

Actual results:
Sometimes the second managed cluster will never show up in the cluster dropdown

Expected results:
The second managed cluster eventually shows up in the cluster dropdown after a page refresh


Additional info:
Work around is to delete the console pod to force a rollout. I suspect the problem is that sometimes the deployment rolls out a new pod before the managed cluster config has been updated, so the new cluster config doesn't get parsed. The subsequent update to the managed cluster config map doesn't trigger another rollout, and so the new config never gets consumed until a rollout is forced.

Comment 1 Jakub Hadvig 2022-02-17 11:34:38 UTC

Jon I have a feeling that we are not picking up the configmap informer's event that the managed-clusters.yaml CM was updated.
I see that we are filtering based on the label https://github.com/openshift/console-operator/blob/master/pkg/console/operator/operator.go#L152
wondering if that's not causing race.

Comment 2 Kevin Cormier 2022-03-25 21:34:43 UTC

I encountered a problem today where a managed cluster was not added to the cluster switcher. @jonjacks believes it was an instance of this issue, and he was able to resolve the problem by forcing deployment of new console pods.

Comment 3 Jon Jackson 2022-11-22 18:28:24 UTC

Migrated to Jira: https://issues.redhat.com/browse/OCPBUGS-4008

Note You need to log in before you can comment on or make changes to this bug.