Bug 2046553

Summary: klusterlet-addon-search Out of Sync on Managed Cluster
Product: Red Hat Advanced Cluster Management for Kubernetes Reporter: James Young <jayoung>
Component: Search / AnalyticsAssignee: Jorge Padilla <jpadilla>
Status: CLOSED DUPLICATE QA Contact: Atif <ashafi>
Severity: high Docs Contact: Mikela Dockery <mdockery>
Priority: unspecified    
Version: rhacm-2.4CC: clasohm, rspagnol
Target Milestone: ---Flags: ashafi: qe_test_coverage-
bot-tracker-sync: rhacm-2.4.z+
bot-tracker-sync: needinfo+
Target Release: rhacm-2.4.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-24 17:07:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 1 Jorge Padilla 2022-01-27 17:40:57 UTC
Is this problem happening on the same ACM instance as bug #2043568 ? Both problems seem to be related.

We need to take a closer look at the resync logic.  The frequent resyncs from multiple managed clusters could be adding too much stress to the search service in the hub. The backoff logic of up to 10 minutes after each error is meant to prevent problems like this to spiral, but this is a contributing factor in preventing the service on the hub to fully recover.

Comment 2 Jorge Padilla 2022-01-27 18:14:41 UTC
The aggregator log on the hub is showing parsing errors with some labels.  Could we get more information about the resource(s) using those labels?


2022-01-18T21:25:12.686283758Z I0118 21:25:12.686111       1 resyncCluster.go:30] Resync for cluster: [REMOVED] edges to insert: 4524
2022-01-18T21:25:13.125077072Z W0118 21:25:13.124019       1 resyncCluster.go:297] Unable to parse string value from interface{} :  ['app.kubernetes.io/part-of=day2-ops']
2022-01-18T21:25:13.125077072Z W0118 21:25:13.124046       1 resyncCluster.go:297] Unable to parse string value from interface{} :  <nil>

2022-01-18T21:23:47.536773446Z W0118 21:23:47.536703       1 resyncCluster.go:297] Unable to parse string value from interface{} :  ['operators.coreos.com/advanced-cluster-management.open-cluster-management=']
2022-01-18T21:23:47.536773446Z W0118 21:23:47.536731       1 resyncCluster.go:297] Unable to parse string value from interface{} :  <nil>

2022-01-18T21:23:38.778877208Z W0118 21:23:38.778751       1 resyncCluster.go:297] Unable to parse string value from interface{} :  ['olm.owner=compliance-operator.v0.1.47']
2022-01-18T21:23:38.778877208Z W0118 21:23:38.778794       1 resyncCluster.go:297] Unable to parse string value from interface{} :  <nil>

Comment 4 James Young 2022-01-28 13:56:41 UTC
This is indeed the same ACM instance as https://bugzilla.redhat.com/show_bug.cgi?id=2043568

Comment 5 Jorge Padilla 2022-02-09 20:57:18 UTC
*** Bug 2043568 has been marked as a duplicate of this bug. ***

Comment 6 Jorge Padilla 2022-02-10 18:46:10 UTC
This problem seem to be similar to BZ 2030005, for which a fix has been merged for ACM 2.4.2

Comment 8 Jorge Padilla 2022-03-24 17:07:06 UTC

*** This bug has been marked as a duplicate of bug 2030005 ***