2036197 – Advanced Cluster Manager (RHACM) fails when loading Applications overview

Bug 2036197 - Advanced Cluster Manager (RHACM) fails when loading Applications overview

Summary: Advanced Cluster Manager (RHACM) fails when loading Applications overview

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	Red Hat Advanced Cluster Management for Kubernetes
Classification:	Red Hat
Component:	Search / Analytics
Sub Component:
Version:	rhacm-2.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	rhacm-2.4.5
Assignee:	Xavier
QA Contact:	Xiang Yin
Docs Contact:	Mikela Dockery
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-12-30 11:19 UTC by mheppler
Modified:	2025-04-04 14:00 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-09-02 12:10:21 UTC
Target Upstream Version:
Embargoed:
Flags:	ashafi: qe_test_coverage+ ashafi: qe_test_coverage+ bot-tracker-sync: rhacm-2.4.z+ bot-tracker-sync: needinfo+

Attachments	(Terms of Use)
Screenshot of situatiion (141.24 KB, image/png) 2021-12-30 11:19 UTC, mheppler	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	open-cluster-management backlog issues 18695	0	None	None	None	2021-12-30 13:32:01 UTC

Description mheppler 2021-12-30 11:19:19 UTC

Created attachment 1848318 [details]
Screenshot of situatiion

Description of problem:

after some workarounds and several restarts the Application overview in the Web - UI stays empty, browser changes etc. do not help.
Chrome says: GraphQL Error: Search service is unavailable in the Dev - Tools view.


Version-Release number of selected component (if applicable):

- OCP 4.8.18
- ACM 2.4


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:

Application shows a blank page

Expected results:


Additional info:

from logs open-cluster-management/multicluster-operators-application-5968d497c5-grh4r/:
2021-12-29T09:04:14.766799176Z I1229 09:04:14.766648       1 event.go:282] Event(v1.ObjectReference{Kind:"Application", Namespace:"openshift-config", Name:"openshift-config", UID:"317b6fc0-15f6-43a4-b95a-919518f10917", APIVersion:"app.k8s.io/v1beta1", ResourceVersion:"1028375760", FieldPath:""}): type: 'Normal' reason: 'Update' The app annotations updated. App:openshift-config/openshift-config
2021-12-29T09:04:14.866829429Z I1229 09:04:14.866289       1 application_controller.go:199] Reconciling Application:openshift-gitops/gitops-operator with Get err:<nil>
2021-12-29T09:04:15.067522210Z I1229 09:04:15.067433       1 event.go:282] Event(v1.ObjectReference{Kind:"Application", Namespace:"openshift-gitops", Name:"gitops-operator", UID:"92030c52-f4ab-49ff-a22c-0e3b131b228b", APIVersion:"app.k8s.io/v1beta1", ResourceVersion:"1028375779", FieldPath:""}): type: 'Normal' reason: 'Update' The app annotations updated. App:openshift-gitops/gitops-operator
2021-12-29T09:04:15.078776542Z I1229 09:04:15.078713       1 application_controller.go:199] Reconciling Application:openshift-operators/gitops-operator-base with Get err:<nil>
2021-12-29T09:04:15.080274555Z I1229 09:04:15.079000       1 application_controller.go:199] Reconciling Application:harbor/harbor with Get err:<nil>
2021-12-29T09:04:15.080274555Z I1229 09:04:15.079118       1 application_controller.go:199] Reconciling Application:openshift-config/openshift-config with Get err:<nil>

Comment 1 mheppler 2022-01-03 10:42:49 UTC

Hi, is there anything new with this issue?

Comment 2 bot-tracker-sync 2022-01-04 15:53:54 UTC

G2Bsync 1004900920 comment 
 KevinFCormier Tue, 04 Jan 2022 15:21:19 UTC 
 G2Bsync Please check the status all pods in `open-cluster-management`. The error message you provided suggests the search pods may not be running properly. However, usually the UI can tolerate this scenario and degrades gracefully, but it looks like you are seeing a completely blank screen. Can you provide the full console output from the browser dev tools? Can you also check if clearing your browser helps?

Comment 3 mheppler 2022-01-10 15:03:16 UTC

oc logs -l 'component in (search-ui,search-api,redisgraph)' 

[2022-01-10T08:41:45.417] [ERROR] [search-api] [server] Unable to resolve search request because RedisGraph is unavailable.
[2022-01-10T08:41:52.549] [ERROR] [search-api] [server] Unable to resolve search request because RedisGraph is unavailable.
[2022-01-10T08:41:58.973] [ERROR] [search-api] [server] Unable to resolve search request because RedisGraph is unavailable.
[2022-01-10T08:41:58.978] [ERROR] [search-api] [server] Unable to resolve search request because RedisGraph is unavailable.
[2022-01-10T08:41:58.978] [ERROR] [search-api] [server] Unable to resolve search request because RedisGraph is unavailable.
[2022-01-10T08:41:58.980] [ERROR] [search-api] [server] Unable to resolve search request because RedisGraph is unavailable.
[2022-01-10T08:41:58.982] [ERROR] [search-api] [server] Unable to resolve search request because RedisGraph is unavailable.
[2022-01-10T08:41:58.982] [ERROR] [search-api] [server] Unable to resolve search request because RedisGraph is unavailable.
[2022-01-10T08:42:43.341] [INFO] [search-api] [server] Role configuration has changed. User RBAC cache has been deleted
[2022-01-10T08:44:01.558] [ERROR] [search-api] [server] Unable to resolve search request because RedisGraph is unavailable.
[2022-01-10T08:44:01.559] [ERROR] [search-api] [server] Unable to resolve search request because RedisGraph is unavailable.
[2022-01-10T08:44:01.560] [ERROR] [search-api] [server] Unable to resolve search request because RedisGraph is unavailable.
[2022-01-10T08:44:01.562] [ERROR] [search-api] [server] Unable to resolve search request because RedisGraph is unavailable.
[2022-01-10T08:44:03.302] [ERROR] [search-api] [server] Unable to resolve search request because RedisGraph is unavailable.
[2022-01-10T08:44:03.303] [ERROR] [search-api] [server] Unable to resolve search request because RedisGraph is unavailable.
[2022-01-10T08:44:03.312] [ERROR] [search-api] [server] Unable to resolve search request because RedisGraph is unavailable.
[2022-01-10T08:44:03.313] [ERROR] [search-api] [server] Unable to resolve search request because RedisGraph is unavailable.
[2022-01-10T08:44:03.314] [ERROR] [search-api] [server] Unable to resolve search request because RedisGraph is unavailable.
[2022-01-10T08:44:01.571] [ERROR] [search-api] [server] Unable to resolve search request because RedisGraph is unavailable.
[2022-01-10T08:44:04.111] [ERROR] [search-api] [server] Unable to resolve search request because RedisGraph is unavailable.

However, all pods are running:

oc get po
NAME                                                             READY   STATUS    RESTARTS   AGE
application-chart-a9e0f-applicationui-6978c7b79c-btfsf           1/1     Running   0          89m
application-chart-a9e0f-applicationui-6978c7b79c-cgjbc           1/1     Running   0          89m
application-chart-a9e0f-consoleapi-b79f7dbf7-4jwtc               1/1     Running   0          89m
application-chart-a9e0f-consoleapi-b79f7dbf7-g7wjf               1/1     Running   0          89m
cluster-curator-controller-85f9864cf8-5nmqj                      1/1     Running   0          89m
cluster-curator-controller-85f9864cf8-d8jrp                      1/1     Running   0          89m
cluster-manager-9847c6469-bwv7b                                  1/1     Running   0          89m
cluster-manager-9847c6469-m4vgq                                  1/1     Running   0          89m
cluster-manager-9847c6469-zgfnn                                  1/1     Running   0          89m
clusterclaims-controller-54657d4f4-qhvzf                         2/2     Running   0          89m
clusterclaims-controller-54657d4f4-sxjsv                         2/2     Running   0          89m
clusterlifecycle-state-metrics-v2-6f6f9d64bf-prt59               1/1     Running   4          89m
console-chart-f367a-console-v2-76b7b98fbb-l6lgv                  1/1     Running   0          89m
console-chart-f367a-console-v2-76b7b98fbb-wqc6h                  1/1     Running   0          89m
discovery-operator-65cf659f5b-dc956                              1/1     Running   0          89m
grc-3d889-grcui-6576675b94-4nljz                                 1/1     Running   0          89m
grc-3d889-grcui-6576675b94-stl9p                                 1/1     Running   0          89m
grc-3d889-grcuiapi-65f6cbf74b-8xhd7                              1/1     Running   0          89m
grc-3d889-grcuiapi-65f6cbf74b-94ktb                              1/1     Running   0          89m
grc-3d889-policy-propagator-6b989f5f4f-bzx24                     2/2     Running   0          89m
grc-3d889-policy-propagator-6b989f5f4f-ffvsk                     2/2     Running   0          89m
hive-operator-747df56cf8-c8xft                                   1/1     Running   0          89m
infrastructure-operator-6cd64dbd97-zbn2f                         1/1     Running   0          89m
klusterlet-addon-controller-v2-5bd775d554-jzn8x                  1/1     Running   0          88m
klusterlet-addon-controller-v2-5bd775d554-r785q                  1/1     Running   0          89m
managedcluster-import-controller-v2-6fd89db8f5-jnxz8             1/1     Running   0          88m
managedcluster-import-controller-v2-6fd89db8f5-wxskm             1/1     Running   0          88m
management-ingress-dc13c-546c5d798d-2pgtj                        2/2     Running   0          88m
management-ingress-dc13c-546c5d798d-xrdh6                        2/2     Running   0          88m
multicluster-observability-operator-7dd888d9bb-zvbdx             1/1     Running   0          88m
multicluster-operators-application-5968d497c5-dpdgt              4/4     Running   0          88m
multicluster-operators-channel-c996f46d-l5qx6                    1/1     Running   0          88m
multicluster-operators-hub-subscription-7c9ff6cfc8-wzk28         1/1     Running   0          88m
multicluster-operators-standalone-subscription-59db98c89-2tkhm   1/1     Running   0          88m
multiclusterhub-operator-75b9cc9858-sz4j8                        1/1     Running   0          88m
multiclusterhub-repo-86947c88c8-8h647                            1/1     Running   0          88m
ocm-controller-5cdd8568b-dkdb5                                   1/1     Running   0          88m
ocm-controller-5cdd8568b-pcj5q                                   1/1     Running   0          88m
ocm-proxyserver-7558bcd957-bj7rc                                 1/1     Running   0          88m
ocm-proxyserver-7558bcd957-cxmjv                                 1/1     Running   0          88m
ocm-webhook-79664f88fb-64jfb                                     1/1     Running   0          88m
ocm-webhook-79664f88fb-vhsl4                                     1/1     Running   0          88m
policyreport-d58bb-insights-client-6c84d88f4d-7h2jp              1/1     Running   0          88m
policyreport-d58bb-metrics-7c8ff59785-7t7ts                      2/2     Running   0          88m
provider-credential-controller-6c5547dc45-8fcbf                  2/2     Running   0          88m
search-operator-57549b59c7-gd8t9                                 1/1     Running   0          88m
search-prod-df6df-search-aggregator-6799d668b6-nl6td             1/1     Running   1          88m
search-prod-df6df-search-api-866f9dc-bvdmf                       1/1     Running   0          88m
search-prod-df6df-search-api-866f9dc-w9gzh                       1/1     Running   0          88m
search-prod-df6df-search-collector-6645d984b5-hhl9n              1/1     Running   0          88m
search-redisgraph-0                                              1/1     Running   0          88m
search-ui-d4cdd8554-8r6sx                                        1/1     Running   0          88m
search-ui-d4cdd8554-hcrss                                        1/1     Running   0          88m
submariner-addon-7cc684b685-vp4tn                                1/1     Running   0          88m

Customer has issue with redis, search-redisgraph-0 consumes ~40GB RAM. Asked for details about this pod:

oc get pvc
NAME                                  STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
trident-premium-search-redisgraph-0   Bound    pvc-2c662619-d0bf-4346-ab48-3ef6bf860330   99Gi       RWO            trident-premium   6d23

oc describe pvc trident-premium-search-redisgraph-0
Name:          trident-premium-search-redisgraph-0
Namespace:     open-cluster-management
StorageClass:  trident-premium
Status:        Bound
Volume:        pvc-2c662619-d0bf-4346-ab48-3ef6bf860330
Labels:        <none>
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: csi.trident.netapp.io
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      99Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Used By:       search-redisgraph-0
Events:        <none>

The PVC is attached and 12GB is used:

sh-4.4$ df 
Filesystem                                                                                           1K-blocks     Used Available Use% Mounted on
overlay                                                                                              125293548 44352500  80941048  36% /
tmpfs                                                                                                    65536        0     65536   0% /dev
tmpfs                                                                                                 32844544        0  32844544   0% /sys/fs/cgroup
shm                                                                                                      65536       12     65524   1% /dev/shm
tmpfs                                                                                                 32844544   139944  32704600   1% /etc/hostname
tmpfs                                                                                                 32844544        8  32844536   1% /certs
/dev/sda4                                                                                            125293548 44352500  80941048  36% /rg
fs-trident13-data.dst.tk-inline.net:/osmgmt_trident_premium_pvc_2c662619_d0bf_4346_ab48_3ef6bf860330 103809024 12846080  90962944  13% /redis-data
tmpfs                                                                                                 32844544       28  32844516   1% /run/secrets/kubernetes.io/serviceaccount
tmpfs                                                                                                 32844544        0  32844544   0% /proc/acpi
tmpfs                                                                                                 32844544        0  32844544   0% /proc/scsi
tmpfs                                                                                                 32844544        0  32844544   0% /sys/firmware![image.png](https://attachments.access.redhat.com/hydra/rest/cases/03113996/attachments/c35cb508-dee1-46c1-8e86-f802d136aa44)

sh-4.4$ cd /redis-data/
sh-4.4$ ls
dump.rdb
sh-4.4$ ls -lah
total 12G
drwxrwxrwx. 2 root  root 4.0K Jan  8 17:14 .
dr-xr-xr-x. 1 root  root   59 Jan 10 09:41 ..
-rw-r--r--. 1 redis root  12G Jan  8 17:08 dump.rdb


Details from web dev tools not sent.

Comment 4 bot-tracker-sync 2022-01-10 17:20:19 UTC

G2Bsync 1009007731 comment 
 KevinFCormier Mon, 10 Jan 2022 15:42:47 UTC 
 G2Bsync Re-assigning to observability-usa to investigate.

Comment 5 Jorge Padilla 2022-01-10 17:29:12 UTC

The search service seems to be under scalability stress.  Looks like the Application UI behaves differently when search is disabled vs. enabled and experiencing problems.  We have some options, but first I want to understand this specific scenario.

- Could I get the must-gather data for this cluster?
- How many clusters are managed by this instance of ACM? A possible workaround is to disable search collection for some of the managed clusters.
- Another option is to completely disable the search service, this should restore the Application UI, but it will run on degraded mode.

Comment 7 mheppler 2022-01-11 11:42:47 UTC

Logs from web browser attached, but must-gather is too big...

Comment 8 mheppler 2022-01-11 15:55:35 UTC

there are 5 managed clusters with a total of around 150 nodes.

Comment 9 Jorge Padilla 2022-01-11 19:52:21 UTC

5 managed clusters and 150 nodes isn't big, so we'll need the logs  for the search service pods to understand what is happening. Is it possible to extract only this data from the must-gather output? There should 5 pods with the name starting with `search-*`. These are in the open-cluster-management namespace.

Also, could you please confirm that there's a similar error in ACM Search UI?

Comment 11 mheppler 2022-01-12 09:56:33 UTC

I missed that the must-gather is not compressed. So attached m-g is almost complete only missing are logs in ns open-cluster-management-observability.

Comment 12 mheppler 2022-01-12 12:28:11 UTC

And customer verified that the search ui throws an error aswell. And CU once fixed the issue by deleting the redis PVC, but the error came back a few days later.

Comment 13 Felix Dewaleyne 2022-01-20 11:28:52 UTC

the problem comes back every 2-3 days, we need a workaround that will work longer.

Comment 14 Jorge Padilla 2022-01-20 22:53:23 UTC

First, I want to confirm how much memory the Redisgraph pod is actually using. I see that the memory limit is set to 24Gi, but Kubernetes doesn't guarantee the memory if needed by another process with higher priority. You can use the OCP console to monitor if the pod is reaching the memory limit or if it's being terminated before. If the pod is being terminated before reaching the memory limit, the solution is to add a memory REQUEST so it's guaranteed that the memory is reserved for the pod.

If the pod is reaching the memory limits, then our next option is to disable the search collector for some of the managed clusters. Unfortunately, this means that we won't be able to show some data from those clusters. To disable search for a managed cluster, edit the KlusterletAddonConfig resource and change searchCollector enabled to false.
`oc edit klusterletaddonconfig <clusterName> -n <clusterName>`

Comment 20 Jorge Padilla 2022-01-21 18:21:47 UTC

Thanks for the additional information. I was going in the wrong direction. This problem isn't caused by memory. We can confirm that because there is no restarts of the pod and the memory graph shows very low consumption.

Redis is in a bad state, I see several issues, but need to investigate more to determine the root cause. The main suspect is a problem reading from the PVC. As reported in comment #12, deleting the PVC temporarily resolves the problem, but I couldn't find any details in the attached logs. A current copy of the logs could help.

Can I get clarification on when the problem appeared? From comment #3 the age of the ACM pods is only 89 minutes. So the problem appeared quickly the first time, but from comment #13 it took 2-3 days after the workaround of deleting the PVC. I'm trying to understand how we can recreate it.

Other notes:
1. There seems to be a discrepancy with the PVC size. From comment #3 PVC capacity is 99Gi, with 12Gi used. However, I see storageSize set to 10Gi from the operator logs.
{"persistence? ": true, " storageClass? ": "", " storageSize? ": "10Gi", " fallbackToEmptyDir? ": true}

2. Redis log shows a high number of dropped connections.
SSL_accept: Peer suddenly disconnected

3. Aggregator log suggest that Redis is stuck in a LOADING state.
redisinterfacev2.go:47] Error fetching results from RedisGraph V2 : LOADING Redis is loading the dataset in memory
clusterWatch.go:150] Error on UpdateByName() LOADING Redis is loading the dataset in memory

4. API log show frequent connection drops and reconnects.
[2021-12-28T18:43:44.988] [INFO] [search-api] [server] Error with Redis connection.
[2021-12-28T18:44:00.809] [INFO] [search-api] [server] Redis Client connected.

Comment 21 Xavier 2022-01-22 03:07:16 UTC

Adding to the comment to Jorge's comment #20 . 
(1) Notice from the logs , that the queries to redis is not working after a restart on search-aggregator pod or search-api pod . Does the redis connection problem occur ONLY after  search-redisgraph-0 pod restarts OR all functions are working fine - then , all of a sudden the search pods throw the redis connection error [ Error on UpdateByName() LOADING Redis is loading the dataset in memory] ? Please clarify. 
(2) The data in memory is periodically synchronized to the PVC . If there is a search-redisgraph-0 pod restart , the data from the PVC is read to rebuild the cache quickly . My impression is that, the data in the PVC is somehow not in a good state ,which is being read to the memory . This theory is true, only if you are seeing the problem after search-redisgraph-0 pod is restarted. 
(3) If we get to this situation again, may I request to collect must-gather before any restarts of the pods .

Comment 23 Jorge Padilla 2022-01-25 02:49:09 UTC

In the latest log, the search-aggregator pod was OOMKilled at 2022-01-19T16:36:32Z.  I believe that the aggregator was killed in the middle of a write and it corrupted the data in Redis. Unfortunately, the logs for search-redisgraph and search-aggregator pods around this time aren't in the must-gather datas. The earliest log starts at 2022-01-22T11:16:17 for redis and 2022-01-22T08:46:06 for the aggregator.

A potential workaround is to increase the memory limit on the search-aggregator.

We'll need a code fix if we confirm this is the real root cause.

Comment 24 mheppler 2022-01-27 08:14:43 UTC

Customer verified that the redis pod eats up to ~40GB of RAM before OOMkilled. He has increased RAM to 48GB for this pod.

Comment 25 Xavier 2022-01-27 16:17:04 UTC

Thank you @mheppler .

(1) We noticed that the , redis connection failed after there was a OOMKilled from search-aggregator pod - So along with the search-redisgraph pod , please increase the limit for search-aggregator pod . ( Try double the size for now )
(2) Fundamentally, the search uses -in memory database- which will increase the requirement of memory as the number of resources from managed clusters go up . We realize at this point , customers ACM has a lot of resources, pushing the redisgraph memory requirement to go as high as 48Gi. This is near our limits . At this point , can we suggest the customer to turn off search on some of the managed clusters . This will reduce the memory requirement - downside is that, the search results from those managed clusters will not be available in the UI.  
You can disable search for a managed cluster, edit the KlusterletAddonConfig resource and change searchCollector enabled to false.
`oc edit klusterletaddonconfig <clusterName> -n <clusterName>`

Please note that this is a work around for now. We are working on replacing in-memory database with scalable database for the future releases.

Comment 26 Jorge Padilla 2022-01-27 19:24:52 UTC

This topic has more information about turning off the search add-on on managed clusters.
https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes/2.4/html/clusters/managing-your-clusters#modifying-the-klusterlet-add-ons-settings-of-your-cluster

Comment 27 mheppler 2022-01-31 09:30:08 UTC

Customer accepted workaround, but he is not very happy. Searching is a huge benefit of ACM for him.

Please, do you have some time plan for replacing in-memory database, as this workaround is limiting for customer...?

Comment 28 juhsu 2022-02-02 17:56:06 UTC

Product management, Scott Berens, has offered to meet with the customer to discuss the Search replacement roadmap.

Comment 29 Xavier 2022-02-03 20:22:13 UTC

Thank you @mheppler .
We are working to meet the customers technical contact.Meantime,  We also want to suggest if customer can disable the search persistence for now, which will make the system stop backing up the search data in PVC. This will not affect search function. You can do it using by creating searchcustomization CR.
You can read more here https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes/2.4/html/web_console/web-console#search-customization
Example :
apiVersion: search.open-cluster-management.io/v1alpha1
kind: SearchCustomization
metadata:
  name: searchcustomization
  namespace: open-cluster-management
spec:
  persistence: false

Comment 31 Dimon 2022-02-23 16:43:22 UTC

(In reply to Xavier from comment #29)

We had the same issue: the "Overview" page just showed "Search service is unavailable".
Don't know if you have some applications, but after applying 
> apiVersion: search.open-cluster-management.io/v1alpha1
> kind: SearchCustomization
> metadata:
>   name: searchcustomization
>   namespace: open-cluster-management
> spec:
>   persistence: false

our deployed application vanished from "Applications" view; but happily only there...

2nd, the volume got empty,

oc get pvc search-redisgraph-pvc-0
NAME                      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
search-redisgraph-pvc-0   Bound    pvc-d8796b8f-b1e1-4b8c-a349-d9e5795377c4   10Gi       RWO            gp2            27d


what might explain the vanishing of application from the view.

Comment 32 Scott Berens 2022-02-23 17:30:12 UTC

@dmitri.voronov the ACM Overview page and the ACM Applications table, all leverage the Search as backend to provide their UI. If there is a problem with search/redisgraph, then those pages do not properly serve the request, even though in your case, those applications do really exist.

Comment 33 Jorge Padilla 2022-02-23 19:15:45 UTC

@dmitri.voronov Removing the persistence setting also clears the redis cache, which explains why some data vanished. This should be temporary until the cache is repopulated. This process can take from a few minutes up to a few hours depending on the size of the managed clusters.

Comment 34 Dimon 2022-02-24 08:32:57 UTC

Thanks to all!

(In reply to Scott Berens from comment #32)
That's clear if the search is not working it affects RHACM, at least the UI.

(In reply to Jorge Padilla from comment #33)
> This process can take from a few minutes up
> to a few hours depending on the size of the managed clusters.

That's clear too, but I think the prerequisite for cache repopulation is a working search function, which is not working after disabling persistence :-(
Or is there any other possibility to enforce the cache repopulation?

Comment 35 Dimon 2022-02-24 10:29:14 UTC

After switching search persistence off what swapped the issue to the search functionality, I switched the search persistence ON, but this time also restarted pods of search-operator, search-aggregator and search-collector, the search feature seems to be normalized again; both "overview" as well as "applications" views look good and the search is working again.
I'm not sure which restart caused recovering, but finally it is working. The only question: what might have caused such a behavior?

Comment 36 Jorge Padilla 2022-02-24 15:28:54 UTC

@dmitri.voronov After deleting the database, the search-collector detected the missing data and most likely encountered BZ 2046553 during the resync. The restart of the pod got it out of the error state.  There's a permanent fix for BZ 2046553 which is included in ACM 2.4.2

Note You need to log in before you can comment on or make changes to this bug.