Bug 2260838 - [ODF] "Data Services" not visible after setting up Multicluster storage health, error "Failed to update monitoring-endpoint-monitoring-work work", "he size of manifests is 58935 bytes which exceeds the 50k limit"
Summary: [ODF] "Data Services" not visible after setting up Multicluster storage healt...
Keywords:
Status: VERIFIED
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: documentation
Version: 4.13
Hardware: All
OS: All
unspecified
medium
Target Milestone: ---
: ODF 4.13.8
Assignee: Olive Lakra
QA Contact: Parikshith
URL:
Whiteboard:
Depends On: 2223461 2260839
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-01-29 10:47 UTC by Olive Lakra
Modified: 2024-06-27 09:14 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2223461
Environment:
Last Closed:
Embargoed:


Attachments (Terms of Use)

Description Olive Lakra 2024-01-29 10:47:23 UTC
+++ This bug was initially created as a clone of Bug #2223461 +++

Description of problem (please be detailed as possible and provide log
snippests):

Cu is setting up Multicluster Storage Health on their sandbox cluster and  following the instructions in "Chapter 2. Multicluster storage health Red Hat OpenShift Data Foundation 4.12 | Red Hat Customer Portal" [1]

[1] https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.12/html/monitoring_openshift_data_foundation/multicluster_storage_health

Configmap `observability-metrics-custom-allowlist` has been added to namespace `open-cluster-management-observability`, however, upon verification, `Data Services` is not visible on the RHACM console.


Version of all relevant components (if applicable):

ODF v4.12.2


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

Cu cannot move forward with the testing phase.


Is there any workaround available to the best of your knowledge?  No.


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?  3


Can this issue reproducible?  No.


Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1.  N/A
2.
3.


Actual results:


Expected results:


Additional info:

--- Additional comment from  on 2023-07-18 01:37:36 UTC ---

* The configmap `observability-metrics-custom-allowlist` is in namespace `open-cluster-management-observability`:

~~~
[redhat@ch1opnlvdev4 ODF-Monitoring]$ oc get cm -n open-cluster-management-observability

NAME                                                              DATA   AGE

alertmanager-ca-bundle                                            1      37d

config-service-cabundle                                           1      37d

config-trusted-cabundle                                           1      37d

grafana-dashboard-acm-clusters-overview                           1      37d

grafana-dashboard-acm-clusters-overview-ocp311                    1      37d

grafana-dashboard-acm-optimization-overview                       1      37d

grafana-dashboard-acm-optimization-overview-ocp311                1      37d

grafana-dashboard-cluster-rsrc-use                                1      37d

grafana-dashboard-k8s-apiserver                                   1      37d

grafana-dashboard-k8s-capacity-management-ocp311                  1      37d

grafana-dashboard-k8s-compute-resources-cluster                   1      37d

grafana-dashboard-k8s-compute-resources-namespace-pods            1      37d

grafana-dashboard-k8s-compute-resources-namespace-pods-ocp311     1      37d

grafana-dashboard-k8s-compute-resources-namespace-workloads       1      37d

grafana-dashboard-k8s-compute-resources-node-pods                 1      37d

grafana-dashboard-k8s-compute-resources-pod                       1      37d

grafana-dashboard-k8s-compute-resources-pod-ocp311                1      37d

grafana-dashboard-k8s-compute-resources-workload                  1      37d

grafana-dashboard-k8s-etcd-cluster                                1      37d

grafana-dashboard-k8s-namespaces-in-cluster-ocp311                1      37d

grafana-dashboard-k8s-networking-cluster                          1      37d

grafana-dashboard-k8s-pods-in-namespace-ocp311                    1      37d

grafana-dashboard-k8s-service-level-overview                      1      37d

grafana-dashboard-k8s-service-level-overview-api-server-cluster   1      37d

grafana-dashboard-k8s-summary-by-node-ocp311                      1      37d

grafana-dashboard-node-rsrc-use                                   1      37d

kube-root-ca.crt                                                  1      37d

observability-metrics-allowlist                                   2      37d

observability-metrics-custom-allowlist                            1      10d

observability-observatorium-api                                   1      37d

observability-thanos-receive-controller-tenants                   1      37d

observability-thanos-receive-controller-tenants-generated         1      37d

openshift-service-ca.crt                                          1      37d

rbac-query-proxy-probe                                            1      37d

thanos-ruler-config                                               1      37d

thanos-ruler-default-rules                                        1      37d
~~~


* The yaml used to create the configmap:
 
~~~
kind: ConfigMap

apiVersion: v1

metadata:

  name: observability-metrics-custom-allowlist

  Namespace: open-cluster-management-observability

data:

  metrics_list.yaml: |

    names:

      - odf_system_health_status

      - odf_system_map

      - odf_system_raw_capacity_total_bytes

      - odf_system_raw_capacity_used_bytes

    matches:

      - __name__="csv_succeeded",exported_namespace="openshift-storage",name=~"odf-operator.*"
~~~


* `.../namespaces/open-cluster-management/pods/multicluster-observability-operator-674cbcff85-ndp7c/multicluster-observability-operator/multicluster-observability-operator/logs/rotated/529.log.20230707-230347`

~~~
2023-07-07T22:29:19.345124878+00:00 stderr F 2023-07-07T22:29:19.344Z   ERROR   controller_placementrule        Failed to update monitoring-endpoint-monitoring-work work       {"error": "admission webhook \"manifestworkvalidators.admission.work.open-cluster-management.io\" denied the request: the size of manifests is 58935 bytes which exceeds the 50k limit"}
2023-07-07T22:29:19.345124878+00:00 stderr F github.com/stolostron/multicluster-observability-operator/operators/multiclusterobservability/controllers/placementrule.createManagedClusterRes
2023-07-07T22:29:19.345124878+00:00 stderr F    /remote-source/app/operators/multiclusterobservability/controllers/placementrule/placementrule_controller.go:434
2023-07-07T22:29:19.345124878+00:00 stderr F github.com/stolostron/multicluster-observability-operator/operators/multiclusterobservability/controllers/placementrule.createAllRelatedRes
2023-07-07T22:29:19.345124878+00:00 stderr F    /remote-source/app/operators/multiclusterobservability/controllers/placementrule/placementrule_controller.go:354
2023-07-07T22:29:19.345124878+00:00 stderr F github.com/stolostron/multicluster-observability-operator/operators/multiclusterobservability/controllers/placementrule.(*PlacementRuleReconciler).Reconcile
2023-07-07T22:29:19.345124878+00:00 stderr F    /remote-source/app/operators/multiclusterobservability/controllers/placementrule/placementrule_controller.go:158
2023-07-07T22:29:19.345124878+00:00 stderr F sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
2023-07-07T22:29:19.345124878+00:00 stderr F    /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:114
2023-07-07T22:29:19.345124878+00:00 stderr F sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
2023-07-07T22:29:19.345124878+00:00 stderr F    /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:311
2023-07-07T22:29:19.345124878+00:00 stderr F sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
2023-07-07T22:29:19.345124878+00:00 stderr F    /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266
2023-07-07T22:29:19.345124878+00:00 stderr F sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
2023-07-07T22:29:19.345124878+00:00 stderr F    /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227
2023-07-07T22:29:19.345124878+00:00 stderr F 2023-07-07T22:29:19.345Z   ERROR   controller_placementrule        Failed to create manifestwork   {"error": "admission webhook \"manifestworkvalidators.admission.work.open-cluster-management.io\" denied the request: the size of manifests is 58935 bytes which exceeds the 50k limit"}
2023-07-07T22:29:19.345124878+00:00 stderr F github.com/stolostron/multicluster-observability-operator/operators/multiclusterobservability/controllers/placementrule.(*PlacementRuleReconciler).Reconcile
2023-07-07T22:29:19.345124878+00:00 stderr F    /remote-source/app/operators/multiclusterobservability/controllers/placementrule/placementrule_controller.go:158
2023-07-07T22:29:19.345124878+00:00 stderr F sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
2023-07-07T22:29:19.345124878+00:00 stderr F    /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:114
2023-07-07T22:29:19.345124878+00:00 stderr F sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
2023-07-07T22:29:19.345124878+00:00 stderr F    /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:311
2023-07-07T22:29:19.345124878+00:00 stderr F sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
2023-07-07T22:29:19.345124878+00:00 stderr F    /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266
2023-07-07T22:29:19.345124878+00:00 stderr F sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
2023-07-07T22:29:19.345124878+00:00 stderr F    /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227
2023-07-07T22:29:19.345124878+00:00 stderr F 2023-07-07T22:29:19.345Z   ERROR   controller_placementrule        Failed to create managedcluster resources       {"namespace": "local-cluster", "error": "admission webhook \"manifestworkvalidators.admission.work.open-cluster-management.io\" denied the request: the size of manifests is 58935 bytes which exceeds the 50k limit"}
2023-07-07T22:29:19.345124878+00:00 stderr F sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
2023-07-07T22:29:19.345124878+00:00 stderr F    /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:114
2023-07-07T22:29:19.345124878+00:00 stderr F sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
2023-07-07T22:29:19.345124878+00:00 stderr F    /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:311
2023-07-07T22:29:19.345124878+00:00 stderr F sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
2023-07-07T22:29:19.345124878+00:00 stderr F    /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266
2023-07-07T22:29:19.345124878+00:00 stderr F sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
2023-07-07T22:29:19.345124878+00:00 stderr F    /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227
2023-07-07T22:29:19.345124878+00:00 stderr F 2023-07-07T22:29:19.345Z   INFO    controller_placementrule        Monitoring operator should be installed in cluster      {"cluster_name": "nyocplab1", "request.name": "mch-updated-request", "request.namespace": "open-cluster-management"}
2023-07-07T22:29:19.345179625+00:00 stderr F 2023-07-07T22:29:19.345Z   INFO    controller_placementrule        observabilityaddon already existed/unchanged    {"namespace": "nyocplab1"}
2023-07-07T22:29:19.345179625+00:00 stderr F 2023-07-07T22:29:19.345Z   INFO    controller_placementrule        clusterrolebinding endpoint-observability-mco-rolebinding already existed/unchanged     {"namespace": "nyocplab1"}
2023-07-07T22:29:19.345179625+00:00 stderr F 2023-07-07T22:29:19.345Z   INFO    controller_placementrule        rolebinding endpoint-observability-res-rolebinding already existed/unchanged    {"namespace": "nyocplab1"}
2023-07-07T22:29:19.345251693+00:00 stderr F 2023-07-07T22:29:19.345Z   INFO    controller_placementrule        Updating manifestwork   {"nyocplab1": "nyocplab1", "name": "nyocplab1-observability"}
~~~

--- Additional comment from RHEL Program Management on 2023-07-18 01:37:44 UTC ---

This bug having no release flag set previously, is now set with release flag 'odf‑4.14.0' to '?', and so is being proposed to be fixed at the ODF 4.14.0 release. Note that the 3 Acks (pm_ack, devel_ack, qa_ack), if any previously set while release flag was missing, have now been reset since the Acks are to be set against a release flag.

--- Additional comment from gowtham on 2023-07-24 12:35:37 UTC ---

I am doubting ACM observability has some issues, Please check the MultiClusterObservability status is ready(it means ACM observability is ready and healthy)
    oc get MultiClusterObservability observability -o jsonpath='{.status.conditions}'


The result of the above command should have "message:Observability components are deployed and running reason:Ready status:True type:Ready"

--- Additional comment from Joydeep Banerjee on 2023-07-24 13:36:24 UTC ---

Can you check if we see the metrics like:
      - odf_system_health_status
      - odf_system_map
      - odf_system_raw_capacity_total_bytes
      - odf_system_raw_capacity_used_bytes
in the ACM Grafana explorer. The Grafana is https://grafana-open-cluster-management-observability.[openshift_ingress_domain]/. Then click on the EXPLORER icon on the left hand bar. And then type in the metric name and see if you see data.
If data for all metrics can be seen, then it is not an issue with ACM.
If you do not see data, please run the ACM must-gather. - https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes/2.8/html/troubleshooting/troubleshooting#procedure

--- Additional comment from Red Hat Bugzilla on 2023-08-03 08:31:20 UTC ---

Account disabled by LDAP Audit

--- Additional comment from  on 2023-08-04 01:48:48 UTC ---

Thanks gowtham, joydeep.

Cu has confirmed that they can search the ODF metrics on Grafana [1].  However, they did not confirm if any data is available.

They did confirm, though, that after installing Multicluster Orchestrator 4.12 operator with plugin for console enabled, they are able to see Data Services on HUB console. They also said they have added feedback that this operator should be included in the prerequisites for setting up Multicluster Storage Health dashboard.

Can the documentation [2] be updated to include this prerequisite?

[1] supportshell:/cases/03557262/0050-grafana-screenshot.png

[2] https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.12/html/monitoring_openshift_data_foundation/multicluster_storage_health#enabling-multicluster-dashboard-on-hub-cluster_rhodf

--- Additional comment from Sunil Kumar Acharya on 2023-09-01 08:14:24 UTC ---

We have dev freeze of ODF-4.14.0 on 04-SEP-2023. Since this BZ has not been approved and is not marked as blocker/exception, it will be moved out to ODF-4.15.0 on 04-SEP-2023.

If you think this BZ should be considered as an exception/blocker feel free to set the flag with justification note. Also, please mention the estimated date by which this BZ can be moved to MODIFIED state.

--- Additional comment from Sunil Kumar Acharya on 2023-09-12 06:21:42 UTC ---

ODF-4.14 has entered 'blocker only' phase on 12-SEP-2023. Hence, moving the non-blocker BZs to ODF-4.15. If you think this BZ needs to be evaluated for ODF-4.14, please feel free to propose the BZ as a blocker/exception to ODF-4.14 with a justification note.

--- Additional comment from gowtham on 2024-01-08 11:47:02 UTC ---

ack, i will inform the document team to update the doc

--- Additional comment from RHEL Program Management on 2024-01-17 15:50:18 UTC ---

The 'Target Release' is not to be set manually at the Red Hat OpenShift Data Foundation product.

The 'Target Release' will be auto set appropriately, after the 3 Acks (pm,devel,qa) are set to "+" for a specific release flag and that release flag gets auto set to "+".

--- Additional comment from RHEL Program Management on 2024-01-25 10:58:35 UTC ---

This BZ is being approved for an ODF 4.12.z z-stream update, upon receipt of the 3 ACKs (PM,Devel,QA) for the release flag 'odf‑4.12.z', and having been marked for an approved z-stream update

--- Additional comment from RHEL Program Management on 2024-01-25 10:58:35 UTC ---

Since this bug has been approved for ODF 4.12.11 release, through release flag 'odf-4.12.z+', and appropriate update number entry at the 'Internal Whiteboard', the Target Release is being set to 'ODF 4.12.11'

--- Additional comment from Olive Lakra on 2024-01-29 08:24:26 UTC ---

Hi Gowtham


does this request apply to 4.13, 4.14 & 4.15 apart from 4.12?

--- Additional comment from Olive Lakra on 2024-01-29 08:50:47 UTC ---

doc updated. Added following bullet point in the prereq:

----------------------------------------------------------
* Ensure that you have installed Multicluster Orchestrator 4.12 operator with plugin for console enabled.

----------------------------------------------------------


Staging url for review: https://dxp-docp-prod.apps.ext-waf.spoke.prod.us-west-2.aws.paas.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.12/html-single/monitoring_openshift_data_foundation/index?lb_target=stage#enabling-multicluster-dashboard-on-hub-cluster_rhodf

--- Additional comment from gowtham on 2024-01-29 10:00:41 UTC ---

This change is applicable for 4.12, 4,13, 4,14 and 4,15

--- Additional comment from gowtham on 2024-01-29 10:02:12 UTC ---

it's up to the document team to decide on which version to backport this change.


Note You need to log in before you can comment on or make changes to this bug.