Hide Forgot
Description of problem: When the user opens the "Observe" dashboard (previously Monitoring) page on the Dev Sandbox cluster, the user see many "An error occurred: Forbidden" alert messages for all charts. Version-Release number of selected component (if applicable): 4.9 How reproducible: Always on Sandbox, not on other clusters Steps to Reproduce: 1. Setup a sandbox account on https://developers.redhat.com/developer-sandbox/get-started 2. Open the cluster which automatically opens the dev console 3. Navigate to "Observe" dashboard Actual results: Shows many "An error occurred: Forbidden" alert messages Expected results: Should show no errors, should show the live charts instead Additional info: *Maybe* connected to https://github.com/openshift/console/pull/10344 (at the moment merged in master for 4.10 but not in 4.9)
Created attachment 1841422 [details] forbidden-errors.png
I could not reproduce this on 4.8 (where the dev console has an own monitoring dashboard). Also could not reproduce this on shared cluster with 4.9 or 4.10 (where dev console uses a shared dashboard), tested with kubeadmin and clusterdeveloper. It looks like the queries to the prometheus contains the namespace. On Sandbox for example, one of the broken (403 Forbidden) API calls: https://console-openshift-console.apps.sandbox.x8i5.p1.openshiftapps.com/api/prometheus/api/v1/query?query=sum%28node_namespace_pod_container%3Acontainer_cpu_usage_seconds_total%3Asum_irate%7Bcluster%3D%22%22%2C+namespace%3D%22cjerolim-dev%22%7D%29+%2F+sum%28kube_pod_container_resource_limits%7Bcluster%3D%22%22%2C+namespace%3D%22cjerolim-dev%22%2C+resource%3D%22cpu%22%7D%29 When clicking on CPU usage 'inspect' and show promQL: sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{cluster="", namespace="cjerolim-dev"}) by (pod) On a 4.9 cluster a working API call: https://console-openshift-console.apps.dev-svc-4.9-111207.devcluster.openshift.com/api/prometheus/api/v1/query?query=sum%28node_namespace_pod_container%3Acontainer_cpu_usage_seconds_total%3Asum_irate%7Bcluster%3D%22%22%2C+namespace%3D%22christoph%22%7D%29+by+%28pod%29+%2F+sum%28cluster%3Anamespace%3Apod_cpu%3Aactive%3Akube_pod_container_resource_requests%7Bcluster%3D%22%22%2C+namespace%3D%22christoph%22%7D%29+by+%28pod%29 When clicking on CPU usage 'inspect' and show promQL: sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{cluster="", namespace="christoph"}) by (pod) So this queries looks fine to me.
When selecting one of the predefined metrics on the "Metrics" tab, for example: CPU Usage: sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{namespace='cjerolim-dev'}) by (pod) On Sandbox, the API endpoints return 200 OK for the API, but doesn't show any data. On a 4.9 cluster this shows a graph.
So it looks like https://console-openshift-console.apps.sandbox.x8i5.p1.openshiftapps.com/api/prometheus-tenancy/ is working and https://console-openshift-console.apps.sandbox.x8i5.p1.openshiftapps.com/api/prometheus/ returns forbidden errors.
I've confirmed that this wasn't an issue with Dev Perspective Monitoring Dashboard in 4.8 with a non privileged account, thus this is a regression.
Upgrading this to Urgent priority.
It sounds very similar to https://github.com/openshift/console/pull/10344 If so, it will be a case of needing to pass the necessary `namespace` component prop(s). Passing that prop will then automatically cause it to hit `/api/prometheus-tenancy` instead of `/api/prometheus`.
*** Bug 2020501 has been marked as a duplicate of this bug. ***
Tested on 4.10.0-0.nightly-2021-12-16-185411, dashboards in both perspective works fine. Doesn't tested this on Sandbox, for this we should backport the change into 4.9, see https://bugzilla.redhat.com/show_bug.cgi?id=2026414
Hi Team, Can we get the dates fr back port of this and when the fix is landing in 4.10? Customer wants to know when the issue is fixed in both the versions. Regards. Triveni.
Hi @ttadala the backport is already released with 4.9.22 - OCP shipped 22-Feb (OLM Advisory shipped live 24-Feb due to push outage) As Vikram said, the 4.10 GA will contain this fix as well.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056