Bug 2022707 - Observe / monitoring dashboard shows forbidden errors on Dev Sandbox
Summary: Observe / monitoring dashboard shows forbidden errors on Dev Sandbox
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Dev Console
Version: 4.9
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
: 4.10.0
Assignee: Vikram Raj
QA Contact: spathak@redhat.com
URL:
Whiteboard:
: 2020501 (view as bug list)
Depends On:
Blocks: 2026414
TreeView+ depends on / blocked
 
Reported: 2021-11-12 11:28 UTC by Christoph Jerolimov
Modified: 2022-04-28 09:36 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-10 16:26:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
forbidden-errors.png (104.60 KB, image/png)
2021-11-12 11:29 UTC, Christoph Jerolimov
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift console pull 10460 0 None open Bug 2022707: use prometheus tenancy URL to load data in dev console observe dashboard 2021-11-15 12:21:10 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:27:10 UTC

Description Christoph Jerolimov 2021-11-12 11:28:10 UTC
Description of problem:
When the user opens the "Observe" dashboard (previously Monitoring) page on the Dev Sandbox cluster, the user see many "An error occurred: Forbidden" alert messages for all charts.

Version-Release number of selected component (if applicable):
4.9

How reproducible:
Always on Sandbox, not on other clusters

Steps to Reproduce:
1. Setup a sandbox account on https://developers.redhat.com/developer-sandbox/get-started
2. Open the cluster which automatically opens the dev console
3. Navigate to "Observe" dashboard

Actual results:
Shows many "An error occurred: Forbidden" alert messages

Expected results:
Should show no errors, should show the live charts instead

Additional info:
*Maybe* connected to https://github.com/openshift/console/pull/10344 (at the moment merged in master for 4.10 but not in 4.9)

Comment 1 Christoph Jerolimov 2021-11-12 11:29:19 UTC
Created attachment 1841422 [details]
forbidden-errors.png

Comment 2 Christoph Jerolimov 2021-11-12 11:45:55 UTC
I could not reproduce this on 4.8 (where the dev console has an own monitoring dashboard).

Also could not reproduce this on shared cluster with 4.9 or 4.10 (where dev console uses a shared dashboard), tested with kubeadmin and clusterdeveloper.

It looks like the queries to the prometheus contains the namespace.

On Sandbox for example, one of the broken (403 Forbidden) API calls:
  https://console-openshift-console.apps.sandbox.x8i5.p1.openshiftapps.com/api/prometheus/api/v1/query?query=sum%28node_namespace_pod_container%3Acontainer_cpu_usage_seconds_total%3Asum_irate%7Bcluster%3D%22%22%2C+namespace%3D%22cjerolim-dev%22%7D%29+%2F+sum%28kube_pod_container_resource_limits%7Bcluster%3D%22%22%2C+namespace%3D%22cjerolim-dev%22%2C+resource%3D%22cpu%22%7D%29

When clicking on CPU usage 'inspect' and show promQL:
  sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{cluster="", namespace="cjerolim-dev"}) by (pod)

On a 4.9 cluster a working API call:
  https://console-openshift-console.apps.dev-svc-4.9-111207.devcluster.openshift.com/api/prometheus/api/v1/query?query=sum%28node_namespace_pod_container%3Acontainer_cpu_usage_seconds_total%3Asum_irate%7Bcluster%3D%22%22%2C+namespace%3D%22christoph%22%7D%29+by+%28pod%29+%2F+sum%28cluster%3Anamespace%3Apod_cpu%3Aactive%3Akube_pod_container_resource_requests%7Bcluster%3D%22%22%2C+namespace%3D%22christoph%22%7D%29+by+%28pod%29

When clicking on CPU usage 'inspect' and show promQL:
  sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{cluster="", namespace="christoph"}) by (pod)

So this queries looks fine to me.

Comment 3 Christoph Jerolimov 2021-11-12 11:47:46 UTC
When selecting one of the predefined metrics on the "Metrics" tab, for example:

CPU Usage:
  sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{namespace='cjerolim-dev'}) by (pod)

On Sandbox, the API endpoints return 200 OK for the API, but doesn't show any data.

On a 4.9 cluster this shows a graph.

Comment 5 Serena 2021-11-12 14:45:22 UTC
I've confirmed that this wasn't an issue with Dev Perspective Monitoring Dashboard in 4.8 with a non privileged account, thus this is a regression.

Comment 6 Serena 2021-11-12 16:06:15 UTC
Upgrading this to Urgent priority.

Comment 7 Andrew Pickering 2021-11-15 08:30:41 UTC
It sounds very similar to https://github.com/openshift/console/pull/10344

If so, it will be a case of needing to pass the necessary `namespace` component prop(s). Passing that prop will then automatically cause it to hit `/api/prometheus-tenancy` instead of `/api/prometheus`.

Comment 9 Vikram Raj 2021-11-23 15:49:05 UTC
*** Bug 2020501 has been marked as a duplicate of this bug. ***

Comment 11 Christoph Jerolimov 2021-12-20 10:57:11 UTC
Tested on 4.10.0-0.nightly-2021-12-16-185411, dashboards in both perspective works fine. Doesn't tested this on Sandbox, for this we should backport the change into 4.9, see https://bugzilla.redhat.com/show_bug.cgi?id=2026414

Comment 16 Triveni Tadala 2022-03-02 10:40:49 UTC
Hi Team,

Can we get the dates fr back port  of this and when the fix is landing in 4.10?
Customer wants to know when the issue is fixed in both the versions.

Regards.

Triveni.

Comment 18 Christoph Jerolimov 2022-03-07 13:42:48 UTC
Hi @ttadala

the backport is already released with 4.9.22 - OCP shipped 22-Feb (OLM Advisory shipped live 24-Feb due to push outage)

As Vikram said, the 4.10 GA will contain this fix as well.

Comment 20 errata-xmlrpc 2022-03-10 16:26:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.