2022707 – Observe / monitoring dashboard shows forbidden errors on Dev Sandbox

Bug 2022707 - Observe / monitoring dashboard shows forbidden errors on Dev Sandbox

Summary: Observe / monitoring dashboard shows forbidden errors on Dev Sandbox

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Dev Console
Sub Component:
Version:	4.9
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	high
Target Milestone:	---
Target Release:	4.10.0
Assignee:	Vikram Raj
QA Contact:	spathak@redhat.com
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	2020501 (view as bug list)
Depends On:
Blocks:	2026414
TreeView+	depends on / blocked

Reported:	2021-11-12 11:28 UTC by Christoph Jerolimov
Modified:	2022-04-28 09:36 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-03-10 16:26:48 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
forbidden-errors.png (104.60 KB, image/png) 2021-11-12 11:29 UTC, Christoph Jerolimov	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift console pull 10460	0	None	open	Bug 2022707: use prometheus tenancy URL to load data in dev console observe dashboard	2021-11-15 12:21:10 UTC
Red Hat Product Errata	RHSA-2022:0056	0	None	None	None	2022-03-10 16:27:10 UTC

Description Christoph Jerolimov 2021-11-12 11:28:10 UTC

Description of problem:
When the user opens the "Observe" dashboard (previously Monitoring) page on the Dev Sandbox cluster, the user see many "An error occurred: Forbidden" alert messages for all charts.

Version-Release number of selected component (if applicable):
4.9

How reproducible:
Always on Sandbox, not on other clusters

Steps to Reproduce:
1. Setup a sandbox account on https://developers.redhat.com/developer-sandbox/get-started
2. Open the cluster which automatically opens the dev console
3. Navigate to "Observe" dashboard

Actual results:
Shows many "An error occurred: Forbidden" alert messages

Expected results:
Should show no errors, should show the live charts instead

Additional info:
*Maybe* connected to https://github.com/openshift/console/pull/10344 (at the moment merged in master for 4.10 but not in 4.9)

Comment 1 Christoph Jerolimov 2021-11-12 11:29:19 UTC

Created attachment 1841422 [details]
forbidden-errors.png

Comment 2 Christoph Jerolimov 2021-11-12 11:45:55 UTC

I could not reproduce this on 4.8 (where the dev console has an own monitoring dashboard).

Also could not reproduce this on shared cluster with 4.9 or 4.10 (where dev console uses a shared dashboard), tested with kubeadmin and clusterdeveloper.

It looks like the queries to the prometheus contains the namespace.

On Sandbox for example, one of the broken (403 Forbidden) API calls:
  https://console-openshift-console.apps.sandbox.x8i5.p1.openshiftapps.com/api/prometheus/api/v1/query?query=sum%28node_namespace_pod_container%3Acontainer_cpu_usage_seconds_total%3Asum_irate%7Bcluster%3D%22%22%2C+namespace%3D%22cjerolim-dev%22%7D%29+%2F+sum%28kube_pod_container_resource_limits%7Bcluster%3D%22%22%2C+namespace%3D%22cjerolim-dev%22%2C+resource%3D%22cpu%22%7D%29

When clicking on CPU usage 'inspect' and show promQL:
  sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{cluster="", namespace="cjerolim-dev"}) by (pod)

On a 4.9 cluster a working API call:
  https://console-openshift-console.apps.dev-svc-4.9-111207.devcluster.openshift.com/api/prometheus/api/v1/query?query=sum%28node_namespace_pod_container%3Acontainer_cpu_usage_seconds_total%3Asum_irate%7Bcluster%3D%22%22%2C+namespace%3D%22christoph%22%7D%29+by+%28pod%29+%2F+sum%28cluster%3Anamespace%3Apod_cpu%3Aactive%3Akube_pod_container_resource_requests%7Bcluster%3D%22%22%2C+namespace%3D%22christoph%22%7D%29+by+%28pod%29

When clicking on CPU usage 'inspect' and show promQL:
  sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{cluster="", namespace="christoph"}) by (pod)

So this queries looks fine to me.

Comment 3 Christoph Jerolimov 2021-11-12 11:47:46 UTC

When selecting one of the predefined metrics on the "Metrics" tab, for example:

CPU Usage:
  sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{namespace='cjerolim-dev'}) by (pod)

On Sandbox, the API endpoints return 200 OK for the API, but doesn't show any data.

On a 4.9 cluster this shows a graph.

Comment 4 Christoph Jerolimov 2021-11-12 11:49:37 UTC

So it looks like https://console-openshift-console.apps.sandbox.x8i5.p1.openshiftapps.com/api/prometheus-tenancy/ is working and https://console-openshift-console.apps.sandbox.x8i5.p1.openshiftapps.com/api/prometheus/ returns forbidden errors.

Comment 5 Serena 2021-11-12 14:45:22 UTC

I've confirmed that this wasn't an issue with Dev Perspective Monitoring Dashboard in 4.8 with a non privileged account, thus this is a regression.

Comment 6 Serena 2021-11-12 16:06:15 UTC

Upgrading this to Urgent priority.

Comment 7 Andrew Pickering 2021-11-15 08:30:41 UTC

It sounds very similar to https://github.com/openshift/console/pull/10344

If so, it will be a case of needing to pass the necessary `namespace` component prop(s). Passing that prop will then automatically cause it to hit `/api/prometheus-tenancy` instead of `/api/prometheus`.

Comment 9 Vikram Raj 2021-11-23 15:49:05 UTC

*** Bug 2020501 has been marked as a duplicate of this bug. ***

Comment 11 Christoph Jerolimov 2021-12-20 10:57:11 UTC

Tested on 4.10.0-0.nightly-2021-12-16-185411, dashboards in both perspective works fine. Doesn't tested this on Sandbox, for this we should backport the change into 4.9, see https://bugzilla.redhat.com/show_bug.cgi?id=2026414

Comment 16 Triveni Tadala 2022-03-02 10:40:49 UTC

Hi Team,

Can we get the dates fr back port  of this and when the fix is landing in 4.10?
Customer wants to know when the issue is fixed in both the versions.

Regards.

Triveni.

Comment 18 Christoph Jerolimov 2022-03-07 13:42:48 UTC

Hi @ttadala

the backport is already released with 4.9.22 - OCP shipped 22-Feb (OLM Advisory shipped live 24-Feb due to push outage)

As Vikram said, the 4.10 GA will contain this fix as well.

Comment 20 errata-xmlrpc 2022-03-10 16:26:48 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056

Note You need to log in before you can comment on or make changes to this bug.