Bug 2022707

Summary: Observe / monitoring dashboard shows forbidden errors on Dev Sandbox
Product: OpenShift Container Platform Reporter: Christoph Jerolimov <cjerolim>
Component: Dev ConsoleAssignee: Vikram Raj <viraj>
Status: CLOSED ERRATA QA Contact: spathak <spathak>
Severity: high Docs Contact:
Priority: urgent    
Version: 4.9CC: anpicker, aos-bugs, juzhao, kjeeyar, nmukherj, oarribas, sdoyle, ttadala, viraj
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-10 16:26:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2026414    
Attachments:
Description Flags
forbidden-errors.png none

Description Christoph Jerolimov 2021-11-12 11:28:10 UTC
Description of problem:
When the user opens the "Observe" dashboard (previously Monitoring) page on the Dev Sandbox cluster, the user see many "An error occurred: Forbidden" alert messages for all charts.

Version-Release number of selected component (if applicable):
4.9

How reproducible:
Always on Sandbox, not on other clusters

Steps to Reproduce:
1. Setup a sandbox account on https://developers.redhat.com/developer-sandbox/get-started
2. Open the cluster which automatically opens the dev console
3. Navigate to "Observe" dashboard

Actual results:
Shows many "An error occurred: Forbidden" alert messages

Expected results:
Should show no errors, should show the live charts instead

Additional info:
*Maybe* connected to https://github.com/openshift/console/pull/10344 (at the moment merged in master for 4.10 but not in 4.9)

Comment 1 Christoph Jerolimov 2021-11-12 11:29:19 UTC
Created attachment 1841422 [details]
forbidden-errors.png

Comment 2 Christoph Jerolimov 2021-11-12 11:45:55 UTC
I could not reproduce this on 4.8 (where the dev console has an own monitoring dashboard).

Also could not reproduce this on shared cluster with 4.9 or 4.10 (where dev console uses a shared dashboard), tested with kubeadmin and clusterdeveloper.

It looks like the queries to the prometheus contains the namespace.

On Sandbox for example, one of the broken (403 Forbidden) API calls:
  https://console-openshift-console.apps.sandbox.x8i5.p1.openshiftapps.com/api/prometheus/api/v1/query?query=sum%28node_namespace_pod_container%3Acontainer_cpu_usage_seconds_total%3Asum_irate%7Bcluster%3D%22%22%2C+namespace%3D%22cjerolim-dev%22%7D%29+%2F+sum%28kube_pod_container_resource_limits%7Bcluster%3D%22%22%2C+namespace%3D%22cjerolim-dev%22%2C+resource%3D%22cpu%22%7D%29

When clicking on CPU usage 'inspect' and show promQL:
  sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{cluster="", namespace="cjerolim-dev"}) by (pod)

On a 4.9 cluster a working API call:
  https://console-openshift-console.apps.dev-svc-4.9-111207.devcluster.openshift.com/api/prometheus/api/v1/query?query=sum%28node_namespace_pod_container%3Acontainer_cpu_usage_seconds_total%3Asum_irate%7Bcluster%3D%22%22%2C+namespace%3D%22christoph%22%7D%29+by+%28pod%29+%2F+sum%28cluster%3Anamespace%3Apod_cpu%3Aactive%3Akube_pod_container_resource_requests%7Bcluster%3D%22%22%2C+namespace%3D%22christoph%22%7D%29+by+%28pod%29

When clicking on CPU usage 'inspect' and show promQL:
  sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{cluster="", namespace="christoph"}) by (pod)

So this queries looks fine to me.

Comment 3 Christoph Jerolimov 2021-11-12 11:47:46 UTC
When selecting one of the predefined metrics on the "Metrics" tab, for example:

CPU Usage:
  sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{namespace='cjerolim-dev'}) by (pod)

On Sandbox, the API endpoints return 200 OK for the API, but doesn't show any data.

On a 4.9 cluster this shows a graph.

Comment 5 Serena 2021-11-12 14:45:22 UTC
I've confirmed that this wasn't an issue with Dev Perspective Monitoring Dashboard in 4.8 with a non privileged account, thus this is a regression.

Comment 6 Serena 2021-11-12 16:06:15 UTC
Upgrading this to Urgent priority.

Comment 7 Andrew Pickering 2021-11-15 08:30:41 UTC
It sounds very similar to https://github.com/openshift/console/pull/10344

If so, it will be a case of needing to pass the necessary `namespace` component prop(s). Passing that prop will then automatically cause it to hit `/api/prometheus-tenancy` instead of `/api/prometheus`.

Comment 9 Vikram Raj 2021-11-23 15:49:05 UTC
*** Bug 2020501 has been marked as a duplicate of this bug. ***

Comment 11 Christoph Jerolimov 2021-12-20 10:57:11 UTC
Tested on 4.10.0-0.nightly-2021-12-16-185411, dashboards in both perspective works fine. Doesn't tested this on Sandbox, for this we should backport the change into 4.9, see https://bugzilla.redhat.com/show_bug.cgi?id=2026414

Comment 16 Triveni Tadala 2022-03-02 10:40:49 UTC
Hi Team,

Can we get the dates fr back port  of this and when the fix is landing in 4.10?
Customer wants to know when the issue is fixed in both the versions.

Regards.

Triveni.

Comment 18 Christoph Jerolimov 2022-03-07 13:42:48 UTC
Hi @ttadala

the backport is already released with 4.9.22 - OCP shipped 22-Feb (OLM Advisory shipped live 24-Feb due to push outage)

As Vikram said, the 4.10 GA will contain this fix as well.

Comment 20 errata-xmlrpc 2022-03-10 16:26:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056