1619132 – "User not found" error in grafana pod logs, and some metrics could not be viewed from grafana UI

Bug 1619132 - "User not found" error in grafana pod logs, and some metrics could not be viewed from grafana UI

Summary: "User not found" error in grafana pod logs, and some metrics could not be vie...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Monitoring
Sub Component:
Version:	3.11.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	3.11.0
Assignee:	Frederic Branczyk
QA Contact:	Junqi Zhao
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-08-20 07:33 UTC by Junqi Zhao
Modified:	2018-10-11 07:25 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-10-11 07:25:20 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
grafana container logs (27.78 KB, text/plain) 2018-08-20 07:33 UTC, Junqi Zhao	no flags	Details
grafana-proxy container logs (2.90 KB, text/plain) 2018-08-20 07:34 UTC, Junqi Zhao	no flags	Details
no instance listed under Nodes and CPU/Memory/Dish data is empty (86.60 KB, image/png) 2018-08-20 07:35 UTC, Junqi Zhao	no flags	Details
data in Headlines part are all N/A (142.52 KB, image/png) 2018-08-20 07:36 UTC, Junqi Zhao	no flags	Details
cluster headlines show has data (139.05 KB, image/png) 2018-08-27 02:43 UTC, Junqi Zhao	no flags	Details
metrics data under "Nodes" (202.62 KB, image/png) 2018-08-27 02:44 UTC, Junqi Zhao	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2018:2652	0	None	None	None	2018-10-11 07:25:45 UTC

Internal Links: 1679504

Description Junqi Zhao 2018-08-20 07:33:48 UTC

Created attachment 1477083 [details]
grafana container logs

Description of problem:
checked grafana container log for grafana pod, there is error, ""Failed to find user", more info please see the attached log file

BTW, lvl=eror seems it is typo, maybe lvl=error is fine

# oc logs -c grafana grafana-7476cc5c4b-zw4dn
t=2018-08-20T03:23:39+0000 lvl=eror msg="Failed to find user" logger=context error="User not found"
t=2018-08-20T03:23:39+0000 lvl=eror msg="Request Completed" logger=context userId=0 orgId=0 uname= method=GET path=/api/datasources/proxy/1/api/v1/query status=500 remote_addr="119.254.120.72, 10.129.0.1" time_ms=7 size=1705 referer="https://grafana-openshift-monitoring.apps.0820-brf.qe.rhcloud.com/d/efa86fd1d0c121a26444b636a3f509a8/k8s-compute-resources-cluster?refresh=10s&orgId=1"


https://grafana-openshift-monitoring.apps.0820-brf.qe.rhcloud.com/d/efa86fd1d0c121a26444b636a3f509a8/k8s-compute-resources-cluster?refresh=10s&orgId=1
is for "K8s / Compute Resources / Cluster", and the Headlines part
"CPU Utilisation,CPU Requests Commitment,CPU Limits Commitment,Memory Utilisation,Memory Requests Commitment,Memory Limits Commitment" are all N/A

no instance listed under Nodes and CPU/Memory/Dish data is empty, see the attached pictures

Version-Release number of selected component (if applicable):
Cluster monitoring component images version: v3.11.0-0.17.0.0


How reproducible:
Always

Steps to Reproduce:
1. Deploy Cluster monitoring and check data in grafana UI.
2.
3.

Actual results:
"User not found" error in grafana pod logs and some metrics could not be viewed from grafana UI

Expected results:
metrics could not be viewed from grafana UI

Additional info:

Comment 1 Junqi Zhao 2018-08-20 07:34:31 UTC

Created attachment 1477085 [details]
grafana-proxy container logs

Comment 2 Junqi Zhao 2018-08-20 07:35:10 UTC

Created attachment 1477086 [details]
no instance listed under Nodes and CPU/Memory/Dish data is empty

Comment 3 Junqi Zhao 2018-08-20 07:36:18 UTC

Created attachment 1477087 [details]
data in Headlines part are all N/A

Comment 5 Frederic Branczyk 2018-08-20 10:31:33 UTC

The "user not found" message is simply because we are not using the Grafana backend for authentication, but just tell Grafana to use whatever user is proxied to it through the OpenShift oauth-proxy, therefore it not finding the user is expected.

I have identified what the problem with the missing metrics in Grafana is, the dashboards were built on node-exporter v0.15.2, but we are deploying v0.16.0, which has a number of breaking changes to its metrics. I'll take this to the team and we will figure out what to do.

Comment 6 Frederic Branczyk 2018-08-23 09:53:14 UTC

We just merged a number of pull requests that should fix most of these problems. We also noticed some incorrect behavior for filesystem graphs that is already in the works, I would suggest to create a new issue for that though.

Comment 7 Frederic Branczyk 2018-08-23 19:34:55 UTC

https://github.com/coreos/prometheus-operator/releases/tag/v0.20.0

Comment 8 Junqi Zhao 2018-08-27 02:42:00 UTC

Please change to ON_QA, most of the issues are fixed, new Bug 1622387 is file

Comment 9 Junqi Zhao 2018-08-27 02:43:58 UTC

Created attachment 1478849 [details]
cluster headlines show has data

Comment 10 Junqi Zhao 2018-08-27 02:44:58 UTC

Created attachment 1478850 [details]
metrics data under "Nodes"

Comment 11 Junqi Zhao 2018-08-27 02:45:19 UTC

cluster monitoring version: v3.11.0-0.22.0.0
# openshift version
openshift v3.11.0-0.22.0

Comment 12 Frederic Branczyk 2018-08-27 13:40:26 UTC

What exactly are you referring to with "cluster headlines show has data"? That in the table there are not always numbers displayed? That is correct behavior as a lot of OpenShift components do not have resource requests and limits configured resulting in those blank fields.

The metric display error in the nodes dashboard is being fixed.

Comment 13 Junqi Zhao 2018-08-28 00:20:03 UTC

Per Comment 9 - Comment 11, set to VERIFIED

Comment 14 Junqi Zhao 2018-08-28 00:22:43 UTC

(In reply to Frederic Branczyk from comment #12)
> What exactly are you referring to with "cluster headlines show has data"?
> That in the table there are not always numbers displayed? That is correct
> behavior as a lot of OpenShift components do not have resource requests and
> limits configured resulting in those blank fields.
> 
> The metric display error in the nodes dashboard is being fixed.

See picture "data in Headlines part are all N/A", data is N/A, now it is fixed.

Comment 16 errata-xmlrpc 2018-10-11 07:25:20 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2652

Note You need to log in before you can comment on or make changes to this bug.