Bug 1619132

Summary: "User not found" error in grafana pod logs, and some metrics could not be viewed from grafana UI
Product: OpenShift Container Platform Reporter: Junqi Zhao <juzhao>
Component: MonitoringAssignee: Frederic Branczyk <fbranczy>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.11.0CC: dma
Target Milestone: ---   
Target Release: 3.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-10-11 07:25:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
grafana container logs
none
grafana-proxy container logs
none
no instance listed under Nodes and CPU/Memory/Dish data is empty
none
data in Headlines part are all N/A
none
cluster headlines show has data
none
metrics data under "Nodes" none

Description Junqi Zhao 2018-08-20 07:33:48 UTC
Created attachment 1477083 [details]
grafana container logs

Description of problem:
checked grafana container log for grafana pod, there is error, ""Failed to find user", more info please see the attached log file

BTW, lvl=eror seems it is typo, maybe lvl=error is fine

# oc logs -c grafana grafana-7476cc5c4b-zw4dn
t=2018-08-20T03:23:39+0000 lvl=eror msg="Failed to find user" logger=context error="User not found"
t=2018-08-20T03:23:39+0000 lvl=eror msg="Request Completed" logger=context userId=0 orgId=0 uname= method=GET path=/api/datasources/proxy/1/api/v1/query status=500 remote_addr="119.254.120.72, 10.129.0.1" time_ms=7 size=1705 referer="https://grafana-openshift-monitoring.apps.0820-brf.qe.rhcloud.com/d/efa86fd1d0c121a26444b636a3f509a8/k8s-compute-resources-cluster?refresh=10s&orgId=1"


https://grafana-openshift-monitoring.apps.0820-brf.qe.rhcloud.com/d/efa86fd1d0c121a26444b636a3f509a8/k8s-compute-resources-cluster?refresh=10s&orgId=1
is for "K8s / Compute Resources / Cluster", and the Headlines part
"CPU Utilisation,CPU Requests Commitment,CPU Limits Commitment,Memory Utilisation,Memory Requests Commitment,Memory Limits Commitment" are all N/A

no instance listed under Nodes and CPU/Memory/Dish data is empty, see the attached pictures

Version-Release number of selected component (if applicable):
Cluster monitoring component images version: v3.11.0-0.17.0.0


How reproducible:
Always

Steps to Reproduce:
1. Deploy Cluster monitoring and check data in grafana UI.
2.
3.

Actual results:
"User not found" error in grafana pod logs and some metrics could not be viewed from grafana UI

Expected results:
metrics could not be viewed from grafana UI

Additional info:

Comment 1 Junqi Zhao 2018-08-20 07:34:31 UTC
Created attachment 1477085 [details]
grafana-proxy container logs

Comment 2 Junqi Zhao 2018-08-20 07:35:10 UTC
Created attachment 1477086 [details]
no instance listed under Nodes and CPU/Memory/Dish data is empty

Comment 3 Junqi Zhao 2018-08-20 07:36:18 UTC
Created attachment 1477087 [details]
data in Headlines part are all N/A

Comment 5 Frederic Branczyk 2018-08-20 10:31:33 UTC
The "user not found" message is simply because we are not using the Grafana backend for authentication, but just tell Grafana to use whatever user is proxied to it through the OpenShift oauth-proxy, therefore it not finding the user is expected.

I have identified what the problem with the missing metrics in Grafana is, the dashboards were built on node-exporter v0.15.2, but we are deploying v0.16.0, which has a number of breaking changes to its metrics. I'll take this to the team and we will figure out what to do.

Comment 6 Frederic Branczyk 2018-08-23 09:53:14 UTC
We just merged a number of pull requests that should fix most of these problems. We also noticed some incorrect behavior for filesystem graphs that is already in the works, I would suggest to create a new issue for that though.

Comment 8 Junqi Zhao 2018-08-27 02:42:00 UTC
Please change to ON_QA, most of the issues are fixed, new Bug 1622387 is file

Comment 9 Junqi Zhao 2018-08-27 02:43:58 UTC
Created attachment 1478849 [details]
cluster headlines show has data

Comment 10 Junqi Zhao 2018-08-27 02:44:58 UTC
Created attachment 1478850 [details]
metrics data under "Nodes"

Comment 11 Junqi Zhao 2018-08-27 02:45:19 UTC
cluster monitoring version: v3.11.0-0.22.0.0
# openshift version
openshift v3.11.0-0.22.0

Comment 12 Frederic Branczyk 2018-08-27 13:40:26 UTC
What exactly are you referring to with "cluster headlines show has data"? That in the table there are not always numbers displayed? That is correct behavior as a lot of OpenShift components do not have resource requests and limits configured resulting in those blank fields.

The metric display error in the nodes dashboard is being fixed.

Comment 13 Junqi Zhao 2018-08-28 00:20:03 UTC
Per Comment 9 - Comment 11, set to VERIFIED

Comment 14 Junqi Zhao 2018-08-28 00:22:43 UTC
(In reply to Frederic Branczyk from comment #12)
> What exactly are you referring to with "cluster headlines show has data"?
> That in the table there are not always numbers displayed? That is correct
> behavior as a lot of OpenShift components do not have resource requests and
> limits configured resulting in those blank fields.
> 
> The metric display error in the nodes dashboard is being fixed.

See picture "data in Headlines part are all N/A", data is N/A, now it is fixed.

Comment 16 errata-xmlrpc 2018-10-11 07:25:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2652