Bug 1733548

Summary: No data in grafana dashboard
Product: OpenShift Container Platform Reporter: Rajnikant <rkant>
Component: MonitoringAssignee: Lili Cosic <lcosic>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Junqi Zhao <juzhao>
Severity: low Docs Contact:
Priority: unspecified    
Version: 3.11.0CC: alegrand, anpicker, erooth, fbranczy, fshaikh, jmalde, lcosic, mirollin, mloibl, openshift-bugs-escalate, pkrupa, prthakur, spasquie, srengan, surbania
Target Milestone: ---   
Target Release: 3.11.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-01-13 07:15:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rajnikant 2019-07-26 13:42:04 UTC
Description of problem:
No data in grafana dashboard. 

Version-Release number of selected component (if applicable):
3.11.117

How reproducible:
- Promethues installation done completely 
- All pods are running 
- Prometheus targets are available

<snip sample output>
- All pods are running no pod restart.
[root@master-0 ~]# oc get pods
NAME                                           READY     STATUS      RESTARTS   AGE
alertmanager-main-0                            3/3       Running     13         10d
alertmanager-main-1                            3/3       Running     0          6d
alertmanager-main-2                            3/3       Running     0          6d
cluster-monitoring-operator-6f5fbd6f8b-qxg5c   1/1       Running     6          10d
grafana-857fc848bf-lljwm                       2/2       Running     7          10d
kube-state-metrics-75c4d6dc-ffs7j              3/3       Running     8          10d
node-exporter-896gs                            2/2       Running     0          18h
node-exporter-jd8rx                            2/2       Running     4          10d
node-exporter-qxfhr                            2/2       Running     2          10d
node-exporter-t92fz                            2/2       Running     0          1d
node-exporter-xmqf2                            0/2       Completed   0          10d
node-exporter-zbqlv                            2/2       Running     3          10d
prometheus-k8s-0                               4/4       Running     1          6d
prometheus-k8s-1                               0/4       Unknown     1          10d
prometheus-operator-7855c8646b-659w6           0/1       Unknown     0          10d
prometheus-operator-7855c8646b-dxhnz           1/1       Running     5          6d
</snip>


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 5 Lili Cosic 2019-07-29 15:51:19 UTC
I tried to reproduce this but I can't. I spin up a v3.11 openshift cluster and I can access Grafana dashboard and see all the data.

Can you provide additional information. Cluster events, logs from grafana container would be useful. How long was the cluster up and running? Anything else failing besides grafana?

Also I noticed it says "- All pods are running no pod restart." But in the output of the ` oc get pods` command it seems like multiple pods were restarted. Can you clarify that. Thanks!

Another question, it says "- Promethues installation done completely ", does this mean the installation was done manually?

Comment 8 Lili Cosic 2019-07-30 15:31:55 UTC
Thanks for the access to the logs. While having a look at the grafana logs, we noticed that grafana itself seems to be having problems with a locked database table. We suggest to delete the grafana pod and see if this problem occurs again. (There are many upstream issues open about this on grafana itself [1]. If data in grafana is still not visible after the pod is deleted and restarted, can you provide the logs and events again. Thank you.

FYI: We could not reproduce this on a new cluster.


[1] https://github.com/grafana/grafana/issues/9870

Comment 9 Praveen Thakur 2019-07-30 16:14:50 UTC
Lili, 

The pod has been deleted and still the issue persists. 

Infact the logs provided are after restarting the pod again.

Comment 11 Lili Cosic 2019-07-31 07:12:23 UTC
Hello,

Can you double check you can see the data in both the Alertmanager and Prometheus dashboards and the only place you are not seeing it is in Grafana? That should help us pin-down the problem better. 

Also we noticed a lot of "unauthorised access" errors in the grafana proxy logs, were these just failed login attempts? I believe you haven't provided the events and other information we asked above, that would make it easier to debug.

Thanks!

Comment 16 Lili Cosic 2019-08-01 09:12:48 UTC
Can you make sure that Grafana can reach prometheus, exec into the grafana container and try to reach https://prometheus-k8s.openshift-monitoring.svc:9091, ping binary seems like it's not available, but curl should work.

The other thing we can try is to increase the logs verbosity if the above works.

Thanks!

Comment 25 Mitchell Rollinson 2019-08-19 22:38:10 UTC
Created attachment 1605901 [details]
UI screenshots of grafana and Prometheus  target UI - grafana cluster-ui