1697295 – Prometheus shows different monitoring history with Grafana dashboard refresh

Bug 1697295 - Prometheus shows different monitoring history with Grafana dashboard refresh

Summary: Prometheus shows different monitoring history with Grafana dashboard refresh

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Monitoring
Sub Component:
Version:	3.11.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	3.11.z
Assignee:	Frederic Branczyk
QA Contact:	Junqi Zhao
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-04-08 10:08 UTC by Robert Sandu
Modified:	2019-06-06 02:00 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-06-06 02:00:29 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2019:0794	0	None	None	None	2019-06-06 02:00:33 UTC

Description Robert Sandu 2019-04-08 10:08:01 UTC

Description of problem:

- Prometheus shows different monitoring history when refreshing the Grafana dashboard.
- Also, Prometheus does not seem to honor storage.tsdb.retention: stores less than 12h of monitoring, instead of the 15d set:

# curl -sk -H "Authorization: Bearer $(oc whoami -t)" https://prometheus-k8s.openshift-monitoring.svc.cluster.local:9091/api/v1/status/flags | python -m json.tool | grep storage
        "storage.remote.flush-deadline": "1m",
        "storage.tsdb.max-block-duration": "36h",
        "storage.tsdb.min-block-duration": "2h",
        "storage.tsdb.no-lockfile": "true",
        "storage.tsdb.path": "/prometheus",
        "storage.tsdb.retention": "15d",

Version-Release number of selected component (if applicable): OCP v3.11.69


How reproducible: not always. I haven't been able to reproduce this issue in a lab environment.


Steps to Reproduce:
1. N/A

Actual results: storage.tsdb.retention as it stores less than 12h of monitoring, instead of the 15d set & seeing different retention frames.


Expected results: storage.tsdb.retention to be honored and see the same retention frames in Prometheus.


Additional info:

- The monitoring stack does not use persistent storage.
- Prometheus pods have been deleted. Seeing the same issue after the pods have been recreated.

Comment 2 Frederic Branczyk 2019-04-08 13:39:43 UTC

That Prometheus setup doesn't have persistent storage configured, so deleting the Prometheus pods deletes the "historic" data, so it doesn't seem like that's an issue (also this would be the first time we hear of this both upstream and in OpenShift). What is the case however is that this stack currently does not appropriately set session affinity so the HA model of Prometheus causes inconsistent data to be shown (see the HA model documentation here for further insight: https://github.com/coreos/prometheus-operator/blob/master/Documentation/high-availability.md#prometheus).

We have opened https://github.com/openshift/cluster-monitoring-operator/pull/313 to fix the session affinity issue to get consistent graphs when looking at Grafana.

Comment 3 Frederic Branczyk 2019-04-09 07:56:50 UTC

PR is merged so moving to modified.

Comment 5 Junqi Zhao 2019-04-16 03:13:02 UTC

when refreshing grafana UI, there is not big difference there, issue is fixed

ose-cluster-monitoring-operator-v3.11.105-1
firfox 52.0.2 (64-bit)
chrome Version 58.0.3029.81 (64-bit)

Comment 10 errata-xmlrpc 2019-06-06 02:00:29 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0794

Note You need to log in before you can comment on or make changes to this bug.