Bug 1674270

Summary:	PVs for Prometheus pods shows different usage
Product:	OpenShift Container Platform	Reporter:	hgomes
Component:	Monitoring	Assignee:	Frederic Branczyk <fbranczy>
Status:	CLOSED NOTABUG	QA Contact:	Junqi Zhao <juzhao>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	3.11.0	CC:	hgomes, kgeorgie, minden, mloibl, spasquie, surbania
Target Milestone:	---	Flags:	kgeorgie: needinfo-
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-02-27 14:59:02 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Comment 1 minden 2019-02-12 12:46:51 UTC

That indeed looks like an inconsistency. Off the top of my head I could see two scenarios causing this:

1. One of the Prometheus instances was down for a given time.

2. One Prometheus compacts differently than the other.

Given that Prometheus scrapes itself, we could prove the former by looking at the `scrape_samples_scraped` metric. Would you mind running this metric in the Prometheus UI and looking for anomalies over a long time range?

I am also adding Krasi here. He is our Prometheus time series database expert. Krasi: Have you seen similar reports?

Comment 2 Krasi 2019-02-12 12:56:33 UTC

nope haven't had such reports to far.

I would say there should be some difference somewhere - retention policy , compaction, recording rules etc.

compare /config , /rules , /targets, /flags http endpoints  and let us know if you see any difference there.