1751602 – [4.1.z]monitoring reports Degraded after scaling up/down prometheus-k8s pods which bound with PVs

Bug 1751602 - [4.1.z]monitoring reports Degraded after scaling up/down prometheus-k8s pods which bound with PVs

Summary: [4.1.z]monitoring reports Degraded after scaling up/down prometheus-k8s pods ...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Monitoring
Sub Component:
Version:	4.1.z
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	low
Target Milestone:	---
Target Release:	4.3.0
Assignee:	Pawel Krupa
QA Contact:	Junqi Zhao
Docs Contact:
URL:
Whiteboard:	wip
Depends On:
Blocks:	1751607
TreeView+	depends on / blocked

Reported:	2019-09-12 08:54 UTC by Junqi Zhao
Modified:	2019-10-16 08:47 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1751607 (view as bug list)
Environment:
Last Closed:	2019-10-16 08:47:54 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
ClusterMonitoringOperatorErrors alert (78.20 KB, image/png) 2019-09-12 08:55 UTC, Junqi Zhao	no flags	Details
monitoring dump (367.84 KB, application/gzip) 2019-09-12 08:56 UTC, Junqi Zhao	no flags	Details
4.3 monitoring dump (627.01 KB, application/gzip) 2019-10-16 06:43 UTC, Junqi Zhao	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	coreos prometheus-operator pull 2801	0	None	closed	Improve detection of StatefulSet changes	2020-06-25 14:39:05 UTC
Github	openshift prometheus-operator pull 40	0	None	closed	Bug 1751602: Fix StatefulSet reconciliation	2020-06-25 14:39:05 UTC

Comment 1 Junqi Zhao 2019-09-12 08:55:59 UTC

Created attachment 1614388 [details]
ClusterMonitoringOperatorErrors alert

Comment 2 Junqi Zhao 2019-09-12 08:56:26 UTC

Created attachment 1614389 [details]
monitoring dump

Comment 11 Junqi Zhao 2019-10-16 06:43:50 UTC

Created attachment 1626292 [details]
4.3 monitoring dump

Comment 12 Pawel Krupa 2019-10-16 08:17:32 UTC

We don't want to delete PV after StatefulSet downscaling as this might lead to data loss. Essentially we want to have a healthy setup even after user does something that is not supported (manually scaling SS), for this we can ensure number of replicas in StatefulSet stays equal to what was specified in Prometheus CR. From what I see that is exactly what happened and that is expected.

In short: manually scaling prometheus or alertmanager StatefulSet is not supported and might lead to having some artifacts left behind. However, it should not affect a cluster and number of pods should be the same before and after manual scaling.

Comment 15 Pawel Krupa 2019-10-16 08:47:54 UTC

> We scaled up statefulset prometheus-k8s to 3, not scale down.

Yes, but then operator needs to react on that and immediately scale it down to a number of replicas specified in Prometheus CR (2 in this case), which is exactly what happens.



Closing as NOTABUG.

Note You need to log in before you can comment on or make changes to this bug.