Bug 1751602 - [4.1.z]monitoring reports Degraded after scaling up/down prometheus-k8s pods which bound with PVs
Summary: [4.1.z]monitoring reports Degraded after scaling up/down prometheus-k8s pods ...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.1.z
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.3.0
Assignee: Pawel Krupa
QA Contact: Junqi Zhao
URL:
Whiteboard: wip
Depends On:
Blocks: 1751607
TreeView+ depends on / blocked
 
Reported: 2019-09-12 08:54 UTC by Junqi Zhao
Modified: 2019-10-16 08:47 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1751607 (view as bug list)
Environment:
Last Closed: 2019-10-16 08:47:54 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
ClusterMonitoringOperatorErrors alert (78.20 KB, image/png)
2019-09-12 08:55 UTC, Junqi Zhao
no flags Details
monitoring dump (367.84 KB, application/gzip)
2019-09-12 08:56 UTC, Junqi Zhao
no flags Details
4.3 monitoring dump (627.01 KB, application/gzip)
2019-10-16 06:43 UTC, Junqi Zhao
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github coreos prometheus-operator pull 2801 0 None closed Improve detection of StatefulSet changes 2020-06-25 14:39:05 UTC
Github openshift prometheus-operator pull 40 0 None closed Bug 1751602: Fix StatefulSet reconciliation 2020-06-25 14:39:05 UTC

Comment 1 Junqi Zhao 2019-09-12 08:55:59 UTC
Created attachment 1614388 [details]
ClusterMonitoringOperatorErrors alert

Comment 2 Junqi Zhao 2019-09-12 08:56:26 UTC
Created attachment 1614389 [details]
monitoring dump

Comment 11 Junqi Zhao 2019-10-16 06:43:50 UTC
Created attachment 1626292 [details]
4.3 monitoring dump

Comment 12 Pawel Krupa 2019-10-16 08:17:32 UTC
We don't want to delete PV after StatefulSet downscaling as this might lead to data loss. Essentially we want to have a healthy setup even after user does something that is not supported (manually scaling SS), for this we can ensure number of replicas in StatefulSet stays equal to what was specified in Prometheus CR. From what I see that is exactly what happened and that is expected.

In short: manually scaling prometheus or alertmanager StatefulSet is not supported and might lead to having some artifacts left behind. However, it should not affect a cluster and number of pods should be the same before and after manual scaling.

Comment 15 Pawel Krupa 2019-10-16 08:47:54 UTC
> We scaled up statefulset prometheus-k8s to 3, not scale down.

Yes, but then operator needs to react on that and immediately scale it down to a number of replicas specified in Prometheus CR (2 in this case), which is exactly what happens.



Closing as NOTABUG.


Note You need to log in before you can comment on or make changes to this bug.