Bug 1703032

Summary: etcd monitoring configuration is completely getting reset when performing an minor upgrade from v3.11.88 to v3.11.98
Product: OpenShift Container Platform Reporter: K Chandra Sekar <csekar>
Component: MonitoringAssignee: Simon Pasquier <spasquie>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.11.0CC: adeshpan, anpicker, cvogel, dcaldwel, erooth, fbranczy, gferrazs, jharding, juzhao, kelly.brown1, kgeorgie, lserven, mloibl, mmariyan, nnosenzo, ocasalsa, pkrupa, spasquie, surbania
Target Milestone: ---   
Target Release: 3.11.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: groom
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: the Cluster Monitoring Operator playbook resets the CMO ConfigMap every time it's executed. Consequence: manual changes to the ConfigMap enabling the etcd monitoring are lost. Fix: etcd monitoring can be configured with Ansible. Result: etcd monitoring is persisted when the CMO playbook is executed again.
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-20 00:12:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
etcd is down after upgrade
none
etcd is still up after upgrade none

Description K Chandra Sekar 2019-04-25 10:46:29 UTC
Description of problem:

etcd monitoring configuration is completely getting reset when performing an minor upgrade from v3.11.88 to v3.11.98.
Followed the guide[1] to setup etcd monitoring will not come by default when OpenShift Monitoring Stack is set up.So after setting up the etcd monitoring successfully when we upgarde the cluster to a minor version whole etcd monitoring setup is getting disappeared and it reverts to the default OpenShift Monitoring Stack as a result it shows all the etcd targets are down.Minor upgrades shouldn't be doing this as etcd is major component which requires continues monitoring 


How reproducible: Always


Steps to Reproduce:
1.Set up a OpenShift Monitoring Stack on OpenShift v3.11
2.Next setup etcd monitoring as stated from the guide[1]
3.Just upgrade the whole cluster to minor version and boom OpenShift Monitoring stack is reverted back to its original state and etcd config goes missing.

Actual results:

Whole etcd monitoring setup is getting disappeared and it reverts to the default OpenShift Monitoring Stack after a minor cluster update as a result it shows all the etcd targets are down.


Expected results:

Minor upgrades shouldn't be doing this as etcd is major component which requires continues monitoring.So minor cluster upgrades should still persist the configuration moving onto to the next version as well unless there are major breaking changes involved.

Additional info:
[1]- https://docs.openshift.com/container-platform/3.11/install_config/prometheus_cluster_monitoring.html#configuring-etcd-monitoring

Comment 1 Frederic Branczyk 2019-04-25 11:58:19 UTC
Yes I can see how this happens, this is indeed a bug. As a work around for now, you can reapply the configuration without an issue and you should get back into the expected state. Of course that's not how it should be, but a way to move forward for the customer in the immediate situation until we fix this. This needs a fix in the OpenShift ansible playbooks.

Comment 10 Simon Pasquier 2019-11-18 11:23:47 UTC
*** Bug 1772948 has been marked as a duplicate of this bug. ***

Comment 11 Simon Pasquier 2019-11-18 11:25:10 UTC
*** Bug 1772729 has been marked as a duplicate of this bug. ***

Comment 12 Simon Pasquier 2020-01-13 09:20:17 UTC
*** Bug 1748871 has been marked as a duplicate of this bug. ***

Comment 16 Junqi Zhao 2020-02-10 16:12:47 UTC
Created attachment 1662196 [details]
etcd is down after upgrade

Comment 28 Junqi Zhao 2020-02-27 13:51:43 UTC
Created attachment 1666211 [details]
etcd is still up after upgrade

Comment 31 errata-xmlrpc 2020-03-20 00:12:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0793

Comment 32 Pawel Krupa 2020-05-28 10:35:06 UTC
*** Bug 1839179 has been marked as a duplicate of this bug. ***