Bug 1804067 - etcd-member-add.sh should check whether the host hasn't been added to etcd cluster already at the beginning itself.
Summary: etcd-member-add.sh should check whether the host hasn't been added to etcd cl...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Etcd
Version: 4.1.z
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.3.z
Assignee: Suresh Kolichala
QA Contact: ge liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-02-18 07:21 UTC by ggore
Modified: 2023-10-06 19:13 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Feature: Prevent the user from running etcd-member-add.sh on the wrong master node. Reason: If the script is run on the wrong member, the etcd could lose quorum. Result: With the additional check the script exits if etcd is already running on the given master node.
Clone Of:
Environment:
Last Closed: 2020-06-17 20:27:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:2436 0 None None None 2020-06-17 20:28:20 UTC

Description ggore 2020-02-18 07:21:41 UTC
Description of problem:

If etcd-member-add.sh is ran on an existing etcd member then etcd pod is removed and etcd cluster goes down.
# https://docs.openshift.com/container-platform/4.2/backup_and_restore/replacing-failed-master.html#restore-add-master_replacing-failed-master-host

Although above doc says "You must run this procedure on the master host that is being added to the etcd cluster", script should check whether the host hasn't been added to etcd cluster already at the beginning itself.

Comment 1 Masaki Hatada 2020-02-18 08:45:14 UTC
>If etcd-member-add.sh is ran on an existing etcd member then etcd pod is removed and etcd cluster goes down.
># https://docs.openshift.com/container-platform/4.2/backup_and_restore/replacing-failed-master.html#restore-add-master_replacing-failed-master-host
>
>Although above doc says "You must run this procedure on the master host that is being added to the etcd cluster", script should check whether the host hasn't been added to etcd cluster already at the beginning itself.

Multiple our customers broke their cluster due to this.
(It didn't become a big problem since every cases happened in their test env...)

Although we know that the warning has been written in OCP4's manual, people sometimes make a mistake.
To run etcd-*.sh is very risky, so we think it should have some logic to prevent misoperation.

We hope that Red Hat will consider to improve it.

Comment 4 Sam Batschelet 2020-03-11 19:46:05 UTC
This should be fixed in 4.3 z

Comment 7 Michal Fojtik 2020-05-12 10:53:32 UTC
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet.

As such, we're marking this bug as "LifecycleStale".

If you have further information on the current state of the bug, please update it, otherwise this bug will be automatically closed in 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant.

Comment 9 Michal Fojtik 2020-05-14 11:48:22 UTC
Thanks! Moving this back to backlog.

Comment 20 errata-xmlrpc 2020-06-17 20:27:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2436


Note You need to log in before you can comment on or make changes to this bug.