Bug 1734554

Summary: Create a script to remove a failed etcd member and to allow it to be replaced
Product: OpenShift Container Platform Reporter: Suresh Kolichala <skolicha>
Component: EtcdAssignee: Sam Batschelet <sbatsche>
Status: CLOSED ERRATA QA Contact: ge liu <geliu>
Severity: medium Docs Contact:
Priority: high    
Version: 4.1.0CC: ahoffer, mfojtik, sbatsche, xtian
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-16 06:34:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Suresh Kolichala 2019-07-30 20:53:03 UTC
Description of problem:
Currently, we have various Disaster Recovery (DR) scenarios that are covered in 4.1. There are docs and scripts describing these recovery processes.

In a more general admin action we want to provide a script that will remove a failed etcd member and allow us to replace it while the cluster is still running. This script would assume TLS certs already exist.

Version-Release number of selected component (if applicable):


How reproducible:
This is a request for a new script to delete remove/replace one of the etcd members.

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Suresh Kolichala 2019-07-30 21:13:04 UTC
Sam adds in a personal communication:
The idea is to provide something like the following.

$ ./etcd-member-remove.sh $name

$ ./etcd-member-add.sh $peer-urls

Comment 5 ge liu 2019-09-03 10:05:02 UTC
Hello Sam, are there scripts ready for test? if yes, I have strong interest to test it. thx

Comment 6 Sam Batschelet 2019-09-03 11:47:23 UTC
Ge,

Yes member remove[1] and member add[2] have merged.

[1] https://github.com/openshift/machine-config-operator/pull/1056
[2] https://github.com/openshift/machine-config-operator/pull/1073

Comment 8 ge liu 2019-09-05 08:27:16 UTC
The scripts is ready in 4.2 payload, and tested it, file another bug to trace the script itself issue.
https://bugzilla.redhat.com/show_bug.cgi?id=1748798

Comment 9 errata-xmlrpc 2019-10-16 06:34:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922