Description of problem: Currently, we have various Disaster Recovery (DR) scenarios that are covered in 4.1. There are docs and scripts describing these recovery processes. In a more general admin action we want to provide a script that will remove a failed etcd member and allow us to replace it while the cluster is still running. This script would assume TLS certs already exist. Version-Release number of selected component (if applicable): How reproducible: This is a request for a new script to delete remove/replace one of the etcd members. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Sam adds in a personal communication: The idea is to provide something like the following. $ ./etcd-member-remove.sh $name $ ./etcd-member-add.sh $peer-urls
Hello Sam, are there scripts ready for test? if yes, I have strong interest to test it. thx
Ge, Yes member remove[1] and member add[2] have merged. [1] https://github.com/openshift/machine-config-operator/pull/1056 [2] https://github.com/openshift/machine-config-operator/pull/1073
The scripts is ready in 4.2 payload, and tested it, file another bug to trace the script itself issue. https://bugzilla.redhat.com/show_bug.cgi?id=1748798
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922