Verified in OCP 4.5.0-0.nightly-2020-04-24-091134 on top of RHOS-16.1-RHEL-8-20210311.n.1 with OVN-Octavia.
New master is successfully created with different port name:
$ openstack port list -c Name -f value| grep master
ostest-pz7zk-master-port-1
ostest-pz7zk-master-3
ostest-pz7zk-master-port-0
Procedure: Replacing an unhealthy etcd member whose machine is not running or whose node is not ready:
1. Remove master-2
$ oc -n openshift-machine-api get machines
NAME PHASE TYPE REGION ZONE AGE
ostest-pz7zk-master-0 Running m4.xlarge regionOne nova 33m
ostest-pz7zk-master-1 Running m4.xlarge regionOne nova 33m
ostest-pz7zk-master-2 Running m4.xlarge regionOne nova 33m
ostest-pz7zk-worker-bmpb2 Running m4.xlarge regionOne nova 18m
ostest-pz7zk-worker-swgx5 Running m4.xlarge regionOne nova 18m
ostest-pz7zk-worker-vfbqn Running m4.xlarge regionOne nova 18m
$ oc -n openshift-machine-api delete machine ostest-pz7zk-master-2
2. Remove etcd member (ostest-pz7zk-master-2):
$ oc rsh -n openshift-etcd etcd-ostest-pz7zk-master-0
Defaulting container name to etcdctl.
Use 'oc describe pod/etcd-ostest-pz7zk-master-0 -n openshift-etcd' to see all of the containers in this pod.
sh-4.4# etcdctl member list -w table
+------------------+---------+-----------------------+---------------------------+---------------------------+------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
+------------------+---------+-----------------------+---------------------------+---------------------------+------------+
| 5c13780e866b473b | started | ostest-pz7zk-master-1 | https://10.196.1.4:2380 | https://10.196.1.4:2379 | false |
| 794465c1dc67a32b | started | ostest-pz7zk-master-0 | https://10.196.1.227:2380 | https://10.196.1.227:2379 | false |
| 8f899f0814a46849 | started | ostest-pz7zk-master-2 | https://10.196.2.247:2380 | https://10.196.2.247:2379 | false |
+------------------+---------+-----------------------+---------------------------+---------------------------+------------+
sh-4.4# etcdctl member remove 8f899f0814a46849
sh-4.4# exit
3. Remove secrets from master:
$ oc -n openshift-etcd get secrets | grep ostest-pz7zk-master-2
etcd-peer-ostest-pz7zk-master-2 kubernetes.io/tls 2 31m
etcd-serving-metrics-ostest-pz7zk-master-2 kubernetes.io/tls 2 31m
etcd-serving-ostest-pz7zk-master-2 kubernetes.io/tls 2 31m
$ oc -n openshift-etcd delete secret etcd-peer-ostest-pz7zk-master-2 etcd-serving-metrics-ostest-pz7zk-master-2 etcd-serving-ostest-pz7zk-master-2
4. Create new master machine:
$ oc get machine ostest-pz7zk-master-0 -n openshift-machine-api -o yaml > new-master-machine.yaml
Edit the new-master-machine.yaml: Remove the entire status section and the annotations and change the name field to a new name (ostest-pz7zk-master-3).
$ oc apply -f new-master-machine.yaml
machine.machine.openshift.io/ostest-pz7zk-master-3 created
$ oc get machines -A
NAMESPACE NAME PHASE TYPE REGION ZONE AGE
openshift-machine-api ostest-pz7zk-master-0 Running m4.xlarge regionOne nova 158m
openshift-machine-api ostest-pz7zk-master-1 Running m4.xlarge regionOne nova 158m
openshift-machine-api ostest-pz7zk-master-3 Running m4.xlarge regionOne nova 13m
openshift-machine-api ostest-pz7zk-worker-bmpb2 Running m4.xlarge regionOne nova 143m
openshift-machine-api ostest-pz7zk-worker-swgx5 Running m4.xlarge regionOne nova 143m
openshift-machine-api ostest-pz7zk-worker-vfbqn Running m4.xlarge regionOne nova 143m
$ openstack port list | grep master
| a9d7e427-6c05-45cd-b7bf-b619afa04611 | ostest-pz7zk-master-port-1 | fa:16:3e:7d:5d:a9 | ip_address='10.196.1.4', subnet_id='0bc1dbe7-88d2-497d-8d9f-e4d55417c349' | ACTIVE |
| af2ca250-0bd7-4c16-bc6c-72d5f5710f37 | ostest-pz7zk-master-3 | fa:16:3e:42:32:02 | ip_address='10.196.1.208', subnet_id='0bc1dbe7-88d2-497d-8d9f-e4d55417c349' | ACTIVE |
| e90c91fb-32d4-49ad-a260-b5029baa342a | ostest-pz7zk-master-port-0 | fa:16:3e:f5:6d:ee | ip_address='10.196.1.227', subnet_id='0bc1dbe7-88d2-497d-8d9f-e4d55417c349' | ACTIVE |
(shiftstack) [stack@undercloud-0 ~]$ oc get nodes
NAME STATUS ROLES AGE VERSION
ostest-pz7zk-master-0 Ready master 157m v1.18.3+cdb0358
ostest-pz7zk-master-1 Ready master 157m v1.18.3+cdb0358
ostest-pz7zk-master-3 Ready master 5m22s v1.18.3+cdb0358
ostest-pz7zk-worker-bmpb2 Ready worker 134m v1.18.3+cdb0358
ostest-pz7zk-worker-swgx5 Ready worker 134m v1.18.3+cdb0358
ostest-pz7zk-worker-vfbqn Ready worker 135m v1.18.3+cdb0358
$ oc rsh -n openshift-etcd etcd-ostest-pz7zk-master-0
sh-4.4# etcdctl member list -w table
+------------------+---------+-----------------------+---------------------------+---------------------------+------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
+------------------+---------+-----------------------+---------------------------+---------------------------+------------+
| 463e8f41186ae6db | started | ostest-pz7zk-master-3 | https://10.196.1.208:2380 | https://10.196.1.208:2379 | false |
| 5c13780e866b473b | started | ostest-pz7zk-master-1 | https://10.196.1.4:2380 | https://10.196.1.4:2379 | false |
| 794465c1dc67a32b | started | ostest-pz7zk-master-0 | https://10.196.1.227:2380 | https://10.196.1.227:2379 | false |
+------------------+---------+-----------------------+---------------------------+---------------------------+------------+
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (OpenShift Container Platform 4.5.37 bug fix update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2021:1015