Verified on OCP 4.6.0-0.nightly-2021-03-21-131139 on top of OSP 13.0.14 (2021-01-20.1) using OVS and amphora provider. New master is successfully created with different port name: $ openstack port list -c Name -f value| grep master ostest-snksz-master-port-0 ostest-snksz-master-port-1 ostest-snksz-master-3 Procedure: Replacing an unhealthy etcd member whose machine is not running or whose node is not ready: 1. Create new master manifest: $ cat new-master-machine.yaml apiVersion: machine.openshift.io/v1beta1 kind: Machine metadata: name: ostest-snksz-master-3 namespace: openshift-machine-api spec: metadata: {} providerSpec: value: apiVersion: openstackproviderconfig.openshift.io/v1alpha1 cloudName: openstack cloudsSecret: name: openstack-cloud-credentials namespace: openshift-machine-api flavor: m4.xlarge image: ostest-snksz-rhcos kind: OpenstackProviderSpec metadata: creationTimestamp: null networks: - filter: {} subnets: - filter: name: ostest-snksz-nodes tags: openshiftClusterID=ostest-snksz securityGroups: - filter: {} name: ostest-snksz-master serverGroupName: ostest-snksz-master serverMetadata: Name: ostest-snksz-master openshiftClusterID: ostest-snksz tags: - openshiftClusterID=ostest-snksz trunk: true userDataSecret: name: master-user-data 2. Remove master-2 $ oc -n openshift-machine-api get machines NAME PHASE TYPE REGION ZONE AGE ostest-snksz-master-0 Running m4.xlarge regionOne nova 43m ostest-snksz-master-1 Running m4.xlarge regionOne nova 43m ostest-snksz-master-2 Running m4.xlarge regionOne nova 43m ostest-snksz-worker-0-9c4rw Running m4.xlarge regionOne nova 33m ostest-snksz-worker-0-pjj2x Running m4.xlarge regionOne nova 33m ostest-snksz-worker-0-xvlvb Running m4.xlarge regionOne nova 33m $ oc -n openshift-machine-api delete machine ostest-snksz-master-2 3. Remove failed etcd member (ostest-snksz-master-2): $ oc rsh -n openshift-etcd etcd-ostest-snksz-master-0 Defaulting container name to etcdctl. Use 'oc describe pod/etcd-ostest-snksz-master-0 -n openshift-etcd' to see all of the containers in this pod. sh-4.4# etcdctl member list -w table +------------------+---------+-----------------------+---------------------------+---------------------------+------------+ | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER | +------------------+---------+-----------------------+---------------------------+---------------------------+------------+ | 4f06537b0009ab3f | started | ostest-snksz-master-1 | https://10.196.3.3:2380 | https://10.196.3.3:2379 | false | | 98618b9a875b38e8 | started | ostest-snksz-master-2 | https://10.196.1.99:2380 | https://10.196.1.99:2379 | false | | f6fcd785775989d7 | started | ostest-snksz-master-0 | https://10.196.2.121:2380 | https://10.196.2.121:2379 | false | +------------------+---------+-----------------------+---------------------------+---------------------------+------------+ sh-4.4# etcdctl member remove 98618b9a875b38e8 sh-4.4# exit 4. Remove secrets from failed master: $ oc -n openshift-etcd get secrets | grep ostest-snksz-master-2 etcd-peer-ostest-snksz-master-2 kubernetes.io/tls 2 37m etcd-serving-metrics-ostest-snksz-master-2 kubernetes.io/tls 2 37m etcd-serving-ostest-snksz-master-2 kubernetes.io/tls 2 37m $ oc -n openshift-etcd delete secret etcd-peer-ostest-snksz-master-2 etcd-serving-metrics-ostest-snksz-master-2 etcd-serving-ostest-snksz-master-2 5. Create new master machine: $ oc apply -f new-master-machine.yaml machine.machine.openshift.io/ostest-snksz-master-3 created $ oc get machines -A NAMESPACE NAME PHASE TYPE REGION ZONE AGE openshift-machine-api ostest-snksz-master-0 Running m4.xlarge regionOne nova 85m openshift-machine-api ostest-snksz-master-1 Running m4.xlarge regionOne nova 85m openshift-machine-api ostest-snksz-master-3 Running m4.xlarge regionOne nova 5m56s openshift-machine-api ostest-snksz-worker-0-9c4rw Running m4.xlarge regionOne nova 75m openshift-machine-api ostest-snksz-worker-0-pjj2x Running m4.xlarge regionOne nova 75m openshift-machine-api ostest-snksz-worker-0-xvlvb Running m4.xlarge regionOne nova 75m $ openstack port list | grep master | 466ca7e9-d282-42c6-a855-37b9bb1e3212 | ostest-snksz-master-port-0 | fa:16:3e:23:09:35 | ip_address='10.196.2.121', subnet_id='fe034996-da63-4b51-a1bc-f1b8452a9069'| ACTIVE | | bba49258-571c-49a7-b268-5277f55472f7 | ostest-snksz-master-3 | fa:16:3e:6e:25:68 | ip_address='10.196.1.23', subnet_id='fe034996-da63-4b51-a1bc-f1b8452a9069' | ACTIVE | | fec14ade-774f-407d-b2d6-94fe7069ca77 | ostest-snksz-master-port-1 | fa:16:3e:08:76:58 | ip_address='10.196.3.3', subnet_id='fe034996-da63-4b51-a1bc-f1b8452a9069' | ACTIVE | (shiftstack) [stack@undercloud-0 ~]$ oc get nodes NAME STATUS ROLES AGE VERSION ostest-snksz-master-0 Ready master 130m v1.19.0+263ee0d ostest-snksz-master-1 Ready master 129m v1.19.0+263ee0d ostest-snksz-master-3 Ready master 48m v1.19.0+263ee0d ostest-snksz-worker-0-9c4rw Ready worker 116m v1.19.0+263ee0d ostest-snksz-worker-0-pjj2x Ready worker 112m v1.19.0+263ee0d ostest-snksz-worker-0-xvlvb Ready worker 116m v1.19.0+263ee0d $ oc rsh -n openshift-etcd etcd-ostest-snksz-master-0 sh-4.4# etcdctl member list -w table +------------------+---------+-----------------------+---------------------------+---------------------------+------------+ | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER | +------------------+---------+-----------------------+---------------------------+---------------------------+------------+ | 19e9bc16baef4507 | started | ostest-snksz-master-3 | https://10.196.1.23:2380 | https://10.196.1.23:2379 | false | | 4f06537b0009ab3f | started | ostest-snksz-master-1 | https://10.196.3.3:2380 | https://10.196.3.3:2379 | false | | f6fcd785775989d7 | started | ostest-snksz-master-0 | https://10.196.2.121:2380 | https://10.196.2.121:2379 | false | +------------------+---------+-----------------------+---------------------------+---------------------------+------------+
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6.23 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:0952