Hide Forgot
Verified on OCP4.7.0-0.nightly-2021-03-14-223051 over OSP16.1 (RHOS-16.1-RHEL-8-20201214.n.3) with OVN-Octavia. New master is successfully created with different port name: $ openstack port list -c Name -f value| grep master ostest-858gf-master-port-0 ostest-858gf-master-port-1 ostest-858gf-master-3 Procedure: Replacing an unhealthy etcd member whose machine is not running or whose node is not ready: 1. Create new master manifest: $ cat new-master-machine.yaml apiVersion: machine.openshift.io/v1beta1 kind: Machine metadata: name: ostest-858gf-master-3 namespace: openshift-machine-api spec: metadata: {} providerSpec: value: apiVersion: openstackproviderconfig.openshift.io/v1alpha1 cloudName: openstack cloudsSecret: name: openstack-cloud-credentials namespace: openshift-machine-api flavor: m4.xlarge image: ostest-858gf-rhcos kind: OpenstackProviderSpec metadata: creationTimestamp: null networks: - filter: {} subnets: - filter: name: ostest-858gf-nodes tags: openshiftClusterID=ostest-858gf securityGroups: - filter: {} name: ostest-858gf-master serverGroupName: ostest-858gf-master serverMetadata: Name: ostest-858gf-master openshiftClusterID: ostest-858gf tags: - openshiftClusterID=ostest-858gf trunk: true userDataSecret: name: master-user-data 2. Remove failed etcd member (ostest-858gf-master-2): $ oc rsh -n openshift-etcd etcd-ostest-858gf-master-0 Defaulting container name to etcdctl. Use 'oc describe pod/etcd-ostest-858gf-master-0 -n openshift-etcd' to see all of the containers in this pod. sh-4.4# etcdctl member list -w table +------------------+---------+-----------------------+---------------------------+---------------------------+------------+ | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER | +------------------+---------+-----------------------+---------------------------+---------------------------+------------+ | ce181a6303f59023 | started | ostest-858gf-master-0 | https://10.196.3.229:2380 | https://10.196.3.229:2379 | false | | daab8b22de58ce9d | started | ostest-858gf-master-2 | https://10.196.2.78:2380 | https://10.196.2.78:2379 | false | | e945b77b066c2312 | started | ostest-858gf-master-1 | https://10.196.0.178:2380 | https://10.196.0.178:2379 | false | +------------------+---------+-----------------------+---------------------------+---------------------------+------------+ sh-4.4# etcdctl member remove daab8b22de58ce9d sh-4.4# exit 3. Remove secrets from failed master: $ oc get secrets -n openshift-etcd | grep ostest-858gf-master-2 etcd-peer-ostest-858gf-master-2 kubernetes.io/tls 2 6h7m etcd-serving-metrics-ostest-858gf-master-2 kubernetes.io/tls 2 6h7m etcd-serving-ostest-858gf-master-2 kubernetes.io/tls 2 6h7m $ oc delete secret -n openshift-etcd etcd-peer-ostest-858gf-master-2 etcd-serving-metrics-ostest-858gf-master-2 etcd-serving-ostest-858gf-master-2 secret "etcd-peer-ostest-858gf-master-2" deleted secret "etcd-serving-metrics-ostest-858gf-master-2" deleted secret "etcd-serving-ostest-858gf-master-2" deleted 4. destroy failed master and create new one: $ oc apply -f new-master-machine.yaml && oc delete machine -n openshift-machine-api ostest-858gf-master-2 $ oc get machines -A NAMESPACE NAME PHASE TYPE REGION ZONE AGE openshift-machine-api ostest-858gf-master-0 Running m4.xlarge regionOne nova 6h19m openshift-machine-api ostest-858gf-master-1 Running m4.xlarge regionOne nova 6h19m openshift-machine-api ostest-858gf-master-2 Deleting m4.xlarge regionOne nova 6h19m openshift-machine-api ostest-858gf-master-3 Provisioned m4.xlarge regionOne nova 112s openshift-machine-api ostest-858gf-worker-0-9pgwp Running m4.xlarge regionOne nova 6h7m openshift-machine-api ostest-858gf-worker-0-qtc8n Running m4.xlarge regionOne nova 6h7m openshift-machine-api ostest-858gf-worker-0-w6psd Running m4.xlarge regionOne nova 6h7m $ openstack port list | grep master | 0a1bd5ad-0fb4-405a-af2f-3a2e83acb789 | ostest-858gf-master-port-0 | fa:16:3e:a6:af:77 | ip_address='10.196.3.229', subnet_id='7bbfcc1c-247f-4d72-927a-e188c082848c' | ACTIVE | | 1c3eee82-e07d-48e7-83e0-4c5be72218d5 | ostest-858gf-master-port-1 | fa:16:3e:fa:b1:b5 | ip_address='10.196.0.178', subnet_id='7bbfcc1c-247f-4d72-927a-e188c082848c' | ACTIVE | | 86e14de6-e6e7-436d-a33d-161c7f18e8b5 | ostest-858gf-master-3 | fa:16:3e:38:3d:6a | ip_address='10.196.0.204', subnet_id='7bbfcc1c-247f-4d72-927a-e188c082848c' | ACTIVE | | e613411e-785e-48ed-a4e5-f898b5c6fab3 | ostest-858gf-master-port-2 | fa:16:3e:3e:1b:e9 | ip_address='10.196.2.78', subnet_id='7bbfcc1c-247f-4d72-927a-e188c082848c' | DOWN | 5. waiting until new master is ready: $ openstack port list | grep master | 0a1bd5ad-0fb4-405a-af2f-3a2e83acb789 | ostest-858gf-master-port-0 | fa:16:3e:a6:af:77 | ip_address='10.196.3.229', subnet_id='7bbfcc1c-247f-4d72-927a-e188c082848c' | ACTIVE | | 1c3eee82-e07d-48e7-83e0-4c5be72218d5 | ostest-858gf-master-port-1 | fa:16:3e:fa:b1:b5 | ip_address='10.196.0.178', subnet_id='7bbfcc1c-247f-4d72-927a-e188c082848c' | ACTIVE | | 86e14de6-e6e7-436d-a33d-161c7f18e8b5 | ostest-858gf-master-3 | fa:16:3e:38:3d:6a | ip_address='10.196.0.204', subnet_id='7bbfcc1c-247f-4d72-927a-e188c082848c' | ACTIVE | (shiftstack) [stack@undercloud-0 ~]$ oc get nodes NAME STATUS ROLES AGE VERSION ostest-858gf-master-0 Ready master 6h31m v1.20.0+bafe72f ostest-858gf-master-1 Ready master 6h30m v1.20.0+bafe72f ostest-858gf-master-3 Ready master 4m40s v1.20.0+bafe72f ostest-858gf-worker-0-9pgwp Ready worker 6h11m v1.20.0+bafe72f ostest-858gf-worker-0-qtc8n Ready worker 6h10m v1.20.0+bafe72f ostest-858gf-worker-0-w6psd Ready worker 6h10m v1.20.0+bafe72f (shiftstack) [stack@undercloud-0 ~]$ oc get machines -A NAMESPACE NAME PHASE TYPE REGION ZONE AGE openshift-machine-api ostest-858gf-master-0 Running m4.xlarge regionOne nova 6h34m openshift-machine-api ostest-858gf-master-1 Running m4.xlarge regionOne nova 6h34m openshift-machine-api ostest-858gf-master-3 Running m4.xlarge regionOne nova 17m openshift-machine-api ostest-858gf-worker-0-9pgwp Running m4.xlarge regionOne nova 6h22m openshift-machine-api ostest-858gf-worker-0-qtc8n Running m4.xlarge regionOne nova 6h22m openshift-machine-api ostest-858gf-worker-0-w6psd Running m4.xlarge regionOne nova 6h22m $ oc rsh -n openshift-etcd etcd-ostest-858gf-master-0 sh-4.4# etcdctl member list -w table +------------------+---------+-----------------------+---------------------------+---------------------------+------------+ | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER | +------------------+---------+-----------------------+---------------------------+---------------------------+------------+ | 325baa0d738d4617 | started | ostest-858gf-master-0 | https://10.196.3.229:2380 | https://10.196.3.229:2379 | false | | 4482884f4b163114 | started | ostest-858gf-master-3 | https://10.196.0.204:2380 | https://10.196.0.204:2379 | false | | e945b77b066c2312 | started | ostest-858gf-master-1 | https://10.196.0.178:2380 | https://10.196.0.178:2379 | false | +------------------+---------+-----------------------+---------------------------+---------------------------+------------+
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.7.3 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:0821