Description of problem: In an OCP 4.0 install one of the master instances gets deleted, a new instance is automatically created. The etcd static pod that gets created does not get added to the existing cluster. The old etcd node of the deleted master is never removed from the cluster. Version-Release number of selected component (if applicable): 4.0 v0.12 installer How reproducible: 100% Steps to Reproduce: 1. Install cluster 2. Delete a master ec2 instance aws ec2 terminate-instances --region us-west-2 --instance-ids i-007f05ba505734630 3. Check nodes see a new master instance was started and created. Actual results: etcd that gets started on new master fails starting up and never gets added to existing cluster Expected results: new etcd to gets added to existing cluster, old etcd member is removed from cluster. Additional info: # oc get pod --all-namespaces | grep etcd kube-system etcd-member-ip-10-0-15-225.us-west-2.compute.internal 1/1 Running 0 1d kube-system etcd-member-ip-10-0-23-205.us-west-2.compute.internal 1/1 Running 0 1d kube-system etcd-member-ip-10-0-38-249.us-west-2.compute.internal 1/1 Running 0 1d # oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-142-75.us-west-2.compute.internal Ready worker 1d v1.11.0+406fc897d8 ip-10-0-15-225.us-west-2.compute.internal Ready master 1d v1.11.0+406fc897d8 ip-10-0-155-141.us-west-2.compute.internal Ready worker 1d v1.11.0+406fc897d8 ip-10-0-166-138.us-west-2.compute.internal Ready worker 1d v1.11.0+406fc897d8 ip-10-0-23-205.us-west-2.compute.internal Ready master 1d v1.11.0+406fc897d8 ip-10-0-38-249.us-west-2.compute.internal Ready master 1d v1.11.0+406fc897d8 Delete instance "ip-10-0-23-205.us-west-2.compute.internal" # oc get pod --all-namespaces | grep etcd kube-system etcd-member-ip-10-0-15-225.us-west-2.compute.internal 1/1 Running 0 1d kube-system etcd-member-ip-10-0-23-155.us-west-2.compute.internal 0/1 Init:0/2 5 38m kube-system etcd-member-ip-10-0-38-249.us-west-2.compute.internal 1/1 Running 0 1d # oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-142-75.us-west-2.compute.internal Ready worker 1d v1.11.0+406fc897d8 ip-10-0-15-225.us-west-2.compute.internal Ready master 1d v1.11.0+406fc897d8 ip-10-0-155-141.us-west-2.compute.internal Ready worker 1d v1.11.0+406fc897d8 ip-10-0-166-138.us-west-2.compute.internal Ready worker 1d v1.11.0+406fc897d8 ip-10-0-23-155.us-west-2.compute.internal Ready master 39m v1.11.0+406fc897d8 ip-10-0-38-249.us-west-2.compute.internal Ready master 1d v1.11.0+406fc897d8 # oc rsh etcd-member-ip-10-0-15-225.us-west-2.compute.internal # ETCDCTL_API=3 etcdctl --cert=/peer.crt --key=/peer.key --cacert=/etc/ssl/etcd/ca.crt --endpoints https://localh ost:2379 member list 6852a310452cfe52, started, etcd-member-ip-10-0-15-225.us-west-2.compute.internal, https://rtest-etcd-0.test.redhat.com:2380, https://10.0.15.225:2379 68f9eddcc9186b35, started, etcd-member-ip-10-0-23-205.us-west-2.compute.internal, https://rtest-etcd-1.test.redhat.com:2380, https://10.0.23.205:2379 d764bbf70b8f188f, started, etcd-member-ip-10-0-38-249.us-west-2.compute.internal, https://rtest-etcd-2.test.redhat.com:2380, https://10.0.38.249:2379 # ETCDCTL_API=3 etcdctl --cert=/peer.crt --key=/peer.key --cacert=/etc/ssl/etcd/ca.crt --endpoints https://10.0.3 8.249:2379,https://10.0.23.205:2379,https://10.0.15.225:2379,https://10.0.23.155:2379 endpoint status --write-out=tab le Failed to get the status of endpoint https://10.0.23.205:2379 (context deadline exceeded) Failed to get the status of endpoint https://10.0.23.155:2379 (context deadline exceeded) +--------------------------+------------------+---------+---------+-----------+-----------+------------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX | +--------------------------+------------------+---------+---------+-----------+-----------+------------+ | https://10.0.38.249:2379 | d764bbf70b8f188f | 3.3.10 | 88 MB | false | 66 | 1924899 | | https://10.0.15.225:2379 | 6852a310452cfe52 | 3.3.10 | 88 MB | true | 66 | 1924979 | +--------------------------+------------------+---------+---------+-----------+-----------+------------+
Sounds similar to bug 1667557 ?
*** This bug has been marked as a duplicate of bug 1667557 ***
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days