Created attachment 1687791 [details] etcd-master1 logs Created attachment 1687791 [details] etcd-master1 logs Created attachment 1687791 [details] etcd-master1 logs Created attachment 1687791 [details] etcd-master1 logs Description of problem: After provisioning a new OCP 4.4.3 cluster on VMware using UPI etcd-operator is reporting all of the etcd nodes as being unhealthy. Version-Release number of selected component (if applicable): 4.4.3 How reproducible: Every time Steps to Reproduce: 1. Provision new Cluster of OCP 4.4.3 using VMware UPI Actual results: The follow error is continuously reported: Generated from openshift-cluster-etcd-operator-etcd-member-ip-migrator unhealthy members: master2,master1,master3 Additional error(s): Status for clusteroperator/etcd changed: Degraded message changed from "NodeControllerDegraded: All master nodes are ready EtcdMembersDegraded: master2 members are unhealthy, members are unknown" to "NodeControllerDegraded: All master nodes are ready EtcdMemberIPMigratorDegraded: rpc error: code = Canceled desc = grpc: the client connection is closing EtcdMembersDegraded: master2 members are unhealthy, members are unknown" Status for clusteroperator/etcd changed: Available message changed from "StaticPodsAvailable: 3 nodes are active; 3 nodes are at revision 2 EtcdMembersAvailable: master2,master1,master3 members are available, have not started, are unhealthy, are unknown" to "StaticPodsAvailable: 3 nodes are active; 3 nodes are at revision 2 EtcdMembersAvailable: master1,master3 members are available, have not started, master2 are unhealthy, are unknown" Expected results: Etcd members not to be reported as unhealthy Additional info: $oc get etcd -o=jsonpath='{range .items[0].status.conditions[?(@.type=="EtcdMembersAvailable")]}{.message}{"\n"}' master2,master1,master3 members are available, have not started, are unhealthy, are unknown sh-4.2# etcdctl endpoint health --cluster https://192.168.50.61:2379 is healthy: successfully committed proposal: took = 18.957755ms https://192.168.50.63:2379 is healthy: successfully committed proposal: took = 21.740721ms https://192.168.50.62:2379 is healthy: successfully committed proposal: took = 26.35663ms sh-4.2# etcdctl member list -w table +------------------+---------+---------+----------------------------+----------------------------+ | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | +------------------+---------+---------+----------------------------+----------------------------+ | 3c3af7d7f9c3b05d | started | master2 | https://192.168.50.62:2380 | https://192.168.50.62:2379 | | 79a810c120bd61aa | started | master1 | https://192.168.50.61:2380 | https://192.168.50.61:2379 | | be34329b46ef3c2f | started | master3 | https://192.168.50.63:2380 | https://192.168.50.63:2379 | +------------------+---------+---------+----------------------------+----------------------------+ $ oc get pods -n openshift-etcd -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES etcd-master1 3/3 Running 0 64m 192.168.50.61 master1 <none> <none> etcd-master2 3/3 Running 0 60m 192.168.50.62 master2 <none> <none> etcd-master3 3/3 Running 0 60m 192.168.50.63 master3 <none> <none> etcd-master1 etcd container errors: 2020-05-12 19:00:52.084812 I | embed: rejected connection from "192.168.50.62:57686" (error "EOF", ServerName "") 2020-05-12 19:01:04.824170 I | embed: rejected connection from "192.168.50.62:58252" (error "EOF", ServerName "") 2020-05-12 19:01:16.218079 I | embed: rejected connection from "192.168.50.62:58716" (error "read tcp 192.168.50.61:2379->192.168.50.62:58716: read: connection reset by peer", ServerName "") 2020-05-12 19:01:22.230850 I | embed: rejected connection from "192.168.50.62:58988" (error "EOF", ServerName "") 2020-05-12 19:02:31.635544 I | embed: rejected connection from "192.168.50.62:33666" (error "EOF", ServerName "") etcd-master2 etcd container errors: 2020-05-12 19:09:39.752452 I | embed: rejected connection from "10.254.0.13:37826" (error "EOF", ServerName "") 2020-05-12 19:09:39.752937 I | embed: rejected connection from "10.254.0.13:37842" (error "read tcp 192.168.50.62:2379->10.254.0.13:37842: read: connection reset by peer", ServerName "") 2020-05-12 19:10:28.006962 I | embed: rejected connection from "10.254.0.13:39802" (error "EOF", ServerName "") 2020-05-12 19:10:31.023392 I | embed: rejected connection from "10.254.0.13:39924" (error "EOF", ServerName "") etcd-master3 etcd container errors: 2020-05-12 19:05:38.544817 I | embed: rejected connection from "192.168.50.62:36284" (error "EOF", ServerName "") 2020-05-12 19:06:04.892292 I | embed: rejected connection from "192.168.50.62:37378" (error "EOF", ServerName "") 2020-05-12 19:06:53.933542 I | embed: rejected connection from "192.168.50.62:39444" (error "read tcp 192.168.50.63:2379->192.168.50.62:39444: read: connection reset by peer", ServerName "") 2020-05-12 19:06:59.982829 I | embed: rejected connection from "192.168.50.62:39710" (error "EOF", ServerName "") 2020-05-12 19:07:05.270823 I | embed: rejected connection from "192.168.50.62:39922" (error "EOF", ServerName "")
Created attachment 1687793 [details] etcd-master2 logs
Created attachment 1687795 [details] etcd-master3 logs
*** This bug has been marked as a duplicate of bug 1832986 ***