Description of problem: OCP 4.6 installation failed with below error The connection to the server api.vavuthu-pr2714.qe.rh-ocs.com:6443 was refused - did you specify the right host or port? Version-Release number of the following components: openshift client (4.6.0-0.nightly-2020-08-18-165040) openshift installer (4.6.0-0.nightly-2020-08-18-165040) RHCOS template: rhcos-46.82.202008111140-0-vmware.x86_64 How reproducible: Always Steps to Reproduce: 1. Install OCP 4.6 using branch release-4.6 ( https://github.com/openshift/installer.git ) 2. After bootstrapping is completed and node is removed, getting csr is giving connection refused error Actual results: $ oc get csr The connection to the server api.vavuthu-pr2714.qe.rh-ocs.com:6443 was refused - did you specify the right host or port? $ Expected results: Above command should give csr data Additional info: Jenkins Job: https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/qe-deploy-ocs-cluster/11229/console
[core@control-plane-0 ~]$ sudo crictl ps -a | grep kube-apiserver 5ef48d79eea3b e4d2c0a1679ffb86b584f3563ceb45d8ce5b4fe01af5faef3ac1bf0f4ce474c1 7 hours ago Running kube-apiserver-operator 2 fbcab7e08a07a 5cb454dd7dbf8 e4d2c0a1679ffb86b584f3563ceb45d8ce5b4fe01af5faef3ac1bf0f4ce474c1 7 hours ago Running kube-apiserver-check-endpoints 0 ceb051274a960 6c3189fde99f9 e4d2c0a1679ffb86b584f3563ceb45d8ce5b4fe01af5faef3ac1bf0f4ce474c1 7 hours ago Running kube-apiserver-insecure-readyz 0 ceb051274a960 7b587eaa35336 e4d2c0a1679ffb86b584f3563ceb45d8ce5b4fe01af5faef3ac1bf0f4ce474c1 7 hours ago Running kube-apiserver-cert-regeneration-controller 0 ceb051274a960 9eae13840c3f3 e4d2c0a1679ffb86b584f3563ceb45d8ce5b4fe01af5faef3ac1bf0f4ce474c1 7 hours ago Running kube-apiserver-cert-syncer 0 ceb051274a960 e857e4f32a133 805e2144af41b2f76f4c5fd8f8eac33a7cb16357cfddca7d3c6f6c23bd3bf9eb 7 hours ago Running kube-apiserver 0 ceb051274a960 2b7b912fe78a4 e4d2c0a1679ffb86b584f3563ceb45d8ce5b4fe01af5faef3ac1bf0f4ce474c1 7 hours ago Exited kube-apiserver-operator 1 fbcab7e08a07a [core@control-plane-0 ~]$ > errors in kube-apiserver logs W0819 06:51:39.178529 18 clientconn.go:1223] grpc: addrConn.createTransport failed to connect to {https://10.1.160.27:2379 <nil> 0 <nil>}. Err :connection error: desc = "transport: Err or while dialing dial tcp 10.1.160.27:2379: connect: connection refused". Reconnecting... I0819 06:51:39.178570 18 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc0010e7b20, {TRANSIENT_FAILURE connection error: desc = "transport: Error while dia ling dial tcp 10.1.160.27:2379: connect: connection refused"} I0819 06:51:39.178720 18 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc000660980, {CONNECTING <nil>} W0819 06:51:39.178824 18 clientconn.go:1223] grpc: addrConn.createTransport failed to connect to {https://localhost:2379 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp [::1]:2379: connect: connection refused". Reconnecting... I0819 06:51:39.178882 18 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc000660980, {TRANSIENT_FAILURE connection error: desc = "transport: Error while dia ling dial tcp [::1]:2379: connect: connection refused"} W0819 06:51:39.188268 18 clientconn.go:1223] grpc: addrConn.createTransport failed to connect to {https://10.1.160.27:2379 <nil> 0 <nil>}. Err :connection error: desc = "transport: Err or while dialing dial tcp 10.1.160.27:2379: connect: connection refused". Reconnecting... > kube-apiserver and kube-apiserver-cert-syncer logs are uploaded to http://rhsqe-repo.lab.eng.blr.redhat.com/ocs4qe/vavuthu/bug1870183/
> After bootstrapping is completed and node is removed, getting csr is giving connection refused error After the bootstraping is finished, the installer is not really involved in keeping the api running, so moving to api server team to triage why api server is not running.
The connection refused part of the issue is addressed in https://github.com/openshift/installer/pull/4012. The root cause is most probably etcd, triggering the haproxy issue fixed in that PR.
> 4.6.0-0.nightly-2020-08-18-165040 We had some performance issues with 4.6 CI nightly around this time which were resolved in more recent builds can you please try with more recent nighly and let us know if problem still exists? Also we will need access to the cluser or log-bundle to debug. $ openshift-install gather bootstrap --bootstrap $BOOTSTRAP_IP --master MASTER0_IP --master MASTER1_IP --master MASTER2_IP
Based on Comment 5 this looks like this will be fixed by moving the installer to /readyz for vSphere UPI *** This bug has been marked as a duplicate of bug 1836017 ***