Bug 1870183 - [vSphere]: Connection to server refused during installation of OCP 4.6
Summary: [vSphere]: Connection to server refused during installation of OCP 4.6
Keywords:
Status: CLOSED DUPLICATE of bug 1836017
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.6
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: 4.6.0
Assignee: aos-install
QA Contact: Gaoyun Pei
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-19 13:38 UTC by Vijay Avuthu
Modified: 2020-09-10 22:14 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-10 22:14:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Vijay Avuthu 2020-08-19 13:38:17 UTC
Description of problem:

OCP 4.6 installation failed with below error

The connection to the server api.vavuthu-pr2714.qe.rh-ocs.com:6443 was refused - did you specify the right host or port?

Version-Release number of the following components:

openshift client (4.6.0-0.nightly-2020-08-18-165040)
openshift installer (4.6.0-0.nightly-2020-08-18-165040)

RHCOS template: rhcos-46.82.202008111140-0-vmware.x86_64


How reproducible:
Always

Steps to Reproduce:
1. Install OCP 4.6 using branch release-4.6 ( https://github.com/openshift/installer.git )
2. After bootstrapping is completed and node is removed, getting csr is giving connection refused error


Actual results:

$ oc get csr
The connection to the server api.vavuthu-pr2714.qe.rh-ocs.com:6443 was refused - did you specify the right host or port?
$

Expected results:
Above command should give csr data

Additional info:

Jenkins Job: https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/qe-deploy-ocs-cluster/11229/console

Comment 2 Vijay Avuthu 2020-08-19 14:04:52 UTC
[core@control-plane-0 ~]$ sudo crictl ps -a | grep kube-apiserver
5ef48d79eea3b       e4d2c0a1679ffb86b584f3563ceb45d8ce5b4fe01af5faef3ac1bf0f4ce474c1                                                         7 hours ago         Running             kube-apiserver-operator                       2                   fbcab7e08a07a
5cb454dd7dbf8       e4d2c0a1679ffb86b584f3563ceb45d8ce5b4fe01af5faef3ac1bf0f4ce474c1                                                         7 hours ago         Running             kube-apiserver-check-endpoints                0                   ceb051274a960
6c3189fde99f9       e4d2c0a1679ffb86b584f3563ceb45d8ce5b4fe01af5faef3ac1bf0f4ce474c1                                                         7 hours ago         Running             kube-apiserver-insecure-readyz                0                   ceb051274a960
7b587eaa35336       e4d2c0a1679ffb86b584f3563ceb45d8ce5b4fe01af5faef3ac1bf0f4ce474c1                                                         7 hours ago         Running             kube-apiserver-cert-regeneration-controller   0                   ceb051274a960
9eae13840c3f3       e4d2c0a1679ffb86b584f3563ceb45d8ce5b4fe01af5faef3ac1bf0f4ce474c1                                                         7 hours ago         Running             kube-apiserver-cert-syncer                    0                   ceb051274a960
e857e4f32a133       805e2144af41b2f76f4c5fd8f8eac33a7cb16357cfddca7d3c6f6c23bd3bf9eb                                                         7 hours ago         Running             kube-apiserver                                0                   ceb051274a960
2b7b912fe78a4       e4d2c0a1679ffb86b584f3563ceb45d8ce5b4fe01af5faef3ac1bf0f4ce474c1                                                         7 hours ago         Exited              kube-apiserver-operator                       1                   fbcab7e08a07a
[core@control-plane-0 ~]$ 


> errors in kube-apiserver logs

W0819 06:51:39.178529      18 clientconn.go:1223] grpc: addrConn.createTransport failed to connect to {https://10.1.160.27:2379  <nil> 0 <nil>}. Err :connection error: desc = "transport: Err
or while dialing dial tcp 10.1.160.27:2379: connect: connection refused". Reconnecting...
I0819 06:51:39.178570      18 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc0010e7b20, {TRANSIENT_FAILURE connection error: desc = "transport: Error while dia
ling dial tcp 10.1.160.27:2379: connect: connection refused"}
I0819 06:51:39.178720      18 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc000660980, {CONNECTING <nil>}
W0819 06:51:39.178824      18 clientconn.go:1223] grpc: addrConn.createTransport failed to connect to {https://localhost:2379  <nil> 0 <nil>}. Err :connection error: desc = "transport: Error
 while dialing dial tcp [::1]:2379: connect: connection refused". Reconnecting...
I0819 06:51:39.178882      18 balancer_conn_wrappers.go:78] pickfirstBalancer: HandleSubConnStateChange: 0xc000660980, {TRANSIENT_FAILURE connection error: desc = "transport: Error while dia
ling dial tcp [::1]:2379: connect: connection refused"}
W0819 06:51:39.188268      18 clientconn.go:1223] grpc: addrConn.createTransport failed to connect to {https://10.1.160.27:2379  <nil> 0 <nil>}. Err :connection error: desc = "transport: Err
or while dialing dial tcp 10.1.160.27:2379: connect: connection refused". Reconnecting...

> kube-apiserver and kube-apiserver-cert-syncer logs are uploaded to http://rhsqe-repo.lab.eng.blr.redhat.com/ocs4qe/vavuthu/bug1870183/

Comment 3 Abhinav Dahiya 2020-08-21 17:22:22 UTC
> After bootstrapping is completed and node is removed, getting csr is giving connection refused error

After the bootstraping is finished, the installer is not really involved in keeping the api running, so moving to api server team to triage why api server is not running.

Comment 5 Stefan Schimanski 2020-08-25 16:52:15 UTC
The connection refused part of the issue is addressed in https://github.com/openshift/installer/pull/4012. The root cause is most probably etcd, triggering the haproxy issue fixed in that PR.

Comment 7 Sam Batschelet 2020-08-31 14:29:33 UTC
> 4.6.0-0.nightly-2020-08-18-165040

We had some performance issues with 4.6 CI nightly around this time which were resolved in more recent builds can you please try with more recent nighly and let us know if problem still exists? 

Also we will need access to the cluser or log-bundle to debug.

$ openshift-install gather bootstrap --bootstrap $BOOTSTRAP_IP --master MASTER0_IP --master MASTER1_IP --master MASTER2_IP

Comment 11 Abhinav Dahiya 2020-09-10 22:14:20 UTC
Based on Comment 5 this looks like this will be fixed by moving the installer to /readyz for vSphere UPI

*** This bug has been marked as a duplicate of bug 1836017 ***


Note You need to log in before you can comment on or make changes to this bug.