1871814 – OCP installation times out but after few minutes the cluster is up

Bug 1871814 - OCP installation times out but after few minutes the cluster is up

Summary: OCP installation times out but after few minutes the cluster is up

Keywords:
Status:	CLOSED DUPLICATE of bug 1875005
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	4.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.7.0
Assignee:	Michał Dulko
QA Contact:	GenadiC
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-08-24 11:00 UTC by rlobillo
Modified:	2020-09-23 09:51 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-09-23 09:51:18 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description rlobillo 2020-08-24 11:00:02 UTC

Description of problem:

Approximatetely 1 of 2 times, OCP installer expires but the installation is successfully completed few minutes later.

Checking the kuryr-controller logs, there are a big amount of below errors:

ERROR kuryr_kubernetes.handlers.logging requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

So k8s internal API is aborting the connections for some time leading to the delay on the installation.

Version-Release number of selected component (if applicable):
openshift_puddle: 4.5.0-0.nightly-2020-08-21-084032

How reproducible:


Steps to Reproduce:
1. Install OSP16.1 + OVN + Ceph + TLS-everywhere
2. Install OCP4.5.

Actual results: Unstable results on installation.


Expected results: Stable successful installation.


Additional info:
Logs for two different executions: 
- https://rhos-ci-staging-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/osasinfra/view/shiftstack_on_vms/job/DFG-osasinfra-shiftstack_on_vms-ocp_verification-osp16.1/53/artifact/.sh/ir-openshift-install.log

- https://rhos-ci-staging-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/osasinfra/view/shiftstack_on_vms/job/DFG-osasinfra-shiftstack_on_vms-ocp_verification-osp16.1/54/artifact/.sh/ir-openshift-install.log

Comment 1 Ben Bennett 2020-09-03 14:10:37 UTC

Set the target to 4.7 because I don't think this will block 4.6.  However, please feel free to work on it, and if you have a PR that is ready to merge, please update the target to 4.6.

Comment 3 Itzik Brown 2020-09-07 13:55:30 UTC

Also happened using OpenshiftSDN with OCP 4.5.3 and 4.5.8

It also worth to mention that we sometimes see that kube-controller-manager cannot reach the api-int: E0907 04:49:57.854644       1 leaderelection.go:321] error retrieving resource lock kube-system/kube-controller-manager: Get "https://api-int.ostest.shiftstack.com:6443/api/v1/namespaces/kube-system/configmaps/kube-controller-manager?timeout=10s": dial tcp 10.196.0.5:6443: connect: connection refused

Note You need to log in before you can comment on or make changes to this bug.