Bug 1871814 - OCP installation times out but after few minutes the cluster is up
Summary: OCP installation times out but after few minutes the cluster is up
Status: CLOSED DUPLICATE of bug 1875005
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.5
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.7.0
Assignee: Michał Dulko
QA Contact: GenadiC
Depends On:
TreeView+ depends on / blocked
Reported: 2020-08-24 11:00 UTC by rlobillo
Modified: 2020-09-23 09:51 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2020-09-23 09:51:18 UTC
Target Upstream Version:

Attachments (Terms of Use)

Description rlobillo 2020-08-24 11:00:02 UTC
Description of problem:

Approximatetely 1 of 2 times, OCP installer expires but the installation is successfully completed few minutes later.

Checking the kuryr-controller logs, there are a big amount of below errors:

ERROR kuryr_kubernetes.handlers.logging requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

So k8s internal API is aborting the connections for some time leading to the delay on the installation.

Version-Release number of selected component (if applicable):
openshift_puddle: 4.5.0-0.nightly-2020-08-21-084032

How reproducible:

Steps to Reproduce:
1. Install OSP16.1 + OVN + Ceph + TLS-everywhere
2. Install OCP4.5.

Actual results: Unstable results on installation.

Expected results: Stable successful installation.

Additional info:
Logs for two different executions: 
- https://rhos-ci-staging-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/osasinfra/view/shiftstack_on_vms/job/DFG-osasinfra-shiftstack_on_vms-ocp_verification-osp16.1/53/artifact/.sh/ir-openshift-install.log

- https://rhos-ci-staging-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/osasinfra/view/shiftstack_on_vms/job/DFG-osasinfra-shiftstack_on_vms-ocp_verification-osp16.1/54/artifact/.sh/ir-openshift-install.log

Comment 1 Ben Bennett 2020-09-03 14:10:37 UTC
Set the target to 4.7 because I don't think this will block 4.6.  However, please feel free to work on it, and if you have a PR that is ready to merge, please update the target to 4.6.

Comment 3 Itzik Brown 2020-09-07 13:55:30 UTC
Also happened using OpenshiftSDN with OCP 4.5.3 and 4.5.8

It also worth to mention that we sometimes see that kube-controller-manager cannot reach the api-int: E0907 04:49:57.854644       1 leaderelection.go:321] error retrieving resource lock kube-system/kube-controller-manager: Get "https://api-int.ostest.shiftstack.com:6443/api/v1/namespaces/kube-system/configmaps/kube-controller-manager?timeout=10s": dial tcp connect: connection refused

Note You need to log in before you can comment on or make changes to this bug.