Bug 1871814

Summary: OCP installation times out but after few minutes the cluster is up
Product: OpenShift Container Platform Reporter: rlobillo
Component: InstallerAssignee: MichaƂ Dulko <mdulko>
Installer sub component: OpenShift on OpenStack QA Contact: GenadiC <gcheresh>
Status: CLOSED DUPLICATE Docs Contact:
Severity: medium    
Priority: medium CC: bbennett, itbrown, ltomasbo
Version: 4.5Keywords: AutomationBlocker
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-09-23 09:51:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description rlobillo 2020-08-24 11:00:02 UTC
Description of problem:

Approximatetely 1 of 2 times, OCP installer expires but the installation is successfully completed few minutes later.

Checking the kuryr-controller logs, there are a big amount of below errors:

ERROR kuryr_kubernetes.handlers.logging requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

So k8s internal API is aborting the connections for some time leading to the delay on the installation.

Version-Release number of selected component (if applicable):
openshift_puddle: 4.5.0-0.nightly-2020-08-21-084032

How reproducible:


Steps to Reproduce:
1. Install OSP16.1 + OVN + Ceph + TLS-everywhere
2. Install OCP4.5.

Actual results: Unstable results on installation.


Expected results: Stable successful installation.


Additional info:
Logs for two different executions: 
- https://rhos-ci-staging-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/osasinfra/view/shiftstack_on_vms/job/DFG-osasinfra-shiftstack_on_vms-ocp_verification-osp16.1/53/artifact/.sh/ir-openshift-install.log

- https://rhos-ci-staging-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/osasinfra/view/shiftstack_on_vms/job/DFG-osasinfra-shiftstack_on_vms-ocp_verification-osp16.1/54/artifact/.sh/ir-openshift-install.log

Comment 1 Ben Bennett 2020-09-03 14:10:37 UTC
Set the target to 4.7 because I don't think this will block 4.6.  However, please feel free to work on it, and if you have a PR that is ready to merge, please update the target to 4.6.

Comment 3 Itzik Brown 2020-09-07 13:55:30 UTC
Also happened using OpenshiftSDN with OCP 4.5.3 and 4.5.8

It also worth to mention that we sometimes see that kube-controller-manager cannot reach the api-int: E0907 04:49:57.854644       1 leaderelection.go:321] error retrieving resource lock kube-system/kube-controller-manager: Get "https://api-int.ostest.shiftstack.com:6443/api/v1/namespaces/kube-system/configmaps/kube-controller-manager?timeout=10s": dial tcp 10.196.0.5:6443: connect: connection refused